Cellular Automata and Parallel Processing for Practical Fluid-Dynamics Problems H. Abarbanel K. Case A. Despain F Dyson M. Freedman C. Max D. Nelson 0. Rothaus September 1990 JSR-86-303 This report was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any of their employees, makes any warranty. express or implied, or assumes any legal liability or responsibility for the accuracy completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights Reference herein to any specific commercial product, process, or service by trade name, trademarx, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof JASON The MITRE Corporation 7525 Colshire Drive McLean, Virginia 22102-3481 (703) 883-6997 i -ppT.e pub* ,i. T REPORT DOCUMENTATION PAGE Form Approvedi O0M8 No. 07(4-0188 I pW&K te",inin burdenl for tis CoificCioE of informsation i t ited to aefeWI "0ta ou OtrcIoome. inciudinq ivtie fm.9orreiro...q instrojoni.m ,tachrnq "ntinq data souc.V re ieno u udnetmte~a' n and.re*Ce-nq the. coliition of -nformaton 1efld rommntSre9~l gatha.nq and maintaining fthe data neded. and CORN01011ing ,oiptdro,rotormstoe. *nidadir'q *, qqasf1tOes for reducing [PIPS burden. to WosnnAQoa, n*addaarfe,s $vinefen. Oireeorar o nomation, Oinatioms and itecnu. 121 ) W. oenfs HNgqvSV. Suit# 104. Atrnqon. VA 22202-4302. and to theaOffice of Manaqcemept and Suoaqvit faoerwora atdaucion Proft (07044188S). Woiasfuosqnoas OCJOS03 i. AGENCY USE ONLY (Leave blank) 2REPORT OAT4 ISeptember 3. EPORT TYPE AND DATES COVERED 9 -0, 199 4. TITLE AND SUBTITLE S. FUNDING NUMBERS Cellular Automata and Parallel Processing for Practical Fluid-Dynamics Problems P 6. AUTHOR(S) 53 H. Abarbanel etal 7. PERFORMING ORGANIZATION NAME(S) AND ADORESS(ES) 3. PERFORMING ORGANIZATION REPORT NUMBER The MITRE Corporation JASON Program Office A10 7525 Coishire Drive McLean, VA 22102 JSR-86-303 9. SPONSORING. MONITORING AGENCY NAME(S) AND ADDRESS(ES) of -AGENCY Depatmen 10. SPONSORING/ MONITORING REPORT NUMBER ~j= JSR-86-303 Washington. DC 2 111.SUPPLEMENTARY NOTES 12b. DISTRIBUTION CODE 112a. DISTRIBUTION/ AVAILABILITY STATEMENT -r,.s report %as prepared as an account of work sponsored by an agency of the United States Goversnent Neither the liability epresents ma'. as use would not infringe pnnaiely owenedrights Reference herein to any specific commercial product. rnicess, or sersicc ns trade name, trademark, mnanufacturer, or otherwise. does not necessarly constitute or imply its ,ndorsemen:. rccormedation. or favoing by the United Staies Govemnment or any agency thereof The views and opinions ai~viors exprcssed herein do not nececssarcy suite or reflect those of the Uruted States Gonemment or any agency, thereof 13. ABSTRACT (Maximum 200 words) During the 1986 JASON Summer Study a group of JASONs undertook to examine, under the sponsorship of the Department of Energy and DARPA )the utility of cellular automata in physical science calculations, especially in fluid dynamics. We expanded the scope ofour study to include several related topics which we concluded would be of some interest to our sponsors. These include: (1j a comparison of cellular automata (CA) techniques to "conventional" methods of solving the partial differential equations of fluid dynamics or other physical si'tuations, and (2) the utility and status of using parallel or concurrent processing machines for doing either CA or conventional fluids calculations. 1S. NUMBER OF PAGES 14. SuBJ ECT TERMS cellular automata, vlsi chip, impressible fluid algorithrnns; two, three, or four dimensions; octogonal quasilattice 17. SECURITY CLASSIFICATION OF REPORT UNCLASSIFIED NSN 7547' '-5500 I 13. SECURITY CLASSIFICATION OF THIS PAGE UNCLASSIFIFD ___________ 16. PRICE coDE 19. SECURITY CLASSIFICATION IUN('LASSIFIFI) OF ABSTRACT 20. LIMITATION OF ABSTRACT SAR Stanoard Fort" 298 (ROV "V5Coom Or A*NP6%to 1)9. I 2-89) Contents 1 INTRODUCTION 2 1 CELLULAR AUTOMATA RULE FINDING FOR USE IN 3-D FLUID DYNAMICS 5 2.1 3-D Cellular Automata ...................... 12 3 A VLSI CHIP FOR A CELLULAR AUTOMATA MACHINE 3.1 Introduction .. .... .. ... ...... ....... 3.2 Computational Cell for 4-D Rules ................ 4 .. .. GENERAL COMPARISON OF CELLULAR AUTOMATA AND CONVENTIONAL FLUID-DYNAMICAL METHODS 4.1 Cellular Automata: Potential Advantages ............ 4.2 Cellular Automata: Disadvantages ............... 4.3 Conventional Methods: Advantages ............... 4.4 Conventional Methods: Disadvantages ............. 4.5 Comparison on the Basis of Computational Work ........ 4.5.1 Number of Grid or Lattice Points ............ 4.5.2 Number of Bits per Lattice or Grid Point .......... .... 17 17 19 27 27 27 28 28 29 29 33 4.5.3 Number and Speed of Operations per Timestep 4.5.4 4.5.5 4.5.6 Number of Timeteps ................... 38 Ease of Adapting to a Parallel Computation Environment 39 Overall Comparison of Computational Effort ....... 40 36 5 QUASILATTICES FOR CELLULAR AUTOMATON FLUID CALCULATIONS 47 5.1 Introduction . . . . . . . . .. . . . .. .. .. .. . .. . . .. 47 5.2 Octagonal Quasilattices ..................... 49 5.3 Icosahedral Quasilattices ..................... 59 6 COMPARISON BETWEEN CONVENTIONAL KINETIC THEORY AND CELLULAR AUTOMATA DERIVATIONS OF HYDRODYNAMICS 69 ill 6.1 6.2 6.3 6.4 6.5 6.6 7 Introduction ............................ Equations for One Particle Distribution Function .. .. .. .. .... ... ... .... ... 6.2.1 Molecules .. .. .. ... .. ... ... ... ... .... ... .. .. ... 6.2.2 C.A. ...... ... Macroscopic Conservation Laws. .. .. .. ... ... .... .... ... ... 6.3.1 Molecules .. .. .. ... .. ... ... .... ... ... 6.3.2 C.A. .. .. .. .. ... ... .... ... .... The Euler Equations. .. .. .. ... ... ..... .... ... ... 6.4.1 Molecules .. .. .. ... ... .. ... ... .... ... 6.4.2 C.A. .. .. .. .. ... .. ... ... The Chapman-Enskog Expansion .. .. .. ... .... ... ... .... ... 6.5.1 Molecules .. .. .. ... .. ... ... ... ... .... 6.5.2 C. A. .. .. .. .. ... .. ... ... ... ... ... Conclusion. .. .. .. .. .... 69 70 70 71 72 72 73 74 74 75 76 76 78 80 83 PACKING A PLANAR LATTICE Accession For V0 -N~~tTIS G-RA &I DTlC TAB0 uniannounced Justifioatio 0l Sponsoring/Monitoring agency was U. S. Dept. of Energy, office of Program Analysis/ER- 3 2 . Washington, DC 20585. By Di tributiol/ Dist. "A" per Charles mandelbaumn. Office of Program Analysis/ER-32. U. S. DeparteahngoD ment of Energy/ER32 20585. AvailabilitY Codes jAvail and/or ctt Special VHG iv 11/07/90 1 INTRODUCTION During the 1986 JASON Summer Study a group of JASONs undertook to examine, under the sponsorship of the Department of Energy and DARPA, the utility of cellular automata in physical science calculations, especially in fluid dynamics. We expanded the scope of our study to include several related topics which we concluded would be of some interest to our sponsors. These include: (1) a comparison of cellular automata (CA) techniques to "conventional" methods of solving the partial differential equations of fluid dynamics or other physical situations; and (2) the utility and status of using parallel or concurrent processing machines for doing either CA or conventional fluids calculations. To assist our study we hosted a series of external briefers who kindly gave of their time and expertise. On the subject of CA and applications to fluid dynamics we heard from Dr. Jay Boris of the Naval Research Laboratory who spoke on "Cellular Model for Tasking and Correlation," Dr. T. Toffoli of MIT who spoke about "Primitives of Computation and of Physics as Applied to Cellular Automata," Dr.Gary Doolen of the Los Alamcs National Laboratory who addressed us on the topic of "Lattice Gases," Dr. S. Wolfram of the University of Illinois speaking on "CA and Hydrodynamics," Dr. S. Omohundro of Illinois speaking on "Applications of the Connection Machine Architecture," Dr. B. Nemnich who spoke on "The Connection Machine," and Dr. P. Collela from the Lawrence Livermore National Laboratory speaking on "Multiple Scale Problems in Hydrodynamics." On the subject of parallel processing we heard, in addition to the talks of Nemnich and Omohundro, from Dr. J. Barhen of Oak Ridge National Laboratory on "Hypercube Computer: Architecture and Algorithms for Advanced Applications," and from Dr. J. Fier of the AMETEK Computer Research Division on "Hypercube Architecture and Applications." To all these workers in the field we give our thanks for their assistance. The work reported on in the present study was begun at JASON in the summer of 1986. Since that time there has been a considerable maturation of the field of cellular automata. The reader desiring further background may refer to two excellent review volumes: 21 1. Complex Systems volume 1, no. 4 (August 1987); this volume is based largely on presentations of a workshop held in Santa Fe, NM in October of 1986. 2. Lattice Gas Methods for Partial Differential Equations, edited by G. Dollan et al. (Addison-Wesley, N.Y., 1989). Our own work has been organized along the following lines: " We have examined the relationship, in terms of effort and efficiency, of doing fluid dynamics calculations using cellular automata versus more conventional spectral or finite difference methods. We conclude that cellular automata calculations are likely to be competitive with standard finite element or spectral methods for the Navier-Stokes equations primarily for low Reynolds numbers and Mach numbers. An exception to this may occur in complex geometries or with boundary conditions where conventional methods are often quite difficult. " We have reviewed the derivation of ordinary Navier-Stokes Newtonian fluid dynamics from kinetic theory and compared it to the derivation of fluid dynamics from CA. Only for low Mach numbers do we have some confidence that fluid dynamics is being simulated in these calculations. " We have looked into two schemes for carrying out CA calculations in three dimensions-one uses the notion of tiling or covering 3-D space with a quasi-periodic lattice of Penrose type, and the other investigates the idea of doing CA computations in dimensions larger than three and then projecting the results back into three dimensions. In each case the issue is achieving enough structure in the underlying covering of 3-D space to assure correct tensorial characteristics of the quantities entering the Navier Stokes equations. " We have formulated in an abstract fashion the problem of representing in a physical 3-D computational structure 2-D highly parallel CA computations. Perhaps it is useful to define what we mean by some of the terms used already in this introduction, especially those which will appear throughout the report. First of all, we need to address the meaning of cellular automata. 2 The idea of cellular automata is that by using very restricted information on the position and velocities of individual particles in their microscopic collision and interaction one can, nonetheless, arrive at a good representation of macroscopic dynamical equations such as the Navier-Stokes equations, since the latter involve averaging over larger numbers of individual particles. Fr- thermore, the averaging is over space and time scales large compared to the microscopic dynamics, so one might think that a crude representation of the latter could result in realistic and acceptable macroscopic physics. The generality of macroscopic equations such as the diffusion equation or the fluid dynamics equations which are parametrized by a few transport coefficients in which all the microphysics is buried would also support this point of view. The key notion of CA, as opposed to the ideas of molecular dynamics, is to very crudely represent each individual particle motion. Crude here means giving the position on a simple lattice covering the space of interest and giving the velocity as one of a few choices such as plus or minus unity. In other words, by representing the phase space coordinates of individual particles by only a few bits of information, one hopes that the aggregate average needed to construct the macroscopic velocity or density is accurately and efficiently computable. 3 2 CELLULAR AUTOMATA RULE FINDING FOR USE IN 3-D FLUID DYNAMICS There is some difficulty in finding simple cellular automata dynamics in three dimensions which adequately simulate the Navier-Stokes equation. From a purely formal (i.e., nonphysical) point of view, there are a number of ways of overcoming these difficulties, and as we will show, many of the purely formal procedures have a straightforward physical interpretation. As a consequence of these investigations, we can present what appears to be the simplest, and most straightforward, cellular automata simulation of 3-D fluid motion. To begin, whether in two- or three-dimensions, our lattice sites will always be the points with all integer coordinates, i.e., the usual integral lattice. In addition are given a collection of vectors e 1 , e 2 , ... eN each vector belonging to the lattice. In the list, a given vector may occur repeatedly. At any instant of time, the total state of our automaton is described by an N-dimensional vector of zeros and ones given for each lattice site. In other words, the state of our automata at time t is described by specifying A,(x, t), a = 1, 2,... N, and x running through the lattice sites. For fixed x and t, the vector [A,(x,t),A 2 (X,t), ... ,AN(X,t)] is called the state vector at site x, time t. The evolution or dynamics in our automaton is given a. follows: A(X,t+ 1) = F,,a = 1,2... N, where each F. is a fixed function of the state vectors of the automaton at time i at sites in a fixed neighborhood of x. What we mean to say here is simply that the rule for advancing in time is the same function of neighboring states at each site and all times. You may think of the Fa as including both the collision laws and the motion of the more conventional description of cellular automata, but in general they have no such simple physical description. In order to bring out clearly what the essential properties of the dynamics are, we will write: Fa = A(x-ea,t)/fl 5 (2-1) and demand that identically for all total states of the automata at time t, we have: no = 0 (2-2) 0. (2-3) a eafl. = a The I's are simply the measure of the difference in the dynamics of the automaton from straight collisionless motion of the particles. They are rarely mentioned specifically for most of the familiar constructs, but they do play an important formal role, as we shall see momentarily. It is quite easy to see that conditions in Equation (2-2) are met, as a matter of fact, for the familiar square and hexagonal lattice fluid dynamics constructs, without even specifically bringing in the QVs. For our original list of vectors e', we are going to insist that all the tensors a Ei eO, e 0 ea, Ea e a e & ea and eie ,e ea(ea be isotropic. Thus, for example, if we take in two dimensions the 20 vectors: (±1, ±1) once each, (±1,0) 4 times each, and (0, ±1) 4 times each, we satisfy the isotropy requirements. In three dimensions, we may take the 24 vectors (±1,±1,0) once each, (±1,0, 0) twice each, (0, ±1,0) twice each, and (0, 0, ±1) twice each to satisfy the isotropy requirements. In general, for a given set of e's, it seems quite easy to produce a large number of Fa's satisfying the requirements (2-1) and (2-2) above. Subsequently, we will produce Fe's for the e's described in the paragraph immediately above. Suppose we assume that our automata locally equilibrate in space. Denote by E the admittedly vague notion of expectation operator for the local spatial equilibration, and put fa(x,t) = E(A,(x,t)). By virtue of the re- quirements (2-1) and (2-2), we obtain: Efa(xlt+ 1) = a Eeafa(x,t a E f.(Xea, t) (2-4) a + 1) = Eefa(..X-ea,t). a 6 (2-5) Denote n(x,t) = E.fa(x,t) and n(x,t) u(x,t) In the continuum limit of long times and large lattices, the equations above become: S-+V-nu = Eefa(X,t). = 0 (2-6) -j(nu) + Ee(e.Vf.) = 0. (2-7) a It is convenient to refer to n(x, t)as particle density, u(x, t) as average velocity, and nu as momentum density. With this convention the requirements (2-1) and (2-2) provide for conservation of particle number and momentum. For if we pick an initial state for the automaton in which only finitely many of the state vectors are non-zero, we get from (2-1) that n(xt + 1) = n(xt) or in the continuum hrmit d J n(x, t)dx-= 0. Similarly (2-2) gives conservation of momentum in time. Analogously, if the initial state for the automaton is periodic in the spatial variables, so also are subsequent states and with the same periods, and the total particle number and total momentum over a periodic rectangular parallelogram or parallelepiped is conserved in time. Imposing periodocity is equivalent, of course, to running the dynamics on a 2- or 3-D torus. At this point, we make the bold assumption that the functions fa(x, t) can be determined from the equilibrium parameters u(x, t) and n(x, t) by a Chapman Enskog expansion. Arguing just as in Wolfram 1 we end with Navier-Stokes like equations for a continuum fluid. For the rest of this section, we are going to show that, at least in some cases, the formal procedures described above really work. Along the way, we will produce a particularly simple way of running 3-D fluid dynamics at the cellular automata level. We will begin with a 4-D cellular automaton with the lattice sites being as usual the points with all integral coordinates. Lattice points are labelled (x,y,z,w). We take the sublattice of integral points for which the coordinate 7 sum is even, which is of index 2 in the full integral lattice. This lattice is connected with a very interesting tessalation of tUe 4-D space, but this does not concern us here. In the sublattice we take all points of least nonzero distance from the origin, these being precisely the 24 vectors: (±1, ±1,0, 0) and its permutations. These 24 vectors are the list ea, a = 1,2,..., N = 24. It is easy to verify that the requirements of isotropy are met by this list, that is, for example: E(,a)(,a)(e' ) (,a), = i3'kt + bikbit + bitbik. a There are a number of available scattering laws to give a good momentum scramble, e.g.: A) Binary: (1,-1, 0,0)+ (-1, 1, 0,0) - (0, 0, 1,-1) + (0, 0,-1, 1) B) Ternary: (1, -1,0, 0) + (0, 1, -1,0) + (-1, 0, 1, 0) - (-1, 1, 0, 0) + (0, -1, 1, 0) + (1, 0, -1, 0) C) Ternary: (1, 0, 0, -1)+ (1, 0, 0, 1) + (-1. 1, 0, 0) 4--* (0, 1, 0, 1) + (0, 1, 0,-i) + (1,-1, 0, 0) and D) Ternary: (1, 0, 0,-1) + (1, 0, 0, 1) + (-1, 1, 0, 0) +-+(1, 1, 0, 0) + (0, 1, 1, 0) + (0, -1, -1, 0) and so avoid undesirable conservation laws. If we run a 4-D cellular automaton with a suitable collection of such scattering laws on the integral lattice, there is however one inevitable conservation law, which will not concern us in the applications we make. This arises simply from the fact that the dynamics takes place in two uncoupled sets, namely the sublattice of points whose coordinate sum is even and the coset of points whose coordinate sum is odd. The application of this 4-D cellular automaton to 3-D problems is now quite straightforward. Take any initial state in the 3-D section (x,y,z,w = 0), and extend it to 4-D by repeating it in every section w = k, k an integer. If we have deterministic scattering laws, then throughout the evolution all sections w = k remain the same as the section w = 0. [Even if we have probabilistic scattering laws, we can maintain the identity of sections by insisting that the choice made at a site (x,y,z,0) in the section w = 0 be repeated at all sites (x,y,z,w)l. 8 All sections remain the same, thus macroscopic particle density, momentum density and average velocity vector are independent of w. since we get the same average over a region and any translation of the region along the w-axis. The even sublattice and its colattice are now coupled together because the sites (x,y,z, even) and (x,y,z, odd) have the same state. If we now look at the 4-D Navier Stokes equation which our 4-D automaton is presumably simulating, then since density and momentum density are not functions of w, the projection 7ru of the velocity vector u into the section w = 0 together with the original density n, satisfy a Navier Stokes like system in 3-D. Since the sections w = k remain the same throughout the evolution of the 4-D automaton, the dynamics can be fully described in the section w = 0. How do we do so? We need to describe the state at each 3-D site with a 24 bit vector, corresponding to presence or absence of a particle headed in direction ea;a = 1,2,... ,24. Since we will be concerned only with the projection of the 4-D momentum into the section w = 0, we project the 24 vectors ea to get: (±l, ±1, 0) (±1,0, ±1) (0, ±1, ±1) (±1,0, 0) (0, ±1, 0) (0,0, ±1) once each once each once each twice each twice each twice each. The versions i the projected vectors which occur twice are to be labelled; we call one spin plus, the other spin minus, depending on the sign of the invisible 4th coordinate. The scattering laws such as (A), (B), (C), (D) described earlier are replaced by their versions with the last "oordinate suppressed, but the vectors therein labelled spin plus or minus as required. The 3-D motion after scattering takes place along the 24 projection of the vectors F a. Here we have precisely what was described only as a possibility earlier, namely a 3-D Navier Stokes simulation using some vectors ea repeatedly. The 3-D isotropy is obvious. 9 The 2-D simulation using 20 vectors, described earlier, can be simulated by projecting once more, into the section z = 0. It is rAatively easy to see that the four rest particles arising from the projection of (0,0, ±1), can be dispensed with altogether, but we do not go into details. As matters currently stand, we can run our 3-D Navier Stokes simulation with a 24-bit vector to describe the state at each site. We want to argue now that 24 can be reduced to 18. To do so, we return to our 4-D simulation, with all sections w = k, k an integer, the same. Suppose for the moment that the state vector at each site in the section w = 0 is reflectively symmetric. By this we mean that if one of the particle directions such as (1, 0, 0, 1) occurs, so also does (1, 0,0, -1). This being the case, the state vector can be described by an 18 bit vector. Suppose also that the zattering rules we choose to use are reflectivity symmetric. By this we mean that for every scattering law, the law obtained by changing the signs of last coordinates is also a scattering law we use. (With some care, probabilistic scattering laws can be handled). All this being so, it is easy to see that in the evolution of the 4-D automata, the section w = 0 remains reflectively symmetric, and so the 3-D dynamics needs only 18 bits per site. At the same time, we must assure ourselves that we have enough applicable scattering laws to avoid undesirable conservation laws. It is clear that we will not succeed by using only those vectors which occur singly. But the purpose of our writing down scattering laws (C) and (D) earlier was to persuade the reader now that enough scattering survives, these being two instances of scattering laws in which the spin plus and spin minus directions occur in pairs. Actually, a binary law, such as (1, 0, 0, +1) + (1, 0, 0, -1) ( (1, 1, 0, 0) + (1, -1, 0, 0), will eliminate unwanted conservations. It is interesting to note that in running a 4-D reflectively symmetric cellular automaton with all sections w = k the same, the macroscopic momentum density has no component in the direction of the w-axis. Now that we are done describing our 3-D simulation, it all seems quite trivial. We have simply reversed a standard device in fluid dynamics. If, for example, one wishes to describe a 3-D planar flow about a cylindrical obstacle, one reduces to a 2-D problem. We have, perversely, taken a 3D problem and imbedded it in a 4-D hyperplane flow setting, to gain the advantage of the useful lattices existing in four dimensions. 10 There is another 4-D sublattice of the integral lattice which offers some promise. Essentially the dual of the sublattice considered earlier, it consists of points all of whose coordinates are even, or all of whose coordinates are odd. It is of degree 8 in the full integral lattice. The vectors to consider in this instance, 24 in number also, are (±2, 0, 0, 0) and its permutations and (±,+1, ±1, ±1). These vectors give the desired isotropy, and projecting into w = 0 gives two rest particles, 8 particle directions which occur twice, and 6 particle directions occurring once. The rest particles can be eliminated; running the remainder demanding reflective symmetry as before will give us a 3D simulation requiring a 14 bit vector to describe the state at each site. Scattering laws such as (1111) + (111 - 1) + (-2000) -, (-1111) + (-111 - 1) + (2000) will give us some momentum scramble, but we do not appear to have an analogue of the very useful scattering law (D) used in the earlier lattice. Without such analogue, the total number of particles in all of the directions given by the 8 paired spin plus and spin minus directions is conserved, and this gives us a conservation law. The use of higher order scattering laws can eliminate conservation. The fact that the 3-D dynamics takes place in 4 uncoupled colattices presents no difficulties; restrict one's attention to the sublattice: all coordinates even, or all coordinates odd. It is appropriate to describe at this point what we feel is the right way to search for conservation laws. Recall that there is an assumption underlying the whole discussion of cellular automata, namely that particle number and momentum are the only conserved quantities. Now we suppose that the automaton dynamics is broken into two parts, the applications of the scattering laws followed by the lattice motions. At an instant in time we have a state vector A(x, t) which after scattering becomes A'(x, t), followed by motion to neighboring sites, so that Ao(x, f+ 1) = A'(xea , t). Now suppose there is a vector v such that EvaAa(x, a t)= Ev°A.(x,t), a 11 no matter what x and t, and such that also E. voA.(x, t) is not identically zero. Then we have a particle number conservation law, since now VaA(x,t + 1)= vAo(x - e.,t). a Integrating over all space and taking expectations yields J v.- f (x, t)dx-=f v.- f (x, t + dx which is not the usual conservation law if v • f(x, t) is not a scalar multiple of n(x,t). A law different from the usual momentum conservation law would arise from a vector v such that ZVae'A (x,t) = Zvae AA(x,t). a a On taking the dot product of the last with a fixed arbitrary vector, we would find a particle conservation law of the sort considered just above, so it suffices to search for particle number conservation laws. As noted above, these arise from vectors v orthogonal to A(x, t) - A'(x, t) for all possible x and t. 2.1 3-D Cellular Automata We will now discuss in greater detail some of the technical issues involved in efficient use of 3-D cellular automata for simulating fluid flow, confining our attention to the two kinds of automata described in Section 2. The first of these is associated with the 4-D lattice of integral points whose coordinate sum is even and for which we pick lattice motions corresponding to the 24 directions (±1, ±1,0, 0) and its permutations, this choice assuring sufficient isotropy of the flow to mimic the Navier-Stokes equations. The second choice is the lattice of integral points for which all coordinates are even or all are odd. In this case we pick 24 lattice motions associated to the vectors (±2, 0, 0, 0) and its permutations and (±1, ±1, ±1, ±1), enough again to insure isotropy. In both cases we are going to use the 4-D automaton to simulate 3-D problems. This is effected by choosing initial conditions in the section w = 0, and repeating them in all the other sections w = k. If the scattering law used at the site (a, b, c, k) is the same for all k at each fixed instant in time, 12 then all sections remain the same in the evolution of the automaton, and it is enough to simply follow the evolution in the section w = 0. In order to further reduce the computational burden, the following useful device, already pointed out in Section 2, if a particle is needed in direction (a, b, c, d), there is also one headed in the direction (a, b, c, -d). For the first lattice mentioned above, this convention permits the state at a 3-D site to be described by an 18-bit vector. For the second, the state at a site requires 16 bits. But since it is easy to see that the effect on the evolution of the automaton arising from the pair of directions (0, 0, 0, ±2), if these are not used in any scattering laws, is irrelevant and does not create any unwanted conservation in the 3-D section w = 0, the number of bits necessary to describe the state at a site can be reduced to 14. For the second lattice the pair of directions (0, 0, 0, ±2), which look like rest particles from the 3-D point of view, have simply been dropped. For either lattice, a pair of directions such as (a, b, c, 1) and (a, b, c, -1), both of which occur at a site, or neither, is called a "married pair." For both lattices, a variety of scattering laws are available, much more so than in the simple 2-D hexagonal lattice, even with the restrictions created by the married pairs. The problem, as we see it, is to have a rich enough collection of scattering laws to eliminate any undesirable particle or momentum conservation laws, but not so many that the computational logic at a site becomes unwieldy or too slow. Part of the difficulty in achieving this goal is a consequence of the observation that some randomization is needed in applying scattering laws. For example, if the pair (1, 1, 0, 0) and (-1, -1, 0, 0) are present at a site, they can be replaced by any other velocity vector pair which sums to zero. How shall the choice be made? Similarly, we may have scattering laws: (A) (1, 1, 0, 0) + (-1,-1, 0, 0) (B) (1, 1, 0, 0) + (1,-1, 0, 0) 4-- (1, 0,-1, 0) + (-1, 0, 1, 0) (1, 0, 1, 0) + (1, 0,-1, 0). If all three velocity vectors on the left are present at the site, and none of those on the right, which of the two laws should be applied? 13 Recall that the ultimate macroscopic features of the flow arc to be obtained by averaging over suitable sub-regions of the 3-D grid. If the scattering is very limited, or not always applied when available, then mean-free-paths will be quite long; consequently the suitable sub-regions of the grid will have to be undesirably large. On the other hand, attempting to put in all the scattering available will require intricate combinatorial decisions, and a fair amount of randomization. For these and other reasons, it is desirable to keep the scattering laws as simple as possible. The simplest scattering laws are, of course, binary scatters, and physically these are the most likely to occur. It is fortunate that for the first of the two lattices described earlier, with 18 bits per site for the state vector, the use of binary scattering only suffice. We describe the situation here in a little more detail now. We can interchange any of the four following pairs 1 (I) 1 0 0 -1 -1 -1 0 0 -1 1 0 0 1 0 1 0 0 1 0 M 1 0 0 1 0 1 0-1 -1 or any of the following three pairs (m (i) 1 1 0 0 1 -1 0 0 ) (1 1 0 1 0 0 -1 0 1 0 0 ) 1) 0 0 -1 (1 and analogously, 0 0),0 --1IIi 11 0 0 (V (I)(1 1 0 0 0 1 0 1 1 -1 0 ) (0 0 0 0 1 1 0 0 1 0 0 1 0 )'0 -1 1 0 )'0 1 0 1 0 1 -1 ) 1 ), an d 1 1 -1 (I) gives us 6 binary scatters, (II) to (VII) give us 3 binary scatters each, for a grand total of 24 scatters. Notice that these scattering laws meet the restriction imposed by the married pairs. It is also possible to verify that these binary laws are sufficient to avoid undesirable conservation laws (as described in Section 2); it is the exchanges offered by (II) through (VII) which mix up the "married pairs" with the "single" velocity vectors. We have not investigated the question of how many of the binary scatters can be dropped while still avoiding conservation laws, though it is clear that some can be. We could, for example, rather than permit the interchanging 14 0 1 0 of all the pairs of (I), interchange only pair 1 with pair 2, 2 with 3, ... , pair 4 with pair 1. Even with limitations such as this, the question of which interchange is to be made has to be decided with a randomization; if the second pair is present, and not the first or the third, which of the two interchanges should be effected? It is problematic whether using a thin set of the binary scattering laws, big enough to avoid conservation, will give us results as favorable as using them all. We do not see any particular advantage to limited use, and are going to proceed on the basis of using all the binary laws, and on something like an equal footing. Now we describe one scheme for using all of the 24 laws of the first lattice. The laws are labeled from one to twenty-four. A random number generator, somewhere on the side, picks a number between one and twenty-four, one for each site, and applies the selected scattering law if it can be applied; otherwise no scattering takes place. The random choice of scatter at each site is followed by motion of particles to neighboring sites, and then repetition of the procedure. In practice it may be desirable to have the random selection of the integer between one and twenty-four other than uniform. Since there are a very large number of fast methods for generating random numbers with frequencies, we will not go into these questions here. There is, nonetheless, quite a large bit of randomization-one generator for each site. Possibly the randomization can be carried out intrinsically as follows: use the bits, and their complements, in the far field of a site to generate the random number at the site. There are obvious risks and deficiencies in this procedure, however. For some purposes, the method outlined above will not effect scattering with sufficient frequency, for an available scattering law at a site has roughly a probability of 1/24 of taking place. At the cost of slowing the procedure down, a straightforward alternative is available as follows. After the first choice of scattering law is made and applied if possible, and before the particle motion to neighboring sites, a second independent choice is made of binary scatter and applied if possible. This may be repeated as many times as necessary before motion. An attractive feature of this scheme is that the number of repetitions may be settable by the program. Whether the scheme is any better than running the simulation faster with only one randomization 15 per motion depends on the time trade-off for scattering steps compared to motion steps. There is still another way of proceeding which in a sense eliminates the need for randomization. One may simply choose, once and for all, at each site in the field one of the 24 binary scattering laws, the choice to be made as randomly as possible. This is probably the cheapest and fastest way to simulate, though it might be useful for some purposes if the fixed random allocation of scattering laws to sites could be easily changed. So much for the first lattice. The second lattice has a smaller state vector at the site, but is considerably poorer in scattering laws. With the restriction imposed by the "married pairs," there are only three binary scattersinterchange any of the following three pairs (I 2 -2 0 0 0 0 0 0 ' (0 0 2 0 0 2 0 0 ' (0 0 0 2 0 -2 0 0 ' none of which mix up married and singles. There are no ternary laws; there are, however, 24 special quaternary laws of which a general example is (1,1,1,1) + (1,1,1,-1) + (-1,-1,-1 -1) *-+ (2,0,0,0) + (-2,0,0,0) + (0, 2, 0, 0) + (0,-2,0,0) and these 24 together with the three of (I) eliminate unwanted conservation. This lattice with the 27 scattering laws above can be used along any of the lines described for the first lattice, though it is not at all clear what the relative frequency of scatters coming from (I) with those coming from (II) should be. In general, especially in instances in which overall density is fairly large or fairly small, it is going to be difficult to use (II) and its ilk. Since without (II) the marrieds and singles are never mixed, it seems clear that somewhat greater emphasis must be put on finding applications of (II). In the final analysis, the performance in practice of either lattice, along any of the lines described above, must be studied by actual computer simulations. A great deal will depend on the size of the region over which averages are taken in order to obtain macroscopic estimates, on details of the scattering laws used, and on particle density. 16 3 3.1 A VLSI CHIP FOR A CELLULAR AUTOMATA MACHINE Introduction The 4-D scheme discussed in Section 2 above turns out to be very complex. To explain a VLSI circuit that can implement it, we begin by explaining a 2-D square-lattice system first. The scheme conserves particles and momentum. (However, it is not isotropic, and is hence unsuitable for simulating real physics.) The collision rules are a function of the incoming particles from the North (N), South (S), East (E), and West (W) directions, a random collision bit (C) that causes a collision to occur if it is possible to have one, and a random bit (RN), but we will include this function for possible later use. The most straightforward method is a simple table look up. Table 3-1 illustrates this. Given a set of particles, a collision bit, and a random bit, the output particles are determined. 17 Table 3-1 INPUT NSEW 0000 0001 0010 0011 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1100 1101 1110 1111 C RN 0 1 - - - - - - - - - - - - - - - OUTPUT NSEW 0000 0001 0010 0011 1100 0100 0101 0110 0111 1000 1001 1010 1011 1100 0011 1101 1110 1111 - - - 0 1 - - - - - - - - Now consider a 'factored' solution to the same problem. Only head-on collisions can result in scattering and if scattering is to occur, then there must be an output channel for each particle to occupy. Thus there are two 'scattering' rules: 1) N.S.E,. W 2) I.S C EW E E. W.C .. N. S. The first rule can be read as "If there is a particle from the North, and a particle from the South, and no particle from the East, and no particle from the West, and a collision is to occur, then send a particle out to the East and send a particle out to the West." 18 The enabling condition parts of these rules are simply implemented by 'and' gates. Next we must decide which rule to employ if both should be enabled. (Of course, in our example this can never actually occur, but it will occur in more complex systems.) We will employ a priority encoder to choose a rule, with a dynamic priority assignment by the random bit. Finally, the chosen rule will select which output channels to block and which output channels to inject particles into. The complete circuit is illustrated in Figure 3-1. It is, of course, more complex than a simple table look-up approach for this very simple system. However, it illustrates the method we intend to use to implement the 4-D cellular automata scheme. 3.2 Computational Cell for 4-D Rules The 4-D case has a total set of 24 incoming particles but some (6) are paired, so only 18 bits are needed to specify the incoming and outgoing particles. In addition there are 66 rules, so that seven input priority specification bits are needed to randomize rule selection. These will be decoded to the one in 66 priority selection bits. The 'C' bit (C) suffices to determine if an allowed collision will occur. The straight-forward table look-up scheme will no longer work efficiently as a table of 18 * 2 * *(18 + 7 + 1) - 10' bits is needed. So we will employ the 'factored' approach illustrated above. The computational node to accomplish this is shown in Figure 3-2. The rule enabling conditions will be calculated in the 'AND-plane' of a PLA (programmcd logic array). Thus we input the collision enabling bit and all of our particle signals, as well as the logical complement of each. This is some 38 total input signals. The result is 66 rule enable requests (R). Figure 3-3 illustrates the connections for the 'AND-plane.' From Figure 3-3, it is easy to count that there are 282 transistors needed to implement the connections. A few more (,, 100) will also be needed to function as inverters, drivers, etc., for a total of about 500 transistors. 19 RANDOM PRIORITY ASSIGNMENT RN) REQUEST GRANT RULE 1 RULE 1 C REQUEST NN 3RANT RULE 2 --- Ei SS W, w COLLISION RULES Figure 3-1. PRIORITY SELECTION "FACTORED" Computation Cell. 20 SWITCH PARTICLES To Neighbors Stored State (Lattice Pts) I HSwich Prioity H Priority Figure 3-2. A computational node for the VLSI chip. 21 And PLA----- MARRIAGES AND DIVORCES 1 2 3 4 INTER-GROUP HEAD-ON MIXING COLLISIONS COLLISIONS 5 6 7 8 9 10 1112 13 14 15 16 1718 192021222324252627282930313233 . .. 1ii T . ...... + 00o 0 OUTER-GROUP HEAD-ON COLLISIONS 1T 000+0 0-0 -00 0.. .. 111 0- + 0++ 0 - - -+0+) -0F:.R ++0 Figure 3-3. ; ;E l 1: A ..I: 1 iT: eI . i ::I ii. : e ... .. .. Io InI ""''' """ AND-plane' of Rule Enable Signal Generation Circuit. (The complement of each input signal appears in the row next to the signal.) 22 The next step is to select (randomly) a rule to fire. We will decode the seven random input bits into 66 signals to control the 'AND-gates' of the priority circuit. This will require about 500 transistors. If a simple extension of the priority circuit of Figure 3-1 is employed, it would require three gates and two inverters per stage, or about sixteen transistors per stage. This would be about 1000 transistors for the simple priority circuit. Such a priority circuit would be far too slow, and if pipelined would have too much latency. So a 'carry-look-ahead' scheme will be employed. It will require about 1200 transistors total if full look-ahead is employed using a gate fan of about 4. This combined with modest pipelining would require about 1500 transistors total. So we will assume this number. The output switch circuitry will require 36 gates or about 150 transistors. Thus we see in summing up that the factored circuitry will require about 3000 transistors per computation node. If one-half the chip is dedicated to computation circuitry and one-half to memory (to store the virtual node state) then about 8 x 8 (64) real computation nodes/chip is about the limit that ,.u,,i re expccted for nearterm VLSI technology (see Table 3-2" Tabie .1-1 VLSI TECHNOLOGY (CMOS) - -, --, Today ] in about 5 years - cm 2 cm active area 200 pins - 400 pins 1 Mbit DRAM -, 4 Mbit DRAM 50K 'Random' transistors - 200K Tran. 10 nsec internal clock - nsec 80 nsec external drive - nsec 2 These nodes are laid out as illustrated in Figure 3-4. The performance of the proposed 'ldttice gas computer' is now easily estimated (see Table 3-3). 23 To~ Too Mux Figure 3-4. Layout of nodes on Lhe VLSI chip. 24 A Computational Node Table 3-3 PERFORMANCE ESTIMATES Future (5 years) Present Day 512 lattice points 64 nodes/chip 106 chips 32K lattice points/chip 3 x 1010 lattice points TOTAL z 1011 10 nsec update rate 64 x 10' updates/chip/sec 10"' 1016 updates/sec About 106 VLSI chips is the limit for a single computer system (modern large scale supercomputers have about 3 x 10' chips). At 512 lattice points per node, 64 nodes/chip and 106 chips, up to 3 x 1010 lattice points may be feasible. With a 10 nanosecond update rate, 64 x 108 updates per chip and 106 chips yields an update rate of , 1016 per second. Advances in VLSI technology over the next five years will allow ; 3 times the number of lattice points and , 10 times the internal clock rate to yield ; 1011 lattice points updated at the rate of zt 1017 updates per second. 25 4 GENERAL COMPARISON OF CELLULAR AUTOMATA AND CONVENTIONAL FLUID-DYNAMICAL METHODS Before looking in detail at the computational requirements for cellular automata and conventional fluid dynamics, we first attempt to summarize some of their relative advantages and disadvantages in a more general way. 4.1 Cellular Automata: Potential Advantages Cellular automata have the nice properties of elegance, in the sense that their microphysics is immediately transparent, and of local simplicity. They are more immediately suitable for parallelization than conventional fluid methods, and they lend themselves to the design of custom computer architectures which may give very large increases in speed relative to conventional multipurpose machines. In addition, because of the "modular" nature of their geometry, they may be able to handle some types of complicated boundary conditions more readily than conventional fluid techniques. 4.2 Cellular Automata: Disadvantages On the other hand, the macroscopic physics which is being described by the cellular automata model is not always clear. This is the case, for example, for flows with Mach numbers approaching unity or for lattices lacking the appropriate isotropy properties. Additionally, cellular automata cannot be used for hypersonic flows, and they scale more poorly to high Reynolds numbers than conventional methods, as shown in Reference 2. C.A. calculations suffer with respect to Navier-Stokes equations because of the need to calculate much more than required by the application to fluid flow. Celiular automata provide a "few state" representation of configuration and velocity space, and are basically very simple versions of kinetic theory. Thus one must average over large numbers of individual sites in Z pace to construct the macroscopic density, n ( 4 t), and mean velocities, I (4 t), 27 which enter the fluid equations directly. This problem is shared by all kinetic theories, of course. Indeed, the point of solving macroscopic equations such as those of Navier-Stokes is to first filter out the microscopic spatial and temporal scales, and then solve for n and X. 4.3 Conventional Methods: Advantages For conventional solutions of the Navier-Stokes equations, the physical assumptions are more immediately apparent. The physical model is also more flexible, since for example additional terms may be added or the magnitude of the viscosity may be changed without altering the underlying spatial or grid structure. Conventional fluid dynamics seems better able to deal with a large dynamic range in spatial or velocity coordinates. This is partially because of the more favorable scaling to high Reynolds number discussed above, and partially due to the ability to use adaptive grid techniques in those locations where high spatial resolution is needed, without having to use the fine grid spacing in those locations where nothing much is going on. They also have the potential advantage that the assumption of incompressibility can be incorporated explicitly in the algorithm if the flow is highly subsonic, leading to a substantial savings in computer time. By contrast there is no comparable technique for taking advantage of incompressibility for cellular automata. Again this is because C.A. share with kinetic theories or molecular dynamics the feature of calculating all time scales at once. 4.4 Conventional Methods: Disadvantages Conventional fluid techniques have more complexity and more bits per cell than cellular automata. Per cell, they thus have higher memory requirements. The floating-point operations required for finite-difference solutions of the Navier-Stokes equations take much longer to perform, per operation, than the simple logical operations or table look-ups of the cellular automata case. Conventional fluid techniques are not as easy to parallelize as cellular automata, and they may in some cases be less able to deal with complex boundary conditions. 28 4.5 Comparison on the Basis of Computational Work Here we attempt to quantify some of the explicitly computational pros and cons of the two methods. Consider solving the same fluid-dynamics problem two ways: using cellular automata and using conventional methods based on finite-difference solution of the Navier-Stokes partial differential equations. We compare the computational work needed in the two methods. We emphasize 3-D applications, since in our judgment these represent the next major step in computational fluid dynamics over the coming decade. When the specifics of computer architecture enter our discussion, we shall compare a conventional Navier-Stokes fluid calculation performed on a Cray2 class supercomputer with a cellular automata calculation performed on a special-purpose massively parallel machine that does not exist today. Some details of this hypothetical special-purpose machine were described in Section 3; others will emerge as the discussion proceeds. Both numerical techniques involve subdividing the fluid volume of interest into a discrete grid or lattice. First we discuss how many grid or lattice points are needed for the two approaches (Reference 2). 4.5.1 Number of Grid or Lattice Points Let L be a macroscopic scale length characterizing the fluid-dynamics problem, and let U be a macroscopic velocity. For example L might be the length of an obstruction in the flow, the width of an shear layer in a jet, or the thickness of a channel or pipe through which the fulid is flowing. The cell size for both the conventional and the cellular automata calculations must be clearly be much smaller than L, so as to resolve the macroscopic structure. Figure 4-1 depicts the relations between the macroscopic scale L, the size of a cell t, in the fluid code, and the lattice spacing a in the cellular automata model. Because the equivalent of fluid quantities must be determined in the cellular automata model by averaging over the noisy data of many lattice points, it is clear that L >> t, >> a. 29 a Figure 4-1. Schematic of relations between macroscopic scale 1, for conventional hydrodynamical calculation, and lattice spacing a for cellular automata model. 30 In a conventional hydrodynamics calculation, the viscosity determines the required size of the cell, because a cell should resolve the Kolmogorov dissipation scale 17: 4, 77 = (L v3/U 3 )' / 4 = L Re - 3/ 4 , (4 - 1) where v is the viscosity and Re is the Reynolds number, Re = U L/v. (4 -2) Thus from (4-1), the number of fluid cells in a length L is LIM - Re 3 / 4 . (4 - 3) This quantity, Re3 /4 , represents the dynamic range in scale sizes which the calculation must resolve, since in any case the finite viscosity will not permit shear flow to develop on scales smaller than n. The dynamic range in velocities required in the calculation is of order Re'/4. From Equation (4-3) we see that in the conventional fluid-dynamic calculation, the number of cells required in three dimensions is approximately NcelI( 3 D fluid) = (LI )3 - Re91 4 . (4 -4) It is well known that this strong scaling of the number of cells with Reynolds number makes it difficult to carry out 3-D fluid-dynamics calculations even at Reynolds numbers of a few hundred. For example if one used a three dimensional grid with 64 grid points on a side (2.6 x 10' total grid points in 3-D), one could by the above criterion do a good job with a Navier-Stokes fluid algorithm simulating Reynolds numbers up to about (64) 4 / 3 = 256. To simulate a Reynolds number of 1000 would require 178 grid points on a side, or 5.6 x 106 fluid cells in all. For the celluar automata case, there are several different criteria which must be met in order to produce a physically meaningful simulation of a fluid. These are reviewed in Reference 2. For our purposes the most stringent condition turns out to concern the way in which collisions in the celluar automata model must represent the viscosity of the fluid, when averaged over many cellular automata nodes or lattice points. In the cellular automata model, all particles move with one discrete velocity (or at most a few). We shall call this velocity v,. If the mean free 31 path of a cellular automata particles is A, then the effective viscosity v which the model will produce when averaged over distances much longer than the mean free path is (4 -5) V -' A V.. The relation between the lattice spacing a and the mean free path A varies with the collision rules chosen for the cellular automata. In some models the mean free path will be considerably larger than a, if the rules state that a scattering event can only occur when the lattice sites which will be the final states of the collision are initially unoccupied. In other cases the mean free path will be comparable with, or even in some cases less than, a; this may be the situation when rules state that there can be more than one cellular automata particle occupying a given site. In the discussion to follow, we shall assume that the mean free path is approximately comparable with the lattice spacing, a. When one averages over distances long compared to the mean free path, the macroscopic viscosity v will be roughly v - av,. Av (4-6) Dividing this inequality by a and multiplying by the macroscopic scale length L, this implies that the number of lattice sites in a macroscopic scale length L is (4-7) = ReIM, (V Lia - L vo/v = U V where M is the Mach number, M= U/v,. (4-8) Equation (4-7) is the cellular automata counterpart of Equation (4-3) for the conventional fluid-dynamics case. The Reynolds number Re is considerably larger than unity in most cases of interest. Additionally, the Mach number M must be small because it is shown in Section 6 of this report that the cellular automata model does not reproduce Navier-Stokes fluid dynamics unless M << 1. As a consequence, the scaling for cellular automata given in Equation (4-7) is even more stringent than that for conventional fluid calculations given in Equation (4-3). The number of lattice points required in three dimensions is approximately Ncell (3D C.A.) -t (L/a)3 - (Re/M)3 . 32 (4 - 9) To continue our numerical example, using Re = 256 and Mach number M = 0.2 the number of cells required is 2.1 x 10'. A Reynolds number of 1000 would require 1.3 x 1011 cells for a Mach number of 0.2. Figure 4-2 illustrates the number of cells required for the equivalent Navier-Stokes and cellular automata calculations, at several Mach numbers. In three dimensions, the ratio of the number of grid points needed for cellular automata and conventional fluid-dynamical calculations of the same problem is rL/a] 3 Re 3/ 4 (4-10) M3 >> 1. LL/, (7a Consider a numerical example just discussed, where the fluid dynamics calculation has 643 grid points. The cellular automata model needs a factor of Re3/4/M 3 = 8000 more grid points than the fluid dynamics calculation would need. For a Reynolds number of 1000, the cellular automata model would need a factor of 2.2 x 10' more grid points than the fluid model. 4.5.2 Number of Bits per Lattice or Grid Point The computational effort needed to solve a given problem depends on a number of factors besides the number of lattice or grid points. In the next few subsections we consider these in turn. First, we discuss the number of bits needed at each grid or lattice position, to describe the state of the fluid or cellular automata model. This is ultimately related to the memory requirements. In general cellular automata models will require considerably less memory per cell, but will have many more cells than the conventional models. For the simplest triangular-lattice cellular automata models in two dimensions, the magnitude of a particle's velocity is always v,, and after a scattering event each particle goes in one of 6 discrete directions. Thus in the simplest case there are 6 bits per node to keep track of. However the situation is more complex in three dimensions, since the required lattices may not have simple symmetries (see Sections 2 and 5 of 33 Memory Limit 1012 For CA Models 1011 o 10l 9 10 CA, M - 0.2 0 z Memory Limit For Fluid Models 108 E Eo z 7 10 C7 - - 106 1 - 0 Fluid 4 10 103 100 200 I I I I I I I 300 400 500 600 700 800 900 1000 Reynolds Number Figure 4-2. Number of fluid cells, Nc,., (3D fluid), required in three dimensional Navier-Stokes computation, compared with number of cellular automata lattice points, Nc,,j (C.A.) required to perform the equivalent calculation. Memory limits are derived in Sections 4.5.2 and 4.5.3; they correspond in the cellular automata case to a 106 -chip parallel supercomputer with several megabits of local memory per node, and in the fluid case to a Cray-2 with 64 million words of shared memory. 34 the present report). The number of bits required per node ranges from 14 at the low end to 30 or 40 at the high end, depending on the lattice geometry and collision rules chosen. For the purposes of this discussion we will estimate that about 20 bits per node are required in three dimensions. Next we consider the number of bits required at each cell or grid point in a conventional fluid-dynamics calculation, recalling that each fluid-dynamic variable is now typically described by a 64-bit word. Since the Mach number for cellular automata models must be small, it is appropriate to compare these models with the computational requirements of incompressible hydrodynamics. In this case the only independent variables are two components of the velocity, since the third component is determined from div X = 0. The pressure is found from solving a Poisson equation of the form p = Q(v.:, vy, v'). (4 - 11) However computationally one must in fact utilize more than just two 64-bit words. For example since one must find the pressure by solving Equation (4-11), the pressure and all three velocity components must be known or calculated at each cell or grid point. A previous JASON report (Reference 3) analyzed one particular 3-D finite-difference technique for solving Equation (4-11), and found that of order 10 words per cell were needed, or approximately 640 bits (if the words were 64 bits each). Thus very roughly, the ratio of the number of bits per cell or node required for cellular automata and fluid calculations is Nbits(Cellular Automata) ~ 20 - = Nbits(Fluid Dynamics) 640 3.1X 10. (4 - 12) As anticipated, there are considerably fewer bits per node required for the cellular automata case. However the total memory requirements for cellular automata may still be more than for the fluid-dynamics calculation, since the number of cells is larger: Total Memory (Cellular Automata) ~ (3.1 x 10- 2 )Re 3/ 4 M3 Total Memory (Fluid Dynamics) (4 - 13) For our previous numerical examples, Re = 256, M = 0.2, the cellular automata model requires 250 times more total memory than the fluid dynamics calculation, but only 1/32nd the amount of memory per node. A calculation at a Reynolds number of 1000 require 690 times more total memory for a cellular automata model than for a fluid one. 35 4.5.3 Number and Speed of Operations per Timestep Another way to compare cellular automata techniques with conventional fluid dynamics is to look at the required number of operations per timestep, and the speed with which they can be executed. First we consider the number of operations required by conventional fluid dynamics. These will of course vary with the choice of algorithm. Here we choose one example, which we hope is typical. Reference 3 studied one 3-D incompressible fluid dynamics algorithm in detail, and found the following number of total operations per time step, for a computational grid of N , NJ, and N, zones in the x, y, and z directions: Additions: N.NvN 2(95 + 3 log 2 NNV) Multiplications: N.gNN(81 + 2 log 2 N NY) Memory Transfers: 28 N, Ny X, (4- 14) Continuing our numerical example, if N, = Nv = N. = 64, corresponding to a Reynolds number of 256, there would be 3.4 x10 additions, 2.8 x 107 multiplications, and 7.3 x 106 memory transfers per timestep for the fluid calculation. Per fluid cell, there would be 131 additions, 105 multiplications, and 28 memory transfers per timestep. Of course each of the additions and multiplications is a floating-point operation. Conventional fluid calculations running on a Cray-2 computer in vector mode, with pipelining of memory fetches, would require a clock cycle of about 4 nsec for each of the required adds, multiplies, and memory fetches. For a 3-D cellular automata model, the number of required operations per timestep is considerably more uncertain, since it depends on the particular choice of lattice and collision rules, neither of which has yet been well explored. Here we consider for the sake of definiteness the "4-D" lattice described in Section 2 of this report. For this lattice, 18 bits at each node are required to describe the "state vector" of the system. First we ask how complex the collision rules are, and in particular how much space it would take to realize them on a VLSI chip. 36 The rule function accepts 18 bits to represent incoming particles, utilizes 2 to 4 random bits, and produces 18 bits to represent the outgoing particles after each collision. A straightforward table look-up for the collision rules would thus require a table of about 106 entries. Symmetry, however, greatly reduces the functional complexity. A rough estimate based on collision rules designed by Rothaus (Section 2 of the present report) indicates that there are 24 "parallel collision sites" per lattice point. Ifthe rule functions were to be hard-wired, as in a special-purpose computer devoted exclusively to cellular automata, each "parallel collision site" would require about 8 inputs and 4 outputs or about 500 transistors each in a PLA (Programmed Logic Array). Thus the 24 "parallel collision sites" needed for each rule function would require about 10' transistors or 1% of the area of a VLSI chip, exclusive of wiring. There could therefore be about 102 collision rule calculation engines per chip, and the collision rule calculation could be parallelized by a factor of up to 100. The limiting number of lattice points per chip is easily calculated by assuming that approximately 20 bits are needed to represent one lattice point. Random access memory chips can presently contain up to 4 x 106 bits. So up to 200,000 lattice points can reside in a single chip. Since today about 106 chips can be practically assembled into a super computer-scale system, the number of cellular automata lattice points is limited to about 2 x 1011. A given chip will have a fixed area, so there is a design tradeoff betwen the number of lattice point states that can be put on a chip. If equal chip area were allocated to each of these functions, we would have 10' lattice point states per chip (or 1011 total lattice points in our hypothetical 106 -chip supercomputer), and 50 parallel lattice point engines per chip. According to Equation (4-9), the Reynolds number accessible to cellular automata calculations on this special-purpose computer would then be less than or equal to 1000 at a Mach number of 0.2, and less than or equal to 300 at a Mach number of 0.05. To access Reynolds numbers larger than these values, lattice points could be stored on disc rather than in local memory, and then shuttled in and out of the computational nodes as needed. This use of "virtual memory" would be very costly in execution time, however. In principle one could also increase the accessible Reynolds numbers somewhat by allocating a larger fraction of the chip area to lattice points, at the expense of parallelism in the rule engines. But since the accessible Reynolds 37 number only increases as the 1/3 power of the number of lattice points, there would not be a great deal to be gained by allocating more than 50% of the chip area to lattice points. This discussion leads us to a preliminary view of the required characteristics for the special-purpose cellular automata computer which we are postulating. The desire to simulate fluid behavior in three spatial dinensions and at large Reynolds numbers requires a great many lattice points, and implies a massively parallel architecture which nevertheless has a great many (of order 10') lattice points at each computational node or chip. This is in contrast to the prevalent practice to date, in which for two spatial dimensions and moderate Reynolds numbers one can devote one computational node to a single lattice point. The cost of placing many lattice points at each computational node is some increased complexity, due to the requirement to communicate to neighboring lattice points which are sometimes off-chip and sometimes on-chip. However, with appropriate lattice layout the bulk of these nearest-neighbor communications will occur within a single chip, and thus the considerably slower off-chip communication times will not be as much of a penalty as they are today. A second contrast between this type of computer and today's parallel machines is in the sophistication and cost of each computational node. Today, devices such as the "connection machine" endeavor to use computational nodes which are relatively simple, inexpensive, and which have only a modest amount of local memory. The computer which we have envisioned here would require several megabits of local memory at each computational node, and would in addition use custom designed VLSI rule engines at each node to achieve adequate parallelism. These requirements are due to the desire to perform 3-D simulations at Reynolds numbers of 102 - 10'. 4.5.4 Number of Timesteps The different grid sizes and algorithms will impose different requirements on the number of timesteps required to perform the same physical calculation using cellular automata or conventional fluid-dynamics models. 38 For cellular automata, there is a genuine Courant condition since "sound waves" can propagate at the velocity v,. Thus the timestep is limited to the time it takes a sound wave to cross from one lattice point to another: At. = a/v.. (4-15) From Equation (4-7), this is equivalent to A tc, c (L/U)(M 2 /Re) (4 - 16) For incompressible fluid dynamics, there is not a Courant condition because sound waves effectively move instantaneously around the grid. Rather, the timestep is determined by the time for the smallest scale eddies to evolve. Since the smallest spatial scale is the grid spacing 77 L Re - 314 and the smallest velocity is the corresponding eddy velocity v,. =- U Re - 1 4 , the evolution time for the smallest eddies is r,/v,. Thus the timestep is approximately Atfluid (L/U) Re - 112. (4- 17) The ratio of the minimum timestep size for the cellular automata and fluid dynamics models is A tca//tfluid - M 2 Re - 11 2 . (4 - 18) Continuing with our numerical example of Re = 256 and M = 0.2, this would imply that the cellular automata model would need Re'/ 2 /M 2 =400 times more timesteps to calculate the same physical problem as the fluid dynamics model. For a Reynolds number of 1000, the cellular automata calculation would require about 790 times more timesteps tharb the equivalent NavierStokes calculation. 4.5.5 Ease of Adapting to a Parallel Computation Environment A final factor influencing the relative promise of the cellular automata and conventional fluid dynamics approaches is the ease with which each can be adapted to a computational environment which is highly parallel. By design, the cellular automata algorithms are well suited to massively parallel computer architectures; one can assign a node or group of nodes to a specific 39 microprocessor and its associated memory. Since we have seen that the memory requirements per lattice point or cell are much less stringent for cellular automata than for conventional techniques, and since only logical operations are required, each microprocessor can be relatively unsophisticated. In addition, the collision rules are designed so that each lattice point need only communicate with its nearest neighbors. Thus the communications overhead is not severe, provided that lattice points which are logically adjacent are also connected by short physical paths (see Section 7 for a more complete discussion of the latter issue). In the discussion of cellular automata which follows, we shall continue to hypothesize a special-purpose parallel machine which does not exist today but which we believe to be within today's state-of-the-art: a computer of order 106 computational nodes, a 1 nsec clock, and about 4 megabits of fast local memory per node. Thus before any additional parallelization at the chip level, one gains a factor of 106 over a one-processor machine. If at the chip level one uses the type of hard-wired rule engine which we discussed in Section 4.5.3, there is an additional gain of a factor of 50 due to the ability to compute collision rules for 50 lattice points at once. Conventional Navier-Stokes hydrodynamics can also be adapted for paralel computing, although the individual processors must be considerably more powerful than those needed for cellular automata. Typical requirements are 32- or 64-bit words and fast floating-point arithmetic. Many algorithms also need substantial shared memory accessible by all the processors. For the purpose of the present discussion, we contrast massively parallel cellular automata calculations with Navier-Stokes hydrodynamics computed on a 4-processor Cray-2 scale machine. Experience to date suggests that multiprocessing on such a machine can yield speed-ups in elapsed time of about a factor of 3.6, relative to the same calculation performed on one processor alone. 4.5.6 Overall Comparison of Computational Effort One can attempt to combine the criteria developed in the preceeding sections into an overall figure of merit for the cellular automata approach, 40 relative to conventional fluid dynamical methods. This figure of merit is in some ways a narrow one, in the sense that it does not take into account practical issues such as mesh tangling, numerical stability, accuracy and noise properties, flexibility, and so forth. What our overall figure of merit does measure is the relative amount of elapsed computer time needed to perform the same physical calculation, using the two methods. What we describe as the computational effort is an estimate of the overall computer time needed to complete a fluid-dynamics calculation of physical duration L/U, the timescale for macroscopic evolution of the flow. Of course most actual computations will be carried out for physical durations longer than this. But since the number of L/U times required will vary with the specific problem being solved, we shall use L/U as a convenient scaling parameter. The computational effort thus defined is made up of a series of factors: Effort operations Ntimesteps x (parallelization speed-up) - 1 x =Ncell [operation speed x Noperations / cell/timestep]. (4-19) The comparison we shall present is unfair in one import,--at way: it compares an as yet unbuilt special-purpose parallel supercomputer for cellular automata calculations with a four or five year-old Cray-2 machine for conventional hydrodynamics. In a sense one is comparing computers which are a generation apart, and the reader should be cautioned that this inserts a bias favoring cellular automata algorithms on massively parallel machines. Table 4-1 summarizes the properties needed for the evaluation of Equation (4-19). We have assumed that Cray-2 memory fetches are pipelined and are thus executable in one clock cycle, and that adds and multiplies are in vector mode. Inserting the values from Table 4-1 into Equation (4-19), we obtain for the Navier-Stokes calculation on a Cray-2 class machine: Effort (Navier-Stokes) = 10- 7 Re11 / 4 (1 + 0.16 log10 Re)sec, (4 - 20) where we have assumed that the adds give the dominant timing. For the special-purpose cellular automata computer, the effort is approximately Effort (C.A.) 2 x 10-17 Re 4 M- 5 sec. 41 (4 -21) Table 4-1 Comparison of Cellular Automata with Incompressible Fluid Algorithmns in Three Dimensions Ncell Cellular Automata Hypothetical Parallel Computer (Re/M 3 ) Imcompressible Fluid, Cray-2 Computer Re91 4 Ntimesteps Re/M' Re 1 / 2 Speedup due to parallelization 106 chips x 50 rule engines/chip 3.6 Property = 5 X 107 Noperations per cell pet timestop Rule engine: 1 Memory fetch: 28 Add: 95 + 4.5 log 2 Re Multiply: 81 + 3 log 2 Re Operation speed 1 nsec 4 nsec 42 The resulting effort values are tabluated in Table 4-2 for several Reynolds numbers and two Mach numbers, assuming that the cellular automata ruleengine parallelism is a factor of 50. The parentheses for higher Reynolds numbers and low Mach numbers indicate that the cellular automata calculation would not fit on a special-purpose computer with 106 total chips, if the size of a typical chip is limited to 4 megabits. Similarly, for the fluid case the parentheses indicate that the calculation would not fit in the 64 million word memory of a Cray-2. Under both of these circumstances, virtual disc-based memory could be utilized, but at a substantial degradation in execution speed. Figure 4-3 illustrates the relation between computational effort and Reynolds number graphically. One sees that for a Mach number of 0.2 and Reynolds numbers between 100 and 1000, the cellular automata model is predicted to execute about three orders of magnitude faster than the conventional fluid calculation. To be sure, we have made several quite idealized assumptions concerning the charactersitics of our hypothetical cellular automata computer. But even if one therefore gives the fluid calculation credit for an additional factor of 10 in speed, due for example to an improved next generation of general-purpose supercomputers, the cellular automata approach appears to be quite promising. The same cannot be said, however, for the case of Mach number equal to 0.05. Here the two approaches show much more comparable execution speeds, and the cellular automata technique requires so much more memory that it cannot plausibly fit in a 10'-chip parallel supercomputer for Reynolds numbers larger than about 300. Furthermore, if as before one gives the conventional fluid approach credit for an additional factor of 10 in speed, the fluid algorithms emerge as somewhat superior. The cellular automata approach retains interest primarily in those situations where it has a unique advantage; this might be the case, for example, in studies of fluid behavior within a boundary layer, where the Reynolds numbers are not large and the boundary properties may be complex. Thus our conclusions regarding the relative execution speeds of the two approaches depend on the range of Mach numbers in which one is interested. If, as suggested in the discussion of the Chapman-Enskog expansion in Section 6, one must limit Mach numbers to very small values in order to assure that the cellular automata model reduces to the Navier-Stokes equations, 43 Table 4-2 Computational Effort for Navier-Stokes Fluid and Cellular Automata (C.A.) Methods in Three Dimensions Reynolds Number Fluid Effort: Cray-2 (sec) C.A. Effort: Special-Purpose Parallel Computer (sec) 100 4.2 x 10 - 2 6.3 x 10-' Ratio, Fluid to C.A. Effort 6.7 x IF 256 0.58 2.7 x 10 - 4 2.2 x 103 1000 26 6.3 x 10 - 2 4.2 x 102 5000 (2.4 x 103) (39) (60) 4-2a. Mach number equal to 0.2 Reynolds Number Fluid Effort: Cray-2 (sec) Ratio, Fluid to C.A. Effort 4.2 x 10 - 2 C.A. Effort: Special-Purpose Parallel Computer (sec) 6.4 x 10 - 3 100 256 0.58 2.8 x 10- ' 2.1 1000 26 (64.5) (4.0 x 10-') 5000 (2.4 x 103) (39.9 x 103) (60 x 10- 2) 6.5 4-2b. Mach number equal to 0.05 44 Fluid 101 10 100 - , - " o -o 10 - . CA Memory Limit =0.2 -M M =0.05 M =0.05 "'CA, 0, 10 - 3 _ 1C-4 CA, M =0.2 10 - 5 100 200 300 400 500 600 700 800 900 1000 Reynolds Number Figure 4-3. Overall computational effort for fluid and cellular automata models, as a function of Reynolds number. In our definition, computational effort is a measure of the elapsed time required to perform a given calculation, exclusive of I/0, averaging in the cellular automata case, and diagnostics. Memory limits for cellular automata are derived in Sections 4.5.2 and 4.5.3, and correspond to a 106 -chip parallel supercomputer with several megabits of local memory per node. 45 then for 3-D problems the cellular automata technique does not appear to have notable advantages over conventional fluid techniques. Under these circumstances we do not imagine that there will be interest in developing or purchasing an expensive special-purpose parallel computer for cellular automata work. It may turn out, on the other hand, that Mach numbers as high as one or two tenths are adequate for assuring the reduction of the cellular automata results to the Navier-Stokes model. Only considerably more experience in doing 3-D calculations will suggest whether this is the case. If it is, then the factors of 100 - 1000 improvement in execution speed suggested in Figure 4-3 for M = 0.2 would be a powerful incentive for further pursuing the design and execution of a dedicated massively parallel supercomputer for cellular automata work. 46 5 5.1 QUASILATTICES FOR CELLULAR AUTOMATON FLUID CALCULATIONS Introduction It is possible that lattice models of hydrodynamics with discrete velocities, or "cellular automata fluids," may prove to be an efficient way to simulate complex fluid mechanics problems, at least for virtually incompressible flows at low Mach number. Lattice models were introduced into the physics literature in 1968 by Kadanoff and Swift 4 , who were interested in dynamic critical phenomena, and later by Hardy and Pomeau 5 , who were interested in fluid properties in general. The recent attention focused on these models was stimulated by the observation of Frisch, Hasslacher and Pomeau6 that cellular automata on a triangular lattice can provide good approximations to the two-dimensional Navier-Stokes equations of an isotropic fluid. Cellular automata lend themselves to highly efficient parallel processing on digital computers. A discussion of different models and methods of obtaining approximately isotropic equations in the continuum limit has been given by 1 Wolfram. Cellular automata fluid models usually allow a collection of particles to travel with one of a few, discrete velocities along a specified set of directions or "channels" in two or three dimensions. The directions are specified by a set of unit vectors {go}. On the triangular lattice, for example, particles typically travel with unit velocity along one of the six directions connecting a given site to its neighbors. The models determine the time evolution of a set of probabilities fa(X, t), which give the fraction of particles in velocity channel a at position Y and time t. The number density n(l, t), momentum density gi( ,t), and stress tensor aii(i,t) of the fluid are determined from the f.'s via the relations (5-1) n =ZfA a g = fa ea (5-2) a i = Ee ea a a 47 (5-3) The Navier-Stokes equations may be written (5-4) oigi + ajerij = 0 where the momentum density is related to the fluid velocity il(i, t) by ff = nil. A closed set of equations for il results if we can express aij in terms of U-.As shown, for example, by Wolfram', the Chapman-Enskog expansion in small velocities and gradients for aii takes the form a [ = oe+ bl y -Jr e2 (Z 2 e'ej ek : 'e / eke, "e a e E, e'je a a i i k a eaek a -- ; (1 ee e e' (5-5) k -1/2 a UkU i e "ea k )ukU + (................ ] If the {a} connect nearest neighbors on a triangular lattice, the sums over a in Equation (5-5) are guaranteed to be isotropic up to quantities fourth order in {a}l: eie c 6,j (5-6) a a eaee a o 6ijbk + bijbi + 6 il6 . (5-7) Although Equations (5-6) and (5-7) are enough to ensure that the usual truncation of the gradient expansion leading to the Navier-Stokes equations is isotropic, higher order corrections involving sixth rank tensors will be anisotropic. There is, moreover, no simple analogue in three dimensions of the triangular lattice which ensures that even fourth rank tensors will transform properly under rotations'. One solution to this problem is to introduce additional directions (beyond nearest neighbors) of propagation on lattices with low symmetries (the (11) direction on square lattices and the (110) and (111) directions on cubic lattices, for example) and adjust the collision rules to eliminate higher order anisotropies. A new variant of this solution was described in Section 2. Another approach, mentioned for example in Reference 1, is to have particles moving on a quasilattice, which can have symmetries which are forbidden on conventional crystallographic lattices. In this section we discuss the latter approach, and show in particular how to construct an octagonal quasilattice in two dimensions which ensures 48 isotropy of all tensors up to eighth order. Particles occupy the faces of a four-dimensional hypercubic lattice, and hop from face to face along a twodimensional hyperplane which cuts this lattice at a set of irrational angles. When projected into two dimensions, particles flow along eight channels in a way which, despite its apparent complexity, can be simply described using four dimensional integer arithmetic. Momentum is exchanged among the eight channels by allowing special collisions for particles intersecting at right angles, and two distinct velocities. We also discuss a model in three dimensions which allows particles to flow along one of thirty directions corresponding to the midpoints of the bonds of an icosahedron. Among these directions can be found ten groups of six which are coplanar, and point to the vertic s of a regular hexagon. This feature allows three-particle collisions and momentum transfer between channels. The model requires only one velocity, and is in many ways reminiscent of a triangular lattice cellular automaton in two dimensions. The icosahedral quasilattice is an irrational projection of a six dimensional hypercubic lattice, and particle positions can be specified using 6d integer arithmetic. 5.2 Octagonal Quasilattices In Figure 5-la we show a set of eight basis vectors pointing to the vertices of a regular octagon. One might naively hope to generate a lattice where particles move along one of these eight directions by taking integer linear combinations, 4 Fi(5 -8) ji1 of the basis vectors e- through i'4 (the vectors 65 through F8 are redundant, since they are the negatives of e1 through e4). This clearly will not work, however, because this set is overcomplete, and the basis vectors have irrational projections on each other: If arbitrary quartets of integers {n], n 2 , n 3 , n4 } are allowed in (5-8), we will fill space very inefficiently with an irregular array of points, many of which are almost on top of one another. These difficulties are the basis of a theorem of classical crystallography which states that regular lattices with an octagonal symmetry are impossible. We need to find a way to restrict the integers {n} so that the "lattice" points are approximately equally spaced. 49 I I I I I | e 3 e4 e, . (a) 64 e -Il -II e2 (b) Figure 5-1. e3 (C) Eightfold basis vectors for a two dimensional octagonal lattice. The basis set indicated in (a) can be regarded as the projection of a four dimensional basis set, as shown in (b). The projection of the 4D basis set onto a perpendicular "shadow" plane is shown in (c). 50 One solution of this problem is embodied in Figure 5-1b, where we choose to regard the basis vectors {g } as a two dimensional projection of an orthonormal basis set in four dimensions. A projection matrix P whic!, --'unit vectors along the coordinate axes in four dimensions to give the vectors {I} is vr2 P= 18 01 -1 1 /2 1 0 0-1 1r/2 01 1 V2 " (5-9) {-} The components of the are given by the columns of this matrix. They occupy a common 2-D plane, and it is easily checked by taking dot products that the angles between them are consistent with the octagonal geometry of Figure 5-1a. Let us call this subspace the parallel or "physical" plane. This will be the space occupied by the quasilattice. One can also choose to project normal to this plane into a perpendicular or "shadow" subspace via the matrix Q = 1 - P, ,F2 -10 1 0 2 -1 -1 IF2 -1 Q = 7 0 -1 1 0 -1 (5-10) v'2- and obtain the four alternative projections of the four dimensions basis set labelled {F-,} in Figure 5-1c. The idea behind the projection method for generating quasilattices s is illustrated for a 2-to-i projection in Figure 5-2. Here, a 2-D lattice point (n 1 ,n 2 ) is projected if it is at the lower left hand corner of a square which intersects the line t. The sites are projected onto the axis labelled 611. A useful reformulation of this projection criterion has been given by Elser s in terms of the projection of a potential quasilattice point onto the axis labelled x.L: A potential site q- (nIn 2 ) will be accepted into the quasilattice if and only if its projection Qn- contained in the shadowed region of the x± axis CL V.i),I C(X) = Q [-C(o-)] + Y., (5- 11) where Q [-C(o-)] is the set formed by applying Q to the inversion of a cube centered at the origin, and i, a vector marking the intersection of the line f with the x± axis (see Figure 5-2). By varying the parameter x,, we obtain 51 X2 Physical space XO Shadow Figure 5-2. Projection ideas illustrated for a 2 to 1 projection. 52 a continuous family of possible projections. A more formed definition of this set of lattice points (call it S(xo)) is (5 -12) ,EZ2 S() = This set is indicated by the darkened portion of the Y± axis labelled "shadow" in Figure 5-2. Its bounded extent along the x. axis ensures that the projected points will not be too close together. The generalization of these ideas to the octagonal 4 -- 2 projection is straightforward. The analogue of the "shadow" CL(Xo,) in Figure 5-2 is the two dimensional shadow of a 4-D hypercube, which is the interior of a regular octagon. If the 4-D lattice has unit spacing, it can be shown straightforwardly that the distance between parallel faces of the shadow octagon is (see Figure 5-3) d = 1 + v/2/2. (5 - 13) There is now a 2-D family of possible projections obtained by translating the shadow octagon within the perpendicular plane. The translation vector Zo may be conveniently written in terms of the orthogonal vectors Ja> = 1/2(1,O,-1,v'2_) (5-14) Ib > = 1/2 (-1, v',-1,0) (5-15) as io = sla > +tlb > (5- 16) where s and t are arbitrary real parameters. Note that the transverse projection operator Q may be written Q = Ia >< al + Ib >< bl (5- 17) and that Pla >= Plb >= 0. Allowed quasilattice positions n- = {ni,n 2 , n3 ,n 4} are now determined by the set S(io) = {ffeZ4I Q eC (Y")}. (5 - 18) It is easy to use these ideas to "grow" an octagonal quasilattice on a computer. A program written in Pascal which runs on an IBM PC is included in the Appendix. The program tests all points of Z 4 inside a large hypercube to see if their projection falls into a particular shadow octagon. If the test 53 e + e4 e- + e:-+e el- e+ Figure 5-3. 64 3 4 Shadow of a four dimensional cube, obtained by applying the transverse projection operator Q to the unit cube consisting of points of the form r = - x~e2, whereO0 <x<1. i=l 54 _4 .,-I ff , '2 , -,',.4I I.'-' , ,:.--4tc >N i - .. -44, :: ,.p, :--d - -----. ,: ': "/ "" ': N c . *~-~-4-","-.t--? g, ......y..... .. 4 , ,p",&. , Figure 5-4. Growth of an octagonal quasilattice. There are 321 vertices in the bottom picture. The image of these vertices in shadow space is shown in the upper right. 55 is succsssful, it then draws the projection into physical space of the lines connecting this point to any nearest neighbors which also happen to be in the set S(io). A growing octagonal quasilattice is shown in Figure 5-4. The square and rhombic cells in this projection are the images of faces of 4-D hypercubes which intersect the projection plane. Also shown, in the upper right hand corner, are the projections onto the perpendicular shadow space of the quasilattice vertices. It can be shown using the methods of Reference 8 that the shadow octagon is filled with a uniform density of points in the limit of an infinite quasilattice. This feature can be used to determine the frequency and location of various patterns in the quasilattice s . One way an octoganal quasilattice could be used in a cellular automaton simulation is illustrated in Figure 5-5. In the absence of collisions, particles in one time step hop between adjacent cells connected by a common set of parallel bonds. Several such jagged particle trajectories are shown in the upper portion of Figure 5-5. Particles moving in this way are defined to have unit velocity in a direction normal to the bonds they are traversing. Binary collisions occur whenever two particles occupy the same cell, as illustrated in the lower portion of the figure. Wolfram has suggested using octagonal quasilattices to improve isotropyi, but instead of Figure 5-5 draws a picture like that shown in Figure 5-6. The straight lines shown here may be viewed as approximations to the jagged trajectories of Figure 5-5. We find our implementation of the quasilattice idea preferable because particle positions may be indexed using simple 4-D integer arithmetic, rather than solving for the intersections of a set of incommensurate parallel lines. Note also that there are a number of near "coincidences" in Figure 5-6, where three or more lines almost intersect in a point. Such potential ambiguities are resolved automatically in Figure 5-5. All cellular automaton models with only binary collisions have additional, unwanted conservation laws, in addition to conservation of particle number, energy, and momentum'. On an octagonal quasilattice, for example, the momentum contained in particles flowing along each of four directions (each direction corresponding to a pair of channels) is conserved. This is because a binary collision, like that shown in Figure 5-5, extracts a net zero momentum from one direction, and adds a net of zero momentum to another. On a triangular lattice, equilibration of momentum between three different channels is achieved by allowing triple collisions. 56 one Figure 5-5. Particle trajectories on an octagonal quasilattice. Particles travel in Octagonal bottom. the at of eight directions. A binary collision is shown figures such as those highlighted on the left side are sites of collisions that transfer momentum between channels. 57 Figure 5-6. Alternative representation of the momentum channels shown in Fig, 5-5. 58 Figure 5-7 illustrates one way to achieve a similar effect on an octagonal lattice. Particles are now lalowed to have velocities 0, 1, or v2-. Collisions which transfer momentum between directions occur whenever two particles moving with unit velocity at right angles to each other meet inside an octagon consisting of two squares and four rhombuses. Several such octagons are highlighted in Figure 5-5. Because these octagons have eight symmetrically distributed edges, a representative trajectory irom all four possible directions will pass through every octagon. The result of such a collision is defined to be a particle at rest somewhere inside the octagon, as well as a particle moving with velocity V/2 in the direction given by the vector sum of the two incident velocities. The inverse process occurs when a particle with velocity V2 enters an octagon containing a rest particle and produces a pair of particles with velocity 1. Note that energy (velocity squared) is conserved in both cases. Both v2 -velocity and unit-velocity particles also have the usual binary collisions indicated in Figure 5-7 as well. The eightfold symmetry of the octagonal quasilattice cellular automaton model insures the isotropy of all tensors with rank six or lower in a Chapman-Enskog expansion of the Navier-Stokes equations. The image of the bonds encompassing one of the octagons discussed above is an eight pointed star when projected into the shadow octagon (see Figure 5-8). By translating this star around so that it remains completely inside the shadow octagon, one obtains the set of all octagons which could be used as potential collision sites in physical space. Computer implementations of these are conveniently viewed as shuffling particles around the sites of a regular four-dimensional lattice. The test (514) can be used to confine the particles to the "physical" subset of 4-D lattice points. Particles travel on jagged paths in the 4-D space whose projections correspond to the trajectories in Figure 5-5. 5.3 Icosahedral Quasilattices It is straightforward to apply these ideas in three dimensions and obtain cellular automata with an overall icosahedral symmetry, thus insuring the isotropy of all tensors with rank four and lower. In analogy with Figure 5-6, Wolfram has considered cellular automata models obtained from the 59 00 Inverse: 0 /1 Figure 5-7. Different kinds of binary collisions on an octagonal quasilattice. 60 Figure 5-8. Image of an octagon, like those highlighted in Fig. 5-5, in shadow space. The star shaped object is the image of an octagon whose edges are traced out by starting at the origin and applying e I, e2 ,. e8 in succession. 61 intersections of sets of equally spaced planes such that particles move in directions given either by the vertices of a regular icosahedron, or by its dual, the dodecahedron.' Particles can then flow in one of twelve or twenty symmetrically arranged channels. In our view, it would be preferable to have particles move in directions piercing the midpoints of the bonds of an icosahedron, thus obtaining a thirty channel model. These 30 channels can be grouped into ten coplanar sets of six, which point to the vertices of a regular hexagon. It is then possible to equilibrate momenta across channels using triple collisions in much the same way as for the triangular lattice. Such simple triple collisions are not possible with the other two arrangements.' In any event, we think it may be better to obtain icosahedral quasilattices using the projection technique, rather than from the intersections of incommensurate planes, for the reasons discussed in the previous subsection. The projection technology is well-developed in this case, because it has been used to model recently discovered metallic alloys with an icosahedral symmetry.9 One now tries to obtain a lattice of points of the form 6 F= nFi (5-19) t=1 where six {e} now point to six of the twelve vertices of an icosahedron (the remaining six vertices are obtained by inversion). The allowed sextets of integers .n.,..., n 6 } are now obtained by projecting a six dimensional hypercubic lattice into three dimensions.' g' is now given by' ° The projection matrix into physical space 1 V I 1 1 1 1 -1 1 -1 1 1 1 1 1 1 v/ 5 -1 1 V/5 1 -11 1 V5 1 -1 -1 -1-1i 1 1 (5-20) 1 V/5 and points are projected only if their perpendicular projection (obtained from Q = 1 - P) falls within the three dimensional shadow of a 6D cube, which is the interior of a rhombic triacontahedron. In physical space, one obtains a tiling of space by the two rhombahedra shown in Figure 5-9. These two shapes are analogous to the square and rhombus which appear in Figure 5-5. To obtain a cellular automaton, one 62 Figure 5-9. Prolate and oblate rhombahedral tiles that comprise the icosahedral quasilattice. 63 Figure 5-10. Schematic representation of a triple collision in a plane normal to a three fold symmetry axis of triacontahedron. 65 allows particles to hop between centers of rhombahedra, across a set of rhombic faces with identical orientation. The set of jagged paths obtained in this way leads to thirty possible momentum channels. Particles suffering head on binary collisions within a particular cell exit via one of two sets of parallel faces different from those by which they entered. Just as one finds regular octagons in the 2-D quasilattice, we can find many regular rhombic triacontahedra in 3-D. These triacontahedra have thirty rhombic faces, corresponding to the thirty channels discussed above; the centers of the faces may be put into a one to one correspondence with the centers of the bonds of a regular icosahedron. Each triacontahedron is composed of ten large (prolate) and ten smaller (oblate) rhombahedra. A stereo view of a triacontahedron filled in this way is given in the paper by MacKay 11 Momentum conserving triple collisions can be implemented using these triacontahedra as discussed above: Normal to each of the ten three-fold symmetry axes of the tricontrahedron there are six coplanar momentum channels pointing to the vertices of a hexagon. Whenever three particles enter a triacontanedron as sketched in Figure 5-10, we have them backscatter into the directions from whence they came. Implementations are possible along the lines sketched for the octagonal quasilattice in Section 5.2. 64 APPENDIX TO SECTION 5 OCTAGONAL QUASILATTICE PROGRAM The follwoing program was written and run on an IBM personal computer equipped with a high resolution graphics card and an 8087 chip to speed up arithmetic operations. Less than seven minutes were required to draw the 321 vertex lattice shown in Figure 5-4. The program was written using the TURBO Pascal compiler developed for the IBM PC by Borland International, Scotts Valley, California. 66 type octagon.pas Appendix: Octagonal Quasilattice Program The following program was written and run on an IBM personal computer equipped with a high resolutior graphics card and an 8)87 Less than seven minutes were chip to speed LIP arithmetic operations. The program required to draw the 321 vertex lattice shown in Fig. 4. was written using the TURBO pascal compiler developed by the IBrM1 PC by Borland International, Scotts Valley, California. program octagon; e: array[I. .2,1..4) of real; ep: array[1..2,1..4] of real. of real; q: array[l..4,1..4] of real; p: arrayl1..4,1. .4 rs : array[l..4) of real% of real; array[l..4) ::: real; s, t,r,,ry,s:,y,dl,d,,d4 n: array[l..4) of real; m: array[l..4) of real; integer; :i ,syi ,count: i ,j,|,l ,uv~w~rx:i ryi const phi 1.414213562. r2 = r8 = 2.8G28427125; = 0.785981l3; dm= ().607-553 begin HiFes; HiResColor(15). 'd physical space and 2d shadow space3 'initialize basis vectors in for j :- 1 to 4 do begin e[.,j] := sn((j - O.5)*phi): ell,j] :cos((j - ('.5)*phi) := sin(Z.(*(j := cos(..C)*(j - i..5>*phi); ep[2,j] ep[l,jl end: Cinitiali:e projection matrices) := .).5; Qgl,2) := -l.,-ir8; ,11.13 q(2,23 := ().5, /8 = -. qL2,1] : : q[-7,1 qC 4 ,1] 0.0; q[3,2] 1.0/r8; q[ 4 ,] qgl,zJ : qE2,3] := q[7,7] qE4,3) - := -l.0/r8: 0.): :- (compute displacement vector" 0. 1=5; I=0.1t s -EZI = s*I.i t*0.5: ::-11) := s*.5 -s*0C.5 - t*0).5; xC,[4] := srZ/2.) .test all lattice sites near oriQin of octagonal latttice) CoLnt : -:be for i = -4 to 4 do begin for j -4 to 4 do begin -4 to 4 do begin : for -4 to 4 do begin fc,r 1 : nil) n[2] :i: for rslu] for end: rstu] end: := j; n[- : ; := I to 4 do begin := 0.C' I to 4 do begin v : := rs~u] + qCLI,vJ*n[v]; rsEuL rs[Lt] + X4[u] 67 1.0/re: =0.' q[3,4) := -1.')ir8; 0 .5. := 0).5; -1.0/rB ; q[4,4] t*rZ/2.0' - t*O.0 Z4 for nE4] :1: L :- * : 0.0; q~l,4] := 1.0/r8: q[2,4] inclusion in _d 9 1; d1 ri5Cl]*qCJ,1] rE-[.2I*qEZ,lI + rSE7]*q[E,1 + rs[4).q[4,1): rsLlJ~qC1,2)] rsC2*qZ,Z + +:.L, rsf4J*q[4.Z]; dl rLJQl. + rr-L::J~q(2,7 + rs 3'cL:[,:] + rsE4)*qr4,:);d4 rsIl*q1I,43 + rsC23*q(2-,4J + rs[t.]*q[7,41 + rs[43*qE4,43. If u(absdu'dm) and (abs(d2)-.dm) and (abs(d7,),.dm) and (abs(d4)-'Idm)) then begin cif site oi. then check ats four neighbors and draw line joining then? if these also check out; r.! e(1,1J*n[IJ + eLlZJ*nL)I + eLl.3)aInL]1 + ell,4J*n[4): ry eC2,I1*n1lJ + eC2.2J*n[EJ + e[Z,71n(.J +eE2,4J*n[4J; f:ep[1,13-i-111 + ep~l,Z2*rU[22 + ep[l.7l*mL)- +' eptl,4J*m(4]: Sy epL2,1J*nClJ + ep[:,2)*M(2J + ep[2,4J*mE43: + epE:,Zim[)l si=4e(:) + ro--Und t45.q-u*s'x). syi 50) roL'nd(16.0*sy); d2 cc-unt for w Count + 1; 1 to 4 do bean MEW) :=MEW] + 1; for u 1 to 4 do begin 0 . 0. rsfu] 1 to 4 do begin for v rstu3 :- rs~uJ + q~u,vJ*mCvI; ersd. N~J rscu.tl rs-CLI end; rs[4)*qE4,1J: rs113*q1I.l) + rsEZ)3*qEZ,13 + rs[73*qC7,1] dl rstl)*q(1,2J + rsl:2Ja*q[2,23 + rs[33*qE:-.] + rsE43*q(4,2): dZ s4*4,; 7- rs(1)*QE1.7) + rs[2.JaqC2,7) + rsC73)*qC7,7J + 4 rs[2l*q[,4J + rs[3*qE7,4) + rs(4J*q[4. 3% d4 :rs1lJ*qEI,4J If ttabs(di)h.dm? and (abs(dZ) din) and (abstd3)*:.dm) and (abs(d4),dn1) then begin s~:e[1.l3.m[IlJ +e(I.2*mE2J + e(l1*mn(,3 + eE1,43*m[4Js sy' eLZ,IJ*(mLl) + eE2.J3*M() * eE2,43*mt4J: + eE:.J*m3 - round (I(.-ry)=110) ryi Z40 + r cjund (25. ()*rx:) r:-1 110 - round(1C1sy aund(25.C)*sxd: syi 40 + s:1i : end; end: end. end; end; end; writeln~total end. 86 end; number of vertices is ,count:6); COMPARISON BETWEEN CONVENTIONAL KINETIC THEORY AND CELLULAR AUTOMATA DERIVATIONS OF HYDRODYNAMICS 6 6.1 Introduction There has been much work done on trying to solve partial differential equations by computing the evolution of properly chosen cellular automata (CA 1 ,4, 5 ). Most attention has been focused on the Navier-Stokes (N-S) equa- tion to which we restrict ourselves here. The essential idea of CA is the following. At any instant of time the state of the automata are described by giving the numbers of "particles" at each point of a fairly regular space-filling lattice. Each of the particles has one of a discrete set of velocities. At the succeeding time step the state is changed so that: 1. A particle with a given velocity moves to the nearest lattice point in the direction of the velocity. 2. Two or more particles can have "collided" and become particles moving with other of the discrete velocities. The claim has been made that with appropriate choices of lattices and collision rules, the behavior of the CA when sufficiently averaged represents solutions of the N-S equations. Here we wish to investigate this. Discussions of the process have been given elsewhere. 0 '4- 6 ) However, we thought it to be useful to give a derivation completely in parallel with a conventional derivation of the N-S equations. In particular we try to be as general as possible - for example not specifying lattice or dimension whenever possible. Some (important) technical points are ignored. Thus the question of how to choose parameters so that the isotropy of the N-S equations is obtained is not considered. We drop the restriction frequently made that at most one particle with a given velocity can be at a site. This restriction seems mostly 69 to simplify numerical computation. It is rather artificial and could be an additional barrier to obtaining a desired macroscopic equation. In the following sections we give a step by step derivation of the N-S equation (denoted as "molecules" in the text headings to follow) taken directly from reference( 2 ) . At each step we then obtain the corresponding equation for CA. Discussion of similarities and differences are given when appropriate. 6.2 6.2.1 Equations for One Particle Distribution Function Molecules In principle we start with a Hamiltonian describing all particles of the system. Then a distribution function f (L, ...... r , t) is introduced. Integrating the Liouville equation over all particle coordinates but one, we obtain an equation for the one particle distribution function (f(Z,, Z1 , t)), -L + V. Vf = J". (6-1) 5t J, the collision term, contains all reference to two and many body collisions. We have for simplicity omitted any external potentials. These are readily included. This equation is really no particular simplification because J, involves the two particle distribution function. The equation for this involves the three particle distribution for which we have an equation involving the four particle function and so on to N. The usual assumption is, following Boltzmann, that of molecular chaos. This is to the effect that the two particle function can be approximated as the product of one particle functions. This means that particles before and after collision are uncorrelated - which is reasonable if the correlation function falls off rapidly in space and time (for example, exponentially). Recently("3 ) this has been found not to be necessarily the case. Power law behavior has been found. Consequently, the molecular chaos assumption which we will make can only be assumed safe for sufficiently low 70 density. This does not mean that the N-S equations do not hold for dense fluids (e.g. water). Rather our derivation is then suspect. With the assumption we have J,, = Id3V,I dl g I(g, G)[ f'fi - f f I. et." and The integrand describes the collision of particles of velocity 1)/ 1 ducing or being produced by particles with velocity (6-2) i pro- 0, '01. We assume mo- mentum conservation (6-3) + and energy conservation Iv )2 -4)2 +1/2( i)2 . (6-4) 1/22 1/2. +1,2v,= 1/2(- (Here all molecules are assumed to have unit mass.) I(g, 0) is essentially the differential scattering cross-section and - g"=II = i1. (6-5) The conservation laws tell us that J, dy 6.2.2 = J, 2 J, d3v = 0. d (6-6) C.A Describing the state of the C.A. by a distribution function f.("-, t) giving the number of particles at --, t with velocity " , and replacing small time and space differences by derivatives, gives the Boltzmann-like equation 9f+ e -V f. = J. (6- 7) The le are the allowed velocities which we here require all to be of unit magnitude. The J. can involve two, three, four, .... products of the f's depending on the collision law chosen. To have an analogy with Equation (6-6) we must require that the particle number and momentum are conserved in the collison. Then we have F, Ee a Ja=O=a^,. J. (6-8) We always have one less such equation than for the true Boltzmann equation. In the C.A. models "energy" is trivially conserved. 71 6.3 Macroscopic Conservation Laws 6.3.1 Molecules Define for any 0( ) f O(O) f(v)d3 a f f(v) d3v Multiplying Equation (6-1) by 1, - and T respectively and integrating over V gives in view of Equation (6-6) On+ (nu) = 0, (6-9) Z(nui) + -9 n -- = 0, (6-10) ax, an I partial n 2 =0. + 2 dt dx, 1,ere the number density n = f fdAv 2 (6-11) 0' and Z is the average velocity, i.e. X (L,t)= ,-. (6-12) Tqie Equations (6-10) and (6-11) can be put in a more useful form by introd icing the deviation of - from its mean - -, (,-,t). (6- 13) '1 lien, for example, vij = uiu, + UUj. Equation (6-2) becomes d 0 a (nu,) + _-nuu = d ---- P (6-14) with Pj = n 7. 72 (6- 15) Using mass conservation Equation (6-10) simplifies to a Similiarly introducing U Equation (6-3) and simplifying using Equations (6-3) and (6-14) we get a U.V Q} }{ n{-+ with a + -xiq, = -P, jDj, 17) (6- n 7-5 Q=n -u2 Q " ,qi _ Z U~(6-18) 2 2 and au x,+ Dij = 1/2 ( 6.3.2 (6-19) C.A. Following the above procedure we multiply Equation (6-7) by 1 and and sum over a. We obtain a an and -(nui) + a n(- at (6-20) nui = 0 ++ji Za )i( a)j 0. ax, (6 -21) Here n=afa (6-22) and ,== (6-23) r Again introducing velocities relative to the mean U e (6-24) - and using Equation (6-18) to simplify Equation (6-19) we obtain n=u + at x P. (6-25) with n Tij , Pi, = P.= T.T.U =('a)i(E.)j. (Ti u 73 (6 - 26) 6.4 6.4.1 The Euler Equations Molecules We are interested in solutions slightly away from equilibrium. Then as a zeroth approximation we should use an f such that the collision integral is zero. This will be so if In f is a linear function of the conserved quantities. This leads to ) . (0 n 0o 2kT (2fkT)/2 e , (6-27) the Maxwell distribution. (Note: At this point one usually invokes the Htheorem to further justify our choice of ft. However, to our ki.owledge there is no general H-theorem for CA's). Notice that if Z, l, T are all space-time independent Equation (6-25) is indeed an exact solution of the Boltzmann equation. However, even if Z, I", T are -pace time functions we still have J,[f°] = 0. A reasonable lowest order approximation for our macroscopic equations is to use fo allowing T, 1, -,, to vary and then calculate the quantities occurring in Equations (6-13) and (6-16). One obtains the set 3 q, =0, Q = Pq = an -+-a a't n {- at+ p pbij, p = nkT a9nui =0 (6-28) ax, +'-Vu,} .V)(nT- 3 '2 ) -Vp = 0 (6-29) (6 - 30) which are the Euler equations with an adiabatic temperature law and the ideal gas equation of state. These are true for any distribution function f° which is even in the variable 0 - u (A, t). 74 6.4.2 C.A. Here again it is true that if In f is a linear function of the conserved constants, then (6-31) Ja[I 0 even if the coefficients of the linear function depend on " i t. We then assume as a lowest order approximation W e ' °=(6-32) b b e- Here then indeed n is the number density. w is a vector to be related to Specifically: E2e e .. ua' -. a (6-33) W- e a e - ~ '~° Using Equation (6-30) for the distribution function we obtain for the T in Equation (6-26) j =a (a e -u)j uL b e - ~ ~b ea (6-34) These "Euler" equations then differ significantly from those for molecules: 1. The parameter P. in the distribution is not the average velocity. 2. The pressure tensor now has a rather strange dependence on t.5 If however, It is small (i.e.lw « 1) we can expand the exponentials E ui a e C W (6-35) a bI and T -0 a 121 Ee (-a)i -) b7 75 (6 -36) Assuming enough symmetry that these sums over a are isotropic (see Sections 2 and 5) we find U r(6-37) and Pij = pbij with n n . (6-38) (6-39) (Here d is the number of spatial dimensions) i.e. we have the Euler equations at constant temperature kT = 1/d. The lesson to be learned at this stage is that the C.A. are describing Euler hydrodynamics at constant temperature provided I < 1. This restricts C.A. models to small Mach numbers. 6.5 The Chapman-Enskog Expansion 6.5.1 Molecules We have noted that the Maxwell-Boltzmann distribution function satisfies J"[fI = 0, (6 - 40) even if T are functions of ,-, t. However, in this case the left hand side of Equation (6-1) will not be zero. Consider the case where n, o, T are slowly varying on the spatial scale of the mean free path and time scale of the mean time between collisions. We proceed so. Let f = f0 [1 + 1. (6-41) Insert this in Equation (6-1). On the left hand side we keep only f* while on the right we keep only terms linear in 4. The result is Of* + .76 76 This is a linear integral equation for 0. Since we have the 5 conserved quantities, there are 5 zero eigenvalues of L, corresponding to eigenfunctions V2 (6-42) 1, For Equation (6-37) to have a solution the inhomogeneous terms on the left must be orthogonal to these. But these conditions are just our "Euler" equations. Hence to evaluate the left hand side of Equation (6-37) where n, T, and 8T and t we use the Euler equations to eliminate 2n depend on 0.%. 8n I o~t v . This results in the integral equation uiu 1/3 b kT (6-43) T axi U U, 1 - 2 [ 2fT 2] The solution will be unique if we demand that I 1 fO(v)d=0 For a general scattering cross-section we have no possibility of solving this analytically. However, we can see qualitatively what will result. Let Oi, Ai be the eigenfunction of L,. It can be readily shown that the Oi are orthogonal to each other with weight functions fo, i.e. we can take the Oi to satisfy J Oipif*dv = bij. (6 - 44) The first 5 of the Oi correspond to the eigenvalue zero. The Equation (6-39) is satisfied if we expand so that S= (6-45) _6 ai ib. Further we note that the Oi will be functions only of u - s and can be chosen to be even or odd in u Then the coefficients of the even fb nctions are determined by the terms Di in Equation (6-38) and the coefficients of the odd functions are determined by the terms Pij = nkT ij+c 3 1 (T)(Dij- qi = c2(T)0 77 IT~-. The net result is that IDkk3.ij), (6-46) Note: Here cl and c2 are independent of - . They are also independent of n. This last property results from the fact that we are considering only two body collisions. In Equation (6-38) there is one factor of n on the left and two on the right, therefore 1 (6-47) n The portions of Pij and qj which depend on 0 are then n independent. 6.5.2 C.A. To complete the macroscopic Equations (6-18) and (6-23) we need to compute To= (u')i (Ua)j (6-48) a (ua)i(vo)jfa = E afa For this we need fa to be obtained from Equation (6-7). conservation laws Ja[f*]= O Because of the (6-49) where (6-50) f,(°) = E w e ( a e - '' even when n and t (i.e. t) are functions of space and time. We assume these are slowly varying functions and try f. = fa[ €] (6-51) Inserting this in Equation (6-7), keeping only the first term on the left and terms linear in 0 on the right gives -fT+- C"V fa = L.[0] (6-52) which is an inhomogeneous linear equation for 4. The inhomogeneous terms are given by the derivatives of f*. The linear operator L. has zero eigenvalues corresponding to the number and momentum conservation laws. Thus for Equation (6-46) to have solutions we must require that a,- .Vf)=O 78 (6-53) For the solution to be unique we require 0 to be orthogonal to the eigenfunctions of zero eigenvalue. The Equation (6-47) are just our Euler equations. In Section 6.1 we used these to eliminate 2- and of Equation (6-4C. Unfortuatic from the analog ocuaeebutahrths f,, nately it is not the derivatives of "Owhich occur here but rather tose of' We thus obtain ° = (Z-,"V)f= ja _uz _ +b Vf U ±( _ u~ ),oj ',,, z -' u),'9 O9w*1 a a'u ) n (6-54) 1f [n1~ok all (9x i 49Xk where T (° ) is as given in Equation (6-31). However, this can be simplified. Thus, since ui = a -(-. e- a e E au,O9uk ik - -.-. (6-55) ' e = -ia''9uk - (a )) (e or Ou- = -(T°)l" =- (6-56) Then Equation (6-48) becomes Za -V) f. = {-V. Z +( a - (- + -' aui )k(T°), ( -7 (6-57) u)(TO) a TO) I,a- Finally we note that after some calculation we can show that 0 )-l a' T(O) - Pr. (7T = TXk kj '9u-m6-8 partalXk (6- 58) where I, ea (~ea -u)i - u 79 -- m m ,,m • nam MMnm MxMnn N N II 0 )k. (6-59) Thus our linear equation for 0 is V{ Tonkj]I +K -. ) (To)-' Oak ea - )- .m(T 0 )7,n (6-60) f The complicated dependence on u of this equation should be compared with the simple form of Equation (6-38). Indeed the T' depend on tand this is an implicit function of X. To see what is happening we look at this equation for small It. The lowest order terms in the { } of Equation { } (6-53) are: 1 d(=%),(a)k }d (6-61) 8T This should be compared to the terms in Equation (6-38) left when 2- = 0. We see this will lead to a pressure tensor of the form of Equation (6-42) (As explained above the ci will not now necessarily be independent of n.) If we look at second order terms we obtain d+2 d - + ( o)i (6-62) V')k "k Oxk Oui (6-63) The first two terms of this expression will give rise to terms which have n) analogue in the N-S equation. The last term gives rise to a modification of the convection term in the N-S equation. 6.6 Conclusion The class of Cellular Automata considered here can model low velocity (i.e. incompressible) Navier-Stokes flows, for which the Mach number is << 1. Sound waves can be modeled, though they require a sound velocity that is specified arbitrarily. When calculations are made for Mach numbers approaching unity, the results are questionable. Some non-linear partial 80 differential equations are being solved but they are not the Navier-Stokes equations. In addition, we note that for both the Navier-Stokes and cellular automata systems, the Chapman-Enskog expansion assumes that typical scales for spatial variation are long compared with the collision mean free path. For fluid models this condition is easily satisfied since the typical cell size is usually kept large compared to a mean free path. The situation is more difficult for cellular automata. Here the effective mean free path is generally a few times the distance between adjacent lattice points. Thus care must be taken that external sructures and boundaries introduced into the flow be already smoothed or rounded to the required spatial scales, since the physics of the lattice model alone will not guarantee slow enough variation of fluid quantities. This requirement for smooth variation of boundaries and other structures in the flow has some interesting consequences. Although cellular automata models may be useful in the study of boundary layers for which the boundary geometry is complex, as suggested in Section 4, the Chapman-Enskog expansion requires that boundary surfaces of a cellular automata model must vary slowly relative to the lattice spacing. They cannot be too jagged or bumpy. This raises interesting questions about recent two-dimensional simulations of flow past a plate, for example, because at the abrupt ends of the plate there is a region where these assumptions are violated, as illustrated in Figure 6-1. In a fan-like region emanating from the corners of the plate, the Navier-Stokes equations are not being well modeled. It would therefore be interesting to repeat the calculation using a finite thickness plate with rounded ends, to see whether the previous sharp edges affected the downstream vortex structure or the long-time behavior of the flow. 81 Sharp Edges, Corners Incoming Fluid Outgoing Fluid Flat - Plate Downstream Vortices Figure 6-1. Sharp edges in cellular automata computation of flow past a plate raise questions about validity of Chapman-Eriskog expansion near corners. 82 7 PACKING A PLANAR LATTICE Cellular automata have been proposed for problems whose initial data have a natural parameterization by a metrical surface, usually the Euclidian plane E2 . Such applications include the study of: global weather, vision, two-dimensional fluid flow. The physical realization of such C.A.'s suggests an optimization problem in discrete geometry. Let S. be the square of side = x in the usual planar lattice Z + Z and Cy the cube of side y in the cubical lattice Z 3 . Our problem is to represent the model S, in the computer CY so that: (1) model-adjacent nodes can exchange information quickly, and (2) a high density A = y 3 /x 2 is achieved. On S. we take the "city street" metric 11 xll = lxol+lxll. We imagine Cy to be composed of horizontal layers or leaves within which communication time is essentially distance (e.g., as defined on S.). Communication between distinct layers only occurs between vertically adjacent nodes at the surface of the cube. (We fix a constant c to be the "distance" between such pairs.) min This determines a metric on Cy, dist (p,q) = tn (# (horizontal edges of-i )+ c# (vertical edges of -t ) ) where -y is an edge path from p to q whose only vertical edges are on the cube's surface. Fabrication and heat dissipation problems make this model more realistic than a homogeneous cube. It can be proved [14] that any one to one map f : S. -+ Cy which fills the cube with a fixed density p > 0 must produce a distortion of distance (= communication time) which grows as c1/2 const. (p) x 1/16 . For p near zero the constant is at least ,6; for p near 1 the constant is > 1/10.) Furthermore, we explicitly define a map F: S, -+ C,, with density ; 1, 1/6power stretching, and small leading coefficient (stretchF - V-c'/2 x 1/6 ). The map F describes the best way to cut a two-dimensional data set apart and reassemble it on a stack of trays. Dicing into squares and stacking is not optimal since it concentrates all the stretching in the vertical direction (stretch ; x/ arises). The idea in constructing F is to share the necessary stretching equally (hence in the amount X 1/6 ) between hori7ontal and vertical stretching; these add - not multiply - to give the total stretching. For 83 convenience we fix c = 2 and assume x 1/6 is an integer; here is the definition of F: F-(each z-level set) is a xs/6 by x ' /2 rectangle R×. x1/2. (7 - 1) Define h(p,q) = (p(p.) [p,],x1/ 6 q + p-) where P = P.X 213 + P,0 < P1 213 _1 X. and p (po)CZ 2, the group of two elements. The nontrivial element is the permutation: 0 1 z2/ - 1 X21/1 1 X 21 2 0 and or(odd) is nontrivial, a (even) is trivial. The map h is a bijection which stretches by no more than a factor of x '/6 . Now set: F(u, v) = (h(r'(u.)[u1l, r"(v ,)[vi1), u0 + 0 X1/ 6 V°), where u= uoX5/6 + u,0 u, X5/6 vI X - 1 and V = Vor 1/ 2 + vI,0 1 - 1 a' (odd) and a" (odd) are permutations which (resp.) reverse the ordered sets: As a final note, if Cy is given the more homogeneous metric 1x = I XI I + IX21 + IX2 I+ Ix31 then packings with p ; 1 and stretching = 3 (independent of x) can be constructed. I thank J. Komlos for this sharp bound on stretching. 84 DISTRIBUTION LIST Dr. Henry D.I. Abarbanel Institute for Nonlinear Science Mail Code R002/Building CMRR/Room 115 University of California/San Diego La Jolla, CA 92093-0402 Dr. Herbert L. Buchanan, III Director DARPA/DSO 1400 Wilson Boulevard Arlington, VA 22209-2308 Dr. Donald M. Austin Scientific Computing Staff Office of Energy Research U.S. Department of Energy ER-7, GTN Washington, DC 20545 Dr. Curtis G. Callan, Jr. Physics Department P.O. Box 708 Princeton University Princeton, NJ 08544 The Honorable John A. Betti Undersecretary of Defense for Acquisition The Pentagon, Room 3E933 Washington, DC 20301-3000 Dr. Kenneth M. Case Institute for Nonlinear Science Mail Code R-002 University of California/San Diego San Diego, CA 92093-0402 Dr. Arthur E. Bisson Technical Director of Submarine and SSBN Security Program Department of the Navy, OP-02T The Pentagon, Room 4D534 Washington, DC 20350-2000 Dr. Ferdinand N. Cirillo, Jr. Central Intelligence Agency Washington, DC 20505 Ambassador Henry F. Cooper Director/SDIO-D Strategic Defense Initiative Organization Room 1E1081 The Pentagon Wasington, DC 20301-7 100 Mr. Edward C. Brady Sr. Vice President and General Manager The MITRE Corporation Mail Stop Z605 7525 Colshire Drive McLean, VA 22102 Mr. John Darrah Senior Scientist and Technical Advisor HQAF SPACOM/CN Peterson AFB, CO 80914-5001 Mr. Edward Brown Assistant Director on Nuclear Monitoring DARPA/PM 1400 Wilson Boulevard Arlington, VA 22209-2308 85 DISTRIBUTION LIST Dr. Alvin M. Despain Electrical Engineering Systems SAL-318 University nf Southern California Los Angeles, CA 90089-0781 Dr. Michael H. Freedman Department of Mathematics C-012 University of California/San Diego La Jolla, CA 92093-0112 DTIC [2] Defense Technical Information Center Dr. Richard Gajewski Director, Division of Advanced Energy Cameron Station Alexandria, VA 22314 Projects ER-16 U.S. Department of Energy Washington, DC 20545 Professor Freeman J. Dyson Institute for Advanced Study Dr. Larry Gershwin Central Intelligence Agency Washington, DC 20505 Olden Lane Princeton, NJ 08540 Maj Gen Robert D. Eaglet Assistant Deputy SAF/AQ The Pentagon, Room 4E969 Washington, DC 20330-1000 Dr. Fred M. Glaser Office of Technical Coordination U.S. Department of Energy FE-14/GTN Washington, DC 20545 Mr. John N. Entzminger Director DARPA/ITO 1400 Wilson Boulevard Arlington, VA 22209-2308 Dr. S. William Gouse Sr. Vice President and General Manager The MITRE Corporation Mail Stop Z605 Dr. Robert Foord [2] Central Intelligence Agency Washington, DC 20505 7525 Colshire Drive McLean, VA 22102 Dr. David A. Hammer Laboratory of Plasma Studies 369 Upson Hall Cornell University Ithaca, NY 14853 86 DISTRIBUTION LIST LTGEN Robert D. Hammond Mr. Alfred Lieberman Commander and Program Executive Officer ACDA/OA U.S. Army / CSSD-ZA Strategic Defense Command P.O. Box 15280 Arlington, VA 22215-0150 Room 5726 State 320 21st Street N.W. Washington, DC 20451 Mr. Robert Madden [2] Department of Defense National Security Agency ATIN: R-9 (Mr. Madden) Ft. George G. Meade, MD 20755-6000 Mr. Thomas H. Handel Office of Naval Intelligence The Pentagon Room 5D662 Washington, DC 20350-2000 Mr. Charles R. Mandelbaum U.S. Department of Energy Code ER-32 Mail Stop: G-236 Washington, DC 20545 Mr. Joe Harrison Central Intelligence Agency P.O. Box 1925 Room GV 1710 NHB Washington, DC 20505 Mr. Arthur F. Manfredi, Jr. [10] OSWR Central Intelligence Agency Washington, DC 20505 Dr. Robert G. Henderson Director JASON Program Office The MITRE Corporation 7525 Colshire Drive, Z561 McLean, VA 22102 Dr. Oscar P. Manley Office of Basic Energy Research U.S. Department of Energy Code ER- 15/GTN Washington, DC 20545 JASON Library [5] The MITRE Corporation Mail Stop: W002 7525 Colshire Drive McLean, VA 22102 Mr. Joe Martin Director Naval Warfare and Mobility Office of Tactical Warfare Programs The Pentagon Washington, DC 20301 Dr. O'D'ean P. Judd Lc , Alamos National ' ab Mail Stop A- 110 Los Alamos, NM 87545 87 DISTRIBUTION LIST Dr. Claire E. Max Inst. of Geophysics & Planetary Physics Lawrence Livermore Nail Lab L-413 P.O. Box 808 Livermore, CA 94550 Dr. Peter G. Pappas Chief Scientist U.S. Army Strategic Defense Command P.O. Box 15280 Arlington, VA 22215-0280 MGEN Thomas S. Moorman, Jr. Director of Space and SDI Programs MAJ Donald R. Ponikvar Deputy for Division Engineering Systems ODDR&E/DS Code SAF/AQS The Pentagon Washington, DC 20330-1000 The Pentagon Room 3D 136 Washington, DC 20301-3090 Dr. Julian C. Nall Institute for Defense Analyses 1801 North Beauregard Street Alexandria, VA 22311 Mr. John Rausch [21 NAVOPINTCEN Detachment, Suitland 4301 Suitland Road Washington, DC 20390 Dr. David R. Nelson Records Resources Department of Physics The MITRE Corporation Harvard University Mailstop: Wl15 Cambridge, MA 02138 7525 Colshire Drive McLean, VA 22102 Dr. Robert L. Norwood [2] Director for Space and Strategic Systems Office of the Assistant Secretary of the Army The Pentagon, Room 3E474 Washington, DC 20310-0103 Dr. Victor H. Reis Acting Director DARPA 1400 Wilson Boulevard Arlington, VA 22209-2308 Mr. Gordon Oehler Central Intelligence Agency Dr. Oscar S. Rothaus Math Department Washington, DC 20505 Cornell University Ithaca, NY 14853 88 DISTRJBUTION LIST Dr. Fred E. Saalfeld Mr. Richard Vitali Director Office of Naval Research 800 North Quincy Street Arlington, VA 22217-5000 Director of Corporate Laboratory U.S. Army Laboratory Command 2800 Powder Mill Road Adelphi, MD 20783-1145 Dr. Philip A. Selwyn Dr. Edward C. Whitman [2] Director Office of Naval Technology Duputy Assistance Secretary of the Navy C31 Electronic Warfare & Space Room 907 Department of the Navy 800 North Quincy Street Arlington, VA 22217-5000 The Pentagon, 4D745 Washington, DC 20350-5000 Dr. Donald K. Stevens Associate Director for Basic Energy Sciences,ER- 10 U.S. Department of Energy Office of Energy Research, GTN/Room J304 Washington, DC 20545 RADM Ray Witter Director - Undersoa Warfare Space and Naval Warfare Systems Command Code: PD-80 Department of the Navy Washington, DC 20363-5 100 Superintendent Code 1424 Attn: Documents Librarian Naval Postgraduate School Monterey, CA 93943 ADM Daniel J. Wolkensdorfer Director DASWD (OASN/RD&A) The Pentagon Room 5C676 Washington, DC 20350-1000 Dr. George W. Ullrich [3] Deputy Director Defense Nuclear Agency 6801 Telegraph Road Alexandria, VA 22310 Dr. Linda Zall Central Intelligence Agency Washington, DC 20505 Ms. Michelle Van Cleave Assistant Director for National Security Affairs Mr. Charles A. Zraket President and Chief Executive Officer The MITRE Corporation Mail Stop A265 Office of Science and Technology Plicy New Executive Office Building 17th and Pennsylvania Avenue Washington, DC 20506 Burlington Road Bedford, MA 01730 89

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising