  The program described here is based on Chapter 38 of GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics. The chapter is available in pdf format as a book excerpt (click here to download). Other sample chapters can be viewed here.

The algorithm depends on the GPU to run smoothly in real-time, but at low resolution, the CPU can handle it... well, almost. I wanted to run the algorithm at a fixed frame rate regardless of the speed of your computer; however, I found that even on a dual-core 2.8 GHz Pentium, I could only get something like 10-15 frames/sec at 80×40 resolution. I ended up removing the constant frame rate code. Instead, it pushes your CPU to the limit. If you have a better processor, it will run faster.

Chapter 38 can get a tad confusing especially if you’re rusty in ordinary differential equations and linear algebra. I’ll try to fill in some of the blanks.

The algorithm simulates an incompressible, homogenous fluid. The fluid acts something like a large container full marbles. As the marbles flow around, they individually exert forces on their neighbors. The average force applied across the surface of a marble is its pressure, a scalar value. The marbles don’t deform regardless of the pressure. They’re not like rubber tennis balls.

The algorithm simulates such a fluid by iteratively updating a set of 2-dimensional fields, which are conceptually like 2D arrays. However, the elements of the array may not be scalars. For instance, each element of the velocity vector field has vx and vy components. To store those values, I opted for a 3D array, where the third dimension holds 2 values. The pressure field is a 2D array of scalars. Dye concentration can also be represented as a 2D array of scalars, but I decided to store concentrations for red, green and blue, which required a 3D array. My code also allocates a 2D array for divergence (a scalar), which is used in the computation of the pressure field.

The `UpdateModel()` method in my source breaks down the generation of the next frame into the series of steps described in the chapter:

2. Diffuse the velocity field
3. Add forces to the velocity field
4. Compute pressure field
5. Subtract the gradient of the pressure field from the velocity field
6. Inject dye where the user is holding down the mouse
7. Advect the dye concentration using the velocity field

Advection is the simplest to understand. As explained in the chapter, each point of the field vacuums in stuff based on the velocity. Dye is carried by the fluid via advection (step 7). In a sense, the fluid is oblivious of the dye since it doesn’t affect the velocity field. The separate components of my dye (red, green and blue) are advected individually. The velocity field is itself advected (step 1), which is an odd concept since unlike dye concentration, vacuuming in velocity vectors is more difficult to relate to. But, just as dye advection is applied to each color component, velocity advection is simply applied to each velocity vector component. Essentially, the fluid particles that were pulled in maintain their speed.

Advection relies on linear interpolation. Linear interpolation in 1-dimension is straightforward. For 2D interpolation, consider this cell:

```  |     |
--A-----B--
|     |
|     |
--C-----D--
|     |
```

Each element of the 2D array represents the upper-left corner of a grid cell. Like the pixels on a computer screen, the y-axis increase downwards and the x-axis increases rightward. Each cell is a unit square (the dx and dy in the finite difference forms of the equations both equal 1). To find the value of a point within the square, my code uses 1-dimensional linear interpolation 3 times. It uses the x-coordinate twice to interpolate along the AB line and then along the CD line yielding new values E and F below:

```  |     |
--A---E-B--
|   | |
|   | |
--C---F-D--
|     |
```

Then it uses the y-coordinate to interpolate along the EF line. A, B, C and D are scalars. To interpolate velocity vectors, this must be applied to the 2 components separately. For dye concentration, it must be applied once for each color.

Diffusion is a dissipation of velocity based on the viscosity of the fluid. It’s computed using the Jacobi method. The process considers the field as one giant matrix and applies an operation on every element. This is primary reason the algorithm runs so slowly on a CPU. The field b is the velocity field. Since dx and dy equal 1 in the finite difference forms, alpha reduces to `1 / (VISCOSITY * DT)` and beta is 4 + alpha. `DT` is the time step. `VISCOSITY` is a constant that describes the thickness of the fluid. Since the resolution is so low, I choose a very low viscosity value as to induce vortices in such a tiny region. The Jacobi method is independently applied to each component of velocity. Though in my code, I decided to do it in a single (ugly) loop. Note, when using the Jacobi method, a temporary array is required. Each iteration produces a new matrix from the current matrix. Updating the source matrix mid-iteration would erroneously affect successively computed values.

If the user is dragging the with the mouse, a force is applied to that point of the velocity field relative to the size of the last mouse movement.

```F = m*a = m*dv/dt
dv = F*dt/m
```

I defined a mass for the mouse cursor.

To compute the pressure field, the code first computes the divergence of the velocity field. Divergence is a scalar that represents how much stuff is entering/leaving a region. Next, the Jacobi method is applied again using divergence as the field b. Alpha is -1 and and beta is 4.

Finally, the velocity field is modified by subtracting the gradient of the pressure field. Gradient is a vector quantity that represents the rate of change of pressure in the x and y directions.

The only step yet mentioned is step 6, where dye is added (maxing out the dye concentration array) around the point that the user is pressing with the mouse cursor. Each click results in a different color. Hence, dragging results in injecting dye and applying force to the vector field.

As mentioned above, this algorithm is not really practical in real-time for a CPU. Though, it was actually used in this bizarre entry to the Java 4K 2006 competition:

http://www.xmunkki.org/wiki/doku.php?id=projects:flow4k