Nintendo Tetris AI Revisited

TETRIS

About

Previously, I programmed a plugin that procedurally propels the perpetual procession of plummeting pieces presented to the player in the puzzle pastime Tetris. But, my efforts had some shortcomings:

The bot disables gravity, enabling it to perform slides and spins beyond what is possible in the lowest level of Nintendo Tetris. It never moves pieces upward, but the only way that pieces move down is via controlled soft drops. As such, it plays in a theoretical, idealized world. Or, to be less kind, it cheats.
The bot is risk averse. Its primary objective is long-term survival as opposed to striving for a high score. It shuns the point-valuable Doubles, Triples and Tetrises because they are the products of letting the stack grow, an unnecessary gamble that increases the odds of a top out. Consequentially, maxing out the score usually does not happen until level 90. But, in the real game where gravity exists, most players cannot endure level 29 or above due to the extremely high drop speed.
The bot’s analytical capabilities are based off a weighted sum of various influential factors. But, the weights were selected randomly. It is only by complete happenstance that it performs well. This is further evidence that Tetris without gravity is not much of a challenge at all.

This article describes my upgraded bot that plays Nintendo Tetris without disabling gravity. It assesses and manages risk while aggressively aiming for max outs under high drop speed.

Videos

Watch the bot max out Nintendo Tetris starting from level 19 in all the videos below.

Download

TetrisAI_2018-01-28.zip

The .zip contains:

src - The source tree.
TetrisAI.jar - The compiled binary.
lgpl-2.1.txt - The free software license.

Run

Prerequisites

Nintaco - the NES / Famicom emulator.
Tetris (U) [!].nes - the Nintendo Tetris ROM.

Launch the Plugin

Start Nintaco and open Tetris (U) [!].nes.
Extract TetrisAI.jar from the downloaded .zip.
Open the Run Program window by selecting Tools | Run Program...
Enter the path to the file in the JAR name field or browse to it using the Find JAR... button.
Press Load JAR to load it.
Press Run.
The plugin automatically skips the copyright and title screens, transporting you directly to the GAME TYPE and MUSIC TYPE menu screen. Using the D-pad (the arrow keys in the default button mapping), select A-TYPE and any music type of your choice. Then, press Start (Enter) to advance to the next menu screen.
On the A-TYPE menu screen, use the D-pad (the arrow keys) to select LEVEL 9. Finally, hold down gamepad button A and press Start (hold down keyboard key X and press Enter) to begin level 19 and to pass control to the bot.

Note, the bot is designed for level 19 and above only. It is not able to control the pieces in the lower levels.

Set the Speed

To watch it play faster, select Machine | Speed | Max.

Details

Plateaus

Below level 10, the drop speed of each level is slightly faster than the one before it. But, for 10 and above, there are several plateaus where the speed remains constant across multiple levels. This is a consequence of how the drop mechanism works. The speed is represented as frames per drop, which is an integer value. That left few options for the higher levels: 10–12, 13–15, 16–18, 19–28 and 29+ are 5, 4, 3, 2, and 1 frames/drop, respectively.

The bot was designed to tackle the 19–28 plateau only. On even frames, it asserts either Left, Right, A, B or nothing on the gamepad. And, on odd frames, it let’s the automatic drop happen without asserting any of the buttons. The game does not seem to accept horizontal movement that coincides with rotation; so, each button is asserted independently on an even frame.

Unlike the masters playing on the higher levels, the bot does not take advantage of the Delayed Auto Shift (DAS), a.k.a. autorepeat, and the associated techniques to keep it charged. It’s more like Thor Aackerlund’s vibrating thumb technique. But, it pushes the vibration to the theoretical maximum that the game can handle.

Players are rewarded with 40, 100, 300 and 1200 points for Single, Double, Triple and Tetris clears, respectively. And, the point values are multiplied by the level number plus 1. In other words, to achieve a high score, the player should strive for the maximum number of Tetrises while playing at the highest levels for as long as possible.

19 happens to be the highest starting level, enabling the bot to jump right into the 19–28 plateau. But, due to a bug in the level counting mechanism, the game will advance to level 20 on 140 line clears, instead of the expected 200. From there, it will advance one level every 10 lines. And, when it reaches 230, the bot will graduate from the plateau and quickly succumb. In summary, it needs to achieve as many Tetrises as possible before clearing 230 lines.

Soft Drops

Soft drops can also potentially increase the score. To gain points, the piece needs to be soft dropped all the way down until it locks into the playfield. Any brief soft drop that might occur along the way in positioning the piece will not contribute to the score. If successful, the player will gain 1 point for every row crossed during the soft drop. And, the resultant value is not multiplied by the level number even if the soft drop clears lines.

Soft dropping only has a marginal effect on the overall score. Nonetheless, whenever possible, the bot will complete a placement by asserting Down on the gamepad to gain those few extra points. In rare cases, it might mean the difference between a very high score and a max out.

AI Algorithm

When a piece spawns, the bot examines every possible placement of the current piece and the next piece. A valid placement is a position in which a piece is supported either by solid cells or by the playfield floor. It can be reached from the spawn location through a sequence of horizontal movements, rotations and drops. The valid placements and their pathway sequences are found using breadth-first search (see Searcher).

Placing a piece into the playfield has consequences: 4 empty cells become solid cells and any completed lines will be cleared, dropping the rows above. For each valid placement of the current piece and its associated playfield consequences, the bot tries each valid placement of the next piece, evaluating the combined consequences. This chain of searches is represented by SearchChain.

Each of the combined consequences is fed into an evaluation function, which scores the contents of the playfield. The combo with the lowest score wins out and the current piece is placed accordingly. The results of the search chain only affect the current piece. When the next piece spawns, it is evaluated with the next-next piece and so on.

Evaluation Function

The evaluation function is a weighted sum of the following factors:

Total Lines Cleared – the number of lines cleared as a consequence of introducing both Tetriminos.
Total Lock Height – the sum of the heights above the playfield floor where the Tetriminos locked. The lock height of a individual piece is the vertical distance it would drop if all of the other solid squares in the playfield were removed and the orientation of the piece was maintained.
Total Well Cells – the number of cells within the wells. A well cell is an empty cell located above all the solid cells within its column such that its left and right neighbors are both solid cells; the playfield walls are treated as solid cells in this determination. The idea is that a well is a structure open at the top, sealed at the bottom and surrounded by walls on both sides. The possibility of intermittent gaps in the well walls means that well cells do not necessarily appear in a contiguous stack within a column.
Total Deep Wells – the number of wells containing 3 or more well cells.
Total Column Holes – the number of empty cells with solid cells immediately above them. The playfield floor is not compared to the cell directly above it. Empty columns contain no holes.
Total Weighted Column Holes – the sum of the column hole row indices. In this case, the rows are indexed from top to bottom starting with 1. The idea is to penalize holes located deeper in the stack more than those closer to the surface since fewer lines needs to be cleared to fill them.
Total Column Hole Depths – the sum of the vertical distances between each hole and the top of the column in which it resides. The top is the uppermost solid cell within a column and the depth of the hole is the difference between the row index of the hole and the row index of the top.
Min Column Hole Depth – the smallest column hole depth. If there are no holes, this defaults to the playfield height (20).
Max Column Hole Depth – the largest column hole depth. If there are no holes, this defaults to 0.
Total Column Transitions – the number of empty cells adjacent to a solid cell (or vice versa) within the same column. The changeover from the highest solid block in the column to the empty space above it is not considered a transition. Similarly, the playfield floor is not compared to the cell directly above it. As a result, a completely empty column has no transitions.
Total Row Transitions – the number of empty cells adjacent to a solid cell (or vice versa) within the same row. Empty cells adjoining playfield walls are considered transitions. The total is computed across all rows in the playfield. However, rows that are completely empty do not contribute to the sum.
Total Column Heights – the sum of the vertical distances between the top of each column and the playfield floor. A column containing only 1 solid cell has a height of 1 whereas a completely empty column has a height of 0.
Pile Height – the largest column height.
Column Height Spread – the height difference between the tallest and the shortest columns.
Total Solid Cells – the number of solid cells in the entire playfield.
Total Weighted Solid Cells – the sum heights of all the solid cells. The row immediately above the playfield floor has a height of one.
Column Height Variance – the sum of the absolute differences between heights of all adjacent columns.

Machine Learning

The Particle Swarm Optimization (PSO) variant described in reference [1] was used to find the weights for the evaluation function. The suggested inertia and acceleration coefficients were applied to achieve good convergent behavior. And, maximum particle step sizes were established by clamping their velocity magnitudes.

During each iteration, the particles were evaluated in parallel to fully exploit the available computational resources. In addition, after convergence was detected (no advancement after a certain number of iterations), the PSO was setup to automatically restart with a new set of randomly selected weights, enabling it to explore more of the search space.

Each particle’s position vector (it’s weights) were evaluated by simulating 100 runs of the 19–28 plateau. A full run means 230 line clears, but many end in a top out. The scores of the runs are sorted and the particle’s evaluation is defined as the average of the best 33 out of the 100 runs. The idea is to select for aggression. By focusing exclusively on the upper-third, the particles only experience and become accustomed to favorable piece sequences, limiting the need to act conservatively. As a result, they tend to push a typical game to the brink, hanging on just long enough for that next long bar.

The piece sequences for the 100 runs were generate prior to PSO execution and the same sequences were used over and over again. This was necessary to fix the search space, to make the candidate solutions comparable to each other. The sequences were produced by employing the logic of the actual Nintendo Tetris PRNG, which was designed to reduce the odds of consecutive duplicates. But, the PRNG is also flawed: it does not pick pieces uniformly.

Initial attempts produced bots that were way too aggressive. If they made it through the 19–28 plateau, they would usually max out the score. But, unfortunately, they often just topped out early. In response, four things were done to pacify the bots:

They were told to be greedy: if the current piece or the next piece could be used to achieve a Tetris, do it. Prior to this directive, the bots would use long bars to further grow a perfectly clean stack with a very deep well. Greedy behavior potentially exchanges long-term survival for short-term gains. But, the bots do not need to survive indefinitely; they only need to get through 230 lines. And, experiments revealed that taking Tetrises when available facilitated that goal. However, the same cannot be said about Singles, Doubles or Triples. Greedily going after them produced bots that were too conservative; they would reach the end of the plateau with a low score.
They were told not to put blocks close to the playfield ceiling. A penalty was introduced in the evaluation function that applies to solid cells in any of the upper 7 rows. And, the penalty is inversely proportional to the vertical distance between the block and the ceiling.
They were told that if they are forced to put a block close to the playfield ceiling, at least keep it away from the spawn point. The search chain currently rejects placement combinations would interfere with spawning of any of the 7 Tetriminos.
They were told that if they place a block that abuts the ceiling, to make sure that it does not partition the playfield, which would make progress impossible. If the search chain detects ceiling contact, it performs a seed fill from the spawn point (see SeedFiller). And, if the seed fill is unable to completely fill any row, then the search chain rejects the placement combination.

With the pacification rules in place, PSO provided the following weights:

Factor	Weight
Total Lines Cleared	0.286127095297893900
Total Lock Height	1.701233676909959200
Total Well Cells	0.711304230768307700
Total Deep Wells	0.910665415998680400
Total Column Holes	1.879338064244357000
Total Weighted Column Holes	2.168463848297177000
Total Column Hole Depths	−0.265587111961757270
Min Column Hole Depth	0.289886584949610500
Max Column Hole Depth	0.362361055261181730
Total Column Transitions	−0.028668795795469625
Total Row Transitions	0.874179981113233100
Total Column Heights	−0.507409683144361900
Pile Height	−2.148676202831281000
Column Height Spread	−1.187558540281141700
Total Solid Cells	−2.645656132241128000
Total Weighted Solid Cells	0.242043416268706620
Column Height Variance	0.287838126164431440

Since the search chain hunts for the combination that minimizes the evaluation function, factors paired with positive weights can be viewed as bonuses and the remainder can be viewed as penalties. But, the magnitudes of the weights do not necessarily indicate the significance of their associated factors; they are not normalized in some way that makes them comparable.

AI Strength

To estimate the strength of the AI, scores were collected across ~1.7 million simulated runs of the 19–28 plateau. The scores do not reflect any play into level 29 or above, and they omit points gained from soft drops. But, they do include games that ended prematurely in a top out. The Nintendo Tetris PRNG logic was used to generate the Tetrimino sequences for the simulated runs.

Within these results, the maximum score is 1,313,600. And, the minimum is 0.

The mean score is 816,379, which sounds low. But, as you’ll see below, the data is skewed such that the median score, 989,200, provides a much better idea of a typical value.

As discussed earlier, the PSO optimized the weights based on the average of the best third of its runs. In this case the average score of the best third is 1,108,860. In fact, the average score of the top 75% is 1,000,000.

The bot has a probability of 47% of maxing out the score prior to level 29. It has a probability of 61% of obtaining at least 900,000 points prior to level 29. In fact, the chart below provides the odds of attaining any particular score prior to level 29.

probability density

The probability appears to drop linearly up until about 900,000 points. Then, it transitions to an inverted sigmoid curve.

Below is a smoothed-out histogram showing the number of runs for each attained score. It’s shape is determined by the derivative of the plot above it.

histogram

Ignoring the wobble, it’s flat up until about 900,000 and then it transitions to a normal distribution centered around 1,050,000 points. The exact source of the wobble is unclear. It seems to suggest that scores prefer to jump in units of 20,000 points. It might be related to the cycle of stack building and achieving Tetrises.

RAM and ROM Maps

The plugin uses the Nintaco API to manipulate CPU Memory, to assert gamepad buttons and to receive frame rendered events. All memory addresses were discovered through exploration with the Nintaco Debugging Tools and the information has been added to the Data Crystal ROMhacking.net wiki. In the source, they appear as constants within the Addresses interface.

References

van den Bergh, F.; Engelbrecht, A.P. (2006)
A study of particle swarm optimization particle trajectories
In: Information Sciences 176 (2006) (pp. 937–971)
Retrieved from http://researchspace.csir.co.za/dspace/bitstream/handle/10204/1155/van%20den%20bergh_2006_D.pdf