Applying Artificial Intelligence to Handicapping NES Tetris
[ About | Sanity Check | Hacking the Game | Training the AI | Calculating Handicap | Videos | Download | Run ]
What would happen if you sat down for a head-to-head match against one of the NES Tetris masters in the Classic Tetris World Championship? Unless you possess lightning-fast fingers and savant-level stacking skills, expect nothing less than total and complete defeat. But what if there were a way to even the playing field, to enable players of different proficiency to compete on equal terms? In an email discussion with Trey Harrison—the distinguished CTWC pro and the techie behind the graphics streamed to the world during the competition—he proposed a way:
To make the match more even, the really good player could have the floor of their well raised up N rows. So, instead of 20 rows of space to work within, they only have (20 - N). I think it's obvious that this will lower the overall total score you can expect from that player, but by how much? Would 1 row make much of a difference? How about 4? Seems like 4 rows would make a massive difference at level 19.
With the help of my NES Tetris AI, I explored these questions. And in this article, I present my findings.
Before I got started, I contemplated if eliminating rows would make the game unplayable. And, perhaps more importantly, if the skills that a player attained while practicing on the standard 20 rows are translatable to shorter playfields. Game Boy Tetris provided some insights. Due to the handheld system's limited screen resolution, its playfield is only 18 rows tall. And pieces spawn on the second row from the top, effectively shrinking the playfield to 17 rows. Nonetheless, it’s considered one of the best versions of the game, even the one favored by Tetris creator Alexey Pajitnov. Most players are oblivious to the smaller size, suggesting that difficulty is not significantly impacted by the removal of a few rows.
What about more than a few? Tetris DX, the Game Boy Color enhanced version of the original Game Boy Tetris, supports a mode that challenges the player to clear 40 lines at a chosen drop speed and playfield height. The height can be reduced all the way down to 8 rows. The result is a lot more difficult, but it’s playable and surprisingly fun.
Another thing that I considered is the rise and fall of the stack as standard games plays out. The surface of the stack is a barrier, a sort of artificial floor, that intermittently emulates the effects of shorter playfields. This is particularly notable when skilled players build deep wells while waiting for long bars; gameplay gets squeezed into a very tight space. These transient yet frequent experiences might equate to training on all playfield sizes.
With these encouraging observations in mind I wanted to experience NES Tetris with different heights...
Since the game does not provide an option to adjust the playfield height, it was time to dive into the source. A single byte change is all that is necessary to prevent the pieces from falling all the way to the bottom. The snippet of code below shows the vertical range check applied to the 4 squares of each Tetrimino. The value 20 ($16 in hex) at $94B3 defines the deepest row. And, if you’re wondering, the −2 check applies to the 2 hidden rows above the playfield.
94A6: LDA #$04 94A8: STA $00AA ; for(i = 0; i < 4; i++) { 94AA: LDA $8A9C,X ; squareY = orientationTable[index]; 94AD: CLC 94AE: ADC $0041 ; cellY = squareY + tetriminoY; 94B0: ADC #$02 ; if (cellY < -2 || cellY >= 20) { 94B2: CMP #$16 ; return false; 94B4: BCS $94E9 ; }
The image below shows the result of halving the value.
I experimented with different ways of making the lower portion of the playfield appear solid. Since I ultimately intended to connect this to the AI project that I developed earlier and that project interfaces with the game through the Nintaco NES emulator, I ended up simply drawing a gray rectangle using the graphics functions provided by that emulator's API.
The first thing I noticed while playing is how often the music tempo changes since the stack frequently approaches the ceiling. More annoyingly, when the switch happens, the song starts over from the beginning. Tetris DX’s 40-Lines mode suffers a similar effect when the playfield is very small. And that game contains 2 intensities of panic music resulting in an audible mess.
I decided to disable the tempo changes. When a piece locks into place, the routine listed below executes. It scans across row 5 (the sixth row from the top) and if any solid blocks are detected, then the music speeds up. If that row is empty, the music resumes normal pace.
9D17: LDX #$05 ; SCANNED_ROW_INDEX = 5; 9D19: LDA $96D6,X 9D1C: TAY ; offset = 10 * SCANNED_ROW_INDEX; 9D1D: LDX #$0A ; column = 10; 9D1F: LDA ($B8),Y ; do { 9D21: CMP #$EF ; if (*(PLAYFIELD + offset) != EMPTY) { 9D23: BNE $9D3C ; goto foundBlockInRow; ; } 9D25: INY ; ++offset; 9D26: DEX ; --column; 9D27: BNE $9D1F ; } while(column > 0); 9D29: LDA $00BA 9D2B: BEQ $9D50 ; if (allegro == true) { 9D2D: LDA #$00 9D2F: STA $00BA ; allegro = false; 9D31: LDX $00C2 9D33: LDA $85D2,X 9D36: JSR $9E07 ; adjustMusicSpeed(); ; } 9D39: JMP $9D50 ; return; foundBlockInRow: 9D3C: LDA $00BA 9D3E: BNE $9D50 ; if (allegro == false) { 9D40: LDA #$FF 9D42: STA $00BA ; allegro = true; 9D44: LDA $00C2 9D46: CLC 9D47: ADC #$04 9D49: TAX 9D4A: LDA $85D2,X 9D4D: JSR $9E07 ; adjustMusicSpeed(); ; } 9D50: RTS ; return;
The scanned row index is at $9D18. I set it to 0. Meaning, game over will occur before the tempo gets an opportunity to change.
I also doctored the score counter to display values beyond max out. The hack is only relevant to the AI and human Tetris masters.
The 6-digit score is stored using packed BCD. And while doing BCD addition, digits may temporarily contain values larger than 9. Normally, the overflow gets carried into higher digits. However, as seen in the code fragment below, if the uppermost digit is 10 or greater, then all digits become 9.
9C84: LDA $0055 9C86: AND #$F0 9C88: CMP #$A0 9C8A: BCC $9C94 ; if (uppermost digit >= 10) { 9C8C: LDA #$99 9C8E: STA $0053 9C90: STA $0054 9C92: STA $0055 ; set all digits to 9; ; }
Part of the comparison involves isolating the uppermost digit from the packed byte. This is done by masking the byte with $F0. By changing the mask to $00, the check will always fail. And due the arrangement of tiles in the pattern tables, that modification causes the uppermost digit to display as a hexadecimal value.
The digits A, B, C, D, E and F correspond to 1,000,000, 1,100,000, 1,200,000, 1,300,000, 1,400,000, and 1,500,000, respectively. Beyond that, the score will roll over.
On various websites, I’ve encountered Game Genie code ENEOOGEZ that does something similar. But, instead of clearing the mask, it changes the value compared against the uppermost digit from 10 to 15. That means if the score ever reaches 1,500,000, the check will succeed and the score will erroneously max out at 999,999.
My version is equivalent to Game Genie code AEEPNGEY. The other changes above could also be expressed as Game Genie codes; however, I implemented all these hacks using the AccessPointListener provided by the Nintaco API.
With the modifications in place, I got back to playing. As expected, removing a few rows from the playfield is hardly noticeable. But the experience also revealed that there’s a tipping point where difficulty ramps up.
When a piece spawns, the AI simulates every possible placement of the current piece and the next piece. The resultant piles are ranked using a weighted sum of various metrics such as height, density, surface roughness, well depths, and so on. Ultimately, the AI directs the current piece to the position accordant with the highest ranked pile.
“Training the AI” amounts to tuning the weights used in the sum. This was accomplished using Particle Swarm Optimization (PSO). It evaluated candidate weights by simulating 100 games, each of which started on level 19 and either deliberately ended at the transition to level 29 or ended prematurely with a top out.
I initially setup the PSO to rate each candidate by its average score. However, that method caused the candidates to migrate toward cautious play. To rake in points, an AI needs to diligently pursue simultaneous line clears. And that demands repeatedly building deep wells, a technique that can bring the game to an abrupt end, dragging the average down with it. Getting a decent average meant being risk-averse.
To get around that, I configured the PSO to use the average of the best 33 out of the 100. That method focused on the subset of candidates that were exposed to favorable piece sequences, the ones that didn’t need to act conservatively. The result was an aggressive, risk-taking AI capable of stratospheric scores. However, when I plotted its probability density function—a normalized histogram, showing the odds of attaining any particular score—I ended up with this:
The large peak confirms its high-scoring potential. But the broad region on the left reveals that the AI often fails early. The plot shows a bimodal distribution indicative of 2 separate groups: the upper-third that hung on long enough to score high and the lower-two-thirds that took too many risks.
To suppress this effect, I instructed the PSO to use a weighted average across all 100 scores, biasing it toward the high end. That produced something much closer to a normal distribution:
It’s mean, median and standard deviation are 1,036,706, 1,061,820, and 149,458, respectively. The low standard deviation and high mean implies that it plays well consistently. Since the mean rests a bit below the median, the curve is somewhat negatively skewed. Fitting a Gaussian to it yields:
The Gaussian’s mean is 1,066,136 and its standard deviation is 67,656. On the left side, the red line rides slightly above the blue, exposing that AI can still fail early, though not very often.
The Gaussian fit also provides answers to some interesting questions. For example, reaching level 29 via exclusively achieving Tetrises happens roughly only once every trillion games! Consistent play, unfortunately, means that absolutely extraordinary games are just as rare as the bad ones.
Integrating the original probability density function from a given score to infinity produces this cumulative distribution:
It depicts the total probability of the AI getting or exceeding a particular score. For instance, the chances of a max out (999,999 points) is 80%.
I extended this training technique to obtained separate optimized weights for each playfield height.
Below, I plotted the average score of the AI at all playfield heights.
Since it’s S-shaped (sigmoidal), I fitted a logistic curve to the values:
Finally, I normalized the logistic curve’s scale:
A player’s expected average score at a particular height is the handicap percent of the player’s average score at the standard height. For example, at Tetris DX’s playfield height (18 rows), you should be able to get about 96% of what you normally achieve at 20 rows. But at Game Boy Tetris’s height (effectively 17 rows), expect about 91%. For the smallest playfield setting in Tetris DX’s 40-Line mode (8 rows), anticipate only around 4% of your typical score. The curve reveals everything in between.
To calculate a meaningful average score, all the runs must begin on a common starting level. This is due to the fact the rate at which points accumulate depend on that level. For the curve above, the AI always initiated play on level 19. However, after every run, it reperformed the same sequence of moves under the scoring systems of all 20 starting levels. In other words, it computed what its final scored would have been had it commenced on any of the lower levels.
Each starting level produces a different set of average scores. But when they are fitted to logistic curves and normalized to the same scale, they end up nearly identical. In the following chart, the curves for all 20 starting levels are shown overlapping.
A logistic curve is defined as the function
where k is the growth rate or steepness, x0 is the midpoint and L is the maximum value. If you want to compute the handicap percent for a specific starting level and height, refer to the table below for the associated constants.
level | k | x0 | L |
---|---|---|---|
0 | 0.635429136476100 | 13.623164176577156 | 101.738709037280190 |
1 | 0.633972899770532 | 13.620287025234648 | 101.751732068134570 |
2 | 0.631114654355038 | 13.612829587953332 | 101.775591024550740 |
3 | 0.627195897107233 | 13.601248616665740 | 101.807418477625220 |
4 | 0.622532395890976 | 13.586087913561940 | 101.844673187735510 |
5 | 0.617428140821707 | 13.567896570838954 | 101.884775139035870 |
6 | 0.612141984054329 | 13.547265835826323 | 101.925490406957280 |
7 | 0.606895937567774 | 13.524792810517173 | 101.964805479462770 |
8 | 0.601859525106324 | 13.501078168682877 | 102.001170338946780 |
9 | 0.597150840829724 | 13.476731302125021 | 102.033573922583870 |
10 | 0.595130004334894 | 13.464342732164772 | 102.045422660007920 |
11 | 0.593402147811258 | 13.453406179006349 | 102.055270258108080 |
12 | 0.591908492905426 | 13.443681556719616 | 102.063553748699860 |
13 | 0.590604878944027 | 13.434978730229760 | 102.070596217722500 |
14 | 0.589457522734079 | 13.427145215718749 | 102.076640591006680 |
15 | 0.588440151383866 | 13.420057323011818 | 102.081872483177430 |
16 | 0.585861067199916 | 13.403352684198659 | 102.096881727694470 |
17 | 0.583489124535944 | 13.386846672485404 | 102.109534148677570 |
18 | 0.581345726299131 | 13.370845964185436 | 102.119838312928080 |
19 | 0.579845427283322 | 13.348273093799587 | 102.113185988647810 |
Alternatively, when each of those columns are averaged together you end up a single curve that approximates the handicap percent across all starting levels (k = 0.603260297943194, x0 = 13.490715607520380 and L = 101.978036758426840):
Does this chart really enable me to play against a CTWC pro? Well, if my average is 150,000 and the pro’s average is 800,000, then my typical score is 18.75% of the pro’s. From the chart, the pro would have to play on height 11 to even out our skill levels. That’s not impossible. But most players are going to find the game painfully difficult below height 14, to the left of the midpoint of the curve. Put another way, this system is only practical when the ratio of the players' averages scores is not smaller than 60%.
Watch the AI take on playfields of various heights in the videos below.
HandicappedTetris_2019-08-17.zip
The .zip contains:
Copyright © 2019 meatfighter.com |