Applying Artificial Intelligence to Handicapping NES Tetris

[ About | Sanity Check | Hacking the Game | Training the AI | Calculating Handicap | Videos | Download | Run ]


About

What would happen if you sat down for a head-to-head match against one of the NES Tetris masters in the Classic Tetris World Championship? Unless you possess lightning-fast fingers and savant-level stacking skills, expect nothing less than total and complete defeat. But what if there were a way to even the playing field, to enable players of different proficiency to compete on equal terms? In an email discussion with Trey Harrison—the distinguished CTWC pro and the techie behind the graphics streamed to the world during the competition—he proposed a way:

To make the match more even, the really good player could have the floor of their well raised up N rows. So, instead of 20 rows of space to work within, they only have (20 - N). I think it's obvious that this will lower the overall total score you can expect from that player, but by how much? Would 1 row make much of a difference? How about 4? Seems like 4 rows would make a massive difference at level 19.

With the help of my NES Tetris AI, I explored these questions. And in this article, I present my findings.


Sanity Check

Before I got started, I contemplated if eliminating rows would make the game unplayable. And, perhaps more importantly, if the skills that a player attained while practicing on the standard 20 rows are translatable to shorter playfields. Game Boy Tetris provided some insights. Due to the handheld system's limited screen resolution, its playfield is only 18 rows tall. And pieces spawn on the second row from the top, effectively shrinking the playfield to 17 rows. Nonetheless, it’s considered one of the best versions of the game, even the one favored by Tetris creator Alexey Pajitnov. Most players are oblivious to the smaller size, suggesting that difficulty is not significantly impacted by the removal of a few rows.

Game Boy Tetris

What about more than a few? Tetris DX, the Game Boy Color enhanced version of the original Game Boy Tetris, supports a mode that challenges the player to clear 40 lines at a chosen drop speed and playfield height. The height can be reduced all the way down to 8 rows. The result is a lot more difficult, but it’s playable and surprisingly fun.

Tetris DX 40-Lines

Another thing that I considered is the rise and fall of the stack as standard games plays out. The surface of the stack is a barrier, a sort of artificial floor, that intermittently emulates the effects of shorter playfields. This is particularly notable when skilled players build deep wells while waiting for long bars; gameplay gets squeezed into a very tight space. These transient yet frequent experiences might equate to training on all playfield sizes.

NES Tetris

With these encouraging observations in mind I wanted to experience NES Tetris with different heights...


Hacking the Game

Since the game does not provide an option to adjust the playfield height, it was time to dive into the source. A single byte change is all that is necessary to prevent the pieces from falling all the way to the bottom. The snippet of code below shows the vertical range check applied to the 4 squares of each Tetrimino. The value 20 ($16 in hex) at $94B3 defines the deepest row. And, if you’re wondering, the −2 check applies to the 2 hidden rows above the playfield.

94A6: LDA #$04
94A8: STA $00AA    ; for(i = 0; i < 4; i++) {

94AA: LDA $8A9C,X  ;   squareY = orientationTable[index];
94AD: CLC
94AE: ADC $0041    ;   cellY = squareY + tetriminoY; 
94B0: ADC #$02     ;   if (cellY < -2 || cellY >= 20) {
94B2: CMP #$16     ;     return false;
94B4: BCS $94E9    ;   }

The image below shows the result of halving the value.

Modding the height

I experimented with different ways of making the lower portion of the playfield appear solid. Since I ultimately intended to connect this to the AI project that I developed earlier and that project interfaces with the game through the Nintaco NES emulator, I ended up simply drawing a gray rectangle using the graphics functions provided by that emulator's API.

Gray rectangle

The first thing I noticed while playing is how often the music tempo changes since the stack frequently approaches the ceiling. More annoyingly, when the switch happens, the song starts over from the beginning. Tetris DX’s 40-Lines mode suffers a similar effect when the playfield is very small. And that game contains 2 intensities of panic music resulting in an audible mess.

I decided to disable the tempo changes. When a piece locks into place, the routine listed below executes. It scans across row 5 (the sixth row from the top) and if any solid blocks are detected, then the music speeds up. If that row is empty, the music resumes normal pace.

9D17: LDX #$05     ; SCANNED_ROW_INDEX = 5;
9D19: LDA $96D6,X
9D1C: TAY          ; offset = 10 * SCANNED_ROW_INDEX;
9D1D: LDX #$0A     ; column = 10;
9D1F: LDA ($B8),Y  ; do {
9D21: CMP #$EF     ;   if (*(PLAYFIELD + offset) != EMPTY) {
9D23: BNE $9D3C    ;     goto foundBlockInRow;
                   ;   }
9D25: INY          ;   ++offset;  
9D26: DEX          ;   --column; 
9D27: BNE $9D1F    ; } while(column > 0);
9D29: LDA $00BA     
9D2B: BEQ $9D50    ; if (allegro == true) {  
9D2D: LDA #$00
9D2F: STA $00BA    ;   allegro = false;
9D31: LDX $00C2
9D33: LDA $85D2,X
9D36: JSR $9E07    ;   adjustMusicSpeed();
                   ; } 
9D39: JMP $9D50    ; return;
                 
foundBlockInRow:
   
9D3C: LDA $00BA     
9D3E: BNE $9D50    ; if (allegro == false) { 
9D40: LDA #$FF
9D42: STA $00BA    ;   allegro = true;
9D44: LDA $00C2
9D46: CLC 
9D47: ADC #$04
9D49: TAX 
9D4A: LDA $85D2,X
9D4D: JSR $9E07    ;   adjustMusicSpeed();
                   ; }
9D50: RTS          ; return;

The scanned row index is at $9D18. I set it to 0. Meaning, game over will occur before the tempo gets an opportunity to change.

I also doctored the score counter to display values beyond max out. The hack is only relevant to the AI and human Tetris masters.

The 6-digit score is stored using packed BCD. And while doing BCD addition, digits may temporarily contain values larger than 9. Normally, the overflow gets carried into higher digits. However, as seen in the code fragment below, if the uppermost digit is 10 or greater, then all digits become 9.

9C84: LDA $0055    
9C86: AND #$F0
9C88: CMP #$A0
9C8A: BCC $9C94  ; if (uppermost digit >= 10) {
9C8C: LDA #$99
9C8E: STA $0053         
9C90: STA $0054         
9C92: STA $0055  ;   set all digits to 9;
                 ; }

Part of the comparison involves isolating the uppermost digit from the packed byte.  This is done by masking the byte with $F0.  By changing the mask to $00, the check will always fail.  And due the arrangement of tiles in the pattern tables, that modification causes the uppermost digit to display as a hexadecimal value.

A65361 points

The digits A, B, C, D, E and F correspond to 1,000,000, 1,100,000, 1,200,000, 1,300,000, 1,400,000, and 1,500,000, respectively. Beyond that, the score will roll over.

On various websites, I’ve encountered Game Genie code ENEOOGEZ that does something similar. But, instead of clearing the mask, it changes the value compared against the uppermost digit from 10 to 15. That means if the score ever reaches 1,500,000, the check will succeed and the score will erroneously max out at 999,999.

My version is equivalent to Game Genie code AEEPNGEY. The other changes above could also be expressed as Game Genie codes; however, I implemented all these hacks using the AccessPointListener provided by the Nintaco API.

With the modifications in place, I got back to playing. As expected, removing a few rows from the playfield is hardly noticeable. But the experience also revealed that there’s a tipping point where difficulty ramps up.


Training the AI

When a piece spawns, the AI simulates every possible placement of the current piece and the next piece. The resultant piles are ranked using a weighted sum of various metrics such as height, density, surface roughness, well depths, and so on. Ultimately, the AI directs the current piece to the position accordant with the highest ranked pile.

“Training the AI” amounts to tuning the weights used in the sum. This was accomplished using Particle Swarm Optimization (PSO). It evaluated candidate weights by simulating 100 games, each of which started on level 19 and either deliberately ended at the transition to level 29 or ended prematurely with a top out.

I initially setup the PSO to rate each candidate by its average score. However, that method caused the candidates to migrate toward cautious play. To rake in points, an AI needs to diligently pursue simultaneous line clears. And that demands repeatedly building deep wells, a technique that can bring the game to an abrupt end, dragging the average down with it. Getting a decent average meant being risk-averse.

To get around that, I configured the PSO to use the average of the best 33 out of the 100. That method focused on the subset of candidates that were exposed to favorable piece sequences, the ones that didn’t need to act conservatively. The result was an aggressive, risk-taking AI capable of stratospheric scores. However, when I plotted its probability density function—a normalized histogram, showing the odds of attaining any particular score—I ended up with this:

bimodal distribution

The large peak confirms its high-scoring potential. But the broad region on the left reveals that the AI often fails early. The plot shows a bimodal distribution indicative of 2 separate groups: the upper-third that hung on long enough to score high and the lower-two-thirds that took too many risks.

To suppress this effect, I instructed the PSO to use a weighted average across all 100 scores, biasing it toward the high end. That produced something much closer to a normal distribution:

normal distribution

It’s mean, median and standard deviation are 1,036,706, 1,061,820, and 149,458, respectively. The low standard deviation and high mean implies that it plays well consistently. Since the mean rests a bit below the median, the curve is somewhat negatively skewed. Fitting a Gaussian to it yields:

Gaussian fit

The Gaussian’s mean is 1,066,136 and its standard deviation is 67,656. On the left side, the red line rides slightly above the blue, exposing that AI can still fail early, though not very often.

The Gaussian fit also provides answers to some interesting questions. For example, reaching level 29 via exclusively achieving Tetrises happens roughly only once every trillion games! Consistent play, unfortunately, means that absolutely extraordinary games are just as rare as the bad ones.

Integrating the original probability density function from a given score to infinity produces this cumulative distribution:

cumulative distribution

It depicts the total probability of the AI getting or exceeding a particular score. For instance, the chances of a max out (999,999 points) is 80%.

I extended this training technique to obtained separate optimized weights for each playfield height.


Calculating Handicap

Below, I plotted the average score of the AI at all playfield heights.

means at all heights

Since it’s S-shaped (sigmoidal), I fitted a logistic curve to the values:

logistic and means at all heights

Finally, I normalized the logistic curve’s scale:

handicap percent

A player’s expected average score at a particular height is the handicap percent of the player’s average score at the standard height. For example, at Tetris DX’s playfield height (18 rows), you should be able to get about 96% of what you normally achieve at 20 rows. But at Game Boy Tetris’s height (effectively 17 rows), expect about 91%. For the smallest playfield setting in Tetris DX’s 40-Line mode (8 rows), anticipate only around 4% of your typical score. The curve reveals everything in between.

To calculate a meaningful average score, all the runs must begin on a common starting level. This is due to the fact the rate at which points accumulate depend on that level. For the curve above, the AI always initiated play on level 19. However, after every run, it reperformed the same sequence of moves under the scoring systems of all 20 starting levels. In other words, it computed what its final scored would have been had it commenced on any of the lower levels.

Each starting level produces a different set of average scores. But when they are fitted to logistic curves and normalized to the same scale, they end up nearly identical. In the following chart, the curves for all 20 starting levels are shown overlapping.

all handicap percents

A logistic curve is defined as the function

logistic function

where k is the growth rate or steepness, x0 is the midpoint and L is the maximum value. If you want to compute the handicap percent for a specific starting level and height, refer to the table below for the associated constants.

levelkx0L
00.63542913647610013.623164176577156101.738709037280190
10.63397289977053213.620287025234648101.751732068134570
20.63111465435503813.612829587953332101.775591024550740
30.62719589710723313.601248616665740101.807418477625220
40.62253239589097613.586087913561940101.844673187735510
50.61742814082170713.567896570838954101.884775139035870
60.61214198405432913.547265835826323101.925490406957280
70.60689593756777413.524792810517173101.964805479462770
80.60185952510632413.501078168682877102.001170338946780
90.59715084082972413.476731302125021102.033573922583870
100.59513000433489413.464342732164772102.045422660007920
110.59340214781125813.453406179006349102.055270258108080
120.59190849290542613.443681556719616102.063553748699860
130.59060487894402713.434978730229760102.070596217722500
140.58945752273407913.427145215718749102.076640591006680
150.58844015138386613.420057323011818102.081872483177430
160.58586106719991613.403352684198659102.096881727694470
170.58348912453594413.386846672485404102.109534148677570
180.58134572629913113.370845964185436102.119838312928080
190.57984542728332213.348273093799587102.113185988647810

Alternatively, when each of those columns are averaged together you end up a single curve that approximates the handicap percent across all starting levels (k = 0.603260297943194, x0 = 13.490715607520380 and L = 101.978036758426840):

all averaged handicap percents

Does this chart really enable me to play against a CTWC pro?  Well, if my average is 150,000 and the pro’s average is 800,000, then my typical score is 18.75% of the pro’s.  From the chart, the pro would have to play on height 11 to even out our skill levels.  That’s not impossible.  But most players are going to find the game painfully difficult below height 14, to the left of the midpoint of the curve.  Put another way, this system is only practical when the ratio of the players' averages scores is not smaller than 60%.


Videos

Watch the AI take on playfields of various heights in the videos below.


Download

HandicappedTetris_2019-08-17.zip

The .zip contains:


Run

Prerequisites

Load the JAR

  1. Extract HandicappedTetris.jar from the downloaded .zip.
  2. Start Nintaco and open Tetris (U) [!].nes.
  3. Open the Run Program window by selecting Tools | Run Program...
  4. Enter the path to the file in the JAR name field or browse to it using the Find JAR... button.
  5. Press Load JAR to load it.

Either Start Human Mode

  1. Set Main class to handicappedtetris.bot.HandicappedTetrisPlugin
  2. Optionally, in the Arguments field, specify the playfield height (4–20). The default is the standard 20 rows.
  3. Press Run.
  4. Return to the main window and play Tetris as you normally would.

Or Start AI Mode

  1. Set Main class to handicappedtetris.bot.HandicappedTetrisBot
  2. Optionally, in the Arguments field, specify a playfield height (4–20). By default, the height randomly changes between games.
  3. Press Run.
  4. Return to the main window to watch the AI play.

Tips


Copyright © 2019 meatfighter.com
This project is free software; you can redistribute it and/or modify it under the terms of LGPLv2.1.

Home