Applying Artificial Intelligence to

Hatris

[ About | Videos | Download | Run | The Game | Details ]


About

This article describes a Java plug-in for the Nintaco NES / Famicom emulator that automatically plays the puzzle game Hatris.


Videos

Check out the AI playing through the final 10 levels:

Watch the AI blast through all levels in fast motion:

Here are all 6 Hatris cutscenes that were collected while testing the AI:


Download

HatrisAI_2017-08-06.zip

The .zip contains:


Run

Prerequisites

Launch the Plug-in

  1. Start Nintaco and open Hatris (U) [!].nes.
  2. Extract HatrisAI.jar from the downloaded .zip.
  3. Open the Run Program window by selecting Tools | Run Program...
  4. Enter the path to the file in the JAR name field or browse to it using the Find JAR... button.
  5. Press Load JAR to load it.
  6. Press Run.
  7. From the Hatris main menu, select a starting Shop and Stage. Enabling or disabling music will launch gameplay and the AI will take over from there.

Set the Speed

To watch it play faster, select Machine | Speed | Max.


The Game

Rules

Hatris is a puzzle game somewhat similar to Tetris. Randomly selected hat pairs drop into the playfield. The hats within the pair can be swapped and the pair can be shifted left and right, positioning it above any of the 6 playfield columns. As hats land, they pile up and if any pile reaches the top, it's Game Over. To thwart that, a contiguous stack of 5 of the same hats will vanish. The goal in each level is to eliminate a specified number of stacks of 5.

If the pair lands on the playfield floor (the heads) or onto piles of equal height, then the hats will land together. However, if only one of the hats encounters a pile, then the second hat will remain free to be positioned independently.

As hats descend, horizontal movement is initially restricted only by the playfield walls. But, further down, tall piles can further confine them, particularly limiting the placement of the second hat. Pile height ordering strategies can be employed to maximize placement options. For instance, the player can attempt to keep the piles sorted by height or they can be maintained in a U-shaped pattern.

Descending height order

U-shaped order

There are 6 types of hats and each of them have different heights and nesting characteristics. Typically, hats of the same type nest compactly whereas mismatches vary to the detriment of the player. This is why matching only takes place vertically; the playfield is not a grid; hats will not lineup horizontally or diagonally. However, as mentioned above, piles interact with each other indirectly as a consequence of the height ordering.

Helpers

Hatris was developed by Alexey Pajitnov, the inventor of Tetris, and his close friend Vladimir Pokhilko.

License screen

Avatars of Alexey and Vladimir appear in the game in the form of helpers and in the cutscenes. Alexey is stationed above the upper-left of the playfield. When summoned, he slides down a pole and then the player can use him to remove up to 5 hats from the bottoms of any of the piles. Vladimir appears to the lower-right of the playfield and he has the ability to swap any 2 piles, potentially contributing to the sorting order.

Helpers are a limited resource. Acquiring one involves eliminating 5 stacks of 5 identical hats. Specifically, removing 5 stacks of crowns, derbies or wizard hats earns an Alexey. And, clearing 5 stacks of baseball caps, cowboy hats or top hats earns a Vladimir. Every time a stack vanishes, one of the progress bars to the right of the playfield advances. When the bar fills up, it resets and a helper is added to the Helper Pool, depicted directly above the progress bars. The last helper earned is the first helper that may be used (FIFO). The pool has a maximum capacity of 8 helpers.

When a helper is summoned, the currently falling hats vanish. And, after the helper is finished, the next pair drops. This effect can actually be used as a strategy. If the current piles cannot accommodate a particular randomly selected pair, a helper could potentially be employed to eliminate it immediately.

Conquering Hatris

The main menu enables the player to start on any of the 60 levels. The Shop is the most significant digit of the level, ranging from 0 to 5, and the Stage is the least significant digit of the level, ranging from 0 to 9. Oddly, during play, those digits are displayed together, denoted simply as the Shop.

Main menu screen

Baseball caps, cowboy hats and top hats are introduced in level 0. Derbies are first seen in level 1. Wizard hats follow in level 3. And, crowns appear in levels 6 and above.

When playing from the beginning, the number of stacks of 5 identical hats to eliminate remains at a constant 15 in every stage. To get through all 60 levels, the player needs to assemble 60 × 15 = 900 stacks in total. And, since hats spawn in pairs, that means surviving through a minimum of 5 × 900 / 2 = 2250 spawns. Well, almost.

Normally, at the end of each level, the playfield, the helper progress bars, the Helpers Pool and the random hat sequence remains undisturbed; it's one long continuum. However, every 10 levels (at the completion of each Shop), there is a cutscene (see Videos). After the cutscene, the playfield is cleared and then it is populated with 6 × Shop randomly selected hats, distributed equally among the columns. The rest remains unchanged. Depending on the prior contents of the playfield, that effect may be beneficial or detrimental to the player.

Refer to the Hatris Instruction Manual for more details.


Details

RAM and ROM Maps

The plug-in uses the Nintaco API to manipulate CPU Memory and to receive frame rendered events. All memory addresses were discovered through exploration with the Nintaco Debugging Tools and the information has been added to the Data Crystal ROMhacking.net wiki. In the source, they appear as constants within the Addresses interface.

AI Algorithm

When a hat pair spawns, the plug-in examines every possible placement of the current pair and the next pair. A placement is simply which piles each hat ends up in. As noted above, the piles do not necessarily need to be adjacent and landing the first hat restricts the options for the second. All of the possible placements are determined using nested loops (see Searcher).

Landing a hat onto a pile has consequences: the hat will nest with the one beneath it, increasing the pile height and if it completes an identical stack of 5, then the stack will vanish from the pile. For each valid placement of the current pair and its associated playfield consequences, the plug-in tries each valid placement of the next pair, evaluating the combined consequences. This chain of searches is represented by SearchChain.

Each of the combined consequences is fed to an evaluation function, which scores the contents of the playfield. The combo with the highest score wins out and the current pair is placed accordingly. The results of the search chain only affect the current pair. When the next pair spawns, it is evaluated with the next-next pair and so on.

In addition to examining every pair placement, when a helper is available, the plug-in also evaluates using it. For Alexey, this means it tries all possible ways to remove up to 5 hats from the bottom of piles. For Vladimir, it explores all of the potential pile swaps. Since using a helper annihilates the falling hats, the option is treated as one of the two search chain steps as opposed to introducing more. And, the same evaluation function applies.

Evaluation Function

The evaluation function scores a playfield using a series of metrics:

  1. The normalized complement of the sum of the squares of the number of hats in each column. In other words, the fewer the hats on the playfield and the more evenly they are distributed among the columns, the better.
  2. The normalized complement of the sum of the squares of the pile heights. Said another way, the shorter the piles, the better.
  3. The normalized sum of the squares of the run lengths > 2. This metrics inspects each pile, looking for contiguous spans of identical hats. Spans of 2 or more hats contribute to the sum.
  4. The normalized sum of the squares of the pile top run lengths > 2. This metric is identical to prior one, except that only the contiguous spans at the very top of the piles contribute to the sum.
  5. The normalized complement of the total number of mismatches. A mismatch refers to 2 different adjacent hats within a pile.
  6. The normalized complement of the pile heights permutation distance. This metric aims to keep the piles in height descending order. With the help of a lookup table, the minimal number of pile swaps required to completely sort them by height is determined.
  7. The normalized sum of the adjacent pile height differences. This metric has the same goal as the prior one, but it does it in a different way. It simply compares the heights of neighboring piles.
  8. The normalized cash value. In Hatris, the player's score is denoted as cash and the more cash the better.

Metrics (1) and (2) attempt to minimize and even out the amount of material in the playfield. Metrics (3), (4) and (5) look for conformity or discord within the piles. Metrics (6) and (7) try to keep the piles in sorted order. And, metric (8) aims for a high score.

These metrics are combined into a single number via a weighted sum (see EvaluationFunction).

Machine Learning

The road to wisdom?—Well, it's plain
and simple to express:
                   Err
                   and err
                   and err again,
                   but less
                   and less
                   and less.
                           —
Piet Hein

To optimize the weights used in the evaluation function, a Trainer was created that executes the following algorithm:

W := a weight vector, the elements initialized randomly
repeat indefinitely:
  S := a randomly generated sequence of hat pairs
  R := W rated by running the AI across S
  repeat a few times:
    Z := a set of weight vectors, each a randomly generated variant of W
    for each weight vector, V, in Z:
      T := V rated by running the AI across S
      if T is better than R:
        R := T
        W := V

It's a variation of Hill Climbing. Here are more details:

Here are the results, rounded to 5 decimal place:

MetricWeight (%)
# Hats20.00100
Pile Heights20.00055
Run Lengths21.78585
Top Run Lengths20.06811
# Mismatches88.31852
Permutation Distance0.00048
Height Differences0.06392
Cash9.76156

Each metric is normalized to a number in between 0 and 1 in attempt to make them comparable. And, if they are, then the total number of mismatches (different adjacent hats) is by far the most significant. Total cash (the score) comes in a distant second. And, the run lengths (identical hat spans) comes in as an even more distant third. The height ordering metrics, although not that significant percentagewise, are still relevant to Vladimir and potentially to long-term play.

AI Strength

After training the AI, the number of spawns required to reach Game Over were recorded over a large batch of simulated runs (see MeasureStats). Below is the average and standard deviation:

That's abysmal. As mentioned above, 2250 is the theoretical minimum number of spawns required to defeat all 60 levels. However, when interfaced with the real game, the AI demonstrated a surprising degree of proficiency; it appeared to be able to get through every stage without issue.

The AI was clearly taking advantage of some aspect of the real game missing in the simulation. One possibility was how the playfield is reset after the cutscenes, a feature that the simulation ignores. But, that happens so sparely that the long-term effect should not be that significant. Another possibility is that the real game starts out with only 4 hat types. But, all successive levels after the fifth one use 6 hat types. That also seems insignificant in the long-term.

The most likely candidate was the pseudorandom number generator (PRNG) used to produce the hat pair sequence. It could be rigged to make the game easier to play, which would actually be consistent with some of the other Tetris variants of the era. For example, Super Tetris 2 + Bombliss, developed and published by Bullet Proof Software in 1992 (3 years after this game) for the Super Famicom, is able to detect when the player is getting close to topping out and it will respond by providing extra long-bars.

There are many simply ways that the PRNG could be rigged to aid the player. For instance, it could occasionally produce needed pairs by inspecting the tops of adjacent piles. And, it could be setup to do this with increasing frequency as a function of the maximum pile height. Another simple option is to generate repeated pairs that could easily be collected by the player. In fact, the latter does appear to happen every now and then. But, that may just be a statistical fluke.

The PRNG could be analyzed empirically. At the very least, that would reveal if it were rigged or not. But, the data probably could not explain the exact mechanisms behind it. The only way to get a definitive answer would be to peek at the code...

All of the variables that appear in the following 6502 code comments are single-byte global variables. The PRNG depends on 2 such variables, designated as h1 and h2, which are seeded on game launch:

seedRNG:

8CB2: LDY #$00
8CB4: LDA $8CDC,Y
8CB7: STA $03C0,Y
8CBA: INY 
8CBB: CPY #$28
8CBD: BNE $8CB4
8CBF: LDY #$00
8CC1: LDA $8D04,Y
8CC4: STA $03F0,Y
8CC7: INY 
8CC8: CPY #$10
8CCA: BNE $8CC1

8CCC: LDA #$04
8CCE: STA $03E5

8CD1: LDA #$3F
8CD3: STA $0304    ; h1 = 63;

8CD6: LDA #$85
8CD8: STA $0305    ; h2 = 133;   

8CDB: RTS          ; return;

Below is the main PRNG subroutine:

randomizeHats:

88D8: LDA $0300
88DB: EOR $0304
88DE: ROR A
88DF: STA $0304  ; h1 = rotateRight(frameCounter ^ h1);

88E2: LDA $0300
88E5: EOR $0305
88E8: ROL A
88E9: STA $0305  ; h2 = rotateLeft(frameCounter ^ h2);

88EC: LDA $0304
88EF: STA $0080
88F1: SEC 
88F2: SBC #$06
88F4: BCS $88EF  
88F6: LDA $0080
88F8: CLC 
88F9: ADC #$01
88FB: STA $0080  ; hat1 = h1 % 6 + 1;

88FD: LDX $0080
88FF: CPX #$07
8901: BCS $8901

8903: LDA $0305
8906: STA $0081
8908: SEC 
8909: SBC #$06
890B: BCS $8906
890D: LDA $0081
890F: CLC 
8910: ADC #$01
8912: STA $0081  ; hat2 = h2 % 6 + 1;

8914: RTS        ; return;

frameCounter is incremented once per frame (since startup). h1 and h2 are XORed with frameCounter and then they are circularly shifted right and left, respectively. hat1 and hat2 are set to h1 and h2 mod 6 plus 1, respectively. Those are hat types, the output of the routine.

The main PRNG subroutine is only called by the following:

pickHats:

                 ; while(true) {

889E: JSR $88D8  ;   randomizeHats();

88A1: LDA $03A8  
88A4: BNE $889D  
88A6: LDA $03A9  ;   if (stage >= 6) {
88A9: CMP #$06   ;     return;
88AB: BCS $889D  ;   }

88AD: LDA #$03   
88AF: CMP $0080  
88B1: BEQ $889E  ;   if (hat1 == CROWN || hat2 == CROWN) {
88B3: CMP $0081  ;     continue;
88B5: BEQ $889E  ;   }

88B7: LDA $03A9  ;   if (stage >= 3) {
88BA: CMP #$03   ;     return;
88BC: BCS $889D  ;   }

88BE: LDA #$02
88C0: CMP $0080
88C2: BEQ $889E  ;   if (hat1 == WIZARD || hat2 == WIZARD) {
88C4: CMP $0081  ;     continue;
88C6: BEQ $889E  ;   }

88C8: LDA $03A9  ;   if (stage != 0) {
88CB: BNE $889D  ;     return;
                 ;   }

88CD: LDA #$05   
88CF: CMP $0080
88D1: BEQ $889E  ;   if (hat1 != DERBY && hat2 != DERBY) {
88D3: CMP $0081  ;     break;
88D5: BEQ $889E  ;   }

                 ; }

88D7: RTS        ; return;

The loop just makes sure that the early stages use fewer hat types. No surprises there.

The initialization subroutine, which is invoked once immediately after starting a new game from the main menu, provides the first clue:

init:

8C60: LDY #$00
8C62: LDA #$00
8C64: STA $00A0

8C66: LDA #$60
8C68: STA $00A1

8C6A: LDA #$00
8C6C: STA ($A0),Y
8C6E: INY 
8C6F: BNE $8C6A
8C71: INC $00A1
8C73: LDX $00A1
8C75: CPX #$7C
8C77: BNE $8C6A
8C79: JSR $A1CE

8C7C: JSR $8CB2  ; seedRNG();

8C7F: JSR $8F14
8C82: JSR $A139

8C85: LDA #$FF
8C87: STA $032A
8C8A: JSR $B722

8C8D: LDA #$00
8C8F: STA $032A

8C92: JSR $889E  ; pickHats();

8C95: LDA $0080    
8C97: CMP $0081  ; if (hat1 == hat2) {
8C99: BNE $8C9E  ;   pickHats();
8C9B: JSR $889E  ; }

8C9E: LDA $0080
8CA0: STA $03D1  ; nextType1 = hat1; 

8CA3: LDA $0081
8CA5: STA $03D9  ; nextType2 = hat2;

8CA8: LDA #$0F
8CAA: CLC 
8CAB: ADC $03A9
8CAE: STA $03A3

8CB1: RTS        ; return;

It seeds the PRNG and then it picks a random pair to begin the sequence. However, it check if the both hats in the pair match. If they do, it picks a second time. nextType1 and nextType2 are the result, the first pair in the sequence.

That same logic appears in the pair spawn subroutine, which is called every time the sequence needs to be advanced:

spawn:

86CA: LDA $03F0
86CD: BNE $86C9
86CF: LDA $03C0
86D2: BEQ $86C9

86D4: LDA #$00
86D6: STA $03C0  ; playing=true;

86D9: LDA #$FF
86DB: STA $03C8
86DE: LDA #$00
86E0: STA $03C2
86E3: LDA #$08
86E5: PHA 
86E6: JSR $8B3B
86E9: JSR $94EA
86EC: JSR $8894
86EF: PLA 
86F0: SEC 
86F1: SBC #$01
86F3: BNE $86E5

86F5: LDA #$A9
86F7: STA $03D3
86FA: STA $03DB  ; hatY1 = hatY2 = 169;

86FD: LDA #$68
86FF: STA $03D4  ; hatSpriteX1 = 104;

8702: LDA #$80
8704: STA $03DC  ; hatSpriteX2 = 128;

8707: LDA #$02
8709: STA $03D2  ; hatColumn1 = 2;

870C: LDA #$03
870E: STA $03DA  ; hatColumn2 = 3;

8711: LDA $03D1
8714: STA $03D0  ; hatType1 = nextType1;

8717: LDA $03D9
871A: STA $03D8  ; hatType2 = nextType2;

871D: JSR $889E  ; pickHats();

8720: LDA $0080  
8722: CMP $0081  ; if (hat1 == hat2) {
8724: BNE $8729  ;   pickHats();
8726: JSR 889E   ; }

8729: LDA $0080
872B: STA $03D1  ; nextType1 = hat1;

872E: LDA $0081
8730: STA $03D9  ; nextType2 = hat2;

8733: JSR $89F8
8736: JSR $8A2E

8739: LDA #$00
873B: STA $03E0  ; fallSpeed = 0;
873E: LDA #$01
8740: STA $03E1

8743: LDA #$00
8745: STA $03D5
8748: STA $03DD  ; hatFalling1 = hatFalling2 = true;

874B: RTS        ; return;

hatType1 and hatType2 is the current pair, which receives its values from the next pair. Random hats are picked and, as with the initialization routine, if the 2 hats match, a second pick occurs. Finally, nextType1 and nextType2 are updated with the results.

Since the number of frames between spawns is determined by the player's actions, this means that the game actually taps into the inherently random nature of the agent it is interfaced with. The main PRNG subroutine provides equally probable results. But, the check in the initialization and spawn subroutines skews the odds.

Since there are 6 hat types, there are 6 × 6 = 36 ways to assign h1 and h2 . And, there are 6 ways that h1 equals h2. Consequentially, the odds of both hats matching on the first pick is 1/6. The same applies to the second pick. Hence, the odds of a match after the check is reduced to 1/36.

Reducing the odds of both hats matching makes the game easier to play. The reason is that the number of columns in the playfield exactly equals the number of hat types. Consequentially, for the playfield to accommodate any particular hat type, there must be a column assigned to that type. Since there is no way to position both hats in a pair onto the same column, matching hats forces the player to contaminate at least one column with the wrong type.

The AI was further trained with the PRNG logic added to the simulator and new stats were measured:

The new average is more than twice the theoretical minimum number of spawns to needed to beat all 60 levels. However, the standard deviation reveals a very wide spread. The AI will play well enough most of the time, but it's not always guaranteed to get through all the levels. That depends on the luck of the draw.


Copyright © 2017 meatfighter.com
This project is free software; you can redistribute it and/or modify it under the terms of LGPLv2.1.