Applying Artificial Intelligence to
[ About | Videos | Download | Run | The Game | Details ]
This article describes a Java plug-in for the Nintaco NES / Famicom emulator that automatically plays the puzzle game Hatris.
Check out the AI playing through the final 10 levels:
Watch the AI blast through all levels in fast motion:
Here are all 6 Hatris cutscenes that were collected while testing the AI:
The .zip contains:
To watch it play faster, select Machine | Speed | Max.
Hatris is a puzzle game somewhat similar to Tetris. Randomly selected hat pairs drop into the playfield. The hats within the pair can be swapped and the pair can be shifted left and right, positioning it above any of the 6 playfield columns. As hats land, they pile up and if any pile reaches the top, it's Game Over. To thwart that, a contiguous stack of 5 of the same hats will vanish. The goal in each level is to eliminate a specified number of stacks of 5.
If the pair lands on the playfield floor (the heads) or onto piles of equal height, then the hats will land together. However, if only one of the hats encounters a pile, then the second hat will remain free to be positioned independently.
As hats descend, horizontal movement is initially restricted only by the playfield walls. But, further down, tall piles can further confine them, particularly limiting the placement of the second hat. Pile height ordering strategies can be employed to maximize placement options. For instance, the player can attempt to keep the piles sorted by height or they can be maintained in a U-shaped pattern.
There are 6 types of hats and each of them have different heights and nesting characteristics. Typically, hats of the same type nest compactly whereas mismatches vary to the detriment of the player. This is why matching only takes place vertically; the playfield is not a grid; hats will not lineup horizontally or diagonally. However, as mentioned above, piles interact with each other indirectly as a consequence of the height ordering.
Hatris was developed by Alexey Pajitnov, the inventor of Tetris, and his close friend Vladimir Pokhilko.
Avatars of Alexey and Vladimir appear in the game in the form of helpers and in the cutscenes. Alexey is stationed above the upper-left of the playfield. When summoned, he slides down a pole and then the player can use him to remove up to 5 hats from the bottoms of any of the piles. Vladimir appears to the lower-right of the playfield and he has the ability to swap any 2 piles, potentially contributing to the sorting order.
Helpers are a limited resource. Acquiring one involves eliminating 5 stacks of 5 identical hats. Specifically, removing 5 stacks of crowns, derbies or wizard hats earns an Alexey. And, clearing 5 stacks of baseball caps, cowboy hats or top hats earns a Vladimir. Every time a stack vanishes, one of the progress bars to the right of the playfield advances. When the bar fills up, it resets and a helper is added to the Helper Pool, depicted directly above the progress bars. The last helper earned is the first helper that may be used (FIFO). The pool has a maximum capacity of 8 helpers.
When a helper is summoned, the currently falling hats vanish. And, after the helper is finished, the next pair drops. This effect can actually be used as a strategy. If the current piles cannot accommodate a particular randomly selected pair, a helper could potentially be employed to eliminate it immediately.
The main menu enables the player to start on any of the 60 levels. The Shop is the most significant digit of the level, ranging from 0 to 5, and the Stage is the least significant digit of the level, ranging from 0 to 9. Oddly, during play, those digits are displayed together, denoted simply as the Shop.
Baseball caps, cowboy hats and top hats are introduced in level 0. Derbies are first seen in level 1. Wizard hats follow in level 3. And, crowns appear in levels 6 and above.
When playing from the beginning, the number of stacks of 5 identical hats to eliminate remains at a constant 15 in every stage. To get through all 60 levels, the player needs to assemble 60 × 15 = 900 stacks in total. And, since hats spawn in pairs, that means surviving through a minimum of 5 × 900 / 2 = 2250 spawns. Well, almost.
Normally, at the end of each level, the playfield, the helper progress bars, the Helpers Pool and the random hat sequence remains undisturbed; it's one long continuum. However, every 10 levels (at the completion of each Shop), there is a cutscene (see Videos). After the cutscene, the playfield is cleared and then it is populated with 6 × Shop randomly selected hats, distributed equally among the columns. The rest remains unchanged. Depending on the prior contents of the playfield, that effect may be beneficial or detrimental to the player.
Refer to the Hatris Instruction Manual for more details.
The plug-in uses the Nintaco API to manipulate CPU Memory and to receive frame rendered events. All memory addresses were discovered through exploration with the Nintaco Debugging Tools and the information has been added to the Data Crystal ROMhacking.net wiki. In the source, they appear as constants within the Addresses interface.
When a hat pair spawns, the plug-in examines every possible placement of the current pair and the next pair. A placement is simply which piles each hat ends up in. As noted above, the piles do not necessarily need to be adjacent and landing the first hat restricts the options for the second. All of the possible placements are determined using nested loops (see Searcher).
Landing a hat onto a pile has consequences: the hat will nest with the one beneath it, increasing the pile height and if it completes an identical stack of 5, then the stack will vanish from the pile. For each valid placement of the current pair and its associated playfield consequences, the plug-in tries each valid placement of the next pair, evaluating the combined consequences. This chain of searches is represented by SearchChain.
Each of the combined consequences is fed to an evaluation function, which scores the contents of the playfield. The combo with the highest score wins out and the current pair is placed accordingly. The results of the search chain only affect the current pair. When the next pair spawns, it is evaluated with the next-next pair and so on.
In addition to examining every pair placement, when a helper is available, the plug-in also evaluates using it. For Alexey, this means it tries all possible ways to remove up to 5 hats from the bottom of piles. For Vladimir, it explores all of the potential pile swaps. Since using a helper annihilates the falling hats, the option is treated as one of the two search chain steps as opposed to introducing more. And, the same evaluation function applies.
The evaluation function scores a playfield using a series of metrics:
Metrics (1) and (2) attempt to minimize and even out the amount of material in the playfield. Metrics (3), (4) and (5) look for conformity or discord within the piles. Metrics (6) and (7) try to keep the piles in sorted order. And, metric (8) aims for a high score.
These metrics are combined into a single number via a weighted sum (see EvaluationFunction).
The road to wisdom?—Well, it's plain
and simple to express:
Err
and err
and err again,
but less
and less
and less.
— Piet Hein
To optimize the weights used in the evaluation function, a Trainer was created that executes the following algorithm:
W := a weight vector, the elements initialized randomly repeat indefinitely: S := a randomly generated sequence of hat pairs R := W rated by running the AI across S repeat a few times: Z := a set of weight vectors, each a randomly generated variant of W for each weight vector, V, in Z: T := V rated by running the AI across S if T is better than R: R := T W := V
It's a variation of Hill Climbing. Here are more details:
Here are the results, rounded to 5 decimal place:
Metric | Weight (%) |
---|---|
# Hats2 | 0.00100 |
Pile Heights2 | 0.00055 |
Run Lengths2 | 1.78585 |
Top Run Lengths2 | 0.06811 |
# Mismatches | 88.31852 |
Permutation Distance | 0.00048 |
Height Differences | 0.06392 |
Cash | 9.76156 |
Each metric is normalized to a number in between 0 and 1 in attempt to make them comparable. And, if they are, then the total number of mismatches (different adjacent hats) is by far the most significant. Total cash (the score) comes in a distant second. And, the run lengths (identical hat spans) comes in as an even more distant third. The height ordering metrics, although not that significant percentagewise, are still relevant to Vladimir and potentially to long-term play.
After training the AI, the number of spawns required to reach Game Over were recorded over a large batch of simulated runs (see MeasureStats). Below is the average and standard deviation:
That's abysmal. As mentioned above, 2250 is the theoretical minimum number of spawns required to defeat all 60 levels. However, when interfaced with the real game, the AI demonstrated a surprising degree of proficiency; it appeared to be able to get through every stage without issue.
The AI was clearly taking advantage of some aspect of the real game missing in the simulation. One possibility was how the playfield is reset after the cutscenes, a feature that the simulation ignores. But, that happens so sparely that the long-term effect should not be that significant. Another possibility is that the real game starts out with only 4 hat types. But, all successive levels after the fifth one use 6 hat types. That also seems insignificant in the long-term.
The most likely candidate was the pseudorandom number generator (PRNG) used to produce the hat pair sequence. It could be rigged to make the game easier to play, which would actually be consistent with some of the other Tetris variants of the era. For example, Super Tetris 2 + Bombliss, developed and published by Bullet Proof Software in 1992 (3 years after this game) for the Super Famicom, is able to detect when the player is getting close to topping out and it will respond by providing extra long-bars.
There are many simply ways that the PRNG could be rigged to aid the player. For instance, it could occasionally produce needed pairs by inspecting the tops of adjacent piles. And, it could be setup to do this with increasing frequency as a function of the maximum pile height. Another simple option is to generate repeated pairs that could easily be collected by the player. In fact, the latter does appear to happen every now and then. But, that may just be a statistical fluke.
The PRNG could be analyzed empirically. At the very least, that would reveal if it were rigged or not. But, the data probably could not explain the exact mechanisms behind it. The only way to get a definitive answer would be to peek at the code...
All of the variables that appear in the following 6502 code comments are single-byte global variables. The PRNG depends on 2 such variables, designated as h1 and h2, which are seeded on game launch:
seedRNG: 8CB2: LDY #$00 8CB4: LDA $8CDC,Y 8CB7: STA $03C0,Y 8CBA: INY 8CBB: CPY #$28 8CBD: BNE $8CB4 8CBF: LDY #$00 8CC1: LDA $8D04,Y 8CC4: STA $03F0,Y 8CC7: INY 8CC8: CPY #$10 8CCA: BNE $8CC1 8CCC: LDA #$04 8CCE: STA $03E5 8CD1: LDA #$3F 8CD3: STA $0304 ; h1 = 63; 8CD6: LDA #$85 8CD8: STA $0305 ; h2 = 133; 8CDB: RTS ; return;
Below is the main PRNG subroutine:
randomizeHats: 88D8: LDA $0300 88DB: EOR $0304 88DE: ROR A 88DF: STA $0304 ; h1 = rotateRight(frameCounter ^ h1); 88E2: LDA $0300 88E5: EOR $0305 88E8: ROL A 88E9: STA $0305 ; h2 = rotateLeft(frameCounter ^ h2); 88EC: LDA $0304 88EF: STA $0080 88F1: SEC 88F2: SBC #$06 88F4: BCS $88EF 88F6: LDA $0080 88F8: CLC 88F9: ADC #$01 88FB: STA $0080 ; hat1 = h1 % 6 + 1; 88FD: LDX $0080 88FF: CPX #$07 8901: BCS $8901 8903: LDA $0305 8906: STA $0081 8908: SEC 8909: SBC #$06 890B: BCS $8906 890D: LDA $0081 890F: CLC 8910: ADC #$01 8912: STA $0081 ; hat2 = h2 % 6 + 1; 8914: RTS ; return;
frameCounter is incremented once per frame (since startup). h1 and h2 are XORed with frameCounter and then they are circularly shifted right and left, respectively. hat1 and hat2 are set to h1 and h2 mod 6 plus 1, respectively. Those are hat types, the output of the routine.
The main PRNG subroutine is only called by the following:
pickHats: ; while(true) { 889E: JSR $88D8 ; randomizeHats(); 88A1: LDA $03A8 88A4: BNE $889D 88A6: LDA $03A9 ; if (stage >= 6) { 88A9: CMP #$06 ; return; 88AB: BCS $889D ; } 88AD: LDA #$03 88AF: CMP $0080 88B1: BEQ $889E ; if (hat1 == CROWN || hat2 == CROWN) { 88B3: CMP $0081 ; continue; 88B5: BEQ $889E ; } 88B7: LDA $03A9 ; if (stage >= 3) { 88BA: CMP #$03 ; return; 88BC: BCS $889D ; } 88BE: LDA #$02 88C0: CMP $0080 88C2: BEQ $889E ; if (hat1 == WIZARD || hat2 == WIZARD) { 88C4: CMP $0081 ; continue; 88C6: BEQ $889E ; } 88C8: LDA $03A9 ; if (stage != 0) { 88CB: BNE $889D ; return; ; } 88CD: LDA #$05 88CF: CMP $0080 88D1: BEQ $889E ; if (hat1 != DERBY && hat2 != DERBY) { 88D3: CMP $0081 ; break; 88D5: BEQ $889E ; } ; } 88D7: RTS ; return;
The loop just makes sure that the early stages use fewer hat types. No surprises there.
The initialization subroutine, which is invoked once immediately after starting a new game from the main menu, provides the first clue:
init: 8C60: LDY #$00 8C62: LDA #$00 8C64: STA $00A0 8C66: LDA #$60 8C68: STA $00A1 8C6A: LDA #$00 8C6C: STA ($A0),Y 8C6E: INY 8C6F: BNE $8C6A 8C71: INC $00A1 8C73: LDX $00A1 8C75: CPX #$7C 8C77: BNE $8C6A 8C79: JSR $A1CE 8C7C: JSR $8CB2 ; seedRNG(); 8C7F: JSR $8F14 8C82: JSR $A139 8C85: LDA #$FF 8C87: STA $032A 8C8A: JSR $B722 8C8D: LDA #$00 8C8F: STA $032A 8C92: JSR $889E ; pickHats(); 8C95: LDA $0080 8C97: CMP $0081 ; if (hat1 == hat2) { 8C99: BNE $8C9E ; pickHats(); 8C9B: JSR $889E ; } 8C9E: LDA $0080 8CA0: STA $03D1 ; nextType1 = hat1; 8CA3: LDA $0081 8CA5: STA $03D9 ; nextType2 = hat2; 8CA8: LDA #$0F 8CAA: CLC 8CAB: ADC $03A9 8CAE: STA $03A3 8CB1: RTS ; return;
It seeds the PRNG and then it picks a random pair to begin the sequence. However, it check if the both hats in the pair match. If they do, it picks a second time. nextType1 and nextType2 are the result, the first pair in the sequence.
That same logic appears in the pair spawn subroutine, which is called every time the sequence needs to be advanced:
spawn: 86CA: LDA $03F0 86CD: BNE $86C9 86CF: LDA $03C0 86D2: BEQ $86C9 86D4: LDA #$00 86D6: STA $03C0 ; playing=true; 86D9: LDA #$FF 86DB: STA $03C8 86DE: LDA #$00 86E0: STA $03C2 86E3: LDA #$08 86E5: PHA 86E6: JSR $8B3B 86E9: JSR $94EA 86EC: JSR $8894 86EF: PLA 86F0: SEC 86F1: SBC #$01 86F3: BNE $86E5 86F5: LDA #$A9 86F7: STA $03D3 86FA: STA $03DB ; hatY1 = hatY2 = 169; 86FD: LDA #$68 86FF: STA $03D4 ; hatSpriteX1 = 104; 8702: LDA #$80 8704: STA $03DC ; hatSpriteX2 = 128; 8707: LDA #$02 8709: STA $03D2 ; hatColumn1 = 2; 870C: LDA #$03 870E: STA $03DA ; hatColumn2 = 3; 8711: LDA $03D1 8714: STA $03D0 ; hatType1 = nextType1; 8717: LDA $03D9 871A: STA $03D8 ; hatType2 = nextType2; 871D: JSR $889E ; pickHats(); 8720: LDA $0080 8722: CMP $0081 ; if (hat1 == hat2) { 8724: BNE $8729 ; pickHats(); 8726: JSR 889E ; } 8729: LDA $0080 872B: STA $03D1 ; nextType1 = hat1; 872E: LDA $0081 8730: STA $03D9 ; nextType2 = hat2; 8733: JSR $89F8 8736: JSR $8A2E 8739: LDA #$00 873B: STA $03E0 ; fallSpeed = 0; 873E: LDA #$01 8740: STA $03E1 8743: LDA #$00 8745: STA $03D5 8748: STA $03DD ; hatFalling1 = hatFalling2 = true; 874B: RTS ; return;
hatType1 and hatType2 is the current pair, which receives its values from the next pair. Random hats are picked and, as with the initialization routine, if the 2 hats match, a second pick occurs. Finally, nextType1 and nextType2 are updated with the results.
Since the number of frames between spawns is determined by the player's actions, this means that the game actually taps into the inherently random nature of the agent it is interfaced with. The main PRNG subroutine provides equally probable results. But, the check in the initialization and spawn subroutines skews the odds.
Since there are 6 hat types, there are 6 × 6 = 36 ways to assign h1 and h2 . And, there are 6 ways that h1 equals h2. Consequentially, the odds of both hats matching on the first pick is 1/6. The same applies to the second pick. Hence, the odds of a match after the check is reduced to 1/36.
Reducing the odds of both hats matching makes the game easier to play. The reason is that the number of columns in the playfield exactly equals the number of hat types. Consequentially, for the playfield to accommodate any particular hat type, there must be a column assigned to that type. Since there is no way to position both hats in a pair onto the same column, matching hats forces the player to contaminate at least one column with the wrong type.
The AI was further trained with the PRNG logic added to the simulator and new stats were measured:
The new average is more than twice the theoretical minimum number of spawns to needed to beat all 60 levels. However, the standard deviation reveals a very wide spread. The AI will play well enough most of the time, but it's not always guaranteed to get through all the levels. That depends on the luck of the draw.
Copyright © 2017 meatfighter.com
This project is free software; you can redistribute it and/or modify it under the terms of LGPLv2.1.