About

The original Puyo Puyo, released for the Famicom in 1991, did not provide an AI opponent. To fill that gap, this article provides a Java plug-in for the Nintaco NES / Famicom emulator. In 1 Player Endless Mode, the plug-in plays perpetually. In 2 Player Mode, it serves as a digital adversary. Or, it can be configured to launch two AI's to battle against each other.

Videos

They call it "Endless Mode", but the score will eventually max out at 99,999,990:

Check out this epic AI vs. AI battle:

Download

PuyoPuyoAI_2017-06-25.zip

The .zip contains:

src - The source tree.
PuyoPuyoAI.jar - The compiled binary.
lgpl-2.1.txt - The free software license.

Run

Prerequisites

Nintaco - the NES / Famicom emulator.
Puyo Puyo (J).nes - the ROM for the Famicom cartridge edition of Puyo Puyo released in 1993.

Launch the Plug-in

Start Nintaco and open Puyo Puyo (J).nes.
Extract PuyoPuyoAI.jar from the downloaded .zip.
Open the Run Program window by selecting Tools | Run Program...
Enter the path to the file in the JAR name field or browse to it using the Find JAR... button.
Press Load JAR to load it.
Press Run.
From the Puyo Puyo title screen, press Start to advance to the main menu.
From the main menu, select either 1 Player Endless Mode or 2 Player Mode to see the AI in action.

Set the Speed

To set the speed, first, press the Stop button. In the Arguments field, enter a numerical speed value from 0 (slowest) to 10 (fastest). Then, press Run again. Select a mode from the Puyo Puyo main menu to return control to the AI.

To watch it play even faster, select Machine | Speed | Max.

AI vs. AI

To have 2 AI's battle it out, first, resume normal speed (Machine | Speed | Normal) and reset the game (Machine | Reset). Next, press the Stop button and in the Arguments field, enter the single word: AIvsAI. Finally, press Run and select 2 Player Mode.

A speed value can also be provided by separating the arguments with a space (e.g. 5 AIvsAI).

Details

RAM Map

The plug-in uses the Nintaco API to manipulate CPU Memory and to receive frame rendered events. All memory addresses were discovered through exploration with the Nintaco Hex Editor and the information has been added to the Data Crystal ROMhacking.net wiki. The plug-in uses a particular set of memory addresses determined by the current mode in which it is operating. The Addresses interface represents a potential set and Player1Addresses, Player2Addresses and Player12Addresses are the implementations for 1 Player Endless Mode, 2 Player Mode and player 1 in AI vs. AI mode, respectively.

AI Algorithm

When a piece spawns, the program examines every possible lock combination of the current piece and the next piece. A valid lock position of a single piece is a position in which one or both of the Puyo in the pair is supported by the playfield floor or by another Puyo. All of the possible lock positions of a single piece are computed using breath-first search (see Searcher).

Locking a piece into the playfield has consequences: unsupported Puyo drop, connected groups of 4 or more identical Puyo are removed, and that process may repeat chain-reaction style. For each valid lock position of the current piece and its associated playfield consequences, the program tries each valid lock position of the next piece, evaluating the combined consequences. This chain of searches is represented by SearchChain.

Each of the combined consequences is fed to an evaluation function, which scores the contents of the playfield. The combo with the highest score wins out and the current piece is moved into position accordingly. The results of the search chain only affect the current piece. When the next piece spawns, it is paired up with the next-next piece and so on.

Evaluation Function

The evaluation function scores a playfield using a series of metrics:

The number of Puyo in the playfield, not including Nuisance Puyo.
The number of Nuisance Puyo in the playfield.
The edge penalty. Puyo along the playfield sides are penalized since they can only potentially link to 3 other Puyo as opposed to up to 4 when not placed against an edge. Puyo in the corners are penalized even more since they can only form 2 links max.
The spawn distance penalty. Puyo are penalized based on their proximity to the spawn coordinates. The idea is to keep the column heights as short and even as possible. And, of course, a Puyo locked at the spawn coordinates ends the game; so, the further away from that point, the better.
Consecutive colors. All rows and all columns are scanned, measuring the lengths of uniformly colored Puyo strips, when empty spaces are ignored. Longer lengths are preferred.
Color variances. For each color and independently along all rows and all columns, the average Puyo coordinate is computed and then the deviations from that average are used to evaluate variance. A smaller total variance suggests that common colors are closer together.
The number of Puyo links. Identically colored Puyo link together horizontally and vertically, but not diagonally. However, Nusiance Puyo do not form links.

These metrics are combined into a single number via a weighted sum (see EvaluationFunction).

Machine Learning

To optimize the weights used in the evaluation function, a Trainer was created that executes the following steps:

Generate a random set of weight vectors, each containing 7 elements, one for each metric.
Randomly select 2 different vectors out of the set.
Have 2 AI instances battle it out using the chosen vectors in their associated evaluation functions.
Based on the outcome, take a weighted average of the elements of the 2 vectors: 80% of the winner and 20% of the loser.
Randomly modify the elements of that merged vector by ±5%.
Replace the losing vector in the set with the merged-and-randomized vector.
Repeat from step 2.

The idea falls somewhere in between Hill Climbing and a Genetic Algorithm. Better competitors have a higher probability of survival. And, the losers are replaced with offspring that mostly resemble the winners with slight mutations.

Here are the results, rounded to nearest integer percents:

Metric	Weight (%)
# Colored Puyo	16
# Nuisance Puyo	25
Edge Penalty	8
Spawn Distance	8
Consecutive Colors	16
Color Variance	2
# Linked Puyo	25

Each metric is normalized to a number in between 0 and 1 in attempt to make them comparable. And, if they are, then the most significant metrics appear to be the number of Nuisance Puyo and the number of linked Puyo, followed by the number of other Puyo and the total consecutive colors.

AI Strength

Maxing out the score aside, Endless Mode really does appear to be endless. How far can the AI really go?

Similar to the Trainer, it is possible to run the AI at high speed as a single player outside of Nintaco to push it to the limit. However, doing so reveals that the game still never ends.

Instead, as the AI played through 50 million pieces, the number of Puyo in the playfield was recorded after each lock (see SelfPlayer). The data shows some interesting things. First, the AI cleared the playfield 3,326 times, each effectively equivalent to restarting the game. Next, it never had more than 37 Puyo on playfield; meaning, it never even occupied half of the available space; and, it only hit that number once. Most of the time was spent around 10 Puyo on the playfield.

Below, the results are plotted as a probability curve. The x-axis is the number of Puyo in the playfield and the y-axis is the probably of finding the AI in that state at any given moment.

Probability curve

It resembles a bell curve. The Excel Solver worked out constants for a Gaussian fit:

Probability curve with Gaussian approximation

Assuming that the AI won't give up until the playfield is completely full, then the game won't end until x is 84. According to the Gaussian, the probability of that happening is 1 in 10⁸⁹. Of course, as the playfield runs out of room, its behavior may radically change. Instead, assume that the AI just spontaneously gives up if it were to inadvertently fill up half of the playfield, where x is 42. The Gaussian says the odds of that happening is 1 in 10¹⁷. With either assumption, it sounds like Endless Mode truly does deserves that title.

It should also be pointed out that the AI handles the special items. The Big Puyo is capable of clearing 2 full columns. And, the Carbuncle modifies Puyo that it contacts, changing them to identical colors. The algorithm considers all the configurations of those items when they come up during its search. However, the program that collected the data for the plots above did not simulate special items and the advantages they offer. Consequentially, the AI in a real game may be far stronger than those curves suggest.