🡄 Previous

Next 🡆

Contents > Abstract Machines > State Machine

Classifier

A classifier is a device that categorizes an input string. If there are only two categories, then the device is referred to as an acceptor. For instance, a regular expression matcher determines if an input string is consistent with a specific pattern or not.

The following approach for implementing a classifier embeds an input string, s, consisting of L single-byte characters, into an MC program. As with the sequencer, the first N bytes of the array, a0‥aN−1, serve as an N-byte register, and a process initializes it to a starting value. At runtime, the program iterates over the input string. During each pass, it assigns the element immediately right of the register, aN, to the ith character of the input string, si, and it applies a transition function, Tc, to the segment of the array containing both the register and the character:

for i ← 0 to L−1
  aN ← si
  a0‥aN ← Tc(a0‥aN)

Tc only modifies the register. It passes the character through, unchanged.

At completion, the process reads the computed category, which is stored in a subset or all bytes of the register.

Unfortunately, since the string is hard-coded, this approach requires a separate MC program per input. Alternatively, and preferably, a process loads the input string into the elements right of the register:

aN+i ← si∈[0,L-1]

Once there, a reusable classifier program categorizes it.

There are two basic strategies for a reusable program. The first moves the characters to the register:

for i ← 0 to L−1
  for j ← i down to 1
    [aN+j−1,aN+j] ← [aN+j,aN+j−1]
  a0‥aN ← Tc(a0‥aN)

Lines 2–3 repeatedly swap the ith character of the string, from index N+i, leftward along the array, until it reaches index N, the element immediately right of the register. The program performs:

0 + 1 + … + L−1 = L(L−1)/2 swaps

In doing so, it reverses the input string, as shown in the animation below, which depicts a classifier with a 3-byte register.

Classifier Animation: Move Data to Register

To maintain the order, the following version swaps each character back to its original location after applying the transition function.

for i ← 0 to L−1
  for j ← i down to 1
    [aN+j−1,aN+j] ← [aN+j,aN+j−1]
  a0‥aN ← Tc(a0‥aN)
  for j ← 1 to i
    [aN+j−1,aN+j] ← [aN+j,aN+j−1]

It performs L(L−1) swaps:

Classifier Animation: Move Data to Register and Back

The second strategy moves the register to the characters:

for i ← 0 to L−1
  ai‥aN+i ← Tc(ai‥aN+i)
  for j ← N−1 down to 0
    [ai+j+1,ai+j] ← [ai+j,ai+j+1]

Line 2 applies the transition function at index i, rather than index 0. Lines 3–4 move the register right, from index i to index i+1, by repeatedly swapping the ith character of the string leftward, through the register.

It takes LN swaps. For strings significantly larger than the width of the register, the second strategy has order 𝒪(L), far more efficient than the first strategy, which has order 𝒪(L2). The animation below demonstrates the improved performance.

Classifier Animation: Register to Data

Regarding reusability, the program contains the number of cycles required to run the classifier to completion. In other words, while the string is no longer hard-coded, the string length is. However, if Tc ignores zeros—all characters right of the input string—then the program is applicable to strings with length ≤ L.

🡄 Previous

Next 🡆