17  Finite Automata

As we saw in Section 16.3, it can be a bit tricky to come up with a regular expression for a language. Often it’s easier to draw what is called a finite state automaton (FSA) instead. For instance, for the no \(11\)s language, we make states representing the last character seen as we move through the string and if the last character was a \(1\) and we see a second \(1\), we know the string should be thrown out.

stateDiagram
    direction LR
    [*] --> 0
    0 --> 0 : 0
    0 --> 1 : 1
    1 --> 0 : 0
Figure 17.1: The Fibonacci automaton

17.1 Interpreting a FSA

The way to use a FSA is to input a string one digit at a time, let’s say from left to right. So if we take the string \(010011\) and input it into the automaton in Figure 17.1, we start in state \(0\). The arrow

stateDiagram
    direction LR
    [*] --> 0
Figure 17.2: Start arrow

indicates which state to start in.

Then we read the first character in our string, which is \(0\) and this tells us to follow the edge labeled \(0\) from our current state, which puts us back in state \(0\).

Next we read a \(1\) and this tells us to follow the edge labeled \(1\) and puts us in state \(1\). Then another \(0\) puts us back in state \(0\). Then state \(0\) again. Then state \(1\).

With the last \(1\), we try to follow an edge labeled \(1\) out of state \(1\), but there is no such edge. So at this point we fail and that means the string \(010011\) is rejected by the automaton.

17.2 A FSA for even ’1’s

To create this automaton, we need one state to represent an even number of \(1\)s and one state to represent an odd number of \(1\)s. To distinguish between success states and failure states, we will use a thick border on the success states.

stateDiagram
    direction LR

    classDef accept stroke-width:5px

    [*] --> 0
    0 --> 0 : 0
    0 --> 1 : 1
    1 --> 0 : 1
    1 --> 1 : 0

    class 0 accept
Figure 17.3: Automaton for an even number of \(1\)s

These success or failure states are used at the end of reading a string. If we finish in a success state, the string is good. If we finish in a failure state, the string is bad.

17.3 State Elimination

Let’s have a look at how to turn a FSA into a regular expression. To do this, we will modify our automata step by step, marking each path by what regular expression goes through each state.

17.3.1 One in, one out

For example, a state like

stateDiagram
    direction LR

    [*] --> S : a
    S --> S : b
    S --> [*] : c
Figure 17.4: Pre-state elimination

Can be replaced by

stateDiagram
    direction LR

    [*] --> [*] : ab*c
Figure 17.5: Post-state elimination

In order to enter state \(S\) we need to see an \(a\), then we can use \(b\) as many times as we want, then we need to see a \(c\) to leave.

17.3.2 Parallel edges

Given two parallel edges, we can combine them using \(+\) meaning “or”.

stateDiagram
    direction LR

    [*] --> [*] : a
    [*] --> [*] : b
Figure 17.6: Pre-state elimination for parallel edges
stateDiagram
    direction LR

    [*] --> [*] : a + b
Figure 17.7: Post-state elimination for parallel edges

This says to get from the first state to the second, we can take either \(a\) or \(b\).

17.3.3 Multiple ins or outs

Suppose we have multiple in-edges or out-edges on a state. We can replace the state with a path from each input to each output and record the expression for that path according to Section 17.3.1.

stateDiagram
    direction LR

    I1 --> S : a
    I2 --> S : b
    S --> S : c
    S --> O1 : d
    S --> O2 : e
Figure 17.8: Pre-state elimination for multiple edges
stateDiagram
    direction LR

    I1 --> O1 : ac*d
    I1 --> O2 : ac*e
    I2 --> O1 : bc*d
    I2 --> O2 : bc*e
Figure 17.9: Pre-state elimination for multiple edges

17.3.4 Trinary Example

Consider the language of trinary strings containing at least one \(1\) and one \(2\). Let’s setup states recording whether we’ve yet seen a \(1\), a \(2\) or both.

stateDiagram
    direction LR

    [*] --> 0
    0 --> 0 : 0
    0 --> 01 : 1
    0 --> 02 : 2
    01 --> 01 : 0,1
    02 --> 02 : 0,2
    01 --> 012 : 2
    02 --> 012 : 1
    012 --> 012 : 0,1,2
    012 --> [*]
Figure 17.10: FSA for trinary strings with at least one 1 and one 2

In order to eliminate every state, we make sure to have a starting and ending node that we can get an answer going from start to finish. There is only one accepting state here, “012”, which can go to the ending node whenever we finish.

Next, we eliminate states “01” and “02” using Section 17.3.1.

stateDiagram
    direction LR

    [*] --> 0
    0 --> 0 : 0
    0 --> 012 : 1(0+1)*2
    0 --> 012 : 2(0+1)*1
    012 --> 012 : 0,1,2
    012 --> [*]
Figure 17.11: FSA for trinary strings with at least one 1 and one 2, begin state elimination

Then we use Section 17.3.2 and finish off with Section 17.3.1 to eliminate the remaining states.

stateDiagram
    direction LR

    [*] --> [*] : 0*[1(0+1)*2 + 2(0+2)*1](0 + 1 + 2)*0
Figure 17.12: FSA for trinary strings with at least one 1 and one 2, finish state elimination

This gives us the regular expression \[ 0^*[1(0+1)^*2 + 2(0+2)^*1](0+1+2)^*. \]

17.4 Fibonacci Example

Consider the language of binary strings with no “11.” We can make an automata with three states: a good state where we can take anything, a state representing that we’ve last seen a “1” and one which represents that we’ve seen two “1”s in a row.

stateDiagram
    direction LR

    [*] --> 0
    0 --> 0 : 0
    0 --> 1 : 1
    1 --> 0 : 0
    1 --> 11 : 1
    11 --> 11 : 0, 1
    0 --> [*] : ε
    1 --> [*] : ε
Figure 17.13: FSA for binary strings with no 11

As before, we have a start and end state represented by just circles.

So first, we can eliminate the state for “11” since going there will never lead to a valid string.

stateDiagram
    direction LR

    [*] --> 0
    0 --> 0 : 0
    0 --> 1 : 1
    1 --> 0 : 0
    0 --> [*] : ε
    1 --> [*] : ε
Figure 17.14: FSA for binary strings with no 11, no sink state

The easiest state to eliminate here is state “1” because it has the fewest number of arrows and no loops. There is one path into state “1” and two out. So following Section 17.3.3, we replace these with the path \(1\varepsilon = 1\) from state “0” to the end and with the path \(10\) from \(0\) back to itself.

stateDiagram
    direction LR

    [*] --> 0
    0 --> 0 : 0
    0 --> 0 : 10
    0 --> [*] : 1
    0 --> [*] : ε
Figure 17.15: FSA for binary strings with no 11, eliminate state 1

We now have some parallel loops and parallel edges, which we can add together to get a loop of \(0 + 10\) and an edge of \(\varepsilon + 1\). So the regular expression we come to is \[ (0 + 10)^*(\varepsilon + 1). \]