Consider the regular expression R = (a + b)* (aa + bb) (a + b)* Which…
2007
Consider the regular expression R = (a + b)* (aa + bb) (a + b)*
Which deterministic finite automaton accepts the language represented by the regular expression R ?
Attempted by 100 students.
Show answer & explanation
Key insight: the language contains exactly those strings over {a,b} that have the substring "aa" or the substring "bb" anywhere.
Construct a deterministic finite automaton that recognizes strings that contain at least one occurrence of two equal consecutive letters. Use four states:
q0: start state (no previous symbol remembered and no double seen).
q1: the last symbol seen was 'a' and no double has been seen yet.
q2: the last symbol seen was 'b' and no double has been seen yet.
q3: accepting sink state meaning "a double (aa or bb) has been seen". Once reached, remain in q3 on all inputs.
Define transitions as follows:
From q0: on 'a' -> q1; on 'b' -> q2.
From q1 (last was 'a'): on 'a' -> q3 (seen 'aa'); on 'b' -> q2.
From q2 (last was 'b'): on 'b' -> q3 (seen 'bb'); on 'a' -> q1.
From q3: on both 'a' and 'b' stay in q3 (absorbing accepting state).
Why this is correct:
Before any double is seen the automaton records only the last symbol (or none at the start) so it can detect when two equal symbols occur consecutively.
When a double is seen the automaton moves to q3 and stays there, so any further input cannot invalidate the fact that a double occurred.
Thus the machine accepts exactly those strings that contain 'aa' or 'bb' at least once.
Simple test examples:
'aa' -> accepted (q0 --a--> q1 --a--> q3).
'baab' -> accepted (double 'aa' appears in the middle).
'abab' -> rejected (no 'aa' or 'bb' anywhere).
Why other simple DFA mistakes fail:
A machine that only checks the final two symbols (accepting exactly strings that end with 'aa' or 'bb') is wrong because it misses strings where the double appears earlier (e.g., 'aab').
A machine that does not remain in an accepting sink after seeing a double is wrong because subsequent inputs could incorrectly drive it out of acceptance.
Conclusion: The DFA described above (with the accepting sink reached when 'aa' or 'bb' is first observed) correctly accepts the language of the regular expression R = (a+b)* (aa + bb) (a+b)*.