Consider the regular expression R = (a + b)* (aa + bb) (a + b)* Which…

2007

Consider the regular expression R = (a + b)* (aa + bb) (a + b)*

Which deterministic finite automaton accepts the language represented by the regular expression R ?

Attempted by 100 students.

Show answer & explanation

Key insight: the language contains exactly those strings over {a,b} that have the substring "aa" or the substring "bb" anywhere.

Construct a deterministic finite automaton that recognizes strings that contain at least one occurrence of two equal consecutive letters. Use four states:

  1. q0: start state (no previous symbol remembered and no double seen).

  2. q1: the last symbol seen was 'a' and no double has been seen yet.

  3. q2: the last symbol seen was 'b' and no double has been seen yet.

  4. q3: accepting sink state meaning "a double (aa or bb) has been seen". Once reached, remain in q3 on all inputs.

Define transitions as follows:

  • From q0: on 'a' -> q1; on 'b' -> q2.

  • From q1 (last was 'a'): on 'a' -> q3 (seen 'aa'); on 'b' -> q2.

  • From q2 (last was 'b'): on 'b' -> q3 (seen 'bb'); on 'a' -> q1.

  • From q3: on both 'a' and 'b' stay in q3 (absorbing accepting state).

Why this is correct:

  • Before any double is seen the automaton records only the last symbol (or none at the start) so it can detect when two equal symbols occur consecutively.

  • When a double is seen the automaton moves to q3 and stays there, so any further input cannot invalidate the fact that a double occurred.

  • Thus the machine accepts exactly those strings that contain 'aa' or 'bb' at least once.

Simple test examples:

  • 'aa' -> accepted (q0 --a--> q1 --a--> q3).

  • 'baab' -> accepted (double 'aa' appears in the middle).

  • 'abab' -> rejected (no 'aa' or 'bb' anywhere).

Why other simple DFA mistakes fail:

  • A machine that only checks the final two symbols (accepting exactly strings that end with 'aa' or 'bb') is wrong because it misses strings where the double appears earlier (e.g., 'aab').

  • A machine that does not remain in an accepting sink after seeing a double is wrong because subsequent inputs could incorrectly drive it out of acceptance.

Conclusion: The DFA described above (with the accepting sink reached when 'aa' or 'bb' is first observed) correctly accepts the language of the regular expression R = (a+b)* (aa + bb) (a+b)*.

Explore the full course: Gate Guidance By Sanchit Sir