CS 435Homework 03
Name your .docx file with your answers: CS435HW03LastName.
For each of the following regular expressions in a) – d) do steps 1 – 3:
1. Draw the syntax (expression) tree.
2. Using Theorem 2.8.1 translate each regular expression into an NFA- ε. Give a state diagram
representation of the constructed NFA- ε. Be sure to indicate the start state and final
state(s) and be sure to label all states and transitions.
3. Using Algorithm 2.5 translate the NFA- ε into a DFA. Show all epsilon closures and all steps
as shown in the example in the textbook. (You can use 1, 2, … for state names as opposed to
q1, q2, … .) Show the state diagram of the DFA and show the tabular of the DFA. Be sure to
indicate the start state and final state(s) and be sure to label all states and transitions.
a)
b)
c)
d)
( a | b )*
( a* | b* )*
( ( ε | a ) b* )*
( a | b )* a b b ( a | b )*
Chapter 2
Introduction to Languages and Finite
tio
n
State Automata
Computer language designers must formally specify the set of legal programs in the designed language to
bu
users (programmers) and implementors of that language. Formally, a programming language is a set of
strings. Since the set may be infinite, a formal way of generating the set in a finite way is needed. To do this,
tri
formal grammars are used to generate the languages. Checking whether a program is syntactically correct is
is
tantamount to implementing an algorithm to realize set membership of strings in a possibly infinite set. To
rD
do this, implementors also use computational models, such as finite state automata, push-down automata,
or Turing machines to implement the membership test.
In this chapter and the next, formal languages, formal grammars, automata, and related concepts are
fo
covered. In addition, their application to building and engineering the front-end of translators (interpreters
and compilers) is covered. As presented in the previous chapter the first two phases of a translator usually
of a translator.
N
ot
include lexical analysis (scanning) and parsing. Together these phases are usually referred to as the front-end
To model lexical analysis, the lexemes are usually viewed as comprising the “sentences” of this lexical
language. The lexemes are categorized into groups called tokens. These tokens makes up the vocabulary of
the programming language being translated. To recognize lexemes, the computational model class known
as a finite state automata are utilized. There are different kinds of finite state automata: nondeterministic,
nondeterministic-with-ϵ-moves, and deterministic. Each are useful for different aspects of lexical analysis,
but, it turns out, all of these are equivalent in the sense that they recognize the same family of languages,
called regular languages. None of these finite automata recognize more languages than the other.
To help language designers express these lexemes of regular languages, regular expressions are used. We
say that regular expressions denote regular languages and finite state automata recognize regular languages.
This chapter looks at the formal definition of finite state automata and the simulation or execution of them
to test string membership in regular languages. We also examine how to express regular languages with
regular expressions along with the algebraic identities of regular expressions and the construction of finite
35
2.1 Formal Languages
Introduction to Languages and Finite State Automata
state automata from regular expressions. This latter translation is used by compiler building tools, such as
lex or antlr4.
The concepts presented in this chapter are intended to give the theoretical basis for building lexical
analyzers. The lexical analysis of a compiler not only involves recognizing the lexemes of a programming
language, but categorizing the lexeme as a token kind and returning the token and, where necessary, the
corresponding lexeme to a parser. In the next chapter grammars and automata related to parsing is examined.
The parser realizes the string membership function for a given language whose vocabulary is comprised of
the tokens recognized by the lexical analyzed (scanner).
2.1
Formal Languages
Computer scientists use finite state automata to model many types of hardware and software systems. The
n
primary focus here is to introduce finite state automata as regular language recognizers. This chapter
tio
ultimately provides the foundation of writing software for scanners in compilers and front ends for the
command and data entry components of user interfaces. It can be shown that given a set of regular expression,
bu
an finite state automata can be constructed that recognizes the regular language specified by the set of regular
expressions. We will study an algorithm that can translate a regular expression into a deterministic finite
tri
state automata. Being able to do this, allows one to develop compiler-writing tools that allows a compiler
is
writer specify the lexemes as a set of regular expression and a recognizer automatically generated. Flex
and Antl4 are examples of tools that use regular expressions to produce deterministic finite automata to be
rD
utilized in a scanner.
Regular expressions and the role as a denotation for regular languages are covered first. The syntax of
fo
regular expressions and their meanings (semantics) is given. To make regular expressions easier to express
and make more concise, extensions to meta-syntax and operators are shown. Also, identities surrounding
particular form.
N
ot
regular expressions are also shown for the purpose of simplifying or transforming a regular expression into a
A class of finite state automata, called deterministic finite automata, are defined first. Following their
definition and behavior, non-deterministic automata are introduced. It will be shown that non-deterministic
automata and deterministic automata recognize the same class of regular languages.
Following the presentation of the automata and their behavior, the theorems stating the equivalence of
finite automata to regular grammars, in sense of the class of languages recognized by finite automata is the
same as the class of languages generated, is shown. First, the transformation from a regular language that
generates a language L to a non-deterministic automata that recognizes that language L and the inverse
transformation are shown. In addition similar transformations between regular grammars and deterministic
automata are also established.
Copyright 2024
36
Introduction to Languages and Finite State Automata
2.2
2.2 Overview of Formal Languages
Overview of Formal Languages
Since the advent of the electronic digital computer the representation of data and instructions along with
the properties of the representations has been studied. In particular, the text representation of programming
languages and data along with the automatic translation of text from one natural or programming language to
another has been researched quite extensively by both computer scientists and linguists. Chomsky grammars
have played a central and foundational role in describing the syntax of these languages and the framework
for the formal semantics of these languages.
Although electronic digital computers are finite, the sets of finite strings representing programming
languages and abstract data are modeled as potentially infinite sets. Designers of a programming language
need a notation or metalanguage for formally specifying the potentially infinite number of legal strings of
their new language. Developers writing translators for the language must know the designer’s intent of what
n
the legal strings are comprising the new programming language.
tio
A particular family of formal grammars, first formulated by Noam Chomsky 1 , are widely used and studied
by computer scientists for describing formal languages. Chomsky grammars also allow one to formally
bu
describe the syntactic structure of a language, which can be useful in describing the semantics of that
language. This can help with proper use of the language and also translation of strings of that language to
tri
strings of other languages.
Many notations and constructions have been introduced to describe infinite sets. In mathematical litera-
is
ture, an author might describe an infinite set of non-negative, even integers to a reader through establishing
rD
a pattern sequence or through elementary arithmetic operations over constants and numbers chosen from a
given set. For example, one can describe the set of even non-negative integers using these two ways as:
= {0, 2, 4, . . .} = {2k|k ∈ N = {0, 1, 2, . . .}}.
fo
Ne
N
ot
To test membership of a number n in Ne one can either begin enumerating the members of Ne until n
is found. Another algorithmic approach is to test whether 2 divides n without remainder; if so, then n is in
Ne and if not, then it is not in the set.
Just as set construction notation involving functions and predicates is used to describe the numbers comprising an infinite set of those numbers, strings comprising the elements of an infinite set can be described
similarly. However, the operations and objects that comprise the functions and predicates must be operations over string objects. In this chapter we examine what these operations are and how formal grammars
serve as a finite way for constructing infinite (and finite) languages of strings and for as a means for constructing procedure or algorithms for testing membership of strings. This latter aspect of language theory
deals primarily with the study and design of algorithms and models that are often referred to as language
recognizers and automata. We shall concentrate the discussion first on the representation or description of
languages and postpone the study of recognizers, parsers, etc. until later.
The scope of this chapter is to introduce the reader to the fundamental concepts underlying formal language theory so that it may be applied later to translator writing systems, such as interpreters and compilers.
1 Noam Chomsky, a linguist, has provided the framework for the subject matter in this chapter.
was done in the late 1950’s.
Copyright 2024
37
Most of his original work
2.3 Strings
Introduction to Languages and Finite State Automata
Section 2.3 introduces definitions of concepts related to strings, string operations, and string properties, each
of which provide the foundational concepts necessary for discussing formal grammars. Next, we cover Chomsky grammars and the four hierarchical families of Chomsky languages along with the corresponding families
of Chomsky grammars. Context Free grammars, including regular grammars, and related concepts such as
leftmost and rightmost derivations, parse trees, ambiguity and properties of these grammars are introduced.
Finally, the relationship to parsing is introduced. A more complete discussion of parsing of context free
languages and related recognizers are discussed in a later chapter.
2.3
Strings
Within formal language theory, an alphabet (or vocabulary) is a finite set of symbols. A finite string is a
finite sequence over an alphabet A. Recall that a finite sequence is a total function restricted to an index set
n
domain: [n] = {0, 1, . . . , n − 1} where n ≥ 0; [0] represents the empty set. A string, α : [n] → A, has length
tio
n and is denoted |α| = n, where n ≥ 0. For each alphabet, A, there is only one string with the empty index
set; this string is called the empty string, denoted in this text as ϵA . Often the subscript A is dropped when
bu
the alphabet A is clear.
Notation: Although strings are finite sequences, we often replace the sequence delimiters , with
tri
quote ” symbol delimiters and leave out commas separating the symbols (terms). Where confusion will not
is
arise, the quote ” delimiters may be dropped.
rD
Example 2.1 Using the String definition
Let A = {a, b, c}. Example of strings are α1 = “a” and α2 = “aac”. More formally, we have:
N
ot
a),(1 7→ a), (2 7→ c)}.
fo
α1 : [1] → A, formally defined as: {(0 7→ a)} and α2 : [3] → A formally defined as: {(0 7→
Substrings, String Prefixes and String Suffixes
Given strings α, β, and ω, (each possibly empty) such that ϕ = αβω, α is the prefix string of ϕ and proper
prefix iff α ̸= ϕ and α ̸= ϵ; similarly, ω (ω possibly ϵ) is the suffix string of ϕ and the proper suffix iff ω ̸= ϕ
and ω ̸= ϵ; and α, β, and ω are all substrings of ϕ.
Example 2.2 Prefixes and Suffixes
Let ϕ = cabb be a string over an alphabet that contains {a, b, c}.
Prefixes of cabb: ϵ, c, ca, cab, and cabb;
Proper Prefixes of cabb: c, ca, cab;
Suffixes of cabb: cabb, abb,bb, b and ϵ;
Copyright 2024
38
Introduction to Languages and Finite State Automata
2.3 Strings
Proper Suffixes of cabb: abb,bb, and b .
Substrings of cabb: All prefixes and suffixes of cabb along with a, b, ab.
Concatenation
The concatenation of strings α : [m] → A and β : [n] → A is the string αβ : [m + n] → A, where αβ(i) = α(i)
for 0 ≤ i ≤ m and αβ(j + m) = β(j) for 0 ≤ j ≤ n. It follows that |αβ| = |α| + |β|. Note that for all strings
α, αϵ = ϵα = α. Also note that concatenation is not always commutative; that is, it does not always hold
that αβ = βα.
n
Example 2.3 Concatenation
tio
Again, let the alphabet A = {a, b, c} and let α1 = “a” and α2 = “aac”. The concatenation, α1 α2 = “aaac”.
= {(0 7→ a), (1 7→ a), (2 7→ a), (3 7→ c)}
rD
is
tri
α1 α2
bu
We have α1 α2 : [4] → A. Formally, we have:
Exponentiation and String Reverse Notation
fo
It is convenient to use exponentiation notation to represent repeated symbols in a string. For each symbol
a ∈ A, we may write an (n ≥ 0) as a shorthand notation for aa
· · · a}. For example, it will be convenient
| {z
n a′ s
N
ot
to write aaac as a3 b1 For each symbol a ∈ A, a1 = a. Usually, exponent of value 1, the exponent is left
implicit. The strings of length 1 are not only interpreted as strings, but as single symbols from an alphabet;
the context of usage is used for understanding the proper interpretation or explicit quotes are used: ”a”.
Also note that for any a ∈ A, a0 = ϵ (or more precisely, ϵA ).
For a ∈ A and m, n ≥ 0:
am an
= am+n .
It is also convenient to use exponentiation notation to represent repeated concatenation of repeated
strings over an alphabet. To denote the concatenation of n ≥ 0 copies of a string α, we write αn as
shorthand notation for |αα{z
· · · α}. Again, α0 = ϵ.
n α′ s
It is sometimes convenient to refer to the reverse of a string. The reverse of a string α = a1 a2 . . . an is a
string denoted and defined as αR = an . . . a2 a1 . A string α is a palindrome iff α = αR . The reverse of ababc,
written ababcR = cbaba. The string abba is a palindrome.
Copyright 2024
39
2.3 Strings
Introduction to Languages and Finite State Automata
Concatenation over Sets of Strings
Concatenation can also be extended over sets of strings. Let S and T each be sets of strings. The set
S • T = {αβ|α ∈ S and β ∈ T }. For an alphabet A, An (n ≥ 0) is recursively defined as follows:
{ϵ}
if n = 0
An =
An−1 • A if n > 0.
Kleene Closure
Note that An is the set of all strings of length n over A. The Kleene closure of an alphabet A is defined as:
A∗ = A0 ∪ A1 ∪ A2 ∪ · · · =
∞
[
Ai .
i=0
A∗ is the infinite set of all possible finite length strings over alphabet A. Any set L ⊆ A∗ is a language over
n
an alphabet A. Each string in language L is called a sentence.
bu
A+ = A∗ − {ϵ}
tio
It also will be convenient to use the following set and corresponding notation:
tri
An Algebraic Structure using String Concatenation
In summary, the concatenation operation over strings of A∗ forms an algebraic structure called a monoid.
rD
is
Such algebraic structures possess the following properties:
1. (Closure) For all α, β ∈ A∗ , αβ ∈ A∗ ;
fo
2. (Associative) For all α, β, γ ∈ A∗ , (αβ)γ = α(βγ);
N
ot
3. (Identity Element: ϵ) For each α ∈ A∗ , αϵ = ϵα = α;
These properties show that A∗ with concatenation forms an algebraic structure called an monoid. Because
a monoid is formed, properties of monoids prove to be important and useful in more theoretical studies of
language theory.
Examples of Languages
Example 2.4 Application of String and Language Definitions
Let A = {0, 1}.
A∗
= {ϵ, 0, 1, 00, 01, 10, 11,
000, 010, 100, 110, 001, 011, . . .}
The set of binary digit strings with no leading zeroes is one possible language.
The set L = {0n 1m | n, m ≥ 0} is another example of a language over this alphabet A. Some sentences of L
are 0 ∈ L since 01 10 = 0ϵ = 0, where n = 1 and m = 0. Similarly, 1 ∈ L since 00 11 = ϵ1 = 1, where n = 0
Copyright 2024
40
Introduction to Languages and Finite State Automata
2.3 Strings
and m = 1. For n = 1, m = 1, we have 01 11 ∈ L. For n = 5, m = 2, we have 05 12 = 0000011 ∈ L. Note that
ϵ ∈ L for n = m = 0. Languages, L, described in English is: any number of 0’s (possibly none), followed by
any number of 1’s (possibly none), but once a 1 is encountered (read from left to right) no 0 may occur.
As shown in the previous example some languages may contain the empty string. For each alphabet, is a
language that contains only the empty string, ϵ. Note that there is a difference between an empty language
and the language containing only the empty string. L = ∅ = { } and L = {ϵ} are both languages given any
alphabet.
Example 2.5 Language of Latin Letter Alphabet
= {ϵ, a, b, . . . , z,
tio
A∗
n
Let A = {a, b, . . . , z}.
bu
aa, ab, ac, . . . , az, ba, bb, . . . bz,
..
.
tri
. . . , za, zb, zc, . . . , zz,
is
aaa, aab, aac, . . . , aaz, aba, abb, . . . , abz,
..
.
rD
zza, zzb, zzc, . . . , zzz, . . . ,
. . .}
N
ot
in Webster’s dictionary.
fo
Here the alphabet is the set of lower-case Latin letters. One possible subset of A∗ is the set of English words
By using the terminology defined thus far, this example language has sentences that are words; this is
not commonly used terminology when discussing natural languages, such as English. It is more common to
think of the words of a dictionary as serving as the building blocks of the language as alphabet sets have
been defined above. We shall see in the development of formal grammars, the sentences of one language may
serve the role of an alphabet for another language. Such is the case here for English. When using alphabets
in this hierarchical way, the sentences forming the dictionary and, in turn, used as an alphabet for another
language, the latter alphabet is often called a vocabulary. This is shown in the next example.
Example 2.6 Language Over a Dictionary Alphabet (or Vocabulary)
Let the vocabulary V equal the set of words in Webster’s dictionary along with the usual punctuation marks
and a space. The set of syntactically legal English sentences is one possible language. Note that some of the
sentences might not make any sense semantically.
Copyright 2024
41
2.4 Regular Expressions
Introduction to Languages and Finite State Automata
Note that most natural languages, such as English, are dynamic in the sense that the set of syntactically legal sentences change; moreover, the vocabulary, which comprises an English dictionary, is also
ever-changing.
Programming languages, like many natural languages, can be viewed as being built from a vocabulary,
which can be viewed as a language built from an alphabet. In contrast to natural languages each of which
are based upon a dictionary of finite size that serves as its vocabulary or alphabet, programming language’s
alphabet may be theoretically infinite if there are no restrictions on the length of identifier (e.g. variable)
names. Each identifier can serve as an element in the alphabet and sentences are legally correct programs
in that language.
=
{ keywords: abstract , boolean, . . . , while }
tio
Let V
∪ { operators and/or delimiters:
{ , } , ( , ) , , , + , – , * , / , ++ , — , == , = , . . . }
x , X , x1 , Max , MAX , . . . , sort , . . . , }
bu
∪ { identifiers:
n
Example 2.7 Java Vocabulary
∪ { literals: abc, a , ’a’, 123 , -123 , +123 , 123L , 12.3 , 12.3f , 12.3e04 , . . . }
=
{ ϵ, abstract , class x { } , . . . , abstract class, if , . . . }
tri
V
∗
Regular Expressions
fo
2.4
rD
is
The Java language is one possible subset of V ∗ . Listed above are strings that are not in the Java language.
Regular expressions serve as a convenient way to describe string patterns that can be searched in a larger
N
ot
string, such as a text file. In language design and compiler engineering, regular expressions are used to
describe lexemes and, with these regular expressions, FSAs can be constructed and, in turn, can represent
scanners. We now look at the syntax and semantics of regular expressions.
Copyright 2024
42
Introduction to Languages and Finite State Automata
2.4.1
2.4 Regular Expressions
Regular Expression Syntax and Semantics
Definition 2.4.1 Regular Expression Syntax
Let T be an alphabet. Regular expressions over the alphabet, T ∪ {∅, ϵ, |, ∗, (, )}, with T ∩ {∅, ϵ, |, ∗, (, )} = ∅
are defined as follows:
∅ is a regular expression;
ϵ is a regular expression;
for each a ∈ T , a is a regular expression;
if r1 and r2 are regular expressions, then r1 |r2 is a regular expression;
if r1 and r2 are regular expressions, then r1 r2 is a regular expression;
n
if r is a regular expression, then r∗ is a regular expression;
tio
if r is a regular expression, the (r) is a regular expression.
bu
Definition 2.4.2 Regular Expression Semantics Let T be an alphabet and r1 , r2 , and r be regular expressions.
tri
∅ is denotes ∅;
is
ϵ is denotes {ϵ};
rD
each regular expression a is denotes {a};
if r1 and r2 denotes sets, R1 and R2 , r1 |r2 denotes R1 ∪ R2 = {α|α ∈ R1 or α ∈ R2 };
if r1 and r2 denotes sets, R1 and R2 , r1 r2 denotes R1 R2 = {αβ | α ∈ R1 , β ∈ R2 };
fo
if r denotes set R, then r∗ denotes R∗ ;
N
ot
if r denotes set R, the (r) denotes R.
The operators precedence levels are highest to lowest as follows: Kleene closure (∗), Concatenation, and then
Union (|). The parentheses can alter the precedence rules.
Example 2.8 Regular Expression
Identifiers in C/C++ can be denoted as follows. Let letter → a|b| . . . z|A|B| . . . Z
Let digit → 0|1| . . . 9
Let ident → (letter| )(letter| |digit)∗
Example 2.9 Regular Expression
Consider this regular expression: abc∗ is denoted by the set {a}{b}{c}∗ = {a}{b, bc, bcc, bccc, . . .} = {ab, abc, abcc, abccc, . . .}.
Copyright 2024
43
2.4 Regular Expressions
Introduction to Languages and Finite State Automata
Example 2.10 Regular Expression
a|bc∗ is denoted by the set {a} ∪ {b}{c}∗ = {a} ∪ {b, bc, bcc, bccc, . . .} = {a, b, bc, bcc, bccc, . . .}
2.4.2
Regular Expression Identities
In Figure 2.1 is a list of identities.
Let r, s, and t represent regular expressions in the following. For a regular expression, r, the following
notation is introduced for convenience: r+ = rr∗ = r∗ r.
r|s
= s|t
(Commutative)
(Associative)
= r(st)
r(s|t)
= rs|rt
(Distributive)
(s|t)r
= sr|tr
r|∅
= ∅|r = r
rϵ
= ϵr = r
r∅
= ∅r = ∅
tio
rD
is
tri
(Identity)
(Zero)
= r
(Idempotent)
(r∗ )∗
= r∗
(Closure-related Identities)
r∗
= ϵ + r+
ϵ∗
= ϵ
ϵ+
= ϵ
∗
= ϵ
N
ot
fo
r|r
n
(rs)t
bu
(r|s)|t = r|(s|t)
∅
Figure 2.1: Regular Expression Identities
2.4.3
Extended Notation for Regular Expressions
The class of regular languages is the family or set of languages that can be described with the set constructions
operations used for defining the semantics of regular expressions. The sets of lexemes for most programming
languages form small regular languages. Since many languages have are based upon fairly large character
sets, regular expressions used for denoting these regular languages can become quite long and cumbersome.
Copyright 2024
44
Introduction to Languages and Finite State Automata
2.5 Finite State Automata
For example, using the standard ASCII character set, a regular expression for denoting a string of upper-case
or lower-case letters, requires a regular expression like:
[a | b | c | … | z | A | B | C | … | Z]*
2.5
Finite State Automata
Abstractly, a finite state automaton (FSA) is a computational model for checking membership of strings
in a language over a given alphabet. Given a string over an alphabet, an FSA reads a string from left to
right and indicates whether that string is accepted or not as a member of the language. An FSA, includes
a read-head, a finite set of states and a (finite) set of transitions between any two states (possibly the same
state). A transition is dependent on a state and also a specific alphabet symbol for determining what state
n
becomes the current state. A subset of states are designated as final (or accepting) states. When ”running”
an FSA, an FSA is always in one state and over time the machine changes states based upon the specification
tio
of the transitions and what symbol is being read. As each symbol is read the read head is moved to the
bu
next symbol. When the end of a input string is reached and the automaton is in a final state, the string
is considered accepted. If not, the string is rejected. Depending on how an FSA’s transitions are specified,
tri
there may be a case where in a given state, there are no transitions for a particular alphabet symbol; in this
case, the machine is considered ”stuck” and, in such as case, the string is rejected.
ai
a2
an
fo
a1
N
ot
a0
rD
is
An abstract graphical depiction of is shown in Figure 2.2.
Finite Control
qj
…
tk
…
qi
Figure 2.2: Abstract view of a Finite State Automata
As a recognizer, a finite state automaton may recognize many prefixes in a given string. It does this informally
as follows. The machine starts in a start state. The machine transitions between state qi to qj if the symbol
that the read-head is reading matches the symbol label of the directed edge from qi to qj . The entire string
Copyright 2024
45
2.5 Finite State Automata
Introduction to Languages and Finite State Automata
is recognized only when the last symbol is read and the state of the automaton is in a final accepting state. If
not in a final state, the string is not recognized as being in the language. We now formally define a particular
kind of finite state automaton called a Deterministic Finite Automaton (DFA). It is deterministic because
there is no choice in a transition to a new state based upon the current symbol in the input. Following
the definition, graphical and matrix representations of DFAs are shown and formal meaning of a DFA as a
language is defined.
Definition 2.5.1 Deterministic Finite Automaton A Deterministic Finite Automaton is a 5-tuple <
Q, T, δ, q0 , F > where
Q is a non-empty finite set of states;
T is an alphabet;
tio
n
δ : Q × T → Q is a (possibly partial) function, called the transition function;
q0 ∈ Q is the start state; and
Representations of DFAs
is
2.5.1
tri
bu
F ⊆ Q is a non-empty (finite) set of final states.
Each DFA = M = < Q, T, δ, q0 , F > is usually presented or expressed with a matrix or an edge-labeled and
rD
vertex-labeled directed graph with added notations for the start and final states. Assuming that |Q| = n
and |T | = m, a transition function δ can be specified with a n × m-matrix as follows:
fo
rows are bijectively labeled with states q ∈ Q;
columns are bijectively labeled with alphabet symbols t ∈ T ;
N
ot
each (row-q,column-t) entry is q ′ when δ is defined and q ‘ = δ(q, t); and
each(row-q,column-t)entry has a blank entry or the symbol ⊥ when δ(q, t) is undefined.
δ
..
.
···
..
.
t
..
.
···
..
.
q
..
.
···
..
.
q′
..
.
···
..
.
and when undefined
δ
..
.
···
..
.
t
..
.
···
..
.
q
..
.
···
..
.
⊥
..
.
···
..
.
One can also graphically present instances of a DFA as follows:
q
a state
/ q
qf
the start state
a final state
q
t / ′
q
a transition
In order to define what is meant by a string α being recognized by an DFA, we extend the δ transition
function to δ̂ : Q × T ∗ → Q and define δ̂ as follows:
Copyright 2024
δ̂(q, ϵ)
=
q
δ̂(q, βa)
=
δ(δ̂(q, β), a)
46
Introduction to Languages and Finite State Automata
2.5 Finite State Automata
Definition 2.5.2 String Recognition by an DFA A string α is recognized by an DFA, M =< Q, T, δ, q0 , F >
when δ(q0 , α) = qf , where qf ∈ F . A language L recognized by an DFA, M , denoted L(M ), is defined as
follows:
L(M ) = {α | δ̂(q0 , α) = qf , qf ∈ F }
A string α is not recognized (rejected) when either at some point in the trace, reading symbol a while in
state q, the transition function, δ, is undefined for that state-alphabet pair, (q, a), or the end of the string is
reached and the machine is in not in a final state.
Example 2.11 M2.11
a
b
q0
q0
q1
q1
q1
q0
/) q0
bu
tio
δ
tri
n
Let M2.11 be a finite state automaton: Q = {q0 , q1 }. Start state = q0 . T = {a, b}. F = {q1 }.
a
b
k
a
+ q
u
1
rD
is
b
Figure 2.3: State diagram of M2.11
fo
We trace the steps for δ̂(q0 , aba).
N
ot
δ̂(q0 , aba)
=
δ(δ̂(q0 , ab), a)
=
δ(δ(δ̂(q0 , a), b), a)
= δ(δ(δ(δ̂(q0 , ϵ), a), b), a)
=
δ(δ(δ(q0 , a), b), a)
=
δ(δ(q0 , b), a)
=
δ(q1 , a)
=
q1
Since q1 ∈ F , the string aba is recognized by M . To simplify the tracing of the acceptance or rejection of
strings the following notation is introduced. For each step δ(q , a) = q we use q a q . So, to denote the trace
i
j
i
j
of aba using machine M , we have the q a q b q a q . Shown below are several example traces of strings using
0
0
1
1
M:
1. abaa ∈ L(M ):
q aq bq aq aq
0
0
1
1
1
2. bbbab is not in L(M ):
q bq bq bq aq bq
0
1
0
Copyright 2024
1
1
0
47
2.5 Finite State Automata
3. bababa ∈ L(M )
q bq aq bq aq bq aq
0
1
1
0
0
1
Introduction to Languages and Finite State Automata
1
4. bababab is not in L(M ):
q bq aq bq aq bq aq bq
0
1
1
0
0
1
1
0
L(M ) consists of strings over {a, b}+ that contains an odd number of bs. Intuitively, we could use a phrase
to represent a state that would convey a more meaningful name than say, q0 or q1 . When in state q0 , we are
in a state of having read an even number of b’s; in state q1 we are in a state of having read an odd number of
b’s. Although this gives more insight into the motivation for machine and structure and intended semantics
of the language recognized by the machine, it is too cumbersome to label the states with such phrases or
assertions. Often, if such assertions prove useful legends indicating the associations of the state labels with
tio
n
the assertions can be included.
Example 2.12 M2.12
bu
Let M2.12 be a finite state automaton: Q = {q0 , q1 , q2 }. Start state = q0 . T = {a, b, c}. F = {q2 }.
a
b
c
q0
q0
q1
⊥
q1
q1
q2
⊥
⊥
q0
/) q1
b
q2
rD
q2
is
tri
δ
a
a
d
a
/ q2 u
c
N
ot
fo
/) q0
b
Figure 2.4: State diagram for M2.12
Shown below are several example traces of strings using M :
1. ababa ∈ L(M ) :
q aq bq aq bq aq
0
0
1
1
2
2
2. bb ∈ L(M ) :
q bq bq
0
1
2
3. ababacababa ∈ L(M ):
q aq bq aq bq aq cq aq bq aq bq aq
0
0
1
1
2
2
0
0
1
1
2
2
4. ac is not in L(M ):
q aq c
0
0
5. ababaca is not in L(M ):
q0 a q0 b q1 a q1 b q2 a q2 c q0 a q0 and q0 ∈
/F
Copyright 2024
48
Introduction to Languages and Finite State Automata
2.5 Finite State Automata
L(M ) is the set of strings that is comprised of one or more substrings separated (delimited) by symbol c and
with each of these substrings being a string over {a, b}+ with exactly two bs (not necessarily consecutive).
2.5.2
An alternative approach to DFA transition functions: Total transition
functions
An alternative to using partial transition function is to introduce an error state, qe , where qe is required not
to be a final state and to modify the transition function, δ into a transition function, δe , as follows. δe is
identical to δ except where δ is undefined. For all δ : (q, t) 7→ ⊥ define δe : (q, t) 7→ qe . and add another |T |
transitions, δ(qe , t) = qe , for all t ∈ T . This introduces many more transitions and complicates the graphical
n
representation.
tio
Example 2.13 Alternative DFA: M2.12
Consider the DFA, M2.12 , from Example 2.12. Using the alternative approach we have the alternative DFA,
bu
e
e
e
M2.12
, graphically presented below. It can be shown that L(M2.12 ) = L(M2.12
). M2.12
is graphically depicted
tri
e
in Figure 2.5. Tracing a string, α ∈ L(M2.12
) is identical to that shown above for L(M2.12 ) in Example 2.12.
b
a
)5/ qe u
T`
is
c
rD
c
a
a
d
b
/) q1
fo
/) q0
c
b
a
b
/ q2 u
c
N
ot
e
Figure 2.5: State diagram for alternative, total, transition function for M2.12
However, tracing a string that is not in the language can be different. When encountering the
1. ac is not in L(M ):
q aq cq
0
0
e
2. ababaca is not in L(M ):
q a q b q a q b q a q c qa
0
0
1
Copyright 2024
1
2
2
0
49
2.6 Non-deterministic Finite Automata (NFAs)
2.5.3
Introduction to Languages and Finite State Automata
Deterministic Finite Automata as Language Recognizers
We present an algorithm as a solution to the decision problem: Given a DFA M and a string α, does
α ∈ L(M )?
Algorithm 2.1: Algorithm for a DFA as a Recognizer
Input : Deterministic Finite Automaton: M =< Q, T, δ, Start State, F >,
α$, where $ ∈
/ T and α ∈ T ∗ .
Output: α ∈ L(M )?: Yes/No
Q currentState;
T currentSymbol;
currentState ← Start State;
currentSymbol ← getSymbol();
n
while currentSymbol ̸= $ and currentState ̸= ⊥ do
tio
currentState ← δ[currentState ][currentSymbol ];
if currentSymbol = $ and currentState ∈ F then
print(”Yes”)
tri
else
print(”No”)
bu
currentSymbol ← getSymbol();
Non-deterministic Finite Automata (NFAs)
fo
2.6
rD
is
;
As FSAs are defined above, at most one transition can be taken for each symbol read from the input string.
N
ot
This class of FSAs are collectively referred to as Deterministic Finite Automata (DFA). Another class of
FSA, where this constraint is lifted, are Non-deterministic Finite State Automata (NFA). At each state there
may be more than one transition applicable with each input symbol. A nondeterministic finite automaton
is defined as follows:
Definition 2.6.1 Non-deterministic Finite State Automaton A non-deterministic finite state automaton
(NFA) is a 5-tuple < Q, T, δ, q0 , F > where
Q is a non-empty finite set of states;
T is an alphabet;
δ : (Q × (T ∪ {ϵ})) → 2Q is a total function, called the transition function;
q0 ∈ Q is the start state; and
F ⊆ Q is a non-empty (finite) set of final states.
Copyright 2024
50
Introduction to Languages and Finite State Automata
a
/ q0
2.7 Equivalence of DFAs and NFAs
a
b
/ q1 k
+ q
9 2
a
Figure 2.6: State diagram for M2.14
The transition matrix contains sets of states rather than states. Note δ is a total function and that ∅ is
used instead of ⊥.
Example 2.14 M2.14
Let M2.14 be specified as follows:
Q = {q0 , q1 , q2 }. T = {a, b}. Start = q0 . F = {q2 }.
a
b
q0
{q1 , q2 }
∅
q1
{q2 }
∅
q2
∅
{q1 }
bu
tio
n
δ
Tracing strings for NFAs can be graphically depicted via trees. This representation is also referred to as the
tri
proliferation of states. A string α is recognized if at least on path may be found from the start state (root)
is
to a final state labeling a leaf. If all leaves are either labeled with ∅ or a state that is a non-final state, the
rD
string α is not accepted (rejected).
1. aba is accepted. A trace via a proliferation of states is shown below:
a /
q1
a
N
ot
fo
q0
b
/∅
q2
b /
q1
b
/∅
a /
q2
2. abab is not accepted by L(M ). Trace:
q0
a /
q1
a
2.7
q2
b /
q1
a /
q2
b /
q1
Equivalence of DFAs and NFAs
Let DFA and N F A be the set of DFA and NFA machines, respectively. Let LDF A be the set of languages
accepted by at least one machine M ∈ DF A and let LN F A be the set of languages accepted by at least one
machine M ∈ N F A. It can be shown that LDF A = LN F A . This means that the non-determinism with
finite automata recognize the same class of languages deterministic finite automata recognize.
Copyright 2024
51
2.7 Equivalence of DFAs and NFAs
Introduction to Languages and Finite State Automata
Theorem 2.7.1 LDF A ⊆ LN F A .
Proof: It must be shown that if a DFA, MD , recognizes language, L(MD ), then an NFA, MN , can be
constructed such that L(MD ) = L(MN ). If MD is a DFA, one could convert it to a NFA, MN , simply by
changing the states in the range of δ to singleton sets and changing ⊥ to ∅. From this construction we have
L(MD ) = L(MN ). .
It can be shown that for regular language L(MN ) where MN is an NFA, then there is a DFA, MD , such
that L(MD ) = L(MN ). This is stated in Theorem 2.7.2. The proof given below is a partial proof. An
algorithm is given for constructing a DFA from an NFA. For a full proof, a correctness proof of the algorithm
is needed.
Theorem 2.7.2 LN F A ⊆ LDFA .
n
The proof is based upon subset-construction algorithm. The DFA constructed simulates moving through
tio
the distinct NFA transitions with the same label in “parallel.” All the next states reached on the same
symbol from one state form a subset that will become a state in the DFA. In the worst case, all subsets of
bu
the NFA state set could be generated. If the NFA state set, QN , has n states, then this could result in 2N
states for the constructed DFA. However, in practice, the DFAs used for scanners in translators have nearly
tri
the same number of states of a corresponding NFA.
Given these two theorems, we have the following corollary, which means that NFAs and DFAs recognize
rD
Corollary 2.7.1 LDFA = LN F A .
is
or accept the same family of languages, namely the regular languages.
Before the subset-construction algorithm is shown, we need some supporting algorithms to handle ϵ transi-
fo
tions and for moving through multiple transitions in parallel.
Algorithm 2.2: ϵ-closure(q)
N
ot
/* This function returns the set of states reachable using only ϵ transitions from
state q in NFA N .
*/
Input : q ∈ QN , where NFA N =< QN , TN , δN , qN0 , FN >.
Output: ClosureSet Of q
begin
ClosureSet Of q ← {q};
while There exists a q ∈ ClosureSet Of q and ((q, ϵ) 7→ q ′ ) ∈ δN and q ′ ∈
/ ClosureSet Of q do
ClosureSet Of q.insert(q ′ )
return ClosureSet Of q;
Copyright 2024
52
Introduction to Languages and Finite State Automata
2.7 Equivalence of DFAs and NFAs
Algorithm 2.3: ϵ-closure(J)
/* This function returns the union of all subsets of states reachable using only ϵ
transitions from each state q in J:
∪q∈J ϵ-closure(q).
*/
Input : J ⊆ QN , where QN is a set of states of an NFA, N =< QN , TN , δN , qN0 , FN >.
Output: ClosureSet Of J
begin
push all states j ∈ J on Stack S;
ClosureSet Of J = J;
while Stack S is not empty do
q = top of S;
pop S;
if q ′ ∈
/ ClosureSet Of J then
tio
ClosureSet Of J ← ClosureSet Of J ∪ ϵ-Closure(q ′ ) ;
n
foreach q ′ = δN ((q, ϵ)) do
bu
push q ′ on S ;
tri
return ClosureSet Of J
Algorithm 2.4: move(J, t)
rD
on a symbol t in NFA N .
is
/* This function returns the union of all states reachable from each state q in J
and
Input : t ∈ T
begin
N
ot
Output: ClosureSet Of Jt
fo
Input : J ⊆ QN , where QN is a set of states of an NFA, N =< QN , TN , δN , qN0 , FN >
ClosureSet Of Jt ← ∅;
foreach q ∈ J do
ClosureSet Of Jt ← ClosureSet Of Jt ∪ δN ((q, t))
return ClosureSet Of Jt
Constructive part of proof of Theorem 2.7.2:
Copyright 2024
53
*/
2.7 Equivalence of DFAs and NFAs
Introduction to Languages and Finite State Automata
Algorithm 2.5: N F A to DF A Subset Construction
/* This function constructs a DF A D from an N F A N .
*/
Input : N =< QN , TN , δN , q0N , FN >
Output: D =< QD , TD , δD , q0D , FD >
/* Each state in QD of DF A is a unique set from 2QN .
*/
begin
TD = TN ;
QD ← ∅;
q0D ← ϵ-closure(q0N );
Mark q0D as unvisited;
QD .insert(q0D );
while there is an unvisited state qD ∈ QD do
n
Mark qD as visited;
tio
foreach t ∈ TN do
X ← ϵ-closure(move(qD , t));
bu
if X ∈Q
/ D then
Set X as unvisited;
foreach Y ∈ QD do
FD .insert(Y );
N
ot
return D.
fo
if FN ∩ Y ̸= ∅ then
rD
FD ← ∅;
is
δD .insert( ((qD , t) 7→ X) );
tri
QD .insert(X);
ϵ
>
/ q0
a / q
1
ϵ / q
2
q3
b / q
4
ϵ
ϵ
>
ϵ
ϵ
q5
c / q
6
ϵ
Figure 2.7: An NFA machine, MN .
Copyright 2024
54
q7
ϵ / q
8
?
d / q
9
Introduction to Languages and Finite State Automata
2.7 Equivalence of DFAs and NFAs
Example 2.15 Constructing a DFA from an NFA
Let MN be the NFA shown in Figure 2.7. We construct the DFA MD using Algorithm 2.5. For reference in
tracing the algorithm, each ϵ-closure(q) for all q ∈ MN is given in the table immediately following.
ϵ-closure(q0 ) = {q0 }
ϵ-closure(q1 ) = {q1 , q2 , q3 , q5 , q8 }
ϵ-closure(q2 ) = {q2 , q3 , q5 }
ϵ-closure(q3 ) = {q3 }
ϵ-closure(q4 ) = {q2 , q3 , q4 , q5 , q7 , q8 }
ϵ-closure(q5 ) = {q5 }
ϵ-closure(q6 ) = {q2 , q3 , q5 , q6 , q7 , q8 }
ϵ-closure(q7 ) = {q2 , q3 , q5 , q7 , q8 }
ϵ-closure(q8 ) = {q8 }
ϵ-closure(q9 ) = {q9 }
Get start state. q0D = ϵ-closure(q0N ) = {q0 }. Next compute QD and δD through constructing subsets. The
overbar notation means the DFA state qD , representing the subset of MN states, is visited. Upper-case Latin
letters are used as labels of the subsets and are used as the name of states in the constructed DFA.
tio
n
QD = {}, δD = {}.
Consider q0N = A = {q0 }. Mark A as visited.
bu
QD = {A = {q0 }}
ϵ-closure(move({q0}, a)) = ϵ-closure(q1 ) = B = {q1 , q2 , q3 , q5 , q8 }
tri
ϵ-closure(move({q0}, b)) = ϵ-closure(∅) = ∅
ϵ-closure(move({q0}, c)) = ϵ-closure(∅) = ∅
is
ϵ-closure(move({q0}, d)) = ϵ-closure(∅) = ∅
rD
QD = {A = {q0 }, B = {q1 , q2 , q3 , q5 , q8 } }
fo
δD = {((A, a) 7→ B)}
Consider B = {q1 , q2 , q3 , q5 , q8 }. Mark B as visited.
N
ot
QD = {A = {q0 }, B = {q1 , q2 , q3 , q5 , q8 }}
ϵ-closure(move({q1 , q2 , q3 , q5 , q8 }, a)) = ∅
ϵ-closure(move({q1 , q2 , q3 , q5 , q8 }, b)) = ϵ-closure({q4 }) = C = {q2 , q3 , q4 , q5 , q7 , q8 }
ϵ-closure(move({q1 , q2 , q3 , q5 , q8 }, c)) = ϵ-closure({q6 }) = D = {q2 , q3 , q5 , q6 , q7 , q8 }
ϵ-closure(move({q1 , q2 , q3 , q5 , q8 }, d)) = ϵ-closure({q9 }) = E = {q9 }
QD = {A = {q0 }, B = {q1 , q2 , q3 , q5 }, C = {q2 , q3 , q4 , q5 , q7 , q8 }, D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
δD = { ((A, a) 7→ B), ((B, b) 7→ C), ((B, c) 7→ D), ((B, d) 7→ E) }
Copyright 2024
55
2.7 Equivalence of DFAs and NFAs
Introduction to Languages and Finite State Automata
Consider C = {q2 , q3 , q4 , q5 , q7 , q8 }. Mark C as visited.
QD = { A = {q0 }, B = {q1 , q2 , q3 , q5 , q8 }, C = {q2 , q3 , q4 , q5 , q7 , q8 }, D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
ϵ-closure(move({q2 , q3 , q4 , q5 , q7 , q8 }, a)) = ∅
ϵ-closure(move({q2 , q3 , q4 , q5 , q7 , q8 }, b)) = ϵ-closure({q4 }) = C = {q2 , q3 , q4 , q5 , q7 , q8 }
ϵ-closure(move({q2 , q3 , q4 , q5 , q7 , q8 }, c)) = ϵ-closure({q6 }) = D = {q2 , q3 , q5 , q6 , q7 , q8 }
ϵ-closure(move({q2 , q3 , q4 , q5 , q7 , q8 }, d)) = ϵ-closure({q9 }) = E = {q9 }
QD = { A = {q0 }, B = {q1 , q2 , q3 , q5 }, C = {q2 , q3 , q4 , q5 , q7 , q8 }, D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
δD = { ((A, a) 7→ B), ((B, b) 7→ C), ((B, c) 7→ D), ((B, d) 7→ E), ((C, b) 7→ C), ((C, c) 7→ D)
n
((C, d) 7→ E), }
tio
Consider D = {q2 , q3 , q5 , q6 , q7 , q8 }. Mark D as visited.
QD = { A = {q0 }, B = {q1 , q2 , q3 , q5 , q8 }, C = {q2 , q3 , q4 , q5 , q7 , q8 },
bu
D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
ϵ-closure(move({q2 , q3 , q5 , q6 , q7 , q8 }, a)) = ∅
tri
ϵ-closure(move({q2 , q3 , q5 , q6 , q7 , q8 }, b)) = ϵ-closure({q4 }) = C = {q2 , q3 , q4 , q5 , q7 , q8 }
ϵ-closure(move({q2 , q3 , q5 , q6 , q7 , q8 }, c)) = ϵ-closure({q6 }) = D = {q2 , q3 , q5 , q6 , q7 , q8 }
is
ϵ-closure(move({q2 , q3 , q5 , q6 , q7 , q8 }, d)) = ϵ-closure({q9 }) = E = {q9 }
rD
QD = {A = {q0 }, B = {q1 , q2 , q3 , q5 }, C = {q2 , q3 , q4 , q5 , q7 , q8 },
D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
δD = { ((A, a) 7→ B), ((B, b) 7→ C), ((B, c) 7→ D), ((B, d) 7→ E), ((C, b) 7→ C),((C, c) 7→ D)
N
ot
fo
((C, d) 7→ E), ((D, b) 7→ C), ((D, c) 7→ D), ((D, d) 7→ E) }
Consider E = {q9 }. Mark E as visited.
QD = { A = {q0 }, B = {q1 , q2 , q3 , q5 , q8 }, C = {q2 , q3 , q4 , q5 , q7 , q8 }, D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
ϵ-closure(move({q9 }a)) = ∅
ϵ-closure(move({q9 }b)) = ∅
ϵ-closure(move({q9 }c)) = ∅
ϵ-closure(move({q9 }d)) = ∅
QD = {A = {q0 }, B = {q1 , q2 , q3 , q5 , q8 }, C = {q2 , q3 , q4 , q5 , q7 , q8 },
D = {q2 , q3 , q5 , q6 , q7 , q8 }, E = {q9 } }
δD = { ((A, a) 7→ B), ((B, b) 7→ C), ((B, c) 7→ D), ((B, d) 7→ E), ((C, b) 7→ C),
((C, d) 7→ E) ((D, c) 7→ D), ((D, d) 7→ E) }
E is the only final state for MD since that is the only set that has a non-empty intersection with FN for the
MN .
The state diagram for MD is shown in Figure 2.8.
Copyright 2024
End of Example 2.15.
56
Introduction to Languages and Finite State Automata
2.8 REs to DFAs
d
/ A
a
b
b /
C
J
/ B
c
b
c
/ E
?
d
d
D i
c
Figure 2.8: DFA constructed from NFA in Figure 2.7
2.8
REs to DFAs
n
We will use an algorithm based upon a proof that shows all languages defined by a RE can be recognized
tio
by a DFA. The algorithm is based McNaughton-Yamada-Thompson algorithm first presented in [7]. First,
we show how to construct a NFA from an regular expression. Once that is created we can use the subset
bu
construction of Algorithm 2.5.
tri
Theorem 2.8.1 For each for each regular expression r, there is a DFA, MD , such that L(r) = L(MD ).
Proof: This is a proof by using an algorithm for constructing a DFA from a regular expression. This is
(Algorithm):
INPUT: Regular Expression, r.
rD
is
a partial proof. A complete proof would require a correctness argument and a proof that it terminates.
fo
OUTPUT: NFA, M , such that L(M ) = L(r).
We have start with the fundamental expressions, ϵ, ∅ and a.
N
ot
For ∅ we have a machine:
/ q0
qf
/ q0
ϵ / q
f
/ q0
a / q
f
For ϵ we have a machine:
For a we have a machine:
For r1 |r2 we have a machine:
ϵ
qM(r1 ) 0
:
M (r1 )
qM(r1 ) f
ϵ
$
f inal
:
/ start
ϵ
Copyright 2024
$
qM(r2 ) 0
ϵ
M (r2 )
57
qM(r2 ) f
2.8 REs to DFAs
Introduction to Languages and Finite State Automata
For r1 r2 we have a machine:
/ start
qM(r12)
M (r1 )
M (r2 )
f inal
For r1 ∗ we have a machine:
ϵ
/ start
ϵ / qM
ϵ
qM(r1 ) f
M (r1 )
(r1 ) 0
/ f inal
>
ϵ
n
End of construction of an NFA using recursive definition of regular expressions.
tio
Consider the regular expression: a ( b | c)* d. The syntax tree is shown in Figure 2.9.
bu
r9
r8
tri
r7
r6
rD
is
r1
r5
(
r4
r2
|
N
ot
fo
a
d
*
)
r3
b
c
Figure 2.9: Syntax Tree for a (b | c)* d.
In Example 2.16 an NFA is constructed from this regular expression.
Example 2.16 Constructing an NFA from a Regular Expression: a ( b | c)* d
Using the tree in Figure 2.9 and using the construction of an NFA N from a regular expression r shown in
the proof of Theorem 2.8.1
This last NFA machine is the same one that was used to construct a DFA in
Example 2.15. See Figure 2.8.
By using the construction of a NFA, N , from regular expression, r, described in the proof in Algorithm 2.8.1
and then constructing a DFA, D, from NFA, N , using the subset-construction shown in the proof of Algorithm 2.5, we have an overall algorithm for constructing a DFA, D, from a regular expression, r. This is
just one of the algorithms used for building DFAs from regular expressions. This algorithm has the time and
Copyright 2024
58
Introduction to Languages and Finite State Automata
/ q0
a / q
1
b / q
4
/ q3
r1 =a
2.8 REs to DFAs
c / q
6
/ q5
r2 = b
/ q8
r3 =c
d / q
9
r8 =d
Figure 2.10: NFA machines for single symbol regular sub-expressions of a (b | c)* d.
q3
>
b / q
4
ϵ
ϵ
/ q2
q7
>
ϵ
ϵ
n
q5
c / q
6
bu
tio
Figure 2.11: NFA machine for r4 = b | c and r5 = (b | c) .
ϵ
ϵ
rD
ϵ / q
2
is
ϵ
/ q1
b / q
4
tri
q3
>
>
ϵ
fo
q5
q7
ϵ / q
8
?
ϵ
c / q
6
N
ot
ϵ
Figure 2.12: NFA machine for r6 = (b | c)*.
ϵ
>
/ q0
a / q
1
ϵ / q
2
q3
b / q
4
ϵ
ϵ
>
ϵ
ϵ
q5
q7
ϵ / q
8
?
c / q
6
ϵ
Figure 2.7: NFA machine for r9 = r1 r6 r8 = a(b|c)*d.
Copyright 2024
59
d / q
9
2.8 REs to DFAs
Introduction to Languages and Finite State Automata
space complexity of O(2n ) where n is the number of states in the NFA, N . In the worst case, the number of
states in the constructed DFA is 2n . However, this is usually not the case for the DFAs used for scanners in
translators for most programming languages. When using regular expressions for multiple pattern matching
instances, this algorithm is acceptable. If the regular expression will be used once or is temporary, it’s best
N
ot
fo
rD
is
tri
bu
tio
n
to convert to the NFA and then run the NFA with a backtracking algorithm on the proliferation of states.
Copyright 2024
60
Introduction to Languages and Finite State Automata
2.8.1
2.8 REs to DFAs
Regular Expressions Constructed from FSAs
A regular expression, r, can be constructed from a DFAs, MD , or NFA, MN , such that L(r) = L(MD ) =
L(MN ). Of course can construct a DFA MD from an NFA, MN and then construct a regular expression.
This construction can be useful when given an FSA in a specification and one needs to implement it in
software. With an equivalent regular expression we can use a software tool like Flex or Antrl4 to construct
the implementation of the equivalent DFA.
Theorem 2.8.2 For each FSM, M , such that L(r) = L(M ).
We do not show a proof here or given an algorithm for constructing regular expressions from FSAs.
Instead, examples of constructing regular expressions from FSAs are given.
Example 2.17 Regular Expression for L(M2.11 )
n
The regular expression for the language recognized by the DFA first shown in Example 2.11 (and whose state
tio
diagram is presented in Figure 2.14) is
a∗ ba∗ (ba∗ ba∗ )∗
is
tri
bu
The state diagram of M2.11 is shown below.
a
a
+ q
1
u
b
fo
rD
/) q0 k
b
N
ot
Figure 2.14: State diagram of M2.11
Example 2.18 Regular Expression for L(M2.12 )
The regular expression for the DFA first shown in Example 2.12 (and whose state diagram is presented in
Figure 2.15) is
a∗ ba∗ ba∗ (ca∗ ba∗ ba∗ )∗
a
a
/) q0
d
b
a
/) q1
b
/ q2 u
c
Figure 2.15: State diagram for M2.12
Copyright 2024
61
2.8 REs to DFAs
Introduction to Languages and Finite State Automata
Example 2.19 Two Regular Expressions for a NFA: L(M2.19 )
The state diagram from Example 2.14 is shown below along with two (equivalent) regular expressions that
describe the language recognized by M2.14 . The state diagram for M2.19 is presented in Figure 2.16.
a(ba)∗ |aa(ba)∗ = (a|aa)(ba)∗
a
/ q0
/ q1 k
a
b
+ q
9 2
a
tio
n
Figure 2.16: State diagram for M2.14
bu
From Example 2.19 we see that regular expression identities can be applied to give alternative expressions
N
ot
fo
rD
is
tri
or, in some cases, simpler expressions in the sense that there are fewer operators.
Copyright 2024
62