Hello there,
Here, I’m attaching all the homework documents and required readings. Let me know if you can access all of them.
CS 435 Project 2
Recursive Descent Parser (with Scanner) for Simple_PL1
Overview of Assignment
•
•
Using your Project 1 scanner, implement a recursive descent parser in C for
Simple_PL1.
If a lexical error or parser error is encountered, report the error according to the
specification shown below and exit.
Review of Simple_PL1 Lexical Specification
Your scanner must recognize the following tokens. For definitions below: letter =
[a..zA..Z] and digit = [0..9].
Token Name (used
internally by compiler)
ID
READ
Regular expression
definition
WRITE
write
NUMBER
ASSIGN
PLUS
MINUS
TIMES
DIV
SEMICOLON
COMMA
LPAREN
RPAREN
SCAN_EOF
digit digit*
:=
+
*
/
;
,
(
)
(letter|_)(letter|digit|_)*
read
Lexeme examples
max, a_1, _a1,
note: read is an identifier,
but is reserved and needs
to be returned from
scanner as distinct token.
note: write is an identifier,
but is reserved and needs
to be returned from
scanner as distinct token.
1, 21, 1024
scanner returns this when
the source end-of-file is
reached.
WHAT TO SUBMIT: Name your project CS435P02. After completing
the project, zip the entire project folder and test cases. Name your zip file
CS435P02.
I will grade your project by running it against a collection of test files. (The executable
file name for this example is CS435P02Jeffrey. Your executable will be your project
name.)
Example: The command line should be: $ CS435P02Jeffrey src1.txt
TO DO: This parser must be written in C using the Visual Studio 2022 IDE. Utilize
your scanner from Project 1. You must use the command-line parameters to access
Project 2 Parser for Simple_PL1
Page 1 of 5
the source file, i.e. argv[1]. Open the source file and parse it. You must use recursive
descent parsing. For this simple parser, when a parsing error occurs, just print the
expected token (use mnemonic array of strings), or, indicate that an unexpected token
was found and exit the program. You will need to analyze which message is
appropriate. When a lexical error occurs, indicate that an unrecognized character was
seen. Do not attempt error recovery.
Simple_PL1 Syntax Definition and Grammar:
The Simple_Pl1 programming language is defined and is generated by the grammar
shown below. (The “$” represents the SCAN_EOF token.) The recursive descent parser
must be based upon the following grammar, which meets the conditions for building a
recursive descent parser. You can also alter it to make it more concise and for building
a more efficient recursive descent parser. For example, see the syntax diagram on the
next page.
Note that you can take advantage of checking whether the next token is in the first set
of a particular non-terminal when parsing a production rule for that non-terminal.
program
stmt_list
stmt
expr_list
expr_list_tail
id_list
id_list_tail
expr
term_tail
term
factor_tail
factor
add_op
mult_op
→
→
→
→
→
→
→
→
→
→
→
→
→
→
stmt_list $
stmt stmt_list | ε
id := expr; | read( id_list ); | write( expr_list );
expr expr_list_tail
, expr expr_list_tail | ε
id id_list_tail
, id id_list_tail | ε
term term_tail
add_op term term_tail |ε
factor factor_tail
mult_op factor factor_tail | ε
( expr ) | id | number
+ | * | /
Project 2 Parser for Simple_PL1
Page 2 of 5
Syntax Diagram: This syntax diagram expresses an alternative presentation of a
grammar. This grammar is a revision of the original grammar shown above and
generates the same Simple_PL1 language. This grammar shown below can aid in the
engineering of a recursive descent parser.
program
stmt
stmt
EOF
ID
:=
expr
;
READ
(
ID
)
;
,
expr
(
WRITE
)
;
,
expr
term
term
factor
+
*
–
/
factor
ID
NUMBER
(
Project 2 Parser for Simple_PL1
expr
)
Page 3 of 5
Handling Syntax Errors in Simple_PL1 source code.
If there are no parsing errors, print to standard output the following message:
Parsing complete. No errors.
If there is a detected parsing (or lexical error), print the token name that was going to
be or was recognized and exit the program. Use the following format for a parser error:
Expected symbol: or Unexpected symbol: .
You can also use the first sets to detect an error inside a certain construct, such as
expressions. Use an error message such as (just an example):
Error in expression: Expected ID, NUMBER, or ‘(‘.
The expected or unexpected symbol should be a token nearby the actual parsing error.
Note: sometimes the error might appear to be misleading. Error reporting and recovery
is hard. Utilize the mnemonic array of c-strings from Project 1 to print the token
name.
Shown below are some examples.
Source File
Example 1
Example 2
x := 2;
y := 3;
read(a, b);
write(a,b,a+b*(2*x/y));
x := 2;
y := 3;
read(a, b);
write(a,b,a+b*(2*x/y);
Standard output
⇒
⇒
Parsing complete. No errors.
Expected symbol: RPAREN
Example 3
x := 2;
y := 3;
read(a, b)
write(a, b,a+b*(2*x/y));
⇒
Expected symbol: SEMICOLON
Example 4
x := 2;
y := 3;
read(a b);
write(a, b,a+b*(2*x/y));
⇒
Expected symbol: RPAREN
Example 5
read(x, y, z, v);
read( c ); temp := (x y)*(2*x + z/4) /v;
a := (c);
write( temp /
);
⇒
Error in expression: Expected
ID, NUMBER, or ‘(‘.
NOTES:
In Example 2, there is a missing right parenthesis in the last statement.
In Example 3, there is a missing semicolon that should terminate the read statement
on the third line.
Project 2 Parser for Simple_PL1
Page 4 of 5
In Example 4, there is a missing comma. Since a comma separates identifiers in an
identifier list for READ, the parser “thinks” it is at the end of the list when no comma
is found and expects a RPAREN. Perhaps the message should be “missing comma,” but
adding code to detect this kind of error would complicate the parser. Although this is a
simple fix in this case, this demonstrates the difficulty of reporting errors and the
design decisions compiler engineers face.
In Example 5, you can see how a specific construct can be referenced in the error
message. The first(Expr) set can be helpful for detecting and reporting this kind of
error.
Implementing a Recursive Descent Parser for Simple_PL1 and Some Supporting
Functions: Utilize the textbook as a guide for writing your recursive descent parser.
Also, shown below are two helpful functions for supporting your recursive descent
parser and implementing error reporting, such as in the examples shown above.
void parse_error(char *errMsg, char *lexeme) {
extern unsigned numErrs; //for future if error recovery used
numErrs++;
fprintf(stderr, “%s: %s\n”, errMsg, lexeme);
}
void match(enum tokenType expected)
{
if (currentToken == expected) {
currentToken = scan();
}
else{
parse_error(“Expected symbol”, mnemonic[expected]);
exit(1);
}
}
Project 2 Parser for Simple_PL1
Page 5 of 5
CS 435 Project 1
Scanner for Simple_PL1
Overview of Assignment
Complete a “hand-written” scanner for Simple_PL1.
This project must be written in C (code files “.c” and “.h”) using Visual Studio 2022.
Simple_PL1 Lexical Specification
For definitions below: letter = [a..zA..Z] digit = [0..9].
Token Name (used
internally by compiler)
ID
READ
Regular expression
definition
WRITE
write
NUMBER
ASSIGN
PLUS
MINUS
TIMES
DIV
SEMICOLON
COMMA
LPAREN
RPAREN
SCAN_EOF
digit digit*
:=
+
*
/
;
,
(
)
(letter|_)(letter|digit|_)*
read
Lexeme examples
max, a_1, _a1,
note: read is an identifier,
but is reserved and needs
to be returned from
scanner as distinct token.
note: write is an identifier,
but is reserved and needs
to be returned from
scanner as distinct token.
1, 21, 1024
scanner returns this when
the source end-of-file is
reached.
WHAT TO SUBMIT: Name your Visual Studio 2022 solution/project:
CS435P01. After completing the project, Zip the entire
solution/project folder and test cases. Name your zip file CS435P01.
I will grade your project by running it against a collection of test files.
Example: The command line should be:
C:> CS435P01YourLastName.exe src1.txt
See the example shown below illustrating the input – output relationship.
Project 1 Scanner for Simple_PL1
Page 1 of 5
x := 3;
y := 4;
read(x);
z1 := x + y;
write( x, y, z1,x*y/2-23 );
⇒
ID, x
ASSIGN
NUMBER, 3
SEMICOLON
ID, y
ASSIGN
NUMBER, 4
SEMICOLON
READ
LPAREN
ID, x
RPAREN
SEMICOLON
ID, z1
ASSIGN
ID, x
PLUS
ID, y
SEMICOLON
WRITE
LPAREN
ID, x
COMMA
ID, y
COMMA
ID, z1
COMMA
ID, x
TIMES
ID, y
DIV
NUMBER, 2
MINUS
NUMBER, 23
RPAREN
SEMICOLON
TO DO: Write a driver program to test your scanner. You must use the command-line
parameters to access the source file, i.e. argv[1]. Open the source file, read it and
output all tokens on separate lines. For identifiers and numbers, you must also output
the lexeme as shown above in the example.
You must use the internal names for tokens shown above in your scanner. That is, the
scanner should return these names to your driver program. For the purpose of testing
the scanner, you’ll need a corresponding array of strings that represent the internal
Token name. You’ll also need a lexeme char array (c-string).
Although it is poor programming practice to use global variables, for this project, you
will be allowed to use global variables current_token and lexeme. You can also use the
src file pointer variable as a global, so you don’t have to pass that as a parameter to
the scanner.
Shown below is some starting code.
Project 1 Scanner for Simple_PL1
Page 2 of 5
Outline of Scanner and Driver for testing the Scanner. Written in C.
//starter code for Simple_PL1 scanner
#include //for c I/o
#include // for exit()
#include // for isalpha(), isalnum(), …
enum tokenType {
READ, WRITE, ID, NUMBER, LPAREN, RPAREN, SEMICOLON, COMMA, ASSIGN, PLUS, MINUS, TIMES, DIV, SCAN_EOF
};
char *mnemonic[] = { “READ”, “WRITE”, “ID”, “NUMBER”, “LPAREN”, “RPAREN”, “SEMICOLON”, “COMMA”,
“ASSIGN”, “PLUS”, “MINUS”, “TIMES”, “DIV”, “SCAN_EOF”};
void lexical_error(char ch)
{
fprintf(stderr, “Lexical Error. Unexpected character: %c.\n”, ch);
}
char lexeme[256] = { ‘\0’ };
unsigned int lexLen = 0;
FILE *src;
enum tokenType scan()
{
static int currentCh = ‘ ‘;
static int tempCh = ‘ ‘;
char* reserved[2] = { “read”, “write” };
lexLen = 0;
// clear lexeme buffer for each scan
lexeme[0] = ‘\0’;
extern FILE *src;
//pointer to FILE handle that binds to source file.
if (feof(src)) {
return SCAN_EOF;
}
while ((currentCh = fgetc(src)) != EOF) {
if (isspace(currentCh)) {
continue;
}
else if (isalpha(currentCh)) { //needs to be modified
lexeme[0] = currentCh;
lexLen = 1;
for (tempCh = fgetc(src); isalnum(tempCh) || tempCh == ‘_’;) {
//build identifier lexeme
}
lexeme[lexLen] = ‘\0’;
ungetc(tempCh, src); //put back character that is not a alpha/digit or ‘_’
// see if lexeme is a reserved word, if not, return ID.
}
else if (isdigit(currentCh)) {
// build lexeme for number
// finish fixing lexeme string, ungetc the last character read that is not a digit and then return NUMBER
}
else if (currentCh == ‘+’) {
return PLUS;
}
// use selection statements to look for tokens for operators and delimiters and assignment (:=)
else {
lexical_error(currentCh);
}
}
return SCAN_EOF;
}
Project 1 Scanner for Simple_PL1
Page 3 of 5
int main(int argc, char *argv[])
{
extern FILE *src;
enum tokenType currentToken;
if (argc > 1) {//should be better testing for proper number of arguments, but not required for this project
if (fopen_s(&src, argv[1], “r”)) {
fprintf(stderr, “Error opening source file: %s “, argv[1]);
exit(1);
}
}
while ((currentToken = scan()) != SCAN_EOF) {
//finish body for displaying the string version of the internal token name and
//also print lexeme if the token is a ID or NUMBER. Do not print lexeme for the other tokens.
}
fclose(src);
return 0;
} //end driver
Project 1 Scanner for Simple_PL1
Page 4 of 5
A preview of a Grammar for Simple_PL1 is shown, so you can start writing programs
in Simple_PL1. However, you do not need to know proper syntax in order to test your
scanner.
The “$$” is treated as eof.
program
stmt_list
stmt
expr
term_tail
term
factor_tail
factor
add_op
mult_op
→
→
→
→
→
→
→
→
→
→
stmt_list $$
stmt stmt_list | ε
id := expr; | read id; | write expr;
term term_tail
add_op term term_tail |ε
factor factor_tail
mult_op factor factor_tail | ε
( expr ) | id | number
+ | * | /
Grammar specifications for Simple_PL1 will be given in the future.
Project 1 Scanner for Simple_PL1
Page 5 of 5
Chapter 1
Introduction
n
Programming a physical machine requires composing ordered sequences of machine-specific operation codes
ut
io
for instructions along with providing codes for data and access to data. The operation codes are decoded
by circuits that activate other circuits in the machine to execute the instructions. With the first electronic
computers, it was difficult to read, debug, and modify the numeric codes representing the programs, thus
rib
taking more time to get programs to run. Also, deploying the programs were time-consuming because it often
took many hours or days to load the program codes into the “memory” since it involved physically wiring
st
some of the circuits together. Because of these reasons, programming languages were designed to express
programs that resembled the syntax of natural languages, like English, but with very precise meaning.
di
Translator programs were developed to translate those programs into the machine code.
The first languages (called assembler languages) and translators (assemblers) were developed to overcome
or
the challenges of reading and writing machine level code. Assembler languages use mnemonic names for the
ot
f
operations, such as ADD; operands represented as names of registers or memory addresses (variables), such
as R1 or X; and operands as constant values represented with specific syntax, such as 2 or ’A’. The assembler
instructions stand in almost a one-to-one correspondence with the machine instructions. This correspondence
N
together with mnemonic names and names for operands are what makes assembler programs easier to read
and write than the respective machine program. An example of an assembler program is shown later in this
chapter. Assembler language and assemblers were introduced in the early 1950s. Since then, and still today,
assembler languages are used when certain applications require access to specific low-level hardware (e.g.,
device drivers) or where the programmer needs to write more time and/or space efficient code than that
produced by other higher-level language translators.
High-level programming languages were developed to express abstractions found in mathematical concepts, algorithms, and computational models and used by many programmers. One of the first high-level
programming languages developed originally at IBM in the mid 1950s was ForTran (which stands for Formula Translator), which was more abstract than assembler languages. Fortran was intended for representing
scientific computations and related abstractions, although has been used for other application domains. In
the late 1950s and early 1960s COBOL, Algol60, and Lisp (List Processing) were designed and implemented
with translators that generated assembler code or machine code. COBOL, Algol60 and Lisp were designed
1
Introduction
to “abstract away” lower-level views of hardware. COBOL is closest to English and was originally designed
for business applications. Many institutions still have large amounts of legacy COBOL code in use today.
Algol60 was introduced to be more general purpose in terms of use in application domains. Algol60 also introduced many programming concepts relevant to many widely-used programming languages of today. Lisp
is based upon the concepts of expressing programs as pure mathematical functions and treating functions
as values or objects as operands to other functions. The design philosophy of Lisp was based upon a pure
function evaluation without regards to an underlying computer.
Programming language concepts consist of abstractions for expressing computations and representing
data. They serve as means to examine and compare languages. Examples of concepts include scoping
rules, lifetimes of objects, bindings and binding times, recursion, block-structures, data types, orthogonality,
abstract control structures, concurrency, functions and procedures, and parameter passing rules. There
are many more. This book is organized around groups of related concepts. For each concept, several
n
languages are sampled and compared in terms of how they utilize and implement the concept and other
ut
io
related concepts. Many concepts are interrelated and cannot be studied in isolation. Again, they are
viewed from the perspectives of the designer, programmer (user), and implementor. For example, functions,
recursion, binding times, and parameters are all interrelated. We look at the motivation of the designer to
rib
include or not include some concept in a language, the pros and cons of how a language employs a concept
from the programmer perspective, and how implementors implement that concept on a computational model
st
according to the concept’s semantics.
di
To help study programming languages, it helps to have an understanding of three areas of study: 1) theory
of computation (theoretical computer science); 2) software design methodologies or techniques (software
or
engineering); and 3) general knowledge of computer architecture and organization (computer engineering).
Each of all three views (designer, user, implementor) of programming languages utilize all three knowledge
ot
f
areas. For example, from an implementor point of view, one must have a working knowledge of theoretical
computer science, such as formal grammar theory and related finite automata and push-down automata
computational model theory; must have a good working knowledge of design and implementation practices of
engineering.
N
software engineering; and have an understanding of machine architectures and operation related to computer
For the remainder of this chapter, discussion expands on topics already mentioned above, reflects on
several other aspects of the study of programming languages, and introduces translators, both compilers and
interpreters, and hybrids of these two types. First, because programming languages are used to represent
algorithms, which are models of some process in a real or virtual world, and that concepts of programming
languages are abstractions, a brief review of the roles of models and abstractions is given. Next, we look
at several reasons for why there are so many languages. Several examples of how some changes in technology influences programming language design and implementation mentioned above are presented. A brief
philosophical look at the purpose of programming languages is also included. Then, to provide the foundation for introducing language translators, semi-formal presentations of computational problems, algorithms,
computational models, syntax, semantics and pragmatics are given. Finally, we look at the implementation
of programming language translators, namely the high-level designs of them and supportive runtime sys-
Copyright 2024
2
Introduction
1.1 Models and Abstraction
tems, linkers and libraries. In this discussion an introduction of the role and application of abstractions,
computational models, and formal grammars are given. How two types of language translators, compilers
and interpreters, work along with their advantages and disadvantages are compared. Finally, categories of
languages and some programming languages are introduced. All sections in this chapter are intended to
provide a foundation for the rest of the book.
1.1
Models and Abstraction
Scientists and engineers use models and abstractions to understand and design complex systems. They
try to understand a complex system within the natural world by forming models and designing an abstract
representation of chosen relevant objects and relationships between them. Deciding upon the relevant objects
along with deciding upon what properties and relationships to represent is the primary issue in forming a
n
given model. Abstraction is the concept of choosing the details of the representation of the objects and
ut
io
operations required for the model. Part of constructing models involves deciding on the level of abstraction for
the representation of objects, relations, and operations. For example, engineers will not use the abstraction
level of the molecular structure of steel used in the model of a bridge to be built. They will use relevant
rib
objects and an appropriate abstraction level, such as representations of steel beams to be used in a physical
bridge, and use the appropriate physics models, such as the mathematics representing static and dynamic
st
forces.
Building software products is an engineering endeavor. Therefore, practices and concepts used by other
di
engineering disciplines are needed. Similarly, software developers write code that simulates certain aspects
of the natural world, such as the code used in flight simulators, air-traffic control systems, and physics in a
or
video game. In the study of collisions of objects, scientists develop models that consider the properties of
ot
f
mass, speed, and direction of the objects and arrive at the phenomena, such as conservation of momentum.
In simulation software, a software developer expresses a scientific model or engineering design model into a
model within the constraints of a computing model. Ultimately, the scientific or engineering model has to
N
be realized on a physical computing device.
Software engineers build programs as their artifacts and use models and abstraction in: 1) uncovering
requirements; 2) formally specifying the scope of the system; 3) the creating a design of the system; and
4)the implementation of the design in a given programming language. In this book, the term, software,
includes not just the target program or programs making up the system, but all of the documents created
during the construction of the program(s). At all stages of software development, abstractions are used and
the program acts a model of some task or phenomenon.
One of the general approaches to using abstractions for understanding a software system is to organize a
system into levels of abstraction. This is used for understanding many software systems, such as operating
systems and virtual machines. Abstraction is used extensively in building translators and support code,
like runtime systems and virtual machines for the purpose of fully implementing a programming language
semantics. The study of programming languages involves the foundations of computational theory, the
use of principles of software engineering design, and the knowledge of thinking on all layers of abstraction,
Copyright 2024
3
1.2 Some Reasons for Studying Programming Language Concepts
Introduction
from the lowest-layer to highest-layer of abstraction. A deeper discussion of levels of abstraction is given in
Section 1.10.
1.2
Some Reasons for Studying Programming Language Concepts
Knowing how to program in more than one language helps expand occupational opportunities for software
development, helps people who work in a wide spectrum of disciplines, and gives programmers insights into
solving problems. So, while there are good arguments for knowing several languages, why should one be
interested in comparing the concepts surrounding programming languages? While most developers are not
going to design a full language or write compilers or interpreters, why should one be well-versed in the
implementation of language or translators? There are many reasons for studying programming language
concepts and related theory. Some are given here and will be illustrated and developed throughout the
n
book.
ut
io
Enhancing ones ability to express and implement algorithms and data structures. Knowledge of a programming language helps one’s understanding of a computational problem and the algorithm used. (Computational problems and algorithms are more formally defined later in this introduction.) As with natural
rib
languages, those fluent in a language can better express their thoughts and ideas for solving problems. Studying the design goals of a programming language and the underlying computational models or mathematical
st
models for expressing the semantics of the language helps in not only implementing an algorithm, but may
give rise to new insights into how to solve problems or may provide more efficient solutions based upon the
di
underlying computational model. An example of this is the use of move-semantics in C++11 (and later
versions) for copy constructors or assignment-member functions. In this case, knowledge of the language
or
semantics and implementation of how objects and parameter passing leads to more time and space efficient
ot
f
C++ code.
Study of programming language concepts helps develop the ability to learn new languages. Change in
the software and hardware industries is inevitable and ubiquitous. New hardware gives rise to new possible
N
applications and provides room for new innovations. These new applications and innovations also results
in new approaches for software design and implementation strategies, which usually calls for new ways to
develop software or to express solutions. Another example of the ability to learn a new language quickly helps
when confronted with an internal, proprietary language within an organization. By studying how grammars
formally define the syntax of languages and how grammars are used for building translators, one can have a
more structured approach to learning a language and deeper understanding of the syntax of the language;
this helps learn languages. Knowing how concepts apply to other languages helps with understanding the
design and proper use or pragmatics of a new language.
Knowledge of constructs of one language helps simulate concepts in another language. Knowing how to
program in a language that has built-in features to express and implement some abstraction, concept or
software design, helps one simulate it in another language that does not have the corresponding features.
Sometimes, a programmer does not have the luxury of the language choice. For example, suppose that a
programmer has to program in C and that programmer knows C++. If one knows how to use pointers to
Copyright 2024
4
Introduction
1.2 Some Reasons for Studying Programming Language Concepts
functions, how to pass pointers to structures, and knows the meaning of static variables, the C programmer
can simulate the implementation of classes and objects. Simulating iterators and generators of current
languages (e.g. Python, Ruby) in older languages is another example. Later in the book we’ll take a look at
this in detail.
Viewing user interfaces, data cleaning, data manipulation, and report generation as mini-language design
and implementation. Most software engineers do not directly design or build translators for full languages.
However, designing a user interface is similar to designing a language. For example, the task of data validation
in a front-end web page is a kind of syntax checking of legal or illegal strings. Often grammars or regular
expressions are used to formally specify what is legal and illegal text for, say, a data field in a web form. The
languages models (e.g., regular expressions, grammars) and translator-building tools (e.g., Antrl4, flex) can
be used for identifying patterns in data, thus “mining” or extracting useful data or information. Similarly,
knowledge of these tools can help develop software for identifying errors in data and for transforming data
n
into another form. A particular pattern represents a set of legal phrases or sentences that are members
ut
io
of a “mini-language.” Another example is creating a report according to a specified format. Strings that
adhere to the format forms a language, namely the language of legal reports as specified by the requirements
of a client. Having a view of these tasks as languages helps develop programs for these common tasks.
rib
There are several programming languages that have libraries or built-in functionality that can be utilized
for implementing these tasks. Moreover, this approach can be used when using software applications that
st
process text documents, spreadsheets, multi-media documents, to name a few categories of data formats.
di
XML is a mark-up language with a strict grammatical structure to represent data that is utilized by many
web applications. Knowing how parsing works in a translator helps understand code to process XML code.
or
There are many translator-building tools (e.g., Antrl4, yacc) that can be used for building applications to
process XML. XSLT is language used for easing the effort in processing XML code.
ot
f
Understanding utility tools, lower-level designs, impacts of hardware architecture can be useful. Highlevel languages exist to avoid examining low-level code or bind the code to specific hardware configurations.
However, sometimes when debugging code, the programmer must invoke the debugger utility to look at
N
the code generated by a compiler or to look at how the supporting runtime environment handles stack
management or the memory allocation. This becomes important when programming embedded systems or
an “internet-of-things” application. Knowing how the runtime works is related to the programming language
semantics and language implementation; this knowledge helps the programmer debug code and/or write more
efficient code. This is also important with security issues, such as buffer overflow problems with the runtime
stack. Sometimes, when using several different languages with separate compilation one has to use a linker to
change the way parameters are passed (e.g., stacked or queued) between assembler or machine code files. An
example of how knowing the hardware configuration and how the compiler represents data structures is that
it helps the programmer write code so that cache memory hits are higher. Knowing the parallel architecture
or the distributed computer network topology can help the programmer write better code for parallel or
distributive programming. For example, the simple knowledge of the number of processors and whether
shared or local distributed memory is configured helps the way the code is written. Studying programming
language theory helps ask the right questions and helps to know how to answer them.
Copyright 2024
5
1.3 The Evolution of Languages
1.3
Introduction
The Evolution of Languages
There are over 1000 programming languages according to some counts. There are dialects, much like U.S.
English versus British English. The difference between a dialect and a new language is not clear. Nevertheless,
there are many languages and here is an attempt to try to address why there are so many.
New approaches to software design. In early days of software design it was clear that some structure was
needed in expressing the flow of an algorithms representing programs. Flowcharts were used and in the late
1960s and early 1970s saw the introduction of the discipline of writing structured flowcharts. There should
be one entrance and one exit for each process, where each process is a simple task or a selection of one of
two sub-processes or an iteration of one subprocess. These three constructs were sequenced. Fortran and
COBOL used goto statements and had no selection statements or iteration statements. So new languages
were designed with if-then block, while-loop blocks, nested blocks, and other similar structures to reflect
n
this structured design. Thus, structured programming was adopted and new languages all included the
ut
io
higher-level control constructs to reflect this software design approach. It became apparent that non-local
variables shared between code in the nested blocks became difficult to maintain. Object-oriented design
demonstrated many advantages in development, maintenance, and reuse of code and in the 1980s and
rib
1990s new programming languages and dialects began to appear to reflect object-oriented design concepts.
Languages like Smalltalk, C++, Eiffel, and Java were introduced.
st
Changes to new hardware technology and related software designs. Smaller and faster processors, denser
and faster main memories, faster graphics processors, new display technology, and new input technology
di
(e.g., mice, touch-pads) led to personal computers and mobile devices with graphical user interfaces, which
led to the need for new approaches to software designs, such as event-driven program, related design patterns,
or
such as the model-view-controller design pattern. New languages were developed to help reflect the software
designs in a more direct way that with say older (but widely-used) languages, like C. C could be used, but
ot
f
the implementing the designs were closely modeled and it was easier to express the underlying assumptions
of the designs, such as concurrent operations. For example, threading was included in the semantics of the
N
first versions, such as Java, Ruby, Swift, Objective-C, and C#. The concept of delegation is important in
event-driven program and the concept of a delegate is in the language definition of C#.
Domain-Specific Languages. Lisp was developed for A.I. applications. In these applications symbols and
nested data structures are important in representing algorithms and data structures in these applications.
Prolog was used for processing natural languages and A.I. applications through the use of logic for representing knowledge bases. R is good for statistical applications. There are several dialects of Lisp and Prolog and
they all have been used for more general applications. Some companies develop code for a particular product and rather than use a general-purpose language, choose to develop an in-house (sometimes proprietary)
language for developing code.
Legacy Languages. Programs and software systems that were written originally in early versions of
Fortran, PL/I, and COBOL still are in use today. There has been a large amount of time and money
invested into some large systems of programs using these older languages. Several systems are tailored for
core subsystems (e.g., transaction systems) in insurance, banking, government, and health institutions. It
would be too expensive to rewrite these systems with new languages and also opens up the possibility of
Copyright 2024
6
Introduction
1.4 Reflections on the Purpose of Programming Languages
introducing errors. New software engineers and programmers have to learn (at least part) of these legacy
languages and to create interfaces with the new code written in current languages that utilize the older
systems.
1.4
Reflections on the Purpose of Programming Languages
All human languages are used for communicating thoughts and ideas and to share knowledge. One use of a
language is to convey or express knowledge of how to complete a task or solve a problem, such as transforming some data into another form of data, how to construct an artifact, or to describe how to react to the
surrounding environment given certain stimuli. Computer scientists call the steps to complete tasks, algorithms, which must be expressed in a language and they are used to solve computational problems. Human
or natural languages are used for communicating solutions or algorithms directly from human to human
n
through notations or sounds and sometimes only with sounds. Algorithms can be viewed as computations
ut
io
that are carried out by some machines or cooperative machines with or without some kind of memory and
input/output interfaces, called a computer system or, simply, a computer. Computers or machines can be
abstract (computational models) or physical. Languages used to express computations or algorithms on
rib
physical or abstract machines are called programming languages. Algorithms, computational models, computational models and computational problems are all terms that have rigorous meanings and are covered
st
in Section 1.5.
As done within the study of human-to-human communication of computations, we look at the descrip-
di
tion and use of programming languages for human-to-machine communication of computations. With all
communication there is a sender and a receiver where the exchanged content is encoded in some language. In
or
the case of human-to-machine communication, content (the message) are the instructions that are expressed
ot
f
in a programming language, the sender is a programmer who is the generator of the instructions in that
language and the receiver is a machine that is designed to decode the instructions and execute the necessary
actions. A programming language also has to have the expressiveness for representing the objects, such as
N
numbers, and their properties utilized within the algorithm. This is a very simplistic and birds-eye view of
communication, languages, and the concept of a machine. It is intended to get the reader to think about
relationships of algorithms, machines (or computational models), and languages and their roles in computations. It will provide the foundation of looking at the design, implementation, and use of programming
languages. Before moving on, let’s take a look at a simple example.
Example 1.1 To help illustrate the use of some concepts and terminology introduced thus far, let’s look at
a simple computational problem. Suppose one person wants to communicate to another person on how to
sum a finite sequence of numbers. We want to transform these numbers into one number, their collective
sum. We need an algorithm expressed in some language, such as a natural language like English. Suppose
the sender (e.g., teacher) in this case, might give the following instructions in English:
“Keep a running total. Read the first number and let that be the current running total. As long as
there is another number, add that number to the current total. When there are no more numbers
the total is your answer.”
Copyright 2024
7
1.4 Reflections on the Purpose of Programming Languages
Introduction
The receiver (student) most likely will formulate some mental or physical model of a computer system to
execute instruction according to the semantics of the English phrases. From this example we see how humanto-human communication can take place. In natural language, many assumptions are made. In this example
it assumed the total is initially zero since no numbers are read at the start of the task. There is no explicit
statement about when the sequence ends; it may be assumed that there is some sort of ending signal from
the list, such as no more numbers on the source document containing the numbers. When communicating
with a machine or telling the machine how to carry out this algorithm, we need a more detailed and formal
description of a language and machine.
When communicating an algorithm to a machine, we need to know how the machine works, use a specific
syntax with rigorous semantics, and understand how the input data to the algorithm are retrieved and where
and how to send the output. Shown below is a C++ program to represent the solution to the computational
problem of summing a finite sequence of numbers. In contrast to the general problem, suppose we had to
n
pick a particular set of numbers, such as integers, on the underlying abstract machine, which is, in turn,
ut
io
restricted by the underlying physical machine. These are just a couple of items that have to be rigorously
defined and specified.
ot
f
or
di
st
rib
Sum Function in C++
#include
using namespace std;
int main()
{
int sumTotal = 0;
int num;
while (!cin.eof()) {
cin >> num;
sumTotal += num;
}
cout