Symbol
Tables Code Generation Issues
Introduction
In Intermediate code
generation we use syntax directed methods to translate the source program into
an intermediate form programming language constructs such as declarations,
assignments and flow-of-control statements.

There are three types of intermediate
representation:-
1. Syntax Trees
2. Postfix notation
3. Three Address Code
Semantic rules for generating three-address code
from common programming language constructs are similar to those for
constructing syntax trees of for generating postfix notation.
Graphical Representations
A syntax tree depicts the natural hierarchical structure
of a source program. A DAG (Directed Acyclic Graph) gives the same information
but in a more compact way because common sub-expressions are identified. A
syntax tree for
the assignment statement a:=b*-c+b*-c appear in
the figure.
.
fig8.2
Postfix notation is a linearized representation
of a syntax tree; it is a list of the nodes of the in which a node appears
immediately after its children. The postfix notation for the syntax tree in the
fig is
a b c
uminus + b c uminus * + assign
The edges in a syntax tree do not appear
explicitly in postfix notation. They can be recovered in the order in which the
nodes appear and the no. of operands that the operator at a node expects. The
recovery of edges is similar to the evaluation, using a staff, of an expression
in postfix notation.
Syntax
tree for assignment statements are produced by the syntax directed definition
in fig.
|
Production |
Semantic
Rule |
|
S
® id :=
E |
S.nptr
:= mknode( ‘assign’, mkleaf(id, id.place), E.nptr) |
|
E
® E1 + E2 |
E.nptr
:= mknode(‘+’, E1.nptr ,E2.nptr) |
|
E
® E1 * E2 |
E.nptr
:= mknode(‘* ’, E1.nptr ,E2.nptr) |
|
E
® - E1 |
E.nptr
:= mkunode(‘uminus’, E1.nptr) |
|
E
® ( E1 ) |
E.nptr
:= E1.nptr |
|
E
® id |
E.nptr
:= mkleaf(id, id.place) |
This
same syntax-directed definition will produce the dag if the functions mkunode(op, child) and mknode(op, left, right) return a
pointer to an existing node whenever possible, instead of constructing new
nodes. The token id has an attribute
place that points to the symbol-table entry for the identifier id.name, representing the lexeme
associated with that occurrence of id. lf the lexical analyzer holds all
lexemes in a single array of characters, then attribute name might be the index of the first character of the lexeme.
Two representations of the syntax tree in Fig8.2 appear in Fig.8.4. Each node
is represented as a record with a field for its operator and additional fields
for pointers to its children. In Fig 8.4(b), nodes are allocated from an array
of records and the index or position of the node serves as the pointer to the
node. All the nodes in the syntax tree can be visited by following pointers,
starting from the root at position IO.
fig8.4(a)
0
|
id
|
b
|
|
1
|
id
|
c
|
|
2
|
uminus
|
1
|
|
3
|
*
|
0
|
2
|
4
|
id
|
b
|
|
5
|
id
|
c
|
|
6
|
uminus
|
5
|
|
7
|
*
|
4
|
6
|
8
|
+
|
3
|
7
|
9
|
id
|
a
|
|
10
|
assign
|
9
|
8
|
11
|
……
|
|
|
Three-address
code is a sequence of statements of the general form
X:= Op Z
where
x, y, and z are names, constants, or compiler-generated temporaries; op stands
for any operator, such as a fixed- or floating-point arithmetic operator, or a
logical operator on Boolean-valued data. Note that no built-up arithmetic
expressions are permitted, as there is only one operator on the right side of a
statement. Thus a source language expression like x+y*z might be translated
into a sequence
Z
X
+
where
t1 and t2 are compiler-generated temporary names. This unraveling cf
complicated arithmetic expressions and of nested flow-of-control statements
makes three-address code desirable for target code generation and optimization.
The use of names for the intermediate values computed by a program allow-
three-address code to be easily rearranged – unlike postfix notation.
three-address code is a linearized representation of a syntax tree or a dag in
which explicit names correspond to the interior nodes of the graph. The syntax
tree and dag in Fig. 8.2 are represented by the three-address code sequences in
Fig. 8.5. Variable names can appear directly in three-address statements, so
Fig. 8.5(a) has no statements corresponding to the leaves in Fig. 8.4.
Code for syntax tree
t1 := -c
t2 := b
* t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
Code for DAG
t1 := -c
t2 := b * t1
t5 := t2 + t2
a := t5
The
reason for the term ”three-address code” is that each statement usually
contains three addresses, two for the operands and one for the result. In the
implementations of three-address code given later in this section, a
programmer-defined name is replaced by a pointer tc a symbol-table entry for
that name.
Three-address
statements are akin to assembly code. Statements can have symbolic labels and
there are statements for flow of control. A symbolic label represents the index
of a three-address statement in the array holding inter- mediate code. Actual
indices can be substituted for the labels either by making a separate pass, or
by using ”back patching,” discussed in Section 8.6. Here are the common
three-address statements used in the remainder of this book:
1.
Assignment statements of the form x: = y
op z, where op is a
binary arithmetic or logical
operation.
2.
Assignment instructions of the form x:= op y, where op is a unary operation.
Essential unary operations include unary minus, logical negation, shift
operators, and conversion operators that, for example, convert a fixed-point
number to a floating-point number.
3. Copy statements of the form x: = y where the
value of y is assigned to x.
4.
The unconditional jump goto L. The three-address statement with label L is the
next to be executed.
5.
Conditional jumps such as if x relop
y goto L. This instruction applies a relational operator (<, =, >=, etc.)
to x and y, and executes the statement with label L next if x stands in
relation relop to y. If not,
the three-address statement following if x relop y goto L is executed next, as in the usual sequence.
6.
param x and call p, n for procedure calls and return y, where y representing a
returned value is optional. Their typical use is as the sequence of
three-address statements
param
x1
param
x2
param
xn
call
p, n
generated
as part of a call of the procedure p(x,, x~,..., x”). The integer n indicating
the number of actual parameters in ”call p, n” is not redundant because calls
can be nested. The implementation of procedure calls is outline d in Section
8.7.
7.
Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of
these sets x to the value in the location i memory units beyond location y. The
statement x[i]:=y sets the contents of the location i units beyond x to the
value of y. In both these instructions, x, y, and i refer to data objects.
8.
Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of these sets the value
of x to be the location of y. Presumably y is a name, perhaps a temporary, that
denotes an expression with an I-value such as A[i, j], and x is a pointer name
or temporary. That is, the r-value of x is the l-value (location) of some
object!. In the statement x: = ~y, presumably y is a pointer or a temporary
whose r- value is a location. The r-value of x is made equal to the contents of
that location. Finally, +x: = y sets the r-value of the object pointed to by x
to the r-value of y.
The choice of allowable operators is
an important issue in the design of an intermediate form. The operator set must
clearly be rich enough to implement the operations in the source language. A
small operator set is easier to implement on a new target machine. However, a
restricted instruction set may force the front end to generate long sequences
of statements for some source, language operations. The optimizer and code
generator may then have to work harder if good code is to be generated.
When
three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. The value of non-terminal E on the left side of E à E1 + E will be
computed into a new temporary t. In general, the three- address code for id: =
E consists of code to evaluate E into some temporary t, followed by the
assignment id.place: = t. If an expression is a single identifier, say y, then
y itself holds the value of the expression. For the moment, we create a new
name every time a temporary is needed; techniques for reusing temporaries are
given in Section S.3. The S-attributed definition in Fig. 8.6 generates
three-address code for assignment statements. Given input a: = b+ – c + b+ – c,
it produces the code in Fig. 8.5(a). The synthesized attribute S.code
represents the three- address code for the assignment S. The non-terminal E has
two attributes:
1. E.place, the name that will hold the
value of E, and
2. E.code, the sequence of
three-address statements evaluating E.
The
function newtemp returns a sequence of distinct names t1, t2,... in response to
successive calls. For convenience, we use the notation gen(x ’: =’ y ’+’ z) in
Fig. 8.6 to represent the three-address statement x: = y + z. Expressions
appearing instead of variables like x, y, and z are evaluated when passed to
gen, and quoted operators or operands, like ’+’, are taken literally. In
practice, three- address statements might be sent to an output file, rather
than built up into the code attributes. Flow-of-control statements can be added
to the language of assignments in Fig. 8.6 by productions and semantic rules
)like the ones for while statements in Fig. 8.7. In the figure, the code for S
- while E do S, is generated using’ new attributes S.begin and S.after to mark
the first statement in the code for E and the statement following the code for S, respectively.


These
attributes represent labels created by a function new label that returns a new label every time it is called.
Note that S.after becomes the
label of the statement that comes after the code for the while statement. We
assume that a non-zero expression represents true; that is, when the value of F becomes zero, control leaves the
while statement. f:expressions that govern the flow of control may in general
be Boolean expressions containing relational and logical operators. The
semantic rules for while statements in Section 8.6 differ from those in Fig.
8.7 to allow for flow of contro1 within Boolean expressions. Postfix notation
-an be obtained by adapting the semantic rules in Fig. 8.6 (or see Fig. 2.5).
1he postfix notation for an identifier is the identifier itself. The rules for
the other productions concatenate only the operator after the code for the
operands. For example, associated with the production E – E, is the semantic rule
E.code:=
E1.code || ’uminus’
1n
general, the intermediate form produced by the syntax-directed translations in
this chapter can he changed by making similar modifications to the semantic
rules.
A
three-address statement is an abstract form of intermediate code. In a
compiler, these statements can be implemented as records with fields for the
operator and the operands. Three such representations are quadruples, triples,
and indirect triples.
A
quadruple is a record structure
with four fields, which we call op, arg l, arg 2, and result. The op field
contains an internal code for the operator. The three-address statement x:= y
op z is represented by placing y in arg 1. z in arg 2. and x in result.
Statements with unary operators like x: = – y or x: = y do not use arg 2.
Operators like param use neither arg2 nor result. Conditional and unconditional
jumps put the target label in result. The quadruples in Fig. H.S(a) are for the
assignment a: = b+ – c + b i – c. They are obtained from the three-address code
in Fig. 8.5(a). The contents of fields arg 1, arg 2, and result are normally
pointers to the symbol-table entries for the names represented by these fields.
If so, temporary names must be entered into the symbol table as they are
created.
To
avoid entering temporary names into the symbol table. we might refer to a temporary
value bi the position of the statement that computes it. If we do so,
three-address statements can be represented by records with only three fields:
op, arg 1 and arg2, as in Fig. 8.8(b). The fields arg l and arg2, for the
arguments of op, are either pointers to the symbol table (for programmer-
defined names or constants) or pointers into the triple structure (for
temporary values). Since three fields are used, this intermediate code format
is known as triples.’ Except for the treatment of programmer-defined names,
triples correspond to the representation of a syntax tree or dag by an array of
nodes, as in Fig. 8.4.
|
|
op |
Arg1 |
Arg2 |
Result |
|
(0) |
uminus |
c |
|
t1 |
|
(1) |
* |
b |
t1 |
t2 |
|
(2) |
uminus |
c |
|
t3 |
|
(3) |
* |
b |
t3 |
t4 |
|
(4) |
+ |
t2 |
t4 |
t5 |
|
(5) |
:= |
t5 |
|
a |
|
|
op |
Arg1 |
Arg2 |
|
(0) |
uminus |
c |
|
|
(1) |
* |
b |
(0) |
|
(2) |
uminus |
c |
|
|
(3) |
* |
b |
(2) |
|
(4) |
+ |
(1) |
(3) |
|
(5) |
:= |
a |
(4) |
fig8.8(a) Qudraples
fig8.8(b) Triples
Parenthesized numbers represent
pointers into the triple structure, while symbol-table pointers are represented
by the names themselves. In practice, the information needed to interpret the
different kinds of entries in the arg 1 and arg2 fields can be encoded into the
op field or some additional fields. The triples in Fig. 8.8(b) correspond to
the quadruples in Fig. 8.8(a). Note that the copy statement a:= t5 is encoded
in the triple representation by placing a in the arg 1 field and using the
operator assign. A ternary operation like x[ i ]: = y requires two entries in
the triple structure, as shown in Fig. 8.9(a), while x: = y[i] is naturally
represented as two operations in Fig. 8.9(b).

Another
implementation of three-address code that has been considered is that of
listing pointers to triples, rather than listing the triples themselves. This
implementation is naturally called indirect triples. For example, let us use an
array statement to list pointers
to triples in the desired order. Then the triples in Fig. 8.8(b) might be
represented as in Fig. 8.10.

The
difference between triples and quadruples may be regarded as a matter of how
much indirection is present in the representation. When we ultimately produce
target code, each name, temporary or programmer-defined, will be assigned some
run-time memory location. This location will be placed in the symbol-table
entry for the datum. Using the quadruple notation, a three- address statement
defining or using a temporary can immediately access the location for that
temporary via the symbol table.
A more important benefit of
quadruples appears in an optimizing compiler, where statements are often moved
around. Using the quadruple notation, the symbol table interposes an extra
degree of indirection between the computation of a value and its use. If we
move a statement computing x, the statements using x require no change.
However, in the triples declaration. moving a statement that defines a
temporary value requires us to change all references to that statement in the
arg 1 and arg2 arrays. This problem makes triples difficult to use in an
optimizing compiler.
Indirect triples present no such
problem. A statement can be moved by reordering the statement list. Since
pointers to temporary values refer to the r>p-arg 1-arg2 array(s), which are
not changed, none of those pointers need be changed. Thus, indirect triples
look very much like quadruples as far as their utility is concerned. The two
notations require about the same amount of space and they are equally efficient
for reordering of code. ~s with ordinary triples. Allocation of storage to
those temporaries needing it must be deferred in the code generation phase.
However, indirect triples can save some space compared with quadruples if the
same temporary value is used more than once. The reason is that two or more
entries in the statement array
can point to the same line of the
op-arg1-arg2 structure. For example, lines (l4) and (l6) of Fig. 8.10
could be combined and we could then combine (l5) and (17).
As the sequence of declarations
in a procedure or block is examined, we can lay out storage for names local to
the procedure. For each local name, we create a symbol-table entry with
information like the type and the relative address of the storage for the name.
The relative address consists of an offset from the base of the static data
area or the field for local data in an activation record. When the front end
generates addresses, it may have a target machine in mind. Suppose that
addresses of consecutive integers differ by 4 on a byte- addressable machine.
The address calculations generated by the front end may therefore include
multiplications by 4. The instruction set of the target machine may also favor
certain layouts of data objects, and hence their addresses. We ignore alignment
of data objects here, Example 7.3 shows how data objects are aligned by two
compilers.
The
syntax of languages such as C, Pascal, and Fortran, allows all the declarations
in a single procedure to be processed as a group. In this case, a global
variable, say offset, can keep track of the next avai1able relative address. In
the translation scheme of Fig. S.I1 non-terminal P generates a sequence of
declarations of the form id: T. Before ’.he first declaration is considered,
offset is set to 0. As each new name is seen, that name is entered in the
symbol table with offset equal to the current value of offset, and offset is
incremented by the width of the data object denoted by that name. The procedure
enter(name, type, offset)
creates a symbol-table entry for name, gives it type and relative address offset in its data area. We use synthesized
attributes type and width for non-terminal T to indicate the type and width, or
number of memory units taken by objects of that type. Attribute type represents
a type expression constructed from the basic types integer and real by applying
the type constructors pointer and array, as in Section 6.l. If type expressions
are represented by graphs, then attribute type might be a pointer to the node
representing a type expression. In Fig. 8. I , integers have width 4 and real
have width 8. The width of an array is obtained by multiplying the width of each element by the number of
elements in the array.- The width of each pointer is assumed to be 4.
P
à D
D
à D ; D
D
à id : T {enter (id.name, T.type,
offset);
Offset:= offset + T.width }
T
à integer {T.type :=integer;
T.width
:=4}
T
à real {T.type := real;
T.width
:= 8}
T
à array [num ]
of T1 {T.type :=array(num.val,
T1.type);
T.width
:=num.val X T1.width}
T
à ^T1 {T.type :=pointer
(T.type);
T.width:=4}
In Pascal and C, a
pointer may be seen before we learn the type of the object pointed to Storage
allocation for such types is simpler if all pointers have the same width. The
initialization of offset in the
translation scheme of Fig. 8.1 is more
evident if the first production appears on one line as:
P à{offset:= 0 } D
Non-terminals
generating a. called marker non-terminals in Section 5.6, can be used to
rewrite productions so that all actions appear at the ends of right sides.
Using a marker non-terminal M,
(8.2) can be restated as:
P à M D
M àε (offset:= 0}
In a language with
nested procedures, names local to each procedure can be assigned relative
addresses using the approach of Fig. 8.11 . When a nested procedure is seen,
processing of declarations in the enclosing procedure is temporarily suspended.
This approach will he illustrated by adding semantic rules to the following
language.
P à D
D
à D;D | id: T proc id; D;S
The
production for non-terminals S for statements and T for types are not shown
because we focus on declarations. The non-terminal T has synthesized attributes
type and width, as in the translation scheme of Fig. For simplicity, suppose
that there is a separate symbol table for each procedure in the language (8.3).
One possible implementation of a symbol table is a linked list of entries for
names. Clever implementations can be substituted if desired. A new symbol table
is created when a procedure declaration D proc id D~; 5 is seen, and entries for the declarations in D~ are created in the new table. The
new table points back to the symbol table of the enclosing procedure; the name
represented by id itself is local to the enclosing procedure. The only change
from the treatment of variable declarations in Fig. 8.11 is that the procedure
enter is told which symbol table to make an entry in. For example, symbol
tables for five procedures are shown in Fig. 8. l2. The nesting structure of
the procedures can be deduced from the links between the symbol tables; the
program is in Fig. 7.22. The symbol tables for procedures readarray, exchange,
and quicksort point back to that for the containing procedure sort, consisting
of the entire program. Since partition is declared within quicksort, its table
points to that of quick sort.

I.
mktable(previous) creates a new symbol table and returns a pointer to the new
table. The argument previous points to a previously created symbol table,
presumably that for the enclosing procedure. The pointer previous is placed in
a header for the new symbol table, along with additional information such as
the nesting depth of a procedure. We can also number the procedures in the
order they are declared and keep this number in the header.
2.
enter(table, name, type, offset) creates a new entry for name name in
the symbol table pointed to by table. Again, enter places type and relative address offset in fields
within the entry.
3. addwidth(table, width) records the
cumulative width of all the entries
table in the header associated with this symbol table.
4. enterproc (table, name, newtable)
creates a new entry for procedure name
in the symbol table pointed to by
table. The argument newtable
points to the symbol table for this procedure name.
The
translation scheme in Fig. S. l3 shows how data can be laid out in one pass,
using a stack tblptr to hold pointers to symbol tables of the enclosing
procedures. With the symbol tables ot’ Fig. 8.12, tblptr will contain pointers
to the tables for -ort, quicksort, and partition when the declarations in
partition are considered. The pointer to the current symbol table is on top.
The other stack offset is the natural generalization to nested procedures of
attribute offset in Fig. 8. l I. The top element of offset is the next
available relative address for a local of the current procedure. All semantic
actions in the sub-trees for B and C in
A B C { actionA}
are
done before actionA the end of the production occurs. Hence, the action associated
with the marker M in Fig. 8.l3 is the first to be done. The action for
non-terminal M initializes stack tblptr
with a symbol table for the outermost scope, created by operation mktable(nil).
The action also pushes relative address 0 onto stack offset. The non-terminal V
plays a similar role when a procedure declaration appears. Its action uses the
operation mktable(top(tblptr)) to create a new symbol table. Here the argument
top(tblptr) gives the enclosing scope of the new table. A pointer to the new
table is pushed above that for the enclosing scope. Again, 0 is pushed onto
offset.
For each variable declaration id: T. an entry is created for id in
the current symbol table. This declaration leaves the stack pointer unchanged; the
top of stack offset is incremented by T.width. when the action on the right
side of D proc id: N D,; S occurs. the width of all
declarations generated by D1 is on top of stack offset.’, it is recorded using addwidth.
and offset are then popped, and we revert to examining the declarations in the
closing procedure. At this point, the name of the enclosed procedure is entered
into the symbol table of its enclosing procedure.
P
à M D {addwidth(top(tblptr),
top(offset));
Pop(tblptr); pop(offset)}
M
à ε { t :=
mktable(nil);
Push(t,tblptr);
push(0,offset)}
D
à D1 ;D2
D
à proc id ; N D1
;S { t := top(tblptr);
addwidth(t.top(offset));
pop(tblptr);
pop(offset);
enterproc(top(tblptr),
id.name, t)}
D
à id : T {enter(top(tblptr),id.name,T.type,top(offset));
top(offset) := top(offset) +T.width }
N
à ε { t :=
mktable(top(tblptr));
Push(t, tblptr); push(0, offset)}
The
following production allows non-terminal
T to generate records in addition to basic types, pointers, and arrays:
T à record D end
The
actions in the translation scheme of Fig. S.I4 emphasize the similarity between
the layout of records as a language construct and activation records. Since procedure
definitions do not affect the width computations in Fig. 8.13, we overlook the
fact that the above production also allows procedure definitions to appear
within records.
T
à record L D end
{T.type :=
record(top(tblptr));
T.width
:= top(offset);
Pop(tblptr); pop(offset) }
Là ε {
t:= mktable(nil);
Push(t,
tblptr); push (0, offset) }
After
the keyword record is seen, the acting associated with the marker
creates
a new symbol table for the field names. A pointer to this symbol table is
pushed onto stack tblptr and
relative address 0 is pushed onto stack
. The action for D à
id: T in Fig. 8.13 therefore
enters information about the field name id into the symbol table for the
record. Furthermore, the top of stack will hold the width of all the data objects
within the record after the fields have been examined. The action following end
in Fig. 8. 14 returns the width as synthesized attribute T.width. The type
T.type is obtained by applying the constructor record to the pointer to the symbol table for this
record.
Expressions
can be of type integer, real, array and record in this section As part of the
translation of assignments into three-address code, we show how names can be
looked up in the symbol table and how elements of arrays and records can be
accessed.
In the previous Section we formed three-address
statements using names themselves, with the understanding that the names stood
for pointers to their symbol-table entries. The translation scheme in Fig. 8.15
shows how such symbol-table entries can be found. The lexeme for the name
represented by id is given by attribute id.name. Operation lookup (id.name) checks if there
is an entry for this occurrence of the name in the symbol table. If so, a
pointer to the entry is returned; otherwise, lookup returns nil
to indicate that no entry was found.
The semantic actions in Fig. 8.15
use procedure emit to emit three-address statements to an output file,
rather than building up code attributes for non-terminals, as in Fig.
8.6. From Section 2.3, translation can be done by emitting to an output file if
the code attributes of the non-terminals on the left side of productions
are formed by concatenating the code attributes of the non-terminals on
the right in the same order that the non-terminals appear on the right side.
Perhaps with some additional strings in between. By reinterpreting the
lookup operation in Fig.8.15, the translation scheme can be used even if
the most closely nested scope rule applies to nonlocal names, as in Pascal. For
concreteness, suppose that the context in which an assignment appears is given
by the following grammar
P
à MD
M
à ε
D
à d ; d | id :T
| proc id ; N D ; S
N
à ε
Non-terminal
P becomes the new start symbol when these productions are added to those in
Fig. 8.15.
S
à id := D {p=lookup(id.name);
if p!=nil
then
emit(P’:=’
E.place)
else error }
E
à E1 + E2 {E.place := newtemp;
emit(E.place
‘:=’E1.place ‘+’ E2.place)}
E
à E1 * E2 {E.place ;= newtemp;
Emit(E.place
‘:=’ E1.place ‘*’ E2.place)}
E
à -E1 {E.place := newtemp;
Emit(E.place
‘:=’ ‘uminus’ E1.place)}
E
à ( E1 ) {E.place := E1.place }
E
à id { p:= lookup(id.name);
If p != nil then
E.place
:= p
else error}
Translation
scheme to produce three-address code for assignments.
For each procedure generated by this
grammar, the translation scheme in Fig. 8. l3 sets up a separate symbol table.
Each such symbol table has a header containing a pointer to the table for the
enclosing procedure. When the statement forming a procedure body is examined. A
pointer to the symbol table for the procedure appears on top of the stack
tblptr. This pointer is pushed onto the stack by actions associated with
the marker non-terminal N on the right side of D àproc id; N D1 ; S. Let the
productions for non-terminal S be those in Fig. 8.l5. Names in an
assignment generated by S must have been declared in either the
procedure that S appears in. or in some enclosing procedure. When
applied to name, the modified lookup operation first checks if
name appears in the current symbol table, accessible through
top(tblptr). If not, lookup uses the pointer in the header of a
table to find the symbol table for the enclosing procedure and looks for the
name there. If the name cannot be found in any of these scopes, then lookup
returns nil.
For example, suppose that the symbol
tables are as in Fig. 8.l2 and that an assignment in the body of procedure
partition is being examined. Operation lookup(i) will find an entry in
the symbol table for partition. Since v is not in this symbol table, lookup(v)
will use the pointer in the header in this symbol table to continue the search
in the symbol table for the enclosing procedure quicksort.
We
have been going along assuming that newtemp generates a new temporary
name each time a temporary is needed. It is useful, especially in optimizing
compilers, to actually create a distinct name each time newtemp is
called. However, the temporaries used to hold intermediate values in expression
calculations tend to clutter up the symbol table, and space has to be allocated
to hold their values. Temporaries can be reused by changing newtemp. An
alternative approach of packing distinct temporaries into the same location
during code generation is explored in the next chapter. The, bulk of
temporaries denoting data are generated during the syntax-directed translation
of expressions, by rules such as those in Fig. 8.15. The code generated by the
rules for Eà
E1 + E2 has the general form:
evaluate
E1 into t1
evaluate
E2 into t2
t:=
t1 + t2
From the rules for the synthesized
attribute E.place it follows that t1 and t2 are not used elsewhere in
the program. The lifetimes of these temporaries are nested like matching pairs
of balanced parentheses. In fact, the lifetimes of all temporaries used in the
evaluation of E2 are contained in the lifetime of t1. It is therefore
possible to modify newtemp so that it uses, as if it were a stack, a
small array in a procedure’s data area to hold temporaries. Let us assume for
simplicity that we are dealing only with integers. Keep a count c,
initialized to zero. Whenever a temporary name is used as an operand, decrement
c by 1. Whenever a new temporary name is generated, use $c and increase
c by 1. Note that the ”stack” of temporaries is not pushed or popped at run
time, although it happens that stores and loads of temporary values are made by
the compiler to occur at the ”top.”
Temporaries that may be assigned
and/or used more than once, for example, in a conditional assignment, cannot be
assigned names in the last-in first-out manner described above. Since they tend
to be rare, all such temporary values can he assigned names of their own. The
same problem of temporaries defined or used more than once occurs when we
perform code optimization such as combining common sub-expressions or moving a
computation out of a loop . A reasonable strategy is to create a new name
whenever we create an additional definition or use for a temporary or move its
computation.
|
Statement |
Value
of c |
|
|
0 |
|
$0:=
a * b |
1 |
|
$1:=
c * d |
2 |
|
$0:=
$0 + $1 |
1 |
|
$1:=
e * f |
2 |
|
$0:=
$0 - $1 |
1 |
|
x
:= $0 |
0 |
Fig8.16 Three-address code with stacked temporaries.
Elements
of an array can be accessed quickly if the elements are stored in a block of consecutive
locations. If the width of each array element is w, then the i th
element of array A begins in location
base
+ (i – low) x w
(8.4)
where
low is the lower bound on the subscript and base is the relative address
of the storage allocated for the array. That is, base is the relative
address of A[low]. The expression (8.4) can be partially evaluated at compile
time if it is rewritten as
i
X w + (base – low x w)
The
sub-expression c = base – low x w can be evaluated when the declaration
of the array is seen. We assume that c is saved in the symbol table entry for
A, so the relative address of A[i] is obtained by simply adding i x w to
c. Compile-time pre-calculation can also be applied to address calculations
of elements of multi-dimensional arrays. A two-dimensional array is normally
stored in one of two forms, either row-major (row-by-row) or
column-major (column-by-column). Figure 8.l7 shows the layout of a
2x3 array A in (a) row-major form and (b) column-major form. Fortran uses
column-major form; Pascal uses row-major form, because A[i,j] is equivalent to
A[i] [ j], and the elements of each array A[i] are stored consecutively. In the
case of a two-dimensional array stored in row-major form, the relative address
of A[i1,i2] can be calculated by
the formula
base
+ ((i1 – low1) x n2 + i2 – low2) x w
where
low1 and low2 are the lower bounds on the values of i1 and
i2 and n2 is the number of values that i2 can take. That is,
if high 2 is the upper bound on the value of i2, then n2 =
high2 – low2 + 1. Assuming that i1 and i2 are the only values
that are not known at compile time, we can rewrite the above expression as
((i1
* n2) + i2) X w + (base – ((low1 X n2) + low2) X w) (8. 5)

The
last term in this expression can be determined at compile time. We can
generalize row- or column-major form to many dimensions. The generalization of
row-major form is to store the elements in such a way that, as we scan down a
block of storage, the rightmost subscripts appear to vary fastest, like the
numbers on an odometer. The expression (8.5) generalizes to the following
expression for the relative address of A[i1, i2,..., ik]
(
( ……((i1n2+i2)n3+i3)…… )nk + ik) X w (8.
6)
+
base – (( ……((low1 n2 + low2) n3 +low3)…… )nk + lowk) x w
Since
for all j, nj = highj – low + 1 is assumed fixed, the term on the second
line of (8.6) can be computed by the compiler and saved with the symbol-table
entry for A. Column-major form generalizes to the opposite arrangement, with
the leftmost subscripts varying fastest. Some languages permit the sizes of
arrays to be specified dynamically, when a procedure is called at run-time. The
formulas for accessing the elements of such arrays are the same as for
fixed-size arrays, but the upper and lower limits are not known at compile
time.
The
chief problem in generating code for array references is to relate the
computation of (8.6) to a grammar for array references. Array references can be
permitted in assignments if non-terminal L with the following
productions is allowed where id appears in Fig. 8.15:
L
à
id [Elist ] | id
Elist
à
Elist, E | E
In
order that the various dimensional limits nj of the array be available
as we group index expressions into an Elist, it is useful to rewrite the
productions as
L
à
Elist] | id
Elist
à
Elist, E | id [ E
That
is, the array name is attached to the leftmost index expression rather than
being joined to Elist when an L is formed. These productions
allow a pointer to the symbol-table entry for the array name to be passed as a
synthesized attribute array of Elist.
We also use Elist.ndim to
record the number of dimensions (index expressions) in the Elist. The
function limit(array, j) returns nj, the number of elements along
the j th dimension of the array whose symbol -table entry is pointed to by
array. Finally, Elist.place denotes the temporary holding a value
computed from index expressions in Elist. An Elist that produces
the first m indices of a k-dimensional array reference A[i1,i2,...,
ik ] will generate three-address code to compute
(
……((i1 n2 + i2)n3 + i3)…… )nm + im (8.7)
using
the recurrence
e1
= i1
em
= e(m-1) X nm + im (8.8)
Thus,
when m = k, a multiplication by the width w is all that will be needed
to compute the term on the firs( line of (8.6). Note that the ij’s here
may really he values of expressions and code to evaluate those expressions will
be interspersed with code to compute (8.7).
An l-value L will have two
attributes. L.place and L.offset. In the case that L is a
simple name. L.place will he a pointer to the, symbol-table entry for
that name, and L.offset will be null, indicating that the l-value is a
simple name rather than an array reference. The non-terminal E has the
same translation E.place, with the same meaning as in Fig 8.15.
Semantic
actions will be added to the grammar:
1. S
à L := E
2.
E à
E + E
3.
E à ( E )
4.
E à L
5.
L à Elist ]
6.
L à id
7.
Elist à Elist , E
8.
Elist à id [ E
As
in the case of expressions without array references, the three-address code
itself is produced by the emit procedure invoked in the semantic
actions. We generate a normal assignment if L is a simple name, and an
indexed assignment into the location denoted by L otherwise:
(1)
S à
L:=E { if L.offset = null then /* L is
a simple id */
emit
(L.place ’:=’ E.place );
else
emit (L.place’[’L.offset’]’ ’:=’ E.place)
The code for arithmetic expressions
is exactly the same as in Fig. 8.15:
(2)
E à
E1 + E2 {E.place:= newtemp;
emit
(E.place ’:=’ E1.place ’+’ E2.place) }
(3)
E à
( E1 ) { E.place:=
E1.place }
When an array reference L is
reduced to E, we want the r-value of L. Therefore we use indexing
to obtain the contents of the location L.place [L.offset]:
(4)
E à
L {if
L.offset = null then /* L is a simple id */ E.place:= L.place
else begin
E.place: = newtemp;
emit (E.place ’:=’ L.place ’[‘
L.offset’]’ )
end
}
Below, L.offset is a new
temporary representing the first term of (8.6); function width(Elist.array)
returns w in (8.6). L.place represents the second term of (8.6),
returned by the function c(Elist.array).
(5)
L à
Elist ] {L.place:=
newtemp;
L.offset:=
newtemp;
emit(L.place
’:=’ c(Elist.array));
emit(L.offset
’:=’ Elist.place’*width(Elist.array)))
A null offset indicates a simple
name.
(6)
L à
id { L.place:=
id.place;
L.offset: = null
When the next index expression is
seen, we apply the recurrence (8.8). In the following action, Elist1.place
corresponds to e(m-1) in (8.8) and Elist.place to em .
Note that if Elist1 has m–l
components, then Elist on the left side of the production has m
components.
(7)
Elist à
Elist1, E { t:=
newtemp;
m:=
Elist1.ndim + l;
emit(t
’:=’ Elist1.place ’*’limit(Elist1.array, m));
emit(t
’:=’ t ’+’ E.place);
Elist.array:
= Elist1.array;
Elist.place:=
t;
Elist.ndim:=
m }
E.place holds both the value of the
expression E and the value of (8.7) for m=1.
(8)
Elist à
id [ E { Elist.array:=
id.place;
Elist.place:=
E.place;
Elist.ndim:
= l )

In
practice, there would be many different types of variables and constants, so
the compiler must either reject certain mixed-type operations or generate
appropriate coercion (type conversion) instructions. Consider the grammar for
assignment statements as above, but suppose there are two types – real and
integer, with integers converted to reals when necessary. We introduce another
attribute E.type, whose value is either real or integer.
The semantic rule for E.type associated with the production E à
E + E is:
E
à
E+E {E.type :=
if E1.type =
integer and
E2.type
= integer then integer
else real }
The
entire semantic rule for E à
E + E. and most of the other productions must be modified
to generate, when necessary, three-address statements of the form x: =
inttoreal y, whose effect is to convert integer y to a real of equal value,
called x. We must also include with the operator code an indication of whether
fixed or floating-point arithmetic is intended. The complete semantic action
for a production of the form E à
E, + E. is listed in Fig. 8. 19.
E.place
:= newtemp;
if
E1.type = integer and E2.type = integer then begin
emit(E.place’:=’ E1.place ’int+’
E2.place);
E.type: = integer
end
else
if E1.type = real and E2.type = real then begin
emit (E.place ’:=’ E1.place
’real +’ E2.place);
E.type := real
end
else
if E1.type = integer and E2.type = real then begin
u := newtemp;
emit(u ’:=’
’inttoreal’ E1.place);
emit(E.place ’:=’ u
’real+’ E2.place);
E.type:= real
end
else
if E1.type = real and E2.type = integer then begin
u := newtemp;
emit(u ’:=’
’inttoreal’ E2.place);
emit(E.place ’:=’ E1.place
’real+’ u);
E.type: = real
end
else
E.type:=
type error;
Fig.
8.19. Semantic action for E à
E1 + E2.
The
semantic action of Fig. 8. l9 uses two attributes E.place and E.type
for the non-terminal E. As the number of types subject to conversion
increases. The number of cases that arise increases quadratically (or worse, if
there are operators with more than two arguments). Therefore with large numbers
of types, careful organization of the semantic actions becomes more important.
The
compiler must keep track of both the types and relative addresses of the fields
of a record. An advantage of keeping this information in symbol-table entries
for the field names is that the routine for looking up names in the symbol
table can also be used for field names. With this in mind, a separate symbol
table was created for each record type by the semantic actions in Fig. 8.14. lf
r is a pointer to the symbol table for a record type, then the type
record(t) formed by applying the constructor record to the pointer
was returned as T.type We use the expression
p↑.info
+ 1
to
illustrate how a pointer to the symbol table can be extracted from an attribute
E.type. From the operations in this expression it follows that p must be a
pointer to a record with a field name info whose type is arithmetic. If types
are constructed as in Fig. 8.13 and 8.14, the type of p must be given by a type
expression
pointer
(record(t))
The
type of pt is then record(t), from which t can be extracted. The
field name info is looked up in the symbol table pointed to by t.
____________________________________________________________________________________________________
Symbol
Tables Code Generation Issues