Basic Blocks
____________________________________________________________________________________________________
Criteria for Code-Improving Transformations
Simply
stated, the best program transformations are those that yield the most benefit
for the least effort. The transformations provided by an optimizing compiler
should have several properties.
First,
a transformation must preserve the meaning of programs. That is, an
"optimization" must not change the output produced by a program for a
given input, or cause an error, such as a division by zero, that was not
present in the original version of the source program. The influence of this
criterion pervades this chapter; at all times we take the "safe"
approach of missing an opportunity to apply a transformation rather than risk
changing what the program does.
Second,
a transformation must, on the average, speed up programs by a measurable amount.
Sometimes we are interested in reducing the space taken by the compiled code,
although the size of code has less importance than it once had. Of course, not
every transformation succeeds in improving every program, and occasionally an
"optimization" may slow down a program slightly, as long as on the
average it improves things.
Third,
a transformation must be worth the effort. It does not make sense for a
compiler writer to expend the intellectual effort to implement a code improving
transformation and to have the compiler expend the additional time compiling
source programs if this effort is not repaid when the target programs are
executed. Certain local or "peephole" transformations of the kind are
simple enough and beneficial enough to be included in any compiler.
Some
transformations can only be applied after detailed, often time-consuming,
analysis of the source program, so there is little point in applying them to
programs that will be run only a few times. For example, a fast, nonoptimizing,
compiler is likely to be more helpful during debugging or for "student
jobs that will be run successfully a few times and thrown away. Only when the
program in question takes up a significant
fraction of the machine's cycles does improved code quality justify the
time spent running an optimizing compiler on the program.
Before we get into optimization as such we need to familiarize
ourselves with a few things
such as
x := x +0
Or
x := x*1
can
be eliminated from a basic block without changing the set of expressions it
computes. The exponentiation operator in the statements
x := y ** 2
usually
requires a function call to implement. Using an algebraic transformation, this
statement can be replaced by cheaper, but equivalent statement
x := y*y
B1
is a predecessor of B2, and B2is a successor of B1.
|
Prod := 0
|
B1
|
t2 := a [ t1 ] t3 := 4 * i t4 := b [ t3 ] t5 := t2 * t4 t6:= prod + t5 t7:=i+1 i := t7 if I <= 20 goto B2 |
B2
Fig .9 flow graph for program
REPRESENTATION
OF BASIC BLOCKS
Basic Blocks are
represented by variety of data structures. For example, after partitioning the
three address statements by Algorithm 1, each basic block can be represented by a record consisting of
a count of number of quadruples in the block, followed by a pointer to the
leader of the block, and by the list of predecessors and successors of the
block. For example the block B2 running from the statement (3) through (12) in
the intermediate code of figure 2 were moved elsewhere in the quadruples array
or were shrunk, the (3) in if i<=20 goto(3)
would
have to be changed.
Loop is a collection of nodes in
a flow graph such that
1.
All nodes in the collection are strongly connected; from any node in the
loop to any other, there is path of length one or more, wholly within the loop,
and
2.
The collection of nodes has a unique entry, a node in the loop such that
is, a node in the loop such that the only way to reach a node of the loop from
a node outside the loop is to first go through the entry.
A loop that contains no other loops
is called an inner loop.
A statement-by-statement
code-generations strategy often produce target code that contains redundant
instructions and suboptimal
constructs .The quality of such
target code can be improved by applying optimizing transformations to the
target program.
A simple but effective technique for
improving the target code is peephole optimization, a method for trying
to improving the performance of the target program by examining a short
sequence of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.
The peephole
is a small, moving window on the target program. The code in the peephole need
not contiguous, although some implementations do require this. We shall give
the following examples of program transformations that are characteristic of
peephole optimizations:
Redundant-instructions elimination
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
REDUNTANT
LOADS AND STORES
If
we see the instructions sequence
(1)
(1) MOV R0,a
(2)
(2) MOV a,R0
-we
can delete instructions (2) because whenever (2) is executed. (1) will ensure
that the value of a is already in register R0.If (2) had a label we
could not be sure that (1) was always executed immediately before (2) and so we
could not remove (2).
Another opportunity for
peephole optimizations is the removal of unreachable instructions. An unlabeled
instruction immediately following an unconditional jump may be removed. This
operation can be repeated to eliminate a sequence of instructions. For example,
for debugging purposes, a large program may have within it certain segments
that are executed only if a variable debug is 1.In C, the source code
might look like:
#define debug 0
.
If
( debug ) {
Print debugging information
}
In
the intermediate representations the if-statement may be translated as:
If debug =1 goto L2
Goto L2
L1: print debugging information
L2:
(a)
One
obvious peephole optimization is to eliminate jumps over jumps .Thus no matter
what the value of debug, (a) can be replaced by:
If debug ≠1 goto L2
Print debugging information
L2:
(b)
As the argument of the statement of (b)
evaluates to a constant true it can be replaced by
If debug ≠0 goto L2
Print debugging information
L2:
(c)
As the argument of the first statement of (c)
evaluates to a constant true, it can be replaced by goto L2.
Then all the
statement that print debugging aids are manifestly unreachable and can be
eliminated one at a time.
The unnecessary jumps can
be eliminated in either the intermediate code or the target code by the following
types of peephole optimizations. We can replace the jump sequence
goto L2
.
L1 : gotoL2
by
the sequence
goto L2
.
L1 : goto L2
If
there are now no jumps to L1, then it may be possible to eliminate the
statement L1:goto L2 provided it is preceded by an unconditional jump
.Similarly, the sequence
if a < b goto L1
.
L1 : goto L2
can
be replaced by
if a < b goto
L2
.
L1 : goto L2
Finally,
suppose there is only one jump to L1 and L1 is preceded by an unconditional
goto. Then the sequence
goto L1
..
L1:if a<b goto L2
L3:
..(1)
may
be replaced by
if a<b goto L2
goto L3
.
L3:
.(2)
While
the number of instructions in (1) and (2) is the same, we sometimes skip the
unconditional jump in (2), but never in (1).Thus (2) is superior to (1) in
execution time
There is no end to the amount of algebraic simplification that can be
attempted through peephole optimization. Only a few algebraic identities occur
frequently enough that it is worth considering implementing them .For example,
statements such as
x := x+0
Or
x := x * 1
are
often produced by straightforward intermediate code-generation algorithms, and
they can be eliminated easily through peephole optimization.
ELIMINATION
OF COMMON SUBEXPRESSIONS
Common
sub expressions need not be computed
over and over again. Instead they can be computed once and kept in store from
where its referenced when encountered again of course providing the variable
values in the expression still remain constant.
Its
possible that a large amount of dead(useless) code may exist in the program.
This might be especially caused when introducing variables and procedures as
part of construction or error-correction of a program once declared and
defined, one forgets to remove them in case they serve no purpose. Eliminating
these will definitely optimize the code
REDUCTION
IN STRENGTH
Reduction in strength
replaces expensive operations by equivalent cheaper ones on the target machine.
Certain machine instructions are considerably cheaper than others and can often
be used as special cases of more expensive operators. For example, x² is
invariably cheaper to implement as x*x than as a call to an exponentiation
routine. Fixed-point multiplication or division by a power of two is cheaper to
implement as a shift. Floating-point division by a constant can be implemented
as multiplication by a constant, which may be cheaper.
The target machine may
have hardware instructions to implement certain specific operations
efficiently. Detecting situations that permit the use of these instructions can
reduce execution time significantly. For example, some machines have
auto-increment and
auto-decrement
addressing modes. These add or subtract one from an operand before or after
using its value. The use of these modes greatly improves the quality of code
when pushing or popping a stack, as in parameter passing. These modes can also
be used in code for statements like i : =i+1.
Getting Better Performance
Dramatic
improvements in the running time of a program-such as cutting the running time
form a few hours to a few seconds-are usually obtained by improving the program
at all levels, from the source level to the target level, as suggested by fig.
At each level, the available options fall between the two extremes of finding a
better algorithm and of implementing a given algorithm so that fewer operations
are performed.
Algorithmic
transformations occasionally produce spectacular improvements in running time.
For example, Bentley relates that the running time of a program for sorting N
elements dropped from 2.02N^2 microseconds to 12Nlog2N microseconds then a
carefully coded "insertion sort" was replaced by "quicksort".
THE PRINCIPAL SOURCES OF
OPTIMIZATION
Here
we introduce some of the most useful code-improving transformations. Techniques
for implementing these transformations are presented in subsequent sections. A
transformation of a program is called local if it can be performed by looking
only at the statements in a bas9ic block; otherwise, it is called global. Many
transformations can be performed at both the local and global levels. Local
transformations are usually performed first.
Function-Preserving
Transformations
There
are a number of ways in which a compiler can improve a program without changing
the function it computes. Common subexpression elimination, copy propagation,
dead-code elimination, and constant folding are common examples of such function-preserving
transformations. The other transformations come up primarily when global
optimizations are performed.
Frequently,
a program will include several calculations of the same value, such as an
offset in an array. Some of these duplicate calculations cannot be avoided by
the programmer because they lie below the level of detail accessible within the
source language. For example, block B5 shown in fig recalculates 4*i and 4*j.

Local common
subexpression elimination
Common Subexpressions
An
occurrence of an expression E is called a common subexpression if E was
previously computed, and the values of variables in E have not changed since
the previous computation. We can avoid recomputing the expression if we can use
the previously computed value. For example, the assignments to t7 and t10 have
the common subexpressions 4*I and 4*j, respectively, on the right side in Fig.
They have been eliminated in Fig by using t6 instead of t7 and t8 instead of
t10. This change is what would result if we reconstructed the intermediate code
from the dag for the basic block.
Example:
Fig shows the result of eliminating both global and local common subexpressions
from blocks B5 and B6 in the flow graph of Fig. We first discuss the
transformation of B5 and then mention some subtleties involving arrays.

B5
and B6 after common subexpression elimination
After
local common subexpressions are eliminated B5 still evaluates 4*i and 4*j, as
shown in the earlier fig. Both are common subexpressions; in particular, the
three statements
t8:=
4*j; t9:= a[t[8]; a[t8]:=x
in
B5 can be replaced by
t9:=
a[t4]; a[t4:= x using t4 computed in block B3. In Fig. observe that as control
passes from the evaluation of 4*j in B3 to B5, there is no change in j, so t4
can be used if 4*j is needed.
Another
common subexpression comes to light in B5 after t4 replaces t8. The new
expression a[t4] corresponds to the value of a[j] at the source level. Not only
does j retain its value as control leaves b3 and then enters B5, but a[j], a
vlue computed into a temporary t5, does too becaoude there are no assignments
to elements of the array a in the interim. The statement
t9:=
a[t4]; a[t6]:= t9
in
B5 can therefore be replaced by
a[t6]:=
t5
The
expression in blocks B1 and B6 is not considered a common subexpression
although t1 can be used in both places.After control leaves B1 and before it
reaches B6,it can go through B5,where there are assignments to a.Hence, a[t1]
may not have the same value on reaching
B6 as it did in leaving B1, and it is not safe to treat a[t1] as a common
subexpression.
Copy Propagation
Block
B5 in Fig. can be further improved by eliminating x using two new
transformations. One concerns assignments of the form f:=g called copy
statements, or copies for short. Had we gone into more detail in Example 10.2,
copies would have arisen much sooner, because the algorithm for eliminating
common subexpressions introduces them, as do several other algorithms. For
example, when the common subexpression in c:=d+e is eliminated in Fig., the
algorithm uses a new variable t to hold the value of d+e. Since control may
reach c:=d+e either after the assignment to a or after the assignment to b, it
would be incorrect to replace c:=d+e by
either c:=a or by c:=b.
The
idea behind the copy-propagation transformation is to use g for f, wherever
possible after the copy statement f:=g. For example, the assignment x:=t3 in
block B5 of Fig. is a copy. Copy propagation applied to B5 yields:
x:=t3
a[t2]:=t5
a[t4]:=t3
goto
B2

Copies introduced during common subexpression elimination.
This
may not appear to be an improvement, but as we shall see, it gives us the
opportunity to eliminate the assignment to x.
Dead-Code Eliminations
A
variable is live at a point in a program if its value can be used subsequently;
otherwise, it is dead at that point. A related idea is dead or useless code,
statements that compute values that never get used. While the programmer is
unlikely to introduce any dead code intentionally, it may appear as the result
of previous transformations. For example, we discussed the use of debug that is
set to true or false at various points in the program, and used in statements
like
If (debug) print
By
a data-flow analysis, it may be possible to deduce that each time the program
reaches this statement, the value of debug is false. Usually, it is because
there is one particular statement
Debug :=false
That
we can deduce to be the last assignment to debug prior to the test no matter
what sequence of branches the program actually takes. If copy propagation
replaces debug by false, then the print statement is dead because it cannot be
reached. We can eliminate both the test and printing from the o9bject code.
More generally, deducing at compile time that the value of an expression is a
co9nstant and using the constant instead is known as constant folding.
One
advantage of copy propagation is that it often turns the copy statement into
dead code. For example, copy propagation followed by dead-code elimination
removes the assignment to x and transforms 1.1 into
a
[t2 ] := t5
a
[t4] := t3
goto
B2
Loop Optimizations
We
now give a brief introduction to a very important place for optimizations,
namely loops, especially the inner loops where programs tend to spend the bulk
of their time. The running time of a program may be improved if we decrease the
number of instructions in an inner loop, even if we increase the amount of code
outside that loop. Three techniques are important for loop optimization: code
motion, which moves code outside a loop; induction-variable elimination, which
we apply to eliminate I and j from the inner loops B2 and B3 and, reduction in
strength, which replaces and expensive operation by a cheaper one, such as a
multiplication by an addition.
Code Motion
An
important modification that decreases the amount of code in a loop is code
motion. This transformation takes an expression that yields the same result
independent of the number of times a loop is executed ( a loop-invariant
computation) and places the expression before the loop. Note that the notion
before the loop assumes the existence of an entry for the loop. For example,
evaluation of limit-2 is a loop-invariant computation in the following
while-statement:
While
(i<= limit-2 )
Code
motion will result in the equivalent of
t=
limit-2;
while
(i<=t)
Induction Variables and
Reduction in Strength
While
code motion is not applicable to the
quicksort example we have been considering the other two transformations
are.Loops are usually processed inside out.For example consider the loop around
B3.
Note
that the values of j and t4 remain in lock-step;every time the value of j
decreases by 1 ,that of t4 decreases by 4 because 4*j is assigned to t4.Such
identifiers are called induction variables.
When
there are two or more induction
variables in a loop, iit may be
possible to get rid of all but one, by the process of induction-variable
elimination.For the inner loop around B3 in Fig. we cannot ger rid of either j
or t4 completely.; t4 is used in B3 and j in B4. However, we can illustrate
reduction in strength and illustrate a part of the process of induction-variable elimination.
Eventually j will be eliminated when the outer loop of B2 - B5 is considered.
Example:
As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig.
and t4 is not changed elsewhere in the inner loop around B3, it follows that
just after the statement j:=j-1 the relationship t4:= 4*j-4 must hold. We may
therefore replace the assignment t4:= 4*j by t4:= t4-4. The only problem is
that t4 does not have a value when we enter block B3 for the first time. Since
we must maintain the relationship t4=4*j on entry to the block B3, we place an
intializations\ of t4 at the end of the blcok where j itself is initialized, shown
by the dashed addt\ition to block B1 in second Fig.
The
replacement of a multiplication by a subtraction will speed up the object code
if multiplication takes more time than addition or subtraction, as is the case
on many machines.

Before
After
strength reduction applied to 4*j in block B3
____________________________________________________________________________________________________
Basic Blocks
.