Previous Page                                                                           Home

Basic Blocks

____________________________________________________________________________________________________

 

 

 

 

CODE OPTIMIZATION

 

Criteria for Code-Improving Transformations

 

Simply stated, the best program transformations are those that yield the most benefit for the least effort. The transformations provided by an optimizing compiler should have several properties.

First, a transformation must preserve the meaning of programs. That is, an "optimization" must not change the output produced by a program for a given input, or cause an error, such as a division by zero, that was not present in the original version of the source program. The influence of this criterion pervades this chapter; at all times we take the "safe" approach of missing an opportunity to apply a transformation rather than risk changing what the program does.

 

Second, a transformation must, on the average, speed up programs by a measurable amount. Sometimes we are interested in reducing the space taken by the compiled code, although the size of code has less importance than it once had. Of course, not every transformation succeeds in improving every program, and occasionally an "optimization" may slow down a program slightly, as long as on the average it improves things.

 

Third, a transformation must be worth the effort. It does not make sense for a compiler writer to expend the intellectual effort to implement a code improving transformation and to have the compiler expend the additional time compiling source programs if this effort is not repaid when the target programs are executed. Certain local or "peephole" transformations of the kind are simple enough and beneficial enough to be included in any compiler.

 

Some transformations can only be applied after detailed, often time-consuming, analysis of the source program, so there is little point in applying them to programs that will be run only a few times. For example, a fast, nonoptimizing, compiler is likely to be more helpful during debugging or for "student jobs” that will be run successfully a few times and thrown away. Only when the program in question takes up a significant  fraction of the machine's cycles does improved code quality justify the time spent running an optimizing compiler on the program.

 

 

Before we get into optimization as such we need to familiarize ourselves with a few things

 

ALGEBRAIC TRANSFORMATION

                    

                     Countless algebraic transformations can be used to change the set of expressions computed by a basic block into an algebraically equivalent set. The useful ones are those that simplify expressions or replace expensive operations by cheaper ones. For example, statements

such as

 

     x := x +0

Or

  

      x := x*1

 

can be eliminated from a basic block without changing the set of expressions it computes. The exponentiation operator in the statements

 

  x := y ** 2

 

usually requires a function call to implement. Using an algebraic transformation, this statement can be replaced by cheaper, but equivalent statement

 

     x := y*y

 

FLOW GRAPHS

                    

                     We can add the flow-of –control information to the set of basic blocks making up a program by constructing a directed graph called a flow graph. The nodes of the flow graph are the basic blocks. One node is distinguished as initial; it is the block whose leader is the first statement. There is a directed edge from block B1 to block B2can be immediately follow B1in some execution sequence; that is, if

 

 

  1. there is a conditional or unconditional jump from the last statement of B2, or
  2. B2 immediately follow B1in the order of the program, and B1 does not end in the unconditional jump

B1 is a predecessor of B2, and B2is a successor of B1.

 

 

Example 4:The flow graph of the program of fig. 7 is shown in fig. 9, B1 is the initial node.

 

 

Prod := 0

I:=1

B1
  

t1 :=  4 * i

t2 := a [ t1 ]

t3 := 4 * i

t4 := b [ t3 ]

t5 := t2 * t4

t6:= prod + t5

t7:=i+1

i := t7

if I <= 20 goto B2

B2

 

 

 

 

 

 

 

 

 

 

                     Fig .9 flow graph for program

 

REPRESENTATION OF BASIC BLOCKS

 

                     Basic Blocks are represented by variety of data structures. For example, after partitioning the three address statements by Algorithm 1, each basic block  can be represented by a record consisting of a count of number of quadruples in the block, followed by a pointer to the leader of the block, and by the list of predecessors and successors of the block. For example the block B2 running from the statement (3) through (12) in the intermediate code of figure 2 were moved elsewhere in the quadruples array or were shrunk, the (3) in if i<=20 goto(3)

would have to be changed.

 

 

LOOPS

                     Loop is a collection of nodes in a flow graph such that

                

1. All nodes in the collection are strongly connected; from any node in the loop to any other, there is path of length one or more, wholly within the loop, and

 

2. The collection of nodes has a unique entry, a node in the loop such that is, a node in the loop such that the only way to reach a node of the loop from a node outside the loop is to first go through the entry.

          A loop that contains no other loops is called an inner loop.     

 

 

 

 

      

 

 

 PEEPHOLE OPTIMIZATION

             

                      A statement-by-statement code-generations strategy often produce target code that contains redundant instructions and suboptimal  constructs  .The quality of such target  code can be improved by  applying “optimizing” transformations to the target program.

                   

                          A simple but effective technique for improving the target code is peephole optimization, a method for trying to improving the performance of the target program by examining a short sequence of target instructions (called the peephole) and replacing these instructions by a shorter or faster sequence, whenever possible.

                   

                                The peephole is a small, moving window on the target program. The code in the peephole need not contiguous, although some implementations do require this. We shall give the following examples of program transformations that are characteristic of peephole optimizations:

 

        • Redundant-instructions elimination

        • Flow-of-control optimizations

       • Algebraic simplifications

        • Use of machine idioms

 

 

REDUNTANT LOADS AND STORES

 

If we see the instructions sequence

 

(1)   (1)   MOV R0,a

(2)   (2)   MOV a,R0

 

-we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of a is already in register R0.If (2) had a label we could not be sure that (1) was always executed immediately before (2) and so we could not remove (2).

 

UNREACHABLE CODE

                     Another opportunity for peephole optimizations is the removal of unreachable instructions. An unlabeled instruction immediately following an unconditional jump may be removed. This operation can be repeated to eliminate a sequence of instructions. For example, for debugging purposes, a large program may have within it certain segments that are executed only if a variable debug is 1.In C, the source code might look like:

              

               #define debug 0

               ….

If ( debug  ) {

   Print debugging information     

}

 

In the intermediate representations the if-statement may be translated as:

      

                If debug =1 goto L2

                Goto L2

             L1: print debugging information

             L2:                                                  …………………………(a)

 

 

One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what the value of debug, (a) can be replaced by:

 

                If debug ≠1 goto L2

                Print debugging information

            L2:                                                ……………………………(b)

 As the argument of the statement of (b) evaluates to a constant true it can be replaced by              

             

 

                If debug ≠0 goto L2

                Print debugging information

            L2:                                               ……………………………(c)

            

 As the argument of the first statement of (c) evaluates to a constant true, it can be replaced by goto L2.  Then  all the statement that print debugging aids are manifestly unreachable and can be eliminated one at a time.

 

FLOW-OF-CONTROL OPTIMIZATIONS

       

                     The unnecessary jumps can be eliminated in either the intermediate code or the target code by the following types of peephole optimizations. We can replace the jump sequence

                      

                   goto L2

                    ….

            L1 : gotoL2

by the sequence

 

                  goto L2

                  ….

           L1 : goto L2

If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto L2 provided it is preceded by an unconditional jump .Similarly, the sequence

 

                 if a < b goto L1

                 ….

           L1 : goto L2

can be replaced by

 

               if a < b  goto  L2

               ….

          L1 : goto L2  

Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional goto. Then the sequence

              

           goto L1

                    ……..

               L1:if a<b goto L2

               L3:                     …………………………………..(1)

 

 

may be replaced by

        

            if a<b goto L2

                   goto L3

                   …….

             L3:                        ………………………………….(2)

 

 

 

While the number of instructions in (1) and (2) is the same, we sometimes skip the unconditional jump in (2), but never in (1).Thus (2) is superior to (1) in execution time

 

 

ALGEBRAIC SIMPLIFICATION

                    

                     There is no end to the amount of algebraic simplification that can be attempted through peephole optimization. Only a few algebraic identities occur frequently enough that it is worth considering implementing them .For example, statements such as

      x := x+0

    Or

     x := x * 1

are often produced by straightforward intermediate code-generation algorithms, and they can be eliminated easily through peephole optimization.

 

ELIMINATION OF COMMON SUBEXPRESSIONS

 

Common sub expressions need  not be computed over and over again. Instead they can be computed once and kept in store from where its referenced when encountered again – of course providing the variable values in the expression still remain constant.

 

ELIMINATION OF DEAD CODE

 

Its possible that a large amount of dead(useless) code may exist in the program. This might be especially caused when introducing variables and procedures as part of construction or error-correction of a program – once declared and defined, one forgets to remove them in case they serve no purpose. Eliminating these will definitely optimize the code

 

REDUCTION IN STRENGTH

 

                     Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper than others and can often be used as special cases of more expensive operators. For example, x² is invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed-point multiplication or division by a power of two is cheaper to implement as a shift. Floating-point division by a constant can be implemented as multiplication by a constant, which may be cheaper.

 

 

USE OF MACHINE IDIOMS

       

                     The target machine may have hardware instructions to implement certain specific operations efficiently. Detecting situations that permit the use of these instructions can reduce execution time significantly. For example, some machines have auto-increment and

auto-decrement addressing modes. These add or subtract one from an operand before or after using its value. The use of these modes greatly improves the quality of code when pushing or popping a stack, as in parameter passing. These modes can also be used in code for statements like i : =i+1.

 

 

 

 

 

 

 

 

 

Getting Better Performance

 

Dramatic improvements in the running time of a program-such as cutting the running time form a few hours to a few seconds-are usually obtained by improving the program at all levels, from the source level to the target level, as suggested by fig. At each level, the available options fall between the two extremes of finding a better algorithm and of implementing a given algorithm so that fewer operations are performed.

 

Algorithmic transformations occasionally produce spectacular improvements in running time. For example, Bentley relates that the running time of a program for sorting N elements dropped from 2.02N^2 microseconds to 12Nlog2N microseconds then a carefully coded "insertion sort" was replaced by "quicksort".

 

THE PRINCIPAL SOURCES OF OPTIMIZATION

 

Here we introduce some of the most useful code-improving transformations. Techniques for implementing these transformations are presented in subsequent sections. A transformation of a program is called local if it can be performed by looking only at the statements in a bas9ic block; otherwise, it is called global. Many transformations can be performed at both the local and global levels. Local transformations are usually performed first.

 

Function-Preserving Transformations

 

There are a number of ways in which a compiler can improve a program without changing the function it computes. Common subexpression elimination, copy propagation, dead-code elimination, and constant folding are common examples of such function-preserving transformations. The other transformations come up primarily when global optimizations are performed.

Frequently, a program will include several calculations of the same value, such as an offset in an array. Some of these duplicate calculations cannot be avoided by the programmer because they lie below the level of detail accessible within the source language. For example, block B5 shown in fig recalculates 4*i and 4*j.

 

 

                                                          Local common subexpression elimination

 

 

 

Common Subexpressions

 

 

An occurrence of an expression E is called a common subexpression if E was previously computed, and the values of variables in E have not changed since the previous computation. We can avoid recomputing the expression if we can use the previously computed value. For example, the assignments to t7 and t10 have the common subexpressions 4*I and 4*j, respectively, on the right side in Fig. They have been eliminated in Fig by using t6 instead of t7 and t8 instead of t10. This change is what would result if we reconstructed the intermediate code from the dag for the basic block.

Example: Fig shows the result of eliminating both global and local common subexpressions from blocks B5 and B6 in the flow graph of Fig. We first discuss the transformation of B5 and then mention some subtleties involving arrays.

 

                                          B5 and B6 after common subexpression elimination

 

 

After local common subexpressions are eliminated B5 still evaluates 4*i and 4*j, as shown in the earlier fig. Both are common subexpressions; in particular, the three statements

t8:= 4*j; t9:= a[t[8]; a[t8]:=x

in B5 can be replaced by

t9:= a[t4]; a[t4:= x using t4 computed in block B3. In Fig. observe that as control passes from the evaluation of 4*j in B3 to B5, there is no change in j, so t4 can be used if 4*j is needed.

Another common subexpression comes to light in B5 after t4 replaces t8. The new expression a[t4] corresponds to the value of a[j] at the source level. Not only does j retain its value as control leaves b3 and then enters B5, but a[j], a vlue computed into a temporary t5, does too becaoude there are no assignments to elements of the array a in the interim. The statement

t9:= a[t4]; a[t6]:= t9

in B5 can therefore be replaced by

a[t6]:= t5

The expression in blocks B1 and B6 is not considered a common subexpression although t1 can be used in both places.After control leaves B1 and before it reaches B6,it can go through B5,where there are assignments to a.Hence, a[t1] may not have the same  value on reaching B6 as it did in leaving B1, and it is not safe to treat a[t1] as a common subexpression.

 

Copy Propagation

 

Block B5 in Fig. can be further improved by eliminating x using two new transformations. One concerns assignments of the form f:=g called copy statements, or copies for short. Had we gone into more detail in Example 10.2, copies would have arisen much sooner, because the algorithm for eliminating common subexpressions introduces them, as do several other algorithms. For example, when the common subexpression in c:=d+e is eliminated in Fig., the algorithm uses a new variable t to hold the value of d+e. Since control may reach c:=d+e either after the assignment to a or after the assignment to b, it would be incorrect  to replace c:=d+e by either c:=a or by c:=b.

 

The idea behind the copy-propagation transformation is to use g for f, wherever possible after the copy statement f:=g. For example, the assignment x:=t3 in block B5 of Fig. is a copy. Copy propagation applied to B5 yields:

x:=t3

a[t2]:=t5

a[t4]:=t3

goto B2

 

                                                   Copies introduced during common subexpression elimination.

 

 

This may not appear to be an improvement, but as we shall see, it gives us the opportunity to eliminate the assignment to x.

 

 

 

 

Dead-Code Eliminations

 

A variable is live at a point in a program if its value can be used subsequently; otherwise, it is dead at that point. A related idea is dead or useless code, statements that compute values that never get used. While the programmer is unlikely to introduce any dead code intentionally, it may appear as the result of previous transformations. For example, we discussed the use of debug that is set to true or false at various points in the program, and used in statements like

            If (debug) print …

By a data-flow analysis, it may be possible to deduce that each time the program reaches this statement, the value of debug is false. Usually, it is because there is one particular statement

            Debug :=false

That we can deduce to be the last assignment to debug prior to the test no matter what sequence of branches the program actually takes. If copy propagation replaces debug by false, then the print statement is dead because it cannot be reached. We can eliminate both the test and printing from the o9bject code. More generally, deducing at compile time that the value of an expression is a co9nstant and using the constant instead is known as constant folding.

 

One advantage of copy propagation is that it often turns the copy statement into dead code. For example, copy propagation followed by dead-code elimination removes the assignment to x and transforms 1.1 into

 

a [t2 ] := t5

a [t4] := t3

goto B2

 

 

Loop Optimizations

 

We now give a brief introduction to a very important place for optimizations, namely loops, especially the inner loops where programs tend to spend the bulk of their time. The running time of a program may be improved if we decrease the number of instructions in an inner loop, even if we increase the amount of code outside that loop. Three techniques are important for loop optimization: code motion, which moves code outside a loop; induction-variable elimination, which we apply to eliminate I and j from the inner loops B2 and B3 and, reduction in strength, which replaces and expensive operation by a cheaper one, such as a multiplication by an addition.

 

Code Motion

 

An important modification that decreases the amount of code in a loop is code motion. This transformation takes an expression that yields the same result independent of the number of times a loop is executed ( a loop-invariant computation) and places the expression before the loop. Note that the notion “before the loop” assumes the existence of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant computation in the following while-statement:

 

While (i<= limit-2 )

Code motion will result in the equivalent of

 

t= limit-2;

while (i<=t)

 

 

Induction Variables and Reduction in Strength

 

While code motion is not  applicable to the quicksort example we have been considering the other two transformations are.Loops are usually processed inside out.For example consider the loop around B3.

Note that the values of j and t4 remain in lock-step;every time the value of j decreases by 1 ,that of t4 decreases by 4 because 4*j is assigned to t4.Such identifiers are called induction variables.

When there are two or more induction  variables in a loop, iit may be  possible to get rid of all but one, by the process of induction-variable elimination.For the inner loop around B3 in Fig. we cannot ger rid of either j or t4 completely.; t4 is used in B3 and j in B4. However, we can illustrate reduction in strength and illustrate a part of the process  of induction-variable elimination. Eventually j will be eliminated when the outer loop of B2 - B5 is considered.

 

Example: As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is not changed elsewhere in the inner loop around B3, it follows that just after the statement j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:= 4*j by t4:= t4-4. The only problem is that t4 does not have a value when we enter block B3 for the first time. Since we must maintain the relationship t4=4*j on entry to the block B3, we place an intializations\ of t4 at the end of the blcok where j itself is initialized, shown by the dashed addt\ition to block B1 in second Fig.

The replacement of a multiplication by a subtraction will speed up the object code if multiplication takes more time than addition or subtraction, as is the case on many machines.

 

 

 

 

 

 

 

 

 

                      Before                                                                                             After

                                                       

                                               strength reduction applied to 4*j in block B3   

 

 

 

____________________________________________________________________________________________________

 

 

 

 

Previous Page                                                                           Home

Basic Blocks

 

 

 

.