Electrical Fire Primitive Graph Control Scheduling

Electrical Fire

Design

Primitive Graph Format

Control Layer

Control Scheduling

Example
Proof

How do we convert a control graph into a linear instruction stream with branches? On this page we present an algorithm that should produce good code that tries to minimize the following:

The number of conditional and unconditional branches encountered dynamically when the function is executed,
The size of the generated function, and
The number of cache misses -- to help with cache performance code executed together should be located together.

Introduction

In general, without additional information such as profiles we won't know how often individual control nodes are executed, so we can't produce an optimal dynamic schedule. Consider a simple if-then-else control graph below. There are 4! = 24 ways to order the four control nodes A, B, C, and D. Many of these result in several unnecessary branches; Figure 1 lists four reasonable orderings.

Order 1

Order 2

Order 3

Order 4

L0: A bc L1 B bra L2 L1: C L2: D return

L0: A bnc L1 C bra L2 L1: B L2: D return

L0: A bc L1 B L2: D return L1: C bra L2

L0: A bnc L1 C L2: D return L1: B bra L2

Figure 1

	Order 1	Order 2	Order 3	Order 4
L0: A bc L1 B bra L2 L1: C L2: D return	L0: A bnc L1 C bra L2 L1: B L2: D return	L0: A bc L1 B L2: D return L1: C bra L2	L0: A bnc L1 C L2: D return L1: B bra L2
Figure 1

Here bc branches if the condition is true, bnc branches if the condition is false, and bra branches unconditionally. For simplicity we assume that the execution cost of any of the three kinds of branches is the same, regardless of whether a conditional branch was taken or not.

The code in orders 1 and 4 will execute two branches if it takes the path through B and one branch if it takes the path through C. The code in orders 2 and 3 will execute one branch if it takes the path through B and two branches if it takes the path through C. Without knowing the relative frequency of execution of B and C we don't know which order to prefer.

Nevertheless, there are assumptions we can make that are usually true in practice. We will assume that a loop is executed several times on the average so that the graph in Figure 2 should be scheduled using order 5 instead of order 6 because order 5 has in one fewer branch in each iteration of the loop even though order 6 has one fewer branch when the loop is executed zero times.

Order 5

Order 6

L0: A bra L2 L1: C L2: B bc L1 D return

L0: A L1: B bnc L2 C bra L1 L2: D return

Figure 2

	Order 5	Order 6
L0: A bra L2 L1: C L2: B bc L1 D return	L0: A L1: B bnc L2 C bra L1 L2: D return
Figure 2

The Scheduling Algorithm

In brief, we schedule control nodes according to the following guidelines:

We place all control nodes that can be reached without throwing an exception before any control nodes that can only be reached via an exception handler. We apply this rule recursively to handlers of exceptions that can be thrown from exception handlers, etc.
We schedule loops from the innermost to the outermost. We try to keep the nodes in each loop contiguous.
We schedule acyclic code trying to minimize the number of unconditional branches.

The scheduling process is described in more detail below. The stages are illustrated in green text using the example control graph from the control edge details page.

1. Discover loops

In this stage we discover the loop hierarchy tree. It's likely that this information will already be available from previous stages of the compiler. The results of this stage are illustrated in Figure 3.

Control Graph Loop Hierarchy Tree

Figure 3


Control Graph	Loop Hierarchy Tree
Figure 3

2. Determine main body nodes

Since exception handlers are executed less frequently than the main body of the function, we'd like to schedule the main body without interference from control flow paths to or from exception handlers. We determine which nodes are in the main body of the function by marking the begin node and then taking the transitive closure of marking the targets of all forward and backward (but not return or exception) control edges whose sources are marked.

For the next few stages, we will call the main body nodes working nodes. All nodes outside the main body are non-working nodes.

The result of doing this process in our example is illustrated in Figure 4, in which all non-working nodes are greyed out.

Control Graph Loop Hierarchy Tree

Figure 4


Control Graph	Loop Hierarchy Tree
Figure 4

3. Schedule from the inside out

At this point we'll walk the loop hierarchy tree and schedule the leaf loops using the approach outlined below. After scheduling all subloops of a loop we can then schedule the loop. We ignore loops whose headers are non-working nodes -- such loops must consiste entirely of non-working nodes. Having scheduled all the loops in the main body of the function, we can then schedule the main body itself using the approach below.

In our example, we might schedule the loops in the following order: loop 4, loop 12, loop 16, loop 9, loop 2, loop 25, and then the main body (starts at node 1).

Let us now take a look at how to schedule one loop or the main body. We'll call the portion that we're scheduling (either a loop or the entire main body) a region REG.

We will produce a schedule SREG for region REG. SREG is a set of one or more cliques, where each clique is an ordered list of working nodes in REG. The order in which the cliques should be scheduled is not determined as long as the cliques are not interleaved.

3a. Determining the nodes in a region

Given a region REG:

Let Rlocal be the set of nodes in REG that are not contained in any subloop of REG.

Let Rsubheaders be the set of headers of immediate subloops of REG.

Let Rcombined be the union of Rlocal and Rsubheaders.

We will define R to be the intersection of Rcombined and the set of all working nodes. We will also define RR to be the intersection of REG and the set of all working nodes.

In our example, if we're scheduling loop 2, REG is loop 2 and the above sets are as follows:

REG = {2, 3, ..., 24, 26, 27, 28, 29}

Rlocal = {2, 3, 8, 22, 29}

Rsubheaders = {4, 9, 23, 27}

Rcombined = {2, 3, 4, 8, 9, 22, 23, 27, 29}

R = {2, 3, 4, 8, 9}

RR = {2, 3, ..., 21}

3b. Initializing long edge flags

With each edge E: A->B for which at least one of A or B is in R, we associate a flag long(E) that states whether the edge E is a short or long edge. Informally, an edge is long if there is no reason to try to place node B immediately after node A.

We initialize the flag long(E) for each edge E: A->B for which at least one of A or B is in R as follows:

If A is outside RR, then E is a long edge (this will only happen if B is the region header H or if A is a non-working node).
If E is any exception or return edge, then E is a long edge (this will only happen when scheduling exception handlers after the function's main body has already been scheduled).
If E is any backward edge, then E is a long edge.
If A is a switch node, then E is a long edge (on our target architectures there is no reason to keep a switch statement contiguous with any of its successors).
If A is a node in some subloop inside REG, then let L be the largest subloop smaller than REG that contains A. Since we schedule loops from the inside out, we have already computed the schedule SL of loop L. If A is not the last node of some clique in SL, then E is a long edge.
If B is a node in some subloop inside REG, then let L be the largest subloop smaller than REG that contains B. If B is not the first node in some clique in the schedule SL of loop L, then E is a long edge. (Note that both A and B cannot be in L, because if they were then A=B=the header of L, so E is a backward edge, and we already covered that case.)
Otherwise, E is a short edge.

3c. Defining ready nodes

We will call a node N ready if each of its incoming edges E: A->N is long.

We will call a node N ready with respect to node P if, for each of N's incoming edges E: A->N, either E is long or A=P.

3d. Linear scheduling

Let H be region REG's header (i.e. the function's begin node or the loop's header). We always make H be the first node in the first clique C1 of our region's schedule SREG.

We initialize a set W to be the set of nodes in the region remaining to be scheduled. Initially W contains all nodes in R except for H:: W := R - {H}
We initialize a stack T of cliques of subloops waiting to be scheduled. Initially the stack T is empty:: T := []

Let P be the last node scheduled; initially P is H. As long either W or T isn't empty, we pick out the next node N to schedule as follows:

Alternative 1. If there exists a node N in W such that there exists at least one short edge from P to N and node N is ready with respect to P, then we pick any such node N to be the next node.
Alternative 2. Otherwise, if the stack T isn't empty, we pop a clique Ck from T, make it be the next clique of SREG, make every edge from P to any node A remaining in W into a long edge, set P to be the last node of Ck, and continue with step 1 above.
Alternative 3. Otherwise, there must exist a node N in W such that node N is ready with respect to P. We pick any such node N to be the next node.

After we have picked node N, we remove N from the set W and make every edge from P to any node A remaining in W into a long edge. Then:

If N is not the header of a subloop:
1. In either case, we let P := N.
If N is the header of a subloop L:
1. Let Ck be the clique of L's schedule SL that contains node N.
  1. If N is the first node of Ck and there exists some edge from P to N then we append the entire clique Ck to the current clique in our region's schedule SREG;
  2. Otherwise, we start a new clique Cn in SREG and make Ck be Cn's beginning.
2. In either case, we let P be the last node of Ck. Also, if SL contained more than one clique, we push all cliques in SL except Ck onto T.

Having scheduled a node or clique, we continue with the next node or clique.

Correctness. Why are we guaranteed that (1) the algorithm will terminate and (2) not get stuck not able to choose any of the three alternatives? See the proof.

3e. Loop scheduling

Step 3d will always produce a linear schedule SREG for a region REG, with the header H being the first node of the first clique C1. As Figure 2 illustrates, this is often not the best thing to do. We fix it as follows:

Let Ck be the last clique in SREG such that the last node N of Ck satisfies the following properties:

There exists an edge from N to H.
There are no short edges E: N->A for which A is outside R.
N is not a switch node.
N is not H.

If such a clique Ck exists, we remove the last node N of Ck and prepend it to C1. Now as long as Ck ends with some node N' that satisfies all of the properties below, we remove N' from the end of Ck and prepend it to C1:

There are no short edges E: N'->A for which A is outside R.
N' is not a switch node.
N' is not H.

We stop either when Ck is empty or we stop at some node N'. If Ck is empty, we remove it from the list of cliques in SREG. At this point our schedule SREG is complete.

4. Schedule exception handlers

After we have scheduled the main body using the algorithm from section 3, we can schedule the exception handlers for the function. Let {H1, H2, ..., Hn} be the catch nodes all of whose incoming edges come from nodes that have already been scheduled (either in the main body of the function or in other exception handlers that we scheduled). If there are no such catch nodes and yet we haven't scheduled every node in our function then we must have an exception handler that handles exceptions from itself. When this happens we just make {H1, H2, ..., Hn} be the set of all remaining unscheduled catch nodes.

Now we create a phantom node H that has one long edges pointing to each of H1, H2, ..., Hn. We make our set of working nodes be all nodes reachable from H that have not been scheduled yet. We apply the algorithm from section 3 to H (and the subloops of its region) to schedule our set of exception handlers. If there are still unscheduled nodes, we go back to the beginning of section 4.

Example

See the next page for the results of applying the scheduling algorithm to our Figure 4 example.