All information below is preliminary and is changing
frequently. Updated 2/3/97.
Translator Phases
The translator's translation of Java bytecodes to native machine language
instructions may be divided into the following phases. This is only a logical
division to help in identifying data structures and abstractions; some of
the phases may run concurrently or interleaved.
Phases in green are optional and may be
done later.
- Process bytecodes
- Create control flow graph from the bytecodes
- Parse bytecodes into a canonical form
- Some verification may take place here as well,
if desired.
- Generate primitive intermediate
format
- Analyze Java virtual machine stack usage
- Detect subroutines and either expand or convert
to subordinate functions
- Perform additional verification while analyzing
stack usage, if desired.
- Translate bytecodes into primitive graph nodes; convert complex bytecodes
into trees of primitive nodes to allow optimization.
- Detect and annotate asynchronous exception points
- Detect and annotate volatile references
- Construct SSA dataflow graph
- Annotate dataflow graph with control and memory dependencies
- Optionally perform simple optimizations to reduce
the amount of work done by the rest of the translator
- Generate argument-moving code if using stack calling conventions
- Optimize
- Perform restricted dataflow optimizations for
primitive nodes that were parts of more complex bytecodes
- Optimize null pointer checks for field accesses
and method calls
- Optimize array bounds checks
- Optimize multiple requests for the size of the
same array
- Optimize checked casts
- Generate code
- Split basic blocks directed acyclic graphs (DAGs) into trees
- Use a BURS tree matcher to generate templates for machine instructions
- Strength-reduce machine instructions
- Check whether instruction patterns with more
than one output can apply (i.e. instructions that generate both a value
in a register and set the condition codes as well as instructions that
use autoincrement or autodecrement addressing modes) and substitute them
as appropriate. These instruction patterns cannot be represented in BURS
because they aren't trees.
- Schedule code
- Linearize the dataflow graphs inside basic blocks according to their
internal dependencies, balancing the following:
- Avoid generating code that causes pipeline stalls
- Try to generate code that uses execution units
in a balanced manner so that it executes well on a superscalar machine
- Try to generate code that uses as few temporary registers as possible
- Allocate registers
- Compute preferences for incoming arguments, the outgoing result, and
the arguments and results of any called functions
- Combine phi-nodes from dataflow graph
- Preallocate intra-basic-block temporaries
- Allocate local variables to registers using either a graph coloring
scheme or a priority register assignment scheme
- Generate spill code
- Minimize spill points
- Schedule spill code to avoid pipeline stalls
and execution unit conflicts
- On CISC machines try to combine loads and/or
stores of spilled values with instructions that reference them (i.e. make
use of memory-to-register arithmetic instructions, etc.)
- Postprocess
- Generate function header and epilogue
- Generate exception tables
- Generate debugging tables, if desired
- Copy code templates to the code's final location in the code cache,
filling in all fields
- Link generated code into the rest of the Java runtime
Translator Subsystems
Important translator abstractions include:
Abstract Containers
- Bit sets; incremental bit sets
- Red-black trees
- Algorithmic heaps
- Interval maps
- Directed graphs
Concrete Containers
- Register maps
- The processed bytecode format. This format presents the same information
as in the bytecodes in a slightly easier-to-use manner. Control flow edges
are explicit. Alternative variants of bytecode instructions (for example,
variants of bytecodes that read local variables) are reduced into one common
format.
- The menu of primitive node types
- The primitive graph format
- Templates for machine instructions
Algorithms
- Graph walkers
- Forward and reverse dataflow analyzers over the intermediate dataflow
graph formats
- Incremental and/or lazy dataflow analyzers
- BURS pattern matcher
Converters
- The bytecode processor
- The dataflow graph generator
- The optimizer
- The instruction generator
- The instruction scheduler
- The register allocator
- The postprocessor
Debugging
- Compiler data structure inspection and dumping interface
- Byte code and native code disassemblers
- Bytecode-to-native-code mapper
- Variable location mapper
Runtime Subsystems
Electrical Fire needs a few subsystems that are not part of the translator
proper but are needed to support it. These include:
- A code cache manager
- A synchronous exception handler
- An asynchronous exception handler. This one is complicated because
it needs to emulate the thread receiving the exception until the next synchronization
point.
- Profilers for making compilation and quality-of-compilation decisions