Synthesis of digital systems


Most of this is remnants of notes from 2021 some of which were found in 2024…

High level synthesis:

Input:

Output:

Example: The behavioural model

y <= (A+B) * C - (D+E)/F;

which is implictly

y <= [(A+B) * C] - [(D+E)/F];

and becomes

Clock0:

Clock1:

Clock2:

Scheduling

|   │   |   │     ┆      |   │   |   │            
|   │   |   │     ┆     ┄|┄┄┄│┄┄┄│┄┄┄│┄┄ Clk0     
└─┬─┘   └─┬─┘     ┆      └─┬─┘   └─┬─┘                 
  +       +       ┆        +       +                 
  |       │       ┆     ┄┄┄|┄┄┄┄┄┄┄│┄┄┄┄ Clk1        
  └───┬───┘       ┆        └───┬───┘            
      +           ┆            +                 
      │                 ┄┄┄┄┄┄┄|┄┄┄┄┄┄┄┄ Clk2    

where the '+' denotes operations.

Function unit binding:

Register allocation:

FSM encoding

 Current state
+--------------+
|              |        
|              ^        
|              |        
|   +------------------------+
|   |    State register      |
|   +------------------------+
|              |
v              ^   Next state
|              |
|   +------------------------+
|   | Next state and output  |
|   |    output logic        |------> Output 
|   +------------------------+
|           |        |
|           ^        ^
|           |        |
+-----------+        |
                    Input

Mealy machine notation: condition/action

Synthesis

Detailed implementation

Example:

Netlist

Multiple abstraction levels

Comparison with compilers

Both need to be clever enough to choose the 'best' path.

Metrics to measure 'goodness'.

Measuring performance

Metrics to measure efficiancy

Perf => Energy
Energy => Power
Power => Heating up

These are orthogonal metrics. We need to make a choice. Trade-offs for realistic designs.

Tools should be able to deal with large designs

Abstract models

Target independent optimizations

Data-Flow transformations

Tree height reduction

With respect to the parse tree. Applies arithmetic properties to the arithmetic expression tree. Assume no hardware restriction now. Expression split into two-operand expressions. => Exploits parallelism better.

Eg:

x = a + b + c + d
x = a + b
x = x + c
x = x + d

But what if later when we take hardware restrictions into account? What if there was only one adder? Still this optimizations is good because it is better in most hardware configurations??

Constant and variable propagation (ie, folding)

Pre-compute values statically. Detect copies of the variable/constant. The input to this form of optimization itself may have been the result of some other optimizations.

Eg:

a = 0
b = a
c = 10 * b
d = a * b + c + 1

becomes

a = 0
~b = a~
c = 10 * a
d = a * a + c + 1

becomes

a = 0
~b = 0~
c = 0
d = 1

Common subexpression elimination

Common subexpression patterns show up as isomorphic (ie, similar in form) patterns in parse trees (especially when reduced to 3 address code) Operator commutativity can be exploited?? No need to compute the subexpression over and over again.

Eg:

a = x + y
b = a + 2
c = (x + y) * (a + 1)

becomes

a = x + y
b = a + 2
c = a * (a + 1)

Operator strength reduction

Replace time consuming operations with faster operations.

Eg:

b = a * 2

could be made

b = a << 1

because shift register cheaper and faster than building up a multiplier.

This is particularly useful when the only kind of multiplication is by a power of 2. Then we don't need a multiplier and can be okay with a shift register instead.

But what if there's an overflow on shifting??

Code motion

'Move' code to make it more efficient. Applies to loop invariants.

for(int i=0; i<a*b; ++i) {
    ...
}

Provided `a` and `b` doesn't change within the loop, this could become

t = a * b;
for(int i=0; i<t; ++i) {
    ...
}

Control flow based transformations

Model expansion

'Flatten' models (submodules) that are called only once. ie, remove the hierarchy. This would expose optimization opportunities across the model's boundaries. ie, larger scope for optimizations.

Conditionals expansion

Control flow is converted to a data flow. Hardware is inherently parallel (unlike in the case of software). This could in turn expose more opportunities for optimization.

y = ab
if (a) {
    x = b+d
} else {
    x = bd
}

could be made into

y = ab

which could in turn become

y = ab
x = ab + ad + a'bd

to become

y = ab
x = y + ad + a'bd  // -> x = y + d(a + a'b) -> x = y + d(a' + d)

Loop unrolling (Loop expansion)

Used in software world for Software pipelining.

Dbts

Abbreviations

Misc

CHERI

Capability Enhanced Enhanced RISC Instructions (CHERI)

MUX vs serializers