Most of this is remnants of notes from 2021 some of which were found in 2024…
Input:
Output:
Example: The behavioural model
y <= (A+B) * C - (D+E)/F;
which is implictly
y <= [(A+B) * C] - [(D+E)/F];
and becomes
Clock0:
Clock1:
Clock2:
| │ | │ ┆ | │ | │
| │ | │ ┆ ┄|┄┄┄│┄┄┄│┄┄┄│┄┄ Clk0
└─┬─┘ └─┬─┘ ┆ └─┬─┘ └─┬─┘
+ + ┆ + +
| │ ┆ ┄┄┄|┄┄┄┄┄┄┄│┄┄┄┄ Clk1
└───┬───┘ ┆ └───┬───┘
+ ┆ +
│ ┄┄┄┄┄┄┄|┄┄┄┄┄┄┄┄ Clk2
where the '+' denotes operations.
Function unit binding:
Register allocation:
Current state
+--------------+
| |
| ^
| |
| +------------------------+
| | State register |
| +------------------------+
| |
v ^ Next state
| |
| +------------------------+
| | Next state and output |
| | output logic |------> Output
| +------------------------+
| | |
| ^ ^
| | |
+-----------+ |
Input
Mealy machine notation: condition/action
Example:
In compilers, there may be a strict order in which the instructions are to be completed.
But in digital circuits, parallelism is there. Parallel activity of gates and components.
Instructions in compilers.
Gates and modules in sods.
Both need to be clever enough to choose the 'best' path.
Metrics to measure 'goodness'.
—
Constraints in chip on silicon wafer (die)
Processor is just one kind of a chip.
What is the fastest chip that can be made? Keeping in mind all limitations?
Speed with which wires transmit signals?
Crosstalk btw wires (capacitance?)
There are physical limits on how fast a chip can operate.
Perf => Energy
Energy => Power
Power => Heating up
These are orthogonal metrics. We need to make a choice. Trade-offs for realistic designs.
Tools should be able to deal with large designs
State transition diagrams
DFG (Data Flow Graph)
CDFG (Control/Data Flow Graphs)
Sequence graphs
Model call: Calling a module in a sequence graph
Links and operations
With respect to the parse tree. Applies arithmetic properties to the arithmetic expression tree. Assume no hardware restriction now. Expression split into two-operand expressions. => Exploits parallelism better.
Eg:
x = a + b + c + d
x = a + b
x = x + c
x = x + d
But what if later when we take hardware restrictions into account? What if there was only one adder? Still this optimizations is good because it is better in most hardware configurations??
Pre-compute values statically. Detect copies of the variable/constant. The input to this form of optimization itself may have been the result of some other optimizations.
Eg:
a = 0
b = a
c = 10 * b
d = a * b + c + 1
becomes
a = 0
~b = a~
c = 10 * a
d = a * a + c + 1
becomes
a = 0
~b = 0~
c = 0
d = 1
Common subexpression patterns show up as isomorphic (ie, similar in form) patterns in parse trees (especially when reduced to 3 address code) Operator commutativity can be exploited?? No need to compute the subexpression over and over again.
Eg:
a = x + y
b = a + 2
c = (x + y) * (a + 1)
becomes
a = x + y
b = a + 2
c = a * (a + 1)
Replace time consuming operations with faster operations.
Eg:
b = a * 2
could be made
b = a << 1
because shift register cheaper and faster than building up a multiplier.
This is particularly useful when the only kind of multiplication is by a power of 2. Then we don't need a multiplier and can be okay with a shift register instead.
But what if there's an overflow on shifting??
'Move' code to make it more efficient. Applies to loop invariants.
for(int i=0; i<a*b; ++i) {
...
}
Provided `a` and `b` doesn't change within the loop, this could become
t = a * b;
for(int i=0; i<t; ++i) {
...
}
'Flatten' models (submodules) that are called only once. ie, remove the hierarchy. This would expose optimization opportunities across the model's boundaries. ie, larger scope for optimizations.
Control flow is converted to a data flow. Hardware is inherently parallel (unlike in the case of software). This could in turn expose more opportunities for optimization.
y = ab
if (a) {
x = b+d
} else {
x = bd
}
could be made into
y = ab
which could in turn become
y = ab
x = ab + ad + a'bd
to become
y = ab
x = y + ad + a'bd // -> x = y + d(a + a'b) -> x = y + d(a' + d)
Used in software world for Software pipelining.
Capability Enhanced Enhanced RISC Instructions (CHERI)