BOA: Branches and Binary Operators
Next, lets add
- Branches (
if
-expressions) - Binary Operators (
+
,-
, etc.)
In the process of doing so, we will learn about
- Intermediate Forms
- Normalization
Branches
Lets start first with branches (conditionals).
We will stick to our recipe of:
- Build intuition with examples,
- Model problem with types,
- Implement with type-transforming-functions,
- Validate with tests.
Examples
First, lets look at some examples of what we mean by branches.
- For now, lets treat
0
as “false” and non-zero as “true”
Example: If1
if 10:
22
else:
0) sub1(
- Since
10
is not0
we evaluate the “then” case to get22
Example: If2
if sub(1):
22
else:
0) sub1(
- Since
sub(1)
is0
we evaluate the “else” case to get-1
QUIZ: If3
if-else
is also an expression so we can nest them:
What should the following evaluate to?
let x = if sub(1):
22
else:
0)
sub1(in
if x:
add1(x)else:
999
- A.
999
- B.
0
- C.
1
- D.
1000
- E.
-1
Control Flow in Assembly
To compile branches, we will use labels, comparisons and jumps
Labels
our_code_label:
...
Labels are “landmarks”
from which execution (control-flow) can be started, or
to which it can be diverted
Comparisons
cmp a1, a2
Perform a (numeric) comparison between the values
a1
anda2
, andStore the result in a special processor flag
Jumps
jmp LABEL # jump unconditionally (i.e. always)
je LABEL # jump if previous comparison result was EQUAL
jne LABEL # jump if previous comparison result was NOT-EQUAL
Use the result of the flag set by the most recent cmp
- To continue execution from the given
LABEL
QUIZ
Which of the following is a valid x86 encoding of
if 10:
22
else
33
Strategy
To compile an expression of the form
if eCond:
eThenelse:
eElse
We will:
- Compile
eCond
- Compare the result (in
rax
) against0
- Jump if the result is zero to a special
"IfFalse"
label- At which we will evaluate
eElse
, - Ending with a special
"IfExit"
label.
- At which we will evaluate
- (Otherwise) continue to evaluate
eTrue
- And then jump (unconditionally) to the
"IfExit"
label.
- And then jump (unconditionally) to the
Example: If-Expressions to Asm
Lets see how our strategy works by example:
Example: if1
Example: if2
Example: if3
Oops, cannot reuse labels across if-expressions!
- Can’t use same label in two places (invalid assembly)
Oops, need distinct labels for each branch!
- Require distinct tags for each
if-else
expression
Types: Source
Lets modify the Source Expression to add if-else
expressions
data Expr a
= Number Int a
| Add1 (Expr a) a
| Sub1 (Expr a) a
| Let Id (Expr a) (Expr a) a
| Var Id a
| If (Expr a) (Expr a) (Expr a) a
Polymorphic tags of type a
for each sub-expression
- We can have different types of tags
- e.g. Source-Position information for error messages
Lets define a name for Tag
(just integers).
type Tag = Int
We will now use:
type BareE = Expr () -- AST after parsing
type TagE = Expr Tag -- AST with distinct tags
Types: Assembly
Now, lets extend the Assembly with labels, comparisons and jumps:
data Label
= BranchFalse Tag
| BranchExit Tag
data Instruction
= ...
| ICmp Arg Arg -- Compare two arguments
| ILabel Label -- Create a label
| IJmp Label -- Jump always
| IJe Label -- Jump if equal
| IJne Label -- Jump if not-equal
Transforms
We can’t expect programmer to put in tags (yuck.)
- Lets squeeze in a
tagging
transform into our pipeline
Transforms: Parse
Just as before, but now puts a dummy ()
into each position
> let parseStr s = fmap (const ()) (parse "" s)
λ
> let e = parseStr "if 1: 22 else: 33"
λ
> e
λIf (Number 1 ()) (Number 22 ()) (Number 33 ()) ()
> label e
λIf (Number 1 ((),0)) (Number 22 ((),1)) (Number 33 ((),2)) ((),3)
Transforms: Tag
The key work is done by doTag i e
- Recursively walk over the
BareE
namede
starting tagging at counteri
- Return a pair
(i', e')
of updated counteri'
and tagged expressione'
QUIZ
doTag :: Int -> BareE -> (Int, TagE)
Number n _) = (i + 1 , Number n i)
doTag i (Var x _) = (i + 1 , Var x i)
doTag i (Let x e1 e2 _) = (_2 , Let x e1' e2' i2)
doTag i (where
= doTag i e1
(i1, e1') = doTag _1 e2 (i2, e2')
What expressions shall we fill in for _1
and _2
?
{- A -} _1 = i
= i + 1
_2
{- B -} _1 = i
= i1 + 1
_2
{- C -} _1 = i
= i2 + 1
_2
{- D -} _1 = i1
= i2 + 1
_2
{- E -} _1 = i2
= i1 + 1 _2
(ProTip: Use mapAccumL
)
We can now tag the whole program by
Calling
doTag
with the initial counter (e.g.0
),Throwing away the final counter.
tag :: BareE -> TagE
= e' where (_, e') = doTag 0 e tag e
Transforms: Code Generation
Now that we have the tags we lets implement our compilation strategy
If eCond eTrue eFalse i)
compile env (= compile env eCond ++ -- compile `eCond`
ICmp (Reg RAX) (Const 0) -- compare result to 0
[ IJe (BranchFalse i) -- if-zero then jump to 'False'-block
,
]++ compile env eTrue ++ -- code for `True`-block
IJmp lExit ] -- jump to exit (skip `False`-block!)
[ ++
ILabel (BranchFalse i) -- start of `False`-block
: compile env eFalse ++ -- code for `False`-block
ILabel (BranchExit i) ] -- exit [
Recap: Branches
Tag
each sub-expression,- Use tag to generate control-flow labels implementing branch.
Lesson: Tagged program representation simplifies compilation…
- Next: another example of how intermediate representations help.
Binary Operations
You know the drill.
- Build intuition with examples,
- Model problem with types,
- Implement with type-transforming-functions,
- Validate with tests.
Compiling Binary Operations
Lets look at some expressions and figure out how they would get compiled.
- Recall: We want the result to be in
rax
after the instructions finish.
QUIZ
What is the assembly corresponding to 33 - 10
?
1 rax, ?2
?3 rax, ?4 ?
A.
?1 = sub
,?2 = 33
,?3 = mov
,?4 = 10
B.
?1 = mov
,?2 = 33
,?3 = sub
,?4 = 10
C.
?1 = sub
,?2 = 10
,?3 = mov
,?4 = 33
D.
?1 = mov
,?2 = 10
,?3 = sub
,?4 = 33
Example: Bin1
Lets start with some easy ones. The source:
Strategy: Given n1 + n2
- Move
n1
intorax
, - Add
n2
torax
.
Example: Bin2
What if the first operand is a variable?
Simple, just copy the variable off the stack into rax
Strategy: Given x + n
- Move
x
(from stack) intorax
, - Add
n
torax
.
Example: Bin3
Same thing works if the second operand is a variable.
Strategy: Given x + n
- Move
x
(from stack) intorax
, - Add
n
torax
.
QUIZ
What is the assembly corresponding to (10 + 20) * 30
?
mov rax, 10
1 rax, ?2
?3 rax, ?4 ?
A.
?1 = add
,?2 = 30
,?3 = mul
,?4 = 20
B.
?1 = mul
,?2 = 30
,?3 = add
,?4 = 20
C.
?1 = add
,?2 = 20
,?3 = mul
,?4 = 30
D.
?1 = mul
,?2 = 20
,?3 = add
,?4 = 30
Second Operand is Constant
In general, to compile e + n
we can do
compile e ++ -- result of e is in rax
[add rax, n]
Example: Bin4
But what if we have nested expressions
1 + 2) * (3 + 4) (
- Can compile
1 + 2
with result inrax
… - .. but then need to reuse
rax
for3 + 4
Need to save 1 + 2
somewhere!
Idea: How about use another register for 3 + 4
?
But then what about (1 + 2) * (3 + 4) * (5 + 6)
?
- In general, may need to save more sub-expressions than we have registers.
Question:
Why are 1 + 2
and x + y
so easy to compile but (1 + 2) * (3 + 4)
not?
Idea: Immediate Expressions
Why were 1 + 2
and x + y
so easy to compile but (1 + 2) * (3 + 4)
not?
As 1
and x
are immediate expressions: their values don’t require any computation!
Either a constant, or,
variable whose value is on the stack.
Idea: Administrative Normal Form (ANF)
An expression is in Administrative Normal Form (ANF)
ANF means all primitive operations have immediate arguments.
Primitive Operations: Those whose values we need for computation to proceed.
v1 + v2
v1 - v2
v1 * v2
QUIZ
ANF means all primitive operations have immediate arguments.
Is the following expression in ANF?
1 + 2) * (4 - 3) (
A. Yes, its ANF.
B. Nope, its not, because of +
C. Nope, its not, because of *
D. Nope, its not, because of -
E. Huh, WTF is ANF?
Conversion to ANF
So, the below is not in ANF as *
has non-immediate arguments
1 + 2) * (4 - 3) (
However, note the following variant is in ANF
let t1 = 1 + 2
= 4 - 3
, t2 in
* t2 t1
How can we compile the above code?
; TODO in class
Binary Operations: Strategy
We can convert any expression to ANF
- By adding “temporary” variables for sub-expressions
- Step 1: Compiling ANF into Assembly
- Step 2: Converting Expressions into ANF
Types: Source
Lets add binary primitive operators
data Prim2
= Plus | Minus | Times
and use them to extend the source language:
data Expr a
= ...
| Prim2 Prim2 (Expr a) (Expr a) a
So, for example, 2 + 3
would be parsed as:
Prim2 Plus (Number 2 ()) (Number 3 ()) ()
Types: Assembly
Need to add X86 instructions for primitive arithmetic:
data Instruction
= ...
| IAdd Arg Arg
| ISub Arg Arg
| IMul Arg Arg
Types: ANF
We can define a separate type for ANF (try it!)
… but …
super tedious as it requires duplicating a bunch of code.
Instead, lets write a function that describes immediate expressions
isImm :: Expr a -> Bool
Number _ _) = True
isImm (Var _ _) = True
isImm (= False isImm _
We can now think of immediate expressions as:
type ImmExpr = {e:Expr | isImm e == True}
The subset of Expr
such that isImm
returns True
QUIZ
Similarly, lets write a function that describes ANF expressions
ANF means all primitive operations have immediate arguments.
isAnf :: Expr a -> Bool
Number _ _) = True
isAnf (Var _ _) = True
isAnf (Prim2 _ e1 e2 _) = _1
isAnf (If e1 e2 e3 _) = _2
isAnf (Let x e1 e2 _) = _3 isAnf (
What should we fill in for _1
?
{- A -} isAnf e1
{- B -} isAnf e2
{- C -} isAnf e1 && isAnf e2
{- D -} isImm e1 && isImm e2
{- E -} isImm e2
QUIZ
Similarly, lets write a function that describes ANF expressions
ANF means all primitive operations have immediate arguments.
isAnf :: Expr a -> Bool
Number _ _) = True
isAnf (Var _ _) = True
isAnf (Prim1 _ e1 _) = isAnf e1
isAnf (Prim2 _ e1 e2 _) = isImm e1 && isImm e2
isAnf (If e1 e2 e3 _) = _2 && isANF e2 && isANF e3
isAnf (Let x e1 e2 _) = isANF e1 && isANF e2 isAnf (
What should we fill in for _2
?
{- A -} isAnf e1
{- B -} isImm e1
{- C -} True
{- D -} False
We can now think of ANF expressions as:
type AnfExpr = {e:Expr | isAnf e == True}
The subset of Expr
such that isAnf
returns True
Use the above function to test our ANF conversion.
Types & Strategy
Writing the type aliases:
type BareE = Expr ()
type AnfE = Expr () -- such that isAnf is True
type AnfTagE = Expr Tag -- such that isAnf is True
type ImmTagE = Expr Tag -- such that isImm is True
we get the overall pipeline:
Transforms: Compiling AnfTagE
to Asm
The compilation from ANF is easy, lets recall our examples and strategy:
Strategy: Given v1 + v2
(where v1
and v2
are immediate expressions)
- Move
v1
intorax
, - Add
v2
torax
.
compile :: Env -> TagE -> Asm
Prim2 o v1 v2)
compile env (= [ IMov (Reg RAX) (immArg env v1)
Reg RAX) (immArg env v2)
, (prim2 o) ( ]
where we have a helper to find the Asm
variant of a Prim2
operation
prim2 :: Prim2 -> Arg -> Arg -> Instruction
Plus = IAdd
prim2 Minus = ISub
prim2 Times = IMul prim2
and another to convert an immediate expression to an x86 argument:
immArg :: Env -> ImmTag -> Arg
Number n _) = Const n
immArg _ (Var x _) = RegOffset RBP i
immArg env (where
= fromMaybe err (lookup x env)
i = error (printf "Error: '%s' is unbound" x) err
QUIZ
Which of the below are in ANF ?
{- 1 -} 2 + 3 + 4
{- 2 -} let x = 12 in
+ 1
x
{- 3 -} let x = 12
= x + 6
, y in
+ y
x
{- 4 -} let x = 12
= 18
, y = x + y + 1
, t in
if t: 7 else: 9
A.
1, 2, 3, 4
B.
1, 2, 3
C.
2, 3, 4
D.
1, 2
E.
2, 3
Transforms: Compiling Bare
to Anf
Next lets focus on A-Normalization i.e. transforming expressions into ANF
A-Normalization
We can fill in the base cases easily
Number n) = Number n
anf (Var x) = Var x anf (
Interesting cases are the binary operations
Example: Anf-1
Left operand is not immediate
Key Idea: Helper Function
imm :: BareE -> ([(Id, AnfE)], ImmE)
imm e
returns ([(t1, a1),...,(tn, an)], v)
where
ti, ai
are new temporary variables bound to ANF expressionsv
is an immediate value (either a constant or variable)
Such that e
is equivalent to
let t1 = a1
...
, = an
, tn in
v
Lets look at some more examples.
Example: Anf-2
Left operand is not internally immediate
Example: Anf-3
Both operands are not immediate
ANF: General Strategy
- Invoke
imm
on both the operands - Concat the
let
bindings - Apply the binary operator to the immediate values
ANF Implementation: Binary Operations
Lets implement the above strategy
Prim2 o e1 e2) = lets (b1s ++ b2s)
anf (Prim2 o (Var v1) (Var v2))
(where
= imm e1
(b1s, v1) = imm e2
(b2s, v2)
lets :: [(Id, AnfE)] -> AnfE -> AnfE
= e
lets [] e' :bs) e' = Let x e (lets bs e') lets ((x,e)
Intuitively, lets
stitches together a bunch of definitions:
lets [(x1, e1), (x2, e2), (x3, e3)] e===> Let x1 e1 (Let x2 e2 (Let x3 e3 e))
ANF Implementation: Let-bindings
For Let
just make sure we recursively anf
the sub-expressions.
Let x e1 e2) = Let x e1' e2'
anf (where
= anf e1
e1' = anf e2 e2'
ANF Implementation: Branches
Same principle applies to If
- use
anf
to recursively transform the branches.
If e1 e2 e3) = If e1' e2' e3'
anf (where
= anf e1
e1' = anf e2
e2' = anf e3 e3'
ANF: Making Arguments Immediate via imm
The workhorse is the function
imm :: BareE -> ([(Id, AnfE)], ImmE)
which creates temporary variables to crunch an arbitrary Bare
into an immediate value.
No need to create an variables if the expression is already immediate:
Number n l) = ( [], Number n l )
imm (Id x l) = ( [], Id x l ) imm (
The tricky case is when the expression has a primitive operation:
Prim2 o e1 e2) = ( b1s ++ b2s ++ [(t, Prim2 o v1 v2)]
imm (Id t )
, = makeFreshVar ()
t = imm e1
(b1s, v1) = imm e2 (b2s, v2)
Oh, what shall we do when:
If e1 e2 e3) = ???
imm (Let x e1 e2) = ??? imm (
Lets look at an example for inspiration.
That is, simply
anf
the relevant expressions,- bind them to a fresh variable.
@(If _ _ _) = immExp e
imm e@(Let _ _ _) = immExp e
imm e
immExp :: Expr -> ([(Id, AnfE)], ImmE)
= ([(t, e')], t)
immExp e where
= anf e
e' = makeFreshVar () t
One last thing: Whats up with makeFreshVar
?
Wait a minute, what is this magic FRESH ?
How can we create distinct names out of thin air?
(Sorry, no “global variables” in Haskell…)
We will use a counter, but will pass its value around
Just like
doTag
anf :: Int -> BareE -> (Int, AnfE)
Number n l) = (i, Number n l)
anf i (
Id x l) = (i, Id x l)
anf i (
Let x e b l) = (i'', Let x e' b' l)
anf i (where
= anf i e
(i', e') = anf i' b
(i'', b')
Prim2 o e1 e2 l) = (i'', lets (b1s ++ b2s) (Prim2 o e1' e2' l))
anf i (where
= imm i e1
(i' , b1s, e1') = imm i' e2
(i'', b2s, e2')
If c e1 e2 l) = (i''', lets bs (If c' e1' e2' l))
anf i (where
= imm i c
(i' , bs, c') = anf i' e1
(i'' , e1') = anf i'' e2 (i''', e2')
and
imm :: Int -> AnfE -> (Int, [(Id, AnfE)], ImmE)
Number n l) = (i , [], Number n l)
imm i (
Var x l) = (i , [], Var x l)
imm i (
Prim2 o e1 e2 l) = (i''', bs, Var v l)
imm i (where
= imm i e1
(i' , b1s, v1) = imm i' e2
(i'' , b2s, v2) = fresh i''
(i''', v) = b1s ++ b2s ++ [(v, Prim2 o v1 v2 l)]
bs
@(If _ _ _ l) = immExp i e
imm i e
@(Let _ _ _ l) = immExp i e
imm i e
immExp :: Int -> BareE -> (Int, [(Id, AnfE)], ImmE)
= (i'', bs, Var v ())
immExp i e l where
= anf i e
(i' , e') = fresh i'
(i'', v) = [(v, e')] bs
where now, the fresh
function returns a new counter and a variable
fresh :: Int -> (Int, Id)
= (n+1, "t" ++ show n) fresh n
Note this is super clunky. There is a really slick way to write the above code without the clutter of the i
but thats too much of a digression, but feel free to look it up yourself
Recap and Summary
Just created Boa
with
- Branches (
if
-expressions) - Binary Operators (
+
,-
, etc.)
In the process of doing so, we will learned about
- Intermediate Forms
- Normalization
Specifically,