BOA: Branches and Binary Operators
Next, lets add
- Branches (
if-expressions) - Binary Operators (
+,-, etc.)
In the process of doing so, we will learn about
- Intermediate Forms
- Normalization
Branches
Lets start first with branches (conditionals).
We will stick to our recipe of:
- Build intuition with examples,
- Model problem with types,
- Implement with type-transforming-functions,
- Validate with tests.
Examples
First, lets look at some examples of what we mean by branches.
- For now, lets treat
0as “false” and non-zero as “true”
Example: If1
if 10:
22
else:
sub1(0)- Since
10is not0we evaluate the “then” case to get22
Example: If2
if sub(1):
22
else:
sub1(0)- Since
sub(1)is0we evaluate the “else” case to get-1
QUIZ: If3
if-else is also an expression so we can nest them:
What should the following evaluate to?
let x = if sub(1):
22
else:
sub1(0)
in
if x:
add1(x)
else:
999- A.
999 - B.
0 - C.
1 - D.
1000 - E.
-1
Control Flow in Assembly
To compile branches, we will use labels, comparisons and jumps
Labels
our_code_label:
...Labels are “landmarks”
from which execution (control-flow) can be started, or
to which it can be diverted
Comparisons
cmp a1, a2Perform a (numeric) comparison between the values
a1anda2, andStore the result in a special processor flag
Jumps
jmp LABEL # jump unconditionally (i.e. always)
je LABEL # jump if previous comparison result was EQUAL
jne LABEL # jump if previous comparison result was NOT-EQUAL Use the result of the flag set by the most recent cmp
- To continue execution from the given
LABEL
QUIZ
Which of the following is a valid x86 encoding of
if 10:
22
else
33
Strategy
To compile an expression of the form
if eCond:
eThen
else:
eElseWe will:
- Compile
eCond - Compare the result (in
rax) against0 - Jump if the result is zero to a special
"IfFalse"label- At which we will evaluate
eElse, - Ending with a special
"IfExit"label.
- At which we will evaluate
- (Otherwise) continue to evaluate
eTrue- And then jump (unconditionally) to the
"IfExit"label.
- And then jump (unconditionally) to the
Example: If-Expressions to Asm
Lets see how our strategy works by example:
Example: if1

Example: if2

Example: if3

Oops, cannot reuse labels across if-expressions!
- Can’t use same label in two places (invalid assembly)

Oops, need distinct labels for each branch!
- Require distinct tags for each
if-elseexpression

Types: Source
Lets modify the Source Expression to add if-else expressions
data Expr a
= Number Int a
| Add1 (Expr a) a
| Sub1 (Expr a) a
| Let Id (Expr a) (Expr a) a
| Var Id a
| If (Expr a) (Expr a) (Expr a) aPolymorphic tags of type a for each sub-expression
- We can have different types of tags
- e.g. Source-Position information for error messages
Lets define a name for Tag (just integers).
type Tag = IntWe will now use:
type BareE = Expr () -- AST after parsing
type TagE = Expr Tag -- AST with distinct tags
Types: Assembly
Now, lets extend the Assembly with labels, comparisons and jumps:
data Label
= BranchFalse Tag
| BranchExit Tag
data Instruction
= ...
| ICmp Arg Arg -- Compare two arguments
| ILabel Label -- Create a label
| IJmp Label -- Jump always
| IJe Label -- Jump if equal
| IJne Label -- Jump if not-equal
Transforms
We can’t expect programmer to put in tags (yuck.)
- Lets squeeze in a
taggingtransform into our pipeline

Transforms: Parse
Just as before, but now puts a dummy () into each position
λ> let parseStr s = fmap (const ()) (parse "" s)
λ> let e = parseStr "if 1: 22 else: 33"
λ> e
If (Number 1 ()) (Number 22 ()) (Number 33 ()) ()
λ> label e
If (Number 1 ((),0)) (Number 22 ((),1)) (Number 33 ((),2)) ((),3)
Transforms: Tag
The key work is done by doTag i e
- Recursively walk over the
BareEnamedestarting tagging at counteri - Return a pair
(i', e')of updated counteri'and tagged expressione'
QUIZ
doTag :: Int -> BareE -> (Int, TagE)
doTag i (Number n _) = (i + 1 , Number n i)
doTag i (Var x _) = (i + 1 , Var x i)
doTag i (Let x e1 e2 _) = (_2 , Let x e1' e2' i2)
where
(i1, e1') = doTag i e1
(i2, e2') = doTag _1 e2What expressions shall we fill in for _1 and _2 ?
{- A -} _1 = i
_2 = i + 1
{- B -} _1 = i
_2 = i1 + 1
{- C -} _1 = i
_2 = i2 + 1
{- D -} _1 = i1
_2 = i2 + 1
{- E -} _1 = i2
_2 = i1 + 1
(ProTip: Use mapAccumL)
We can now tag the whole program by
Calling
doTagwith the initial counter (e.g.0),Throwing away the final counter.
tag :: BareE -> TagE
tag e = e' where (_, e') = doTag 0 e
Transforms: Code Generation
Now that we have the tags we lets implement our compilation strategy
compile env (If eCond eTrue eFalse i)
= compile env eCond ++ -- compile `eCond`
[ ICmp (Reg RAX) (Const 0) -- compare result to 0
, IJe (BranchFalse i) -- if-zero then jump to 'False'-block
]
++ compile env eTrue ++ -- code for `True`-block
[ IJmp lExit ] -- jump to exit (skip `False`-block!)
++
ILabel (BranchFalse i) -- start of `False`-block
: compile env eFalse ++ -- code for `False`-block
[ ILabel (BranchExit i) ] -- exit
Recap: Branches
Tageach sub-expression,- Use tag to generate control-flow labels implementing branch.
Lesson: Tagged program representation simplifies compilation…
- Next: another example of how intermediate representations help.
Binary Operations
You know the drill.
- Build intuition with examples,
- Model problem with types,
- Implement with type-transforming-functions,
- Validate with tests.
Compiling Binary Operations
Lets look at some expressions and figure out how they would get compiled.
- Recall: We want the result to be in
raxafter the instructions finish.
QUIZ
What is the assembly corresponding to 33 - 10 ?
?1 rax, ?2
?3 rax, ?4A.
?1 = sub,?2 = 33,?3 = mov,?4 = 10B.
?1 = mov,?2 = 33,?3 = sub,?4 = 10C.
?1 = sub,?2 = 10,?3 = mov,?4 = 33D.
?1 = mov,?2 = 10,?3 = sub,?4 = 33
Example: Bin1
Lets start with some easy ones. The source:

Strategy: Given n1 + n2
- Move
n1intorax, - Add
n2torax.
Example: Bin2
What if the first operand is a variable?

Simple, just copy the variable off the stack into rax
Strategy: Given x + n
- Move
x(from stack) intorax, - Add
ntorax.
Example: Bin3
Same thing works if the second operand is a variable.

Strategy: Given x + n
- Move
x(from stack) intorax, - Add
ntorax.
QUIZ
What is the assembly corresponding to (10 + 20) * 30 ?
mov rax, 10
?1 rax, ?2
?3 rax, ?4A.
?1 = add,?2 = 30,?3 = mul,?4 = 20B.
?1 = mul,?2 = 30,?3 = add,?4 = 20C.
?1 = add,?2 = 20,?3 = mul,?4 = 30D.
?1 = mul,?2 = 20,?3 = add,?4 = 30
Second Operand is Constant
In general, to compile e + n we can do
compile e
++ -- result of e is in rax
[add rax, n]
Example: Bin4
But what if we have nested expressions
(1 + 2) * (3 + 4)- Can compile
1 + 2with result inrax… - .. but then need to reuse
raxfor3 + 4
Need to save 1 + 2 somewhere!
Idea: How about use another register for 3 + 4?
But then what about (1 + 2) * (3 + 4) * (5 + 6) ?
- In general, may need to save more sub-expressions than we have registers.
Question:
Why are 1 + 2 and x + y so easy to compile but (1 + 2) * (3 + 4) not?
Idea: Immediate Expressions
Why were 1 + 2 and x + y so easy to compile but (1 + 2) * (3 + 4) not?
As 1 and x are immediate expressions: their values don’t require any computation!
Either a constant, or,
variable whose value is on the stack.
Idea: Administrative Normal Form (ANF)
An expression is in Administrative Normal Form (ANF)
ANF means all primitive operations have immediate arguments.
Primitive Operations: Those whose values we need for computation to proceed.
v1 + v2v1 - v2v1 * v2
QUIZ
ANF means all primitive operations have immediate arguments.
Is the following expression in ANF?
(1 + 2) * (4 - 3)A. Yes, its ANF.
B. Nope, its not, because of +
C. Nope, its not, because of *
D. Nope, its not, because of -
E. Huh, WTF is ANF?
Conversion to ANF
So, the below is not in ANF as * has non-immediate arguments
(1 + 2) * (4 - 3)However, note the following variant is in ANF
let t1 = 1 + 2
, t2 = 4 - 3
in
t1 * t2How can we compile the above code?
; TODO in class
Binary Operations: Strategy
We can convert any expression to ANF
- By adding “temporary” variables for sub-expressions

- Step 1: Compiling ANF into Assembly
- Step 2: Converting Expressions into ANF
Types: Source
Lets add binary primitive operators
data Prim2
= Plus | Minus | Timesand use them to extend the source language:
data Expr a
= ...
| Prim2 Prim2 (Expr a) (Expr a) aSo, for example, 2 + 3 would be parsed as:
Prim2 Plus (Number 2 ()) (Number 3 ()) ()
Types: Assembly
Need to add X86 instructions for primitive arithmetic:
data Instruction
= ...
| IAdd Arg Arg
| ISub Arg Arg
| IMul Arg Arg
Types: ANF
We can define a separate type for ANF (try it!)
… but …
super tedious as it requires duplicating a bunch of code.
Instead, lets write a function that describes immediate expressions
isImm :: Expr a -> Bool
isImm (Number _ _) = True
isImm (Var _ _) = True
isImm _ = FalseWe can now think of immediate expressions as:
type ImmExpr = {e:Expr | isImm e == True}The subset of Expr such that isImm returns True
QUIZ
Similarly, lets write a function that describes ANF expressions
ANF means all primitive operations have immediate arguments.
isAnf :: Expr a -> Bool
isAnf (Number _ _) = True
isAnf (Var _ _) = True
isAnf (Prim2 _ e1 e2 _) = _1
isAnf (If e1 e2 e3 _) = _2
isAnf (Let x e1 e2 _) = _3What should we fill in for _1?
{- A -} isAnf e1
{- B -} isAnf e2
{- C -} isAnf e1 && isAnf e2
{- D -} isImm e1 && isImm e2
{- E -} isImm e2
QUIZ
Similarly, lets write a function that describes ANF expressions
ANF means all primitive operations have immediate arguments.
isAnf :: Expr a -> Bool
isAnf (Number _ _) = True
isAnf (Var _ _) = True
isAnf (Prim1 _ e1 _) = isAnf e1
isAnf (Prim2 _ e1 e2 _) = isImm e1 && isImm e2
isAnf (If e1 e2 e3 _) = _2 && isANF e2 && isANF e3
isAnf (Let x e1 e2 _) = isANF e1 && isANF e2 What should we fill in for _2?
{- A -} isAnf e1
{- B -} isImm e1
{- C -} True
{- D -} FalseWe can now think of ANF expressions as:
type AnfExpr = {e:Expr | isAnf e == True}The subset of Expr such that isAnf returns True
Use the above function to test our ANF conversion.
Types & Strategy
Writing the type aliases:
type BareE = Expr ()
type AnfE = Expr () -- such that isAnf is True
type AnfTagE = Expr Tag -- such that isAnf is True
type ImmTagE = Expr Tag -- such that isImm is Truewe get the overall pipeline:

Transforms: Compiling AnfTagE to Asm

The compilation from ANF is easy, lets recall our examples and strategy:
Strategy: Given v1 + v2 (where v1 and v2 are immediate expressions)
- Move
v1intorax, - Add
v2torax.
compile :: Env -> TagE -> Asm
compile env (Prim2 o v1 v2)
= [ IMov (Reg RAX) (immArg env v1)
, (prim2 o) (Reg RAX) (immArg env v2)
]where we have a helper to find the Asm variant of a Prim2 operation
prim2 :: Prim2 -> Arg -> Arg -> Instruction
prim2 Plus = IAdd
prim2 Minus = ISub
prim2 Times = IMuland another to convert an immediate expression to an x86 argument:
immArg :: Env -> ImmTag -> Arg
immArg _ (Number n _) = Const n
immArg env (Var x _) = RegOffset RBP i
where
i = fromMaybe err (lookup x env)
err = error (printf "Error: '%s' is unbound" x)
QUIZ
Which of the below are in ANF ?
{- 1 -} 2 + 3 + 4
{- 2 -} let x = 12 in
x + 1
{- 3 -} let x = 12
, y = x + 6
in
x + y
{- 4 -} let x = 12
, y = 18
, t = x + y + 1
in
if t: 7 else: 9A.
1, 2, 3, 4B.
1, 2, 3C.
2, 3, 4D.
1, 2E.
2, 3
Transforms: Compiling Bare to Anf
Next lets focus on A-Normalization i.e. transforming expressions into ANF

A-Normalization
We can fill in the base cases easily
anf (Number n) = Number n
anf (Var x) = Var xInteresting cases are the binary operations
Example: Anf-1
Left operand is not immediate

Key Idea: Helper Function
imm :: BareE -> ([(Id, AnfE)], ImmE)imm e returns ([(t1, a1),...,(tn, an)], v) where
ti, aiare new temporary variables bound to ANF expressionsvis an immediate value (either a constant or variable)
Such that e is equivalent to
let t1 = a1
, ...
, tn = an
in
vLets look at some more examples.
Example: Anf-2
Left operand is not internally immediate

Example: Anf-3
Both operands are not immediate

ANF: General Strategy

- Invoke
immon both the operands - Concat the
letbindings - Apply the binary operator to the immediate values
ANF Implementation: Binary Operations
Lets implement the above strategy
anf (Prim2 o e1 e2) = lets (b1s ++ b2s)
(Prim2 o (Var v1) (Var v2))
where
(b1s, v1) = imm e1
(b2s, v2) = imm e2
lets :: [(Id, AnfE)] -> AnfE -> AnfE
lets [] e' = e
lets ((x,e):bs) e' = Let x e (lets bs e')Intuitively, lets stitches together a bunch of definitions:
lets [(x1, e1), (x2, e2), (x3, e3)] e
===> Let x1 e1 (Let x2 e2 (Let x3 e3 e))
ANF Implementation: Let-bindings
For Let just make sure we recursively anf the sub-expressions.
anf (Let x e1 e2) = Let x e1' e2'
where
e1' = anf e1
e2' = anf e2
ANF Implementation: Branches
Same principle applies to If
- use
anfto recursively transform the branches.
anf (If e1 e2 e3) = If e1' e2' e3'
where
e1' = anf e1
e2' = anf e2
e3' = anf e3
ANF: Making Arguments Immediate via imm
The workhorse is the function
imm :: BareE -> ([(Id, AnfE)], ImmE)which creates temporary variables to crunch an arbitrary Bare into an immediate value.
No need to create an variables if the expression is already immediate:
imm (Number n l) = ( [], Number n l )
imm (Id x l) = ( [], Id x l )The tricky case is when the expression has a primitive operation:
imm (Prim2 o e1 e2) = ( b1s ++ b2s ++ [(t, Prim2 o v1 v2)]
, Id t )
t = makeFreshVar ()
(b1s, v1) = imm e1
(b2s, v2) = imm e2 Oh, what shall we do when:
imm (If e1 e2 e3) = ???
imm (Let x e1 e2) = ???Lets look at an example for inspiration.

That is, simply
anfthe relevant expressions,- bind them to a fresh variable.
imm e@(If _ _ _) = immExp e
imm e@(Let _ _ _) = immExp e
immExp :: Expr -> ([(Id, AnfE)], ImmE)
immExp e = ([(t, e')], t)
where
e' = anf e
t = makeFreshVar ()
One last thing: Whats up with makeFreshVar ?
Wait a minute, what is this magic FRESH ?
How can we create distinct names out of thin air?
(Sorry, no “global variables” in Haskell…)
We will use a counter, but will pass its value around
Just like
doTag
anf :: Int -> BareE -> (Int, AnfE)
anf i (Number n l) = (i, Number n l)
anf i (Id x l) = (i, Id x l)
anf i (Let x e b l) = (i'', Let x e' b' l)
where
(i', e') = anf i e
(i'', b') = anf i' b
anf i (Prim2 o e1 e2 l) = (i'', lets (b1s ++ b2s) (Prim2 o e1' e2' l))
where
(i' , b1s, e1') = imm i e1
(i'', b2s, e2') = imm i' e2
anf i (If c e1 e2 l) = (i''', lets bs (If c' e1' e2' l))
where
(i' , bs, c') = imm i c
(i'' , e1') = anf i' e1
(i''', e2') = anf i'' e2and
imm :: Int -> AnfE -> (Int, [(Id, AnfE)], ImmE)
imm i (Number n l) = (i , [], Number n l)
imm i (Var x l) = (i , [], Var x l)
imm i (Prim2 o e1 e2 l) = (i''', bs, Var v l)
where
(i' , b1s, v1) = imm i e1
(i'' , b2s, v2) = imm i' e2
(i''', v) = fresh i''
bs = b1s ++ b2s ++ [(v, Prim2 o v1 v2 l)]
imm i e@(If _ _ _ l) = immExp i e
imm i e@(Let _ _ _ l) = immExp i e
immExp :: Int -> BareE -> (Int, [(Id, AnfE)], ImmE)
immExp i e l = (i'', bs, Var v ())
where
(i' , e') = anf i e
(i'', v) = fresh i'
bs = [(v, e')]where now, the fresh function returns a new counter and a variable
fresh :: Int -> (Int, Id)
fresh n = (n+1, "t" ++ show n)Note this is super clunky. There is a really slick way to write the above code without the clutter of the i but thats too much of a digression, but feel free to look it up yourself
Recap and Summary
Just created Boa with
- Branches (
if-expressions) - Binary Operators (
+,-, etc.)
In the process of doing so, we will learned about
- Intermediate Forms
- Normalization
Specifically,
