BOA: Branches and Binary Operators

Next, lets add

Branches (if-expressions)
Binary Operators (+, -, etc.)

In the process of doing so, we will learn about

Intermediate Forms
Normalization

Branches

Lets start first with branches (conditionals).

We will stick to our recipe of:

Build intuition with examples,
Model problem with types,
Implement with type-transforming-functions,
Validate with tests.

Examples

First, lets look at some examples of what we mean by branches.

For now, lets treat 0 as “false” and non-zero as “true”

Example: `If1`

if 10:
  22
else:
  sub1(0)

Since 10 is not 0 we evaluate the “then” case to get 22

Example: `If2`

if sub(1):
  22
else:
  sub1(0)

Since sub(1) is 0 we evaluate the “else” case to get -1

QUIZ: `If3`

if-else is also an expression so we can nest them:

What should the following evaluate to?

let x = if sub(1):
          22
        else:
          sub1(0)
in
  if x:
    add1(x)
  else:
    999

A. 999
B. 0
C. 1
D. 1000
E. -1

Control Flow in Assembly

To compile branches, we will use labels, comparisons and jumps

Labels

our_code_label:
  ...

Labels are “landmarks”

from which execution (control-flow) can be started, or
to which it can be diverted

Comparisons

cmp a1, a2

Perform a (numeric) comparison between the values a1 and a2, and
Store the result in a special processor flag

Jumps

jmp LABEL     # jump unconditionally (i.e. always)
je  LABEL     # jump if previous comparison result was EQUAL  
jne LABEL     # jump if previous comparison result was NOT-EQUAL

Use the result of the flag set by the most recent cmp

To continue execution from the given LABEL

QUIZ

Which of the following is a valid x86 encoding of

if 10:
  22
else
  33

Strategy

To compile an expression of the form

if eCond:
  eThen
else:
  eElse

We will:

Compile eCond
Compare the result (in rax) against 0
Jump if the result is zero to a special "IfFalse" label
- At which we will evaluate eElse,
- Ending with a special "IfExit" label.
(Otherwise) continue to evaluate eTrue
- And then jump (unconditionally) to the "IfExit" label.

Example: If-Expressions to `Asm`

Lets see how our strategy works by example:

Example: if1

Example: if2

Example: if3

Oops, cannot reuse labels across if-expressions!

Can’t use same label in two places (invalid assembly)

Oops, need distinct labels for each branch!

Require distinct tags for each if-else expression

Types: Source

Lets modify the Source Expression to add if-else expressions

data Expr a
  = Number Int                        a
  | Add1   (Expr a)                   a
  | Sub1   (Expr a)                   a
  | Let    Id (Expr a) (Expr a)       a
  | Var    Id                         a
  | If     (Expr a) (Expr a) (Expr a) a

Polymorphic tags of type a for each sub-expression

We can have different types of tags
e.g. Source-Position information for error messages

Lets define a name for Tag (just integers).

type Tag   = Int

We will now use:

type BareE = Expr ()     -- AST after parsing
type TagE  = Expr Tag    -- AST with distinct tags

Types: Assembly

Now, lets extend the Assembly with labels, comparisons and jumps:

data Label
  = BranchFalse Tag
  | BranchExit  Tag

data Instruction
  = ...
  | ICmp   Arg   Arg  -- Compare two arguments
  | ILabel Label      -- Create a label
  | IJmp   Label      -- Jump always
  | IJe    Label      -- Jump if equal  
  | IJne   Label      -- Jump if not-equal

Transforms

We can’t expect programmer to put in tags (yuck.)

Lets squeeze in a tagging transform into our pipeline

Transforms: Parse

Just as before, but now puts a dummy () into each position

λ> let parseStr s = fmap (const ()) (parse "" s)

λ> let e = parseStr "if 1: 22 else: 33"

λ> e
If (Number 1 ()) (Number 22 ()) (Number 33 ()) ()

λ> label e
If (Number 1 ((),0)) (Number 22 ((),1)) (Number 33 ((),2)) ((),3)

Transforms: Tag

The key work is done by doTag i e

Recursively walk over the BareE named e starting tagging at counter i
Return a pair (i', e') of updated counter i' and tagged expression e'

QUIZ

doTag :: Int -> BareE -> (Int, TagE)
doTag i (Number n _)    = (i + 1 , Number n i)
doTag i (Var    x _)    = (i + 1 , Var     x i)
doTag i (Let x e1 e2 _) = (_2    , Let x e1' e2' i2)
  where
    (i1, e1')           = doTag i  e1
    (i2, e2')           = doTag _1 e2

What expressions shall we fill in for _1 and _2 ?

{- A -}   _1 = i
          _2 = i + 1

{- B -}   _1 = i
          _2 = i1 + 1

{- C -}   _1 = i
          _2 = i2 + 1

{- D -}   _1 = i1
          _2 = i2 + 1

{- E -}   _1 = i2
          _2 = i1 + 1

(ProTip: Use mapAccumL)

We can now tag the whole program by

Calling doTag with the initial counter (e.g. 0),
Throwing away the final counter.

tag :: BareE -> TagE
tag e = e'  where  (_, e') = doTag 0 e

Transforms: Code Generation

Now that we have the tags we lets implement our compilation strategy

compile env (If eCond eTrue eFalse i)
  =   compile env eCond ++          -- compile `eCond`
    [ ICmp (Reg RAX) (Const 0)      -- compare result to 0
    , IJe (BranchFalse i)           -- if-zero then jump to 'False'-block
    ]
   ++ compile env eTrue  ++         -- code for `True`-block
    [ IJmp   lExit      ]           -- jump to exit (skip `False`-block!)
   ++
      ILabel (BranchFalse i)        -- start of `False`-block
   : compile env eFalse ++          -- code for `False`-block
    [ ILabel (BranchExit i) ]       -- exit

Recap: Branches

Tag each sub-expression,
Use tag to generate control-flow labels implementing branch.

Lesson: Tagged program representation simplifies compilation…

Next: another example of how intermediate representations help.

Binary Operations

You know the drill.

Build intuition with examples,
Model problem with types,
Implement with type-transforming-functions,
Validate with tests.

Compiling Binary Operations

Lets look at some expressions and figure out how they would get compiled.

Recall: We want the result to be in rax after the instructions finish.

QUIZ

What is the assembly corresponding to 33 - 10 ?

?1 rax, ?2
?3 rax, ?4

A. ?1 = sub, ?2 = 33, ?3 = mov, ?4 = 10
B. ?1 = mov, ?2 = 33, ?3 = sub, ?4 = 10
C. ?1 = sub, ?2 = 10, ?3 = mov, ?4 = 33
D. ?1 = mov, ?2 = 10, ?3 = sub, ?4 = 33

Example: Bin1

Lets start with some easy ones. The source:

Strategy: Given n1 + n2

Move n1 into rax,
Add n2 to rax.

Example: Bin2

What if the first operand is a variable?

Simple, just copy the variable off the stack into rax

Strategy: Given x + n

Move x (from stack) into rax,
Add n to rax.

Example: Bin3

Same thing works if the second operand is a variable.

Strategy: Given x + n

Move x (from stack) into rax,
Add n to rax.

QUIZ

What is the assembly corresponding to (10 + 20) * 30 ?

mov rax, 10
?1  rax, ?2
?3  rax, ?4

A. ?1 = add, ?2 = 30, ?3 = mul, ?4 = 20
B. ?1 = mul, ?2 = 30, ?3 = add, ?4 = 20
C. ?1 = add, ?2 = 20, ?3 = mul, ?4 = 30
D. ?1 = mul, ?2 = 20, ?3 = add, ?4 = 30

Second Operand is Constant

In general, to compile e + n we can do

     compile e      
  ++              -- result of e is in rax
     [add rax, n]

Example: Bin4

But what if we have nested expressions

(1 + 2) * (3 + 4)

Can compile 1 + 2 with result in rax …
.. but then need to reuse rax for 3 + 4

Need to save 1 + 2 somewhere!

Idea: How about use another register for `3 + 4`?

But then what about (1 + 2) * (3 + 4) * (5 + 6) ?

In general, may need to save more sub-expressions than we have registers.

Question:

Why are 1 + 2 and x + y so easy to compile but (1 + 2) * (3 + 4) not?

Idea: Immediate Expressions

Why were 1 + 2 and x + y so easy to compile but (1 + 2) * (3 + 4) not?

As 1 and x are immediate expressions: their values don’t require any computation!

Either a constant, or,
variable whose value is on the stack.

Idea: Administrative Normal Form (ANF)

An expression is in Administrative Normal Form (ANF)

ANF means all primitive operations have immediate arguments.

Primitive Operations: Those whose values we need for computation to proceed.

v1 + v2
v1 - v2
v1 * v2

QUIZ

ANF means all primitive operations have immediate arguments.

Is the following expression in ANF?

(1 + 2) * (4 - 3)

A. Yes, its ANF.

B. Nope, its not, because of +

C. Nope, its not, because of *

D. Nope, its not, because of -

E. Huh, WTF is ANF?

Conversion to ANF

So, the below is not in ANF as * has non-immediate arguments

(1 + 2) * (4 - 3)

However, note the following variant is in ANF

let t1 = 1 + 2
  , t2 = 4 - 3
in  
    t1 * t2

How can we compile the above code?

; TODO in class

Binary Operations: Strategy

We can convert any expression to ANF

By adding “temporary” variables for sub-expressions

Step 1: Compiling ANF into Assembly
Step 2: Converting Expressions into ANF

Types: Source

Lets add binary primitive operators

data Prim2
  = Plus | Minus | Times

and use them to extend the source language:

data Expr a
  = ...
  | Prim2 Prim2  (Expr a) (Expr a) a

So, for example, 2 + 3 would be parsed as:

Prim2 Plus (Number 2 ()) (Number 3 ()) ()

Types: Assembly

Need to add X86 instructions for primitive arithmetic:

data Instruction
  = ...
  | IAdd Arg Arg
  | ISub Arg Arg
  | IMul Arg Arg

Types: ANF

We can define a separate type for ANF (try it!)

… but …

super tedious as it requires duplicating a bunch of code.

Instead, lets write a function that describes immediate expressions

isImm :: Expr a -> Bool
isImm (Number _ _) = True
isImm (Var    _ _) = True
isImm _            = False

We can now think of immediate expressions as:

type ImmExpr = {e:Expr | isImm e == True}

The subset of Expr such that isImm returns True

QUIZ

Similarly, lets write a function that describes ANF expressions

ANF means all primitive operations have immediate arguments.

isAnf :: Expr a -> Bool
isAnf (Number  _     _) = True
isAnf (Var     _     _) = True
isAnf (Prim2 _ e1 e2 _) = _1
isAnf (If e1 e2 e3   _) = _2
isAnf (Let x e1 e2   _) = _3

What should we fill in for _1?

{- A -} isAnf e1
{- B -} isAnf e2
{- C -} isAnf e1 && isAnf e2
{- D -} isImm e1 && isImm e2
{- E -} isImm e2

QUIZ

Similarly, lets write a function that describes ANF expressions

ANF means all primitive operations have immediate arguments.

isAnf :: Expr a -> Bool
isAnf (Number  _     _) = True
isAnf (Var     _     _) = True
isAnf (Prim1 _ e1 _)    = isAnf e1 
isAnf (Prim2 _ e1 e2 _) = isImm e1 && isImm e2 
isAnf (If e1 e2 e3   _) = _2       && isANF e2 && isANF e3
isAnf (Let x e1 e2   _) = isANF e1 && isANF e2

What should we fill in for _2?

{- A -} isAnf e1
{- B -} isImm e1
{- C -} True
{- D -} False

We can now think of ANF expressions as:

type AnfExpr = {e:Expr | isAnf e == True}

The subset of Expr such that isAnf returns True

Use the above function to test our ANF conversion.

Types & Strategy

Writing the type aliases:

type BareE   = Expr ()
type AnfE    = Expr ()  -- such that isAnf is True
type AnfTagE = Expr Tag -- such that isAnf is True
type ImmTagE = Expr Tag -- such that isImm is True

we get the overall pipeline:

Transforms: Compiling `AnfTagE` to `Asm`

The compilation from ANF is easy, lets recall our examples and strategy:

Strategy: Given v1 + v2 (where v1 and v2 are immediate expressions)

Move v1 into rax,
Add v2 to rax.

compile :: Env -> TagE -> Asm
compile env (Prim2 o v1 v2)
  = [ IMov      (Reg RAX) (immArg env v1)
    , (prim2 o) (Reg RAX) (immArg env v2)
    ]

where we have a helper to find the Asm variant of a Prim2 operation

prim2 :: Prim2 -> Arg -> Arg -> Instruction
prim2 Plus  = IAdd
prim2 Minus = ISub
prim2 Times = IMul

and another to convert an immediate expression to an x86 argument:

immArg :: Env -> ImmTag -> Arg
immArg _   (Number n _) = Const n
immArg env (Var    x _) = RegOffset RBP i
  where
    i                   = fromMaybe err (lookup x env)
    err                 = error (printf "Error: '%s' is unbound" x)

QUIZ

Which of the below are in ANF ?

{- 1 -} 2 + 3 + 4

{- 2 -} let x = 12 in
          x + 1

{- 3 -} let x = 12
          , y = x + 6
        in
          x + y

{- 4 -} let x = 12
          , y = 18
          , t = x + y + 1
        in
          if t: 7 else: 9

A. 1, 2, 3, 4
B. 1, 2, 3
C. 2, 3, 4
D. 1, 2
E. 2, 3

Transforms: Compiling `Bare` to `Anf`

Next lets focus on A-Normalization i.e. transforming expressions into ANF

A-Normalization

We can fill in the base cases easily

anf (Number n)      = Number n
anf (Var x)         = Var x

Interesting cases are the binary operations

Example: Anf-1

Left operand is not immediate

Key Idea: Helper Function

imm :: BareE -> ([(Id, AnfE)], ImmE)

imm e returns ([(t1, a1),...,(tn, an)], v) where

ti, ai are new temporary variables bound to ANF expressions
v is an immediate value (either a constant or variable)

Such that e is equivalent to

let t1 = a1
  , ...
  , tn = an
in
   v

Lets look at some more examples.

Example: Anf-2

Left operand is not internally immediate

Example: Anf-3

Both operands are not immediate

ANF: General Strategy

Invoke imm on both the operands
Concat the let bindings
Apply the binary operator to the immediate values

ANF Implementation: Binary Operations

Lets implement the above strategy

anf (Prim2 o e1 e2) = lets (b1s ++ b2s)
                        (Prim2 o (Var v1) (Var v2))
  where
    (b1s, v1)       = imm e1
    (b2s, v2)       = imm e2

lets :: [(Id, AnfE)] -> AnfE -> AnfE
lets []         e' = e
lets ((x,e):bs) e' = Let x e (lets bs e')

Intuitively, lets stitches together a bunch of definitions:

lets [(x1, e1), (x2, e2), (x3, e3)] e
  ===> Let x1 e1 (Let x2 e2 (Let x3 e3 e))

ANF Implementation: Let-bindings

For Let just make sure we recursively anf the sub-expressions.

anf (Let x e1 e2)   = Let x e1' e2'
  where
    e1'             = anf e1
    e2'             = anf e2

ANF Implementation: Branches

Same principle applies to If

use anf to recursively transform the branches.

anf (If e1 e2 e3) = If e1' e2' e3'  
  where
    e1'           = anf e1
    e2'           = anf e2
    e3'           = anf e3

ANF: Making Arguments Immediate via `imm`

The workhorse is the function

imm :: BareE -> ([(Id, AnfE)], ImmE)

which creates temporary variables to crunch an arbitrary Bare into an immediate value.

No need to create an variables if the expression is already immediate:

imm (Number n l) = ( [], Number n l )
imm (Id     x l) = ( [], Id     x l )

The tricky case is when the expression has a primitive operation:

imm (Prim2 o e1 e2) = ( b1s ++ b2s ++ [(t,  Prim2 o v1 v2)]
                      , Id t  )
  t                 = makeFreshVar ()
  (b1s, v1)         = imm e1
  (b2s, v2)         = imm e2

Oh, what shall we do when:

imm (If e1 e2 e3)   = ???
imm (Let x e1 e2)   = ???

Lets look at an example for inspiration.

That is, simply

anf the relevant expressions,
bind them to a fresh variable.

imm e@(If _ _ _)  = immExp e
imm e@(Let _ _ _) = immExp e

immExp :: Expr -> ([(Id, AnfE)], ImmE)
immExp e = ([(t, e')], t)
  where
    e'   = anf e
    t    = makeFreshVar ()

One last thing: Whats up with `makeFreshVar` ?

Wait a minute, what is this magic FRESH ?

How can we create distinct names out of thin air?

(Sorry, no “global variables” in Haskell…)

We will use a counter, but will pass its value around

Just like doTag

anf :: Int -> BareE -> (Int, AnfE)

anf i (Number n l)      = (i, Number n l)

anf i (Id     x l)      = (i, Id     x l)

anf i (Let x e b l)     = (i'', Let x e' b' l)
  where
    (i',  e')           = anf i e
    (i'', b')           = anf i' b

anf i (Prim2 o e1 e2 l) = (i'', lets (b1s ++ b2s) (Prim2 o e1' e2' l))
  where
    (i' , b1s, e1')     = imm i  e1
    (i'', b2s, e2')     = imm i' e2

anf i (If c e1 e2 l)    = (i''', lets bs  (If c' e1' e2' l))
  where
    (i'  , bs, c')      = imm i   c
    (i'' ,     e1')     = anf i'  e1
    (i''',     e2')     = anf i'' e2

and

imm :: Int -> AnfE -> (Int, [(Id, AnfE)], ImmE)

imm i (Number n l)        = (i  , [], Number n l)

imm i (Var x l)           = (i  , [], Var x l)

imm i (Prim2 o e1 e2 l) = (i''', bs, Var v l)
  where
    (i'  , b1s, v1)     = imm i  e1
    (i'' , b2s, v2)     = imm i' e2
    (i''', v)           = fresh i''
    bs                  = b1s ++ b2s ++ [(v, Prim2 o v1 v2 l)]

imm i e@(If _ _ _  l)   = immExp i e

imm i e@(Let _ _ _ l)   = immExp i e

immExp :: Int -> BareE -> (Int, [(Id, AnfE)], ImmE)
immExp i e l  = (i'', bs, Var v ())
  where
    (i' , e') = anf i e
    (i'', v)  = fresh i'
    bs        = [(v, e')]

where now, the fresh function returns a new counter and a variable

fresh :: Int -> (Int, Id)
fresh n = (n+1, "t" ++ show n)

Note this is super clunky. There is a really slick way to write the above code without the clutter of the i but thats too much of a digression, but feel free to look it up yourself

Recap and Summary

Just created Boa with

Branches (if-expressions)
Binary Operators (+, -, etc.)

In the process of doing so, we will learned about

Intermediate Forms
Normalization

Specifically,

BOA: Branches and Binary Operators

Branches

Examples

Example: If1

Example: If2

QUIZ: If3

Control Flow in Assembly

Labels

Comparisons

Jumps

QUIZ

Strategy

Example: If-Expressions to Asm

Example: if1

Example: if2

Example: if3

Types: Source

Types: Assembly

Transforms

Transforms: Parse

Transforms: Tag

QUIZ

Transforms: Code Generation

Recap: Branches

Binary Operations

Compiling Binary Operations

QUIZ

Example: Bin1

Example: Bin2

Example: Bin3

QUIZ

Second Operand is Constant

Example: Bin4

Idea: How about use another register for 3 + 4?

Idea: Immediate Expressions

Idea: Administrative Normal Form (ANF)

QUIZ

Conversion to ANF

Binary Operations: Strategy

Types: Source

Types: Assembly

Types: ANF

QUIZ

QUIZ

Types & Strategy

Transforms: Compiling AnfTagE to Asm

QUIZ

Transforms: Compiling Bare to Anf

A-Normalization

Example: Anf-1

Example: Anf-2

Example: Anf-3

ANF: General Strategy

ANF Implementation: Binary Operations

ANF Implementation: Let-bindings

ANF Implementation: Branches

ANF: Making Arguments Immediate via imm

One last thing: Whats up with makeFreshVar ?

Recap and Summary

Example: `If1`

Example: `If2`

QUIZ: `If3`

Example: If-Expressions to `Asm`

Idea: How about use another register for `3 + 4`?

Transforms: Compiling `AnfTagE` to `Asm`

Transforms: Compiling `Bare` to `Anf`

ANF: Making Arguments Immediate via `imm`

One last thing: Whats up with `makeFreshVar` ?