bound provides a powerful but simple toolbox for dealing with capture-avoiding substitution in your code. It lets you use classes you already know: Monad, Foldable and Traversable to manipulate your syntax tree, and it factors out issues of name capture into a reusable monad transformer named Scope.

What I want to do today is help motivate the design of the bound library by talking a bit about alternatives. This may serve, with a few digressions, as a bit of a crash course on capture-avoiding substitution, for those who don't deal with this issue on a day-to-day basis. This presentation is based on a talk I gave on bound, but it has been upgraded to work with the School of Haskell tools for easier online consumption. In particular, I've tried to make the code fragments executable, and greatly expand on the content to cover things in greater depth. The original slides are also available, but were designed with me speaking over them in mind.

What is in a Name?

We use names in lots of contexts, but any program that deals with names has to deal with a number of issues such as capture avoidance and deciding alpha equivalance... and others that will come up as we go.

If you go to write a syntax tree, the dumbest thing that can possibly work would be something like:

type Name = String

data Exp 
  = Var Name 
  | App Exp Exp
  | Lam Name Exp
  deriving (Eq,Show,Read)

main = do
  print $ Var "x"
  print $ Lam "x" (Var "x")
  print $ Lam "x" (Lam "y" (Var "x"))

Two problems we want to address in almost any syntax tree are:

1.) Capture Avoidance

Blindly substituting

Lam "x" (Var "y")

into

Lam "y" (Var "z")

for z would yield

Lam "y" (Lam "x" (Var "y"))

which now causes the formerly free variable y to now reference the y bound by the outer lambda.

2.) Alpha Equivalence

Lam "x" (Var "x")

and

Lam "y" (Var "y")

both mean the same thing and it'd be nice to be able to check this easily, make them hash the same for common sub-expression elimination, etc.

There is a cottage industry of solutions to the naming problem:

e.g.

Naïve substitution
The Barendregt convention
HOAS
Weak HOAS/PHOAS
"I am not a Number: I am a Free Variable!"
The Locally Nameless Representation
Unbound, which mixes Barendregt with Locally Nameless
Locally Nameless Syntax with De Bruijn Indices

I'm not looking to address all of these here, just a few, to showcase issues with the different points in the design space, and then to try to offer up a nice point in the design space (bound) that combines many of the advantages with few of the disadvantages.

Naïve Substitution

Pros:

1.) The syntax trees are pretty and easy to read.

2.) Easy to use to get started

Cons:

1.) It is easy even for experts to make mistakes.

2.) Alpha equivalence checking is tedious

3.) It is comically slow.

import Data.List (union, span, (\\))

type Name = String

data Exp 
  = Var Name 
  | App Exp Exp
  | Lam Name Exp
  deriving (Eq,Show,Read)

-- show
freeVars :: Exp -> [Name]
freeVars (Var x) = [x]
freeVars (App a b) = freeVars a `union` freeVars b 
freeVars (Lam n x) = freeVars x \\ [n]

allVars :: Exp -> [Name]
allVars (Var x) = [x]
allVars (App a b) = allVars a `union` allVars b
allVars (Lam n x) = allVars x

subst :: Name -> Exp -> Exp -> Exp
subst x s b = sub vs0 b where
  sub _ e@(Var v)
    | v == x = s
    | otherwise = e
  sub vs e@(Lam v e')
    | v == x = e
    | v `elem` fvs = Lam v' (sub (v':vs) e'')
    | otherwise = Lam v (sub vs e') where
    v' = newId vs
    e'' = subst v (Var v') e'
  sub vs (App f a) = sub vs f `App` sub vs a
  fvs = freeVars s
  vs0 = fvs `union` allVars b
  
newId :: [Name] -> Name
newId vs = head (names \\ vs)

names :: [Name]
names = [ [i] | i <- ['a'..'z']] ++ [i : show j | j <- [1..], i <- ['a'..'z'] ]
-- /show

-- show and we can see that this deals with capture avoidance
main = print $ subst "z" (Lam "x" (Var "y")) (Lam "y" (Var "z"))
-- /show

This code is adapted from Lennart Augustsson's excellent λ-calculus cooked four ways, which provides a nice crash course on name binding for evaluation.

[Edited August 27 2013: The code fragment above was updated based on a conversation with Marc André Ziegert to deal with an issue where in the v elem fvs case under sub it wasn't properly updating the set of allVars b when it introduced v' into scope. I told you naïve substitution was tricky!]

The Barendregt Convention

This one is a serious contender. It is what GHC uses.

Pros:

1.) The Secrets of the Glasgow Haskell Compiler inliner paper by Simon and Simon describes a technique they call "the Rapier," which can make this really fast.

Cons:

1.) Easy even for experts to screw up.

2.) Alpha equivalence is still tedious.

3.) Need a globally unique variable supply. e.g. concurrent-supply

4.) The obvious implementation technique chews through a scarily large number of variable IDs! Without the Rapier there is a lot of "administrative" renaming going on.

Higher-Order Abstract Syntax (HOAS)

HOAS cheats and borrows substitution from the host language.

-- show
data Exp a
  = Var a
  | Lam (Exp a -> Exp a)
  | App (Exp a) (Exp a)
-- /show
main = putStrLn "It typechecks, but a Show instance here is hard!"

Pros:

1.) It provides ridiculously fast substitution by comparison to other techniques.

2.) Meijer and Hutton's "Bananas in Space", Fegaras and Sheard's "Revisiting Catamorphisms over Datatypes with Embedded Functions (or, Programs from Outer Space) and Weirich and Washburn's "Boxes Go Bananas: Encoding Higher-order Abstract Syntax with Parametric Polymorphism" each describe forms of catamorphism that can be used on these to work under binders.

Cons:

1.) It doesn't work in theorem provers like Coq or Agda, because Exp occurs in both positive and negative position, causing it to fail the positivity check.

2.) It is quite hard to work under binders. You often have to invert your control flow for passes and think "inside out."

3.) You have to deal with exotic terms that do bad things like inspect the expression they are given, so it is in some sense "too big."

4.) Alpha equivalence is still tedious.

5.) In "Rotten Bananas" on comonad.com I go through the issues with general recursion in this approach.

Variants such as Weak HOAS/PHOAS/Weirich and Washburn exist to address some of these issues (e.g. working in Coq/Agda, restoring positivity, ruling out exotic terms) at the expense of other problems (e.g. losing the notion of catamorphism or general recursion).

De Bruijn Indices

M'colleague Bob Atkey once memorably described the capacity to put up with De Bruijn indices as a Cylon detector, the kind of reverse Turing Test that the humans in Battlestar Galactica invent, the better to recognize one another by their common inadequacies. He had a point
-Conor McBride, "I am not a number, I am a classy hack"

We can split variables into bound and free:

-- show
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Foldable
import Data.Traversable

data Exp a 
  = Free a
  | Bound !Int
  | App (Exp a) (Exp a)
  | Lam (Exp a)
  deriving (Eq,Show,Read,Functor,Foldable,Traversable)
-- /show
main = putStrLn "It typechecks, but it is boring."

We could define combinators to bind names and instantiate them

abstract :: Eq a => a -> Exp a -> Exp a
instantiate :: Exp a -> Exp a -> Exp a

but let's adopt Conor McBride's less error prone convention instead:

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Foldable
import Data.Traversable
-- show
newtype Scope f a = Scope (f a) deriving (Eq,Show,Read,Functor,Foldable,Traversable)
data Exp a 
  = Free a
  | Bound !Int
  | App (Exp a) (Exp a)
  | Lam (Scope Exp a)
-- /show
  deriving (Eq,Show,Read,Functor,Foldable,Traversable)
main = putStrLn "It typechecks, and is even slightly interesting."

With Conor's convention, abstract and instantiate actually have useful types that prevent you from doing bad things. Renaming Bound to B and Free to F, then adapting the approach he takes in his excellent functional pearl with James McKinna "I am not a Number -- I am a Free Variable" to our running example to try to keep Conor's somewhat absurd sense of humor intact we get:

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Foldable
import Data.Traversable

newtype Scope f a = Scope (f a) deriving (Eq,Show,Read,Functor,Foldable,Traversable)
data Exp a 
  = F a
  | B !Int
  | App (Exp a) (Exp a)
  | Lam (Scope Exp a)
  deriving (Eq,Show,Read,Functor,Foldable,Traversable)
-- show
abstract :: Eq a => a -> Exp a -> Scope Exp a
abstract me expr = Scope (letmeB 0 expr) where
  letmeB this (F you) | you == me = B this
                      | otherwise = F you
  letmeB this (B that)            = B that
  letmeB this (App fun arg)       = letmeB this fun `App` letmeB this arg
  letmeB this (Lam (Scope body))  = Lam $ Scope $ letmeB (succ this) body

instantiate :: Exp a -> Scope Exp a -> Exp a
instantiate what (Scope body) = what'sB 0 body where
  what'sB this (B that) | this == that = what
                        | otherwise    = B that
  what'sB this (F you)                 = F you
  what'sB this (App fun arg)           = what'sB this fun `App` what'sB this arg
  what'sB this (Lam (Scope body))      = Lam $ Scope $ what'sB (succ this) body
-- /show
main = putStrLn "It typechecks, and is even slightly interesting."

We could even make a monad that does capture avoiding substitution for Exp, but it is an awkward one-off experience.

Pros:

1.) Scope, abstract, and instantiate made it harder to screw up walking under binders.

2.) Alpha equivalence is just (==) due to the power of De Bruijn indices.

3.) We can make a Monad for Exp that does capture avoiding substitution.

4.) We can use Traversable and Foldable to find free variables and close terms.

Using these we can define combinators like

import Data.Foldable as Foldable
import Data.Traversable
  
-- show
closed :: Traversable f => f a -> Maybe (f b)
closed = traverse (const Nothing)

isClosed :: Foldable f => f a -> Bool
isClosed = Foldable.all (const False)
-- /show
    
main = putStrLn "The types check out."

to check for closed terms, and we can use `Exp Void` directly as a closed term.

Cons:

1.) We succ a lot in letmeB and what'sB.

2.) Illegal terms such as Lam (Scope (B 2)) still exist.

3.) We have to define one-off versions of abstract and instantiate for each expression type we come up with. This means this is a design pattern, not a library.

4.) The Monad for Exp is similarly a one-off deal.

Bird and Paterson, Part 1

We can turn to Bird and Paterson's encoding of De Bruijn indices in terms of polymorphic recursion from the first half of De Bruijn notation as a nested datatype.

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Foldable
import Data.Traversable

-- show
data Exp a
  = Var a
  | App (Exp a) (Exp a)
  | Lam (Exp (Maybe a))
  deriving (Functor, Foldable, Traversable)
  
instance Monad Exp where
  return = Var
  Var a >>= f = f a
  App l r >>= f = App (l >>= f) (r >>= f)
  Lam xs >>= f = undefined -- go for it!
-- /show

main = print "Did you finish (>>=)?"

Pros:

1.) This eliminates the illegal terms from Con #2 above once and for all, and we can write a one-off Monad for it.

2.) We have the Maybe to mark our extra variables, so we can't forget to deal with our bound term at any point in our recursion.

Cons:

1.) We still succ a lot, except now we call succ Just.

2.) We still need to write one-off abstract and instantiate combinators.

3.) The Monad for Exp is a one-off deal.

4.) They lean somewhat unnecessarily on RankNTypes to manipulate these expressions in terms of their folds, when the Monad and other classes can all be defined within the confines of Haskell 98.

Attempt #1

We can modify Bird and Paterson's approach in a few ways, to steal some of the desirable properties from McBride's "I am not a Number; I'm a Free Variable" by noticing that Scope forms a Monad transformer!

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Control.Monad
import Control.Monad.Trans
import Data.Foldable
import Data.Functor
import Data.Traversable

-- show
newtype Scope f a = Scope { runScope :: f (Maybe a) }
  deriving (Functor, Foldable, Traversable)

instance Monad f => Monad (Scope f) where
  return = Scope . return . Just
  Scope m >>= f = undefined -- go for it!

instance MonadTrans Scope where
  lift = Scope . liftM Just
-- /show

main = print "Scope scopes out, but did you finish (>>=)?"

The astute observer will recognize that this form of Scope is just MaybeT from transformers.

We can finally even define abstract and instantiate from Conor's approach once and for all, completely independently of our expression type. Sadly, they find themselves now devoid of humor.

abstract :: (Functor f, Eq a) => a -> f a -> Scope f a
abstract x xs = Scope (fmap go xs) where
  go y = y <$ guard (x /= y)
  
instantiate :: Monad f => f a -> Scope f a -> f a
instantiate x (Scope xs) = xs >>= go where
  go Nothing = x
  go (Just y) = return y

This is starting to feel like a library, rather than design pattern.

Moreover, with that in hand, we can revisit the definition of Exp and define our Monad by borrowing from Scope's expression-type-agnostic definition.

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Control.Monad
import Control.Monad.Trans
import Data.Foldable
import Data.Functor
import Data.Traversable

newtype Scope f a = Scope { runScope :: f (Maybe a) }
  deriving (Functor, Foldable, Traversable)

instance Monad f => Monad (Scope f) where
  return = Scope . return . Just
  Scope m >>= f = Scope $ m >>= maybe (return Nothing) (runScope . f)
 
instance MonadTrans Scope where
  lift = Scope . liftM Just

abstract :: (Functor f, Eq a) => a -> f a -> Scope f a
abstract x xs = Scope (fmap go xs) where
  go y = y <$ guard (x /= y)
  
instantiate :: Monad f => f a -> Scope f a -> f a
instantiate x (Scope xs) = xs >>= go where
  go Nothing  = x
  go (Just y) = return y

-- show
data Exp a
  = Var a
  | App (Exp a) (Exp a)
  | Lam (Scope Exp a)
  deriving (Functor, Foldable, Traversable)
  
instance Monad Exp where
  return = Var
  Var a >>= f    = f a
  App l r >>= f  = App (l >>= f) (r >>= f)
  Lam body >>= f = Lam (body >>= lift . f)
-- /show

main = putStrLn "Now we're getting somewhere!"

Pros:

1.) We were able to factor out Scope, abstract and instantiate into code we can write once and package up.

2.) We've gained the benefits of McBride's conventions, as abstract and instantiate are really nice to reason about.

Cons:

1.) This is still going to succ just as much as our earlier De Bruijn solutions. Using lift to embed into Scope has to traverse all of the leaves to mangle them with a Just.

2.) Eq Ord, Show, and Read are now non-trivial to write due to polymorphic recursion.

Bird and Paterson, Part 2

Bird and Paterson did write the other half of their paper for a reason, though. In the other half they showed we can encode a generalized De Bruijn index notion using a different polymorphic recursion pattern.

data Exp a
  = Var a
  | App (Exp a) (Exp a)
  | Lam (Exp (Maybe (Exp a)))

The astute observer will now note that this is looking a bit less like MaybeT. So what happened?

Normally when you work with De Bruijn indices you succ only the variables down at the leaves.

What Bird and Paterson's generalized De Bruijn form allows you to do is Just née succ whole trees at a time!

If we adopt my MonadTrans variant of McBride's Scope as before, and generalize Bird and Paterson's approach, sticking to Haskell 98. We get very close to the final solution actually encoded in bound.

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Control.Monad
import Control.Monad.Trans
import Data.Foldable
import Data.Functor
import Data.Maybe
import Data.Traversable

-- show
newtype Scope f a = Scope { runScope :: f (Maybe (f a)) }
  deriving (Functor, Foldable, Traversable)
  
instance Monad f => Monad (Scope f) where
  return = Scope . return . Just . return
  Scope e >>= f = Scope $ e >>= \v -> case v of
    Nothing -> return Nothing
    Just ea -> ea >>= runScope . f
    
instance MonadTrans Scope where
  lift = Scope . return . Just
  
abstract :: (Monad f, Eq a) => a -> f a -> Scope f a
abstract a e = Scope (liftM k e) where
  k b = return b <$ guard (a /= b)

instantiate :: Monad f => f a -> Scope f a -> f a
instantiate k (Scope e) = e >>= fromMaybe k
-- /show
main = putStrLn "Almost there!"

The key difference realized by Bird and Paterson's design is that now, lifting an expression that does not have our bound term in it into our new Scope, no longer requires touching every leaf in the expression. In fact it is O(1)!

Scope is still a Monad transformer, but it is a new one!

This is a salvageable representation, but I want to take one more step before we turn it into a library, before we talk about Pros and Cons.

Bound: Generalized Generalized De Bruijn

Often we need to deal with simultaneous substitution of several variables. e.g. all of the variables bound by a pattern, all of the variables bound by a lambda, many variables bound by recursive let

De Bruijn even as generalized above still only lets me abstract or instantiate a single variable at a time.

What I want relates to EitherT as the above monad relates to MaybeT.

Finally, I'm going to alpha rename Either, because really when we do finally get to show you these syntax trees, they are going to be bad enough without meaningless Left and Right names cluttering up my tree. Let's use B for bound (pronounced zero) and F for free (pronounced succ) in homage to Conor's conventions above.

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Control.Monad
import Control.Monad.Trans
import Data.Foldable
import Data.Functor
import Data.Maybe
import Data.Traversable

-- show Var
data Var b a 
  = B b 
  | F a 
  deriving (Eq,Ord,Show,Read,Functor,Foldable,Traversable)

instance Monad (Var b) where
  return = F
  F a >>= f = f a
  B b >>= _ = B b
-- /show

-- show Scope
newtype Scope b f a = Scope { runScope :: f (Var b (f a)) }
  deriving (Functor,Foldable,Traversable)
  
instance Monad f => Monad (Scope b f) where
  return = Scope . return . F . return
  Scope e >>= f = Scope $ e >>= \v -> case v of
    B b  -> return (B b)
    F ea -> ea >>= runScope . f
    
instance MonadTrans (Scope b) where
  lift = Scope . return . F
-- /show

-- show Abstraction and Instantiation
abstract :: Monad f => (a -> Maybe b) -> f a -> Scope b f a
abstract f e = Scope (liftM k e) where
  k y = case f y of
    Just z  -> B z
    Nothing -> F (return y)

instantiate :: Monad f => (b -> f a) -> Scope b f a -> f a
instantiate k e = runScope e >>= \v -> case v of
  B b -> k b
  F a -> a
-- /show

main = putStrLn "It typechecks, therefore it must be correct!"

From here on out we can actually use the real bound library.

Pros:

1.) Because we can now succ/F a whole tree, we don't have to pay the usual De Bruijn performance tax. We get O(1) lifting, and when we instantiate we can skip past the whole F'd tree.

2.) We use Foldable and Traversable to extract information about free variables and do variable -> variable substitution.

3.) We can use the MonadTrans instance for Scope b to facilitate the construction of Exp. We do this so often that we turn the >>= lift . f idiom into a combinator (>>>=) that we'll talk about further when we talk about patterns.

This is a complete example syntax tree example that supports the use of `Foldable`/`Traversable`/`Monad`
to extract information about the free variables, and do capture-avoiding substitution!

-- /show
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Foldable
import Data.Traversable
import Bound

data Exp a
 = Var a
 | App (Exp a) (Exp a)
 | Lam (Scope () Exp a)
 | Let [Scope Int Exp a] (Scope Int Exp a)
 deriving (Functor,Foldable,Traversable)
  
instance Monad Exp where
  return = Var
  Var a >>= f = f a
  App l r >>= f = App (l >>= f) (r >>= f)
  Lam s >>= f = Lam (s >>>= f)
  Let xs b >>= f = Let (map (>>>= f) xs) (b >>>= f)
-- /show
main = putStrLn "That's it!"

4.) We can now deal with simultaneous substitution. To make things interesting in the example above we've extended the example with a simultaneous binding for all of the variables in a recursive let!

Cons:

1.) I still haven't shown you how to get Eq, Ord, Read, Show such that Eq and Ord respect alpha-equivalence.

Loose Ends

To work around the polymorphic recursion in Scope, we turn to my fairly boring and generically named prelude-extras package. This package provides higher-rank versions of typeclasses from the Prelude.

For example

class Eq1 t where
  (==#) :: Eq a => t a -> t a -> Bool
  (/=#) :: Eq a => t a -> t a -> Bool

along with Eq2, Ord1, Ord2, Show1, Show2, Read1 and Read2.

I told you it was boring.

There has also been some discussion on the libraries mailing list of randomly renaming these things and putting (a subset?) of these in transformers to permit Haskell 98 Show instances things like IdentityT and WriterT.

Now, what we can do is use the fact that these classes have default definitions and the instances for Scope are defined in terms of Eq1, Ord1, Show1, and Read1 for the base data type, to obtain the final solution!

-- show
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Bound
import Control.Applicative
import Control.Monad
import Data.Foldable
import Data.Traversable
import Prelude.Extras

data Exp a 
  = Var a 
  | App (Exp a) (Exp a) 
  | Lam (Scope () Exp a)
  deriving (Eq,Ord,Show,Read,Functor,Foldable,Traversable)

instance Eq1 Exp
instance Ord1 Exp
instance Show1 Exp
instance Read1 Exp
instance Applicative Exp where 
  pure = Var
  (<*>) = ap

instance Monad Exp where
  return = Var
  Var a   >>= f = f a
  App x y >>= f = App (x >>= f) (y >>= f)
  Lam e   >>= f = Lam (e >>>= f)
-- /show
whnf :: Exp a -> Exp a
whnf (App f a) = case whnf f of
  Lam b -> whnf (instantiate1 a b)
  f'    -> App f' a
whnf e = e

lam :: Eq a => a -> Exp a -> Exp a
lam v b = Lam (abstract1 v b)

nf :: Exp a -> Exp a
nf e@Var{}   = e
nf (Lam b) = Lam $ toScope $ nf $ fromScope b
nf (App f a) = case whnf f of
  Lam b -> nf (instantiate1 a b)
  f' -> nf f' `App` nf a

main = putStrLn "It compiles"

and we can finally show syntax tree terms again.

The instances of Eq and Ord for Scope in particular are careful to compare only up to alpha-equivalence by quotienting out the placement of any internal F levels in the tree as if they'd all been pushed out to the leaves.

We can use abstract1 and instantiate1, which are analogous to the abstract and instantiate we were using before we generalized our generalized De Bruijn index representation to define things like smart constructors

lam :: Eq a => a -> Exp a -> Exp a
lam v b = Lam (abstract1 v b)

and evaluation strategies such as computing weak head normal form:

whnf :: Exp a -> Exp a
whnf (App f a) = case whnf f of
  Lam b -> whnf (instantiate1 a b)
  f'    -> App f' a
whnf e = e

One thing the f (Maybe a) representation was good for was for walking under binders. bound offers the round trip through this representation as fromScope and toScope. Now, if we need to walk under a lambda, we can!

nf :: Exp a -> Exp a
nf e@V{}   = e
nf (Lam b) = Lam $ toScope $ nf $ fromScope b
nf (App f a) = case whnf f of
  Lam b -> nf (instantiate1 a b)
  f' -> nf f' `App` nf a

Bound.Class

Finally, it is worth commenting on the generality of (>>>=) as it points to the class for which this package is named.

In addition to Scope and Var, the bound package provides the Bound class:

class Bound t where
  (>>>=) :: Monad f => t f a -> (a -> f b) -> t f b
  default (>>>=) :: (MonadTrans t, Monad f) => t f a -> (a -> f b) -> t f b
  m >>>= f = m >>= lift . f

An instance of Bound is required to form a left module over all monads. That is to say it should satisfy the following two laws that were identified by Andrea Vezzosi:

1.) The first such law is analogous to the first monad law:

m >>>= return ≡ m

2.) The second such law is an associativity law:

m >>>= (λ x → k x >>= h) ≡ (m >>>= k) >>>= h

Trivially, any valid MonadTrans instance satisfies these laws, hence the default definition.

This extra flexibility is used in the bound examples/ folder to deal with complex pattern matching or other binding/telescoping structures.

What's Next?

You've just taken a crash course on name capture and been shown how bound deals with common isses and makes it easy to work with De Bruijn indices without ever thinking about De Bruijn indices.

Hopefully, you want to learn more. Perhaps after a slight break to let this settle. If so, I can offer you three further avenues for exploration off the top of my head.

First, I'd like to note that there are a few middlingly-complex examples in the bound examples/ folder to explore and use as a template. Notably:

1.) Simple.hs continues in the vein we've taken here but fleshes out recursive let and sticks to Haskell 98 religiously. 2.) Deriving.hs takes over from where we're leaving off with here and uses

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}

to reduce boilerplate, and importantly adds pattern matching. This showcases the need for the Bound class for dealing with patterns, case alternatives, and similar module structures.

3.) Overkill.hs offers a trip down the rabbit hole of type safety. It shows how strong the type guarantees can get, by using custom kinds to index into its patterns and improves the safety of pattern matching and let binding over and above the Deriving.hs approach, but at the expense of a great deal more code!

Second, it is also possible to derive a higher order version of bound that can deal with strongly typed EDSLs as well. I've included a worked example version of it here under a spoiler tag simply for completeness, including an example of what could be turned into a network serializable EDSL replete with local variable bindings and lambdas, but it isn't for the faint of heart. It is also probably buggy and is completely untested. If you fix any issues with it, please feel free to email me!


{-# LANGUAGE GADTs, Rank2Types, KindSignatures, ScopedTypeVariables, TypeOperators, DataKinds, PolyKinds, MultiParamTypeClasses, FlexibleInstances, TypeFamilies, DoRec, ExtendedDefaultRules #-}

import Control.Applicative
import Control.Category
import Control.Comonad
import Control.Monad.Fix
import Control.Monad (ap)
import Data.Functor.Identity
import Data.Typeable
import Data.Monoid
import Data.Unique
import System.IO.Unsafe
import Unsafe.Coerce
import Prelude hiding ((.),id)

infixl 1 >>>-, >>-

type Nat f g = forall x. f x -> g x

class HFunctor t where
  hmap :: Nat f g -> Nat (t f) (t g)

class HFunctor t => HTraversable t where
  htraverse :: Applicative m => (forall x. f x -> m (g x)) -> t f a -> m (t g a)

class HFunctor t => HMonad t where
  hreturn :: Nat f (t f)
  (>>-)   :: t f a -> Nat f (t g) -> t g a

infixr 1 -<<
(-<<) :: HMonad t => Nat f (t g) -> Nat (t f) (t g)
f -<< m = m >>- f

class HBound s where
  (>>>-) :: HMonad t => s t f a -> Nat f (t g) -> s t g a

class HMonadTrans s where
  hlift :: HMonad t => Nat (t f) (s t f)

data Var b f a where
  B :: b a -> Var b f a
  F :: f a -> Var b f a

instance HFunctor (Var b) where
  hmap _ (B b) = B b
  hmap f (F a) = F (f a)

instance HTraversable (Var b) where
  htraverse _ (B b) = pure (B b)
  htraverse f (F a) = F <$> f a

instance HMonad (Var b) where
  hreturn   = F
  B b >>- _ = B b
  F a >>- f = f a

newtype Scope b t f a = Scope { unscope :: t (Var b (t f)) a }

instance HFunctor t => HFunctor (Scope b t) where
  hmap f (Scope b) = Scope (hmap (hmap (hmap f)) b)

instance HTraversable t => HTraversable (Scope b t) where
  htraverse f (Scope b) = Scope <$> htraverse (htraverse (htraverse f)) b

instance HMonad t => HMonad (Scope b t) where
  hreturn = Scope . hreturn . F . hreturn
  Scope e >>- f  = Scope $ e >>- \v -> case v of
    B b -> hreturn (B b)
    F ea -> ea >>- unscope . f

instance HMonadTrans (Scope b) where
  hlift = Scope . hreturn . F

instance HBound (Scope b) where
  Scope m >>>- f = Scope (hmap (hmap (>>- f)) m)

data Equal a b where
   Refl :: Equal a a

instance Category Equal where
  id = Refl
  Refl . Refl = Refl

abstract :: HMonad t => (forall x. f x -> Maybe (b x)) -> Nat (t f) (Scope b t f)
abstract f = Scope . hmap (\y -> case f y of
  Just b -> B b
  Nothing -> F (hreturn y))

instantiate :: HMonad t => Nat b (t f) -> Nat (Scope b t f) (t f)
instantiate k (Scope e) = e >>- \v -> case v of
  B b -> k b
  F a -> a

data Ix :: [*] -> * -> * where
  Z :: Ix (a ': as) a
  S :: Ix as b -> Ix (a ': as) b

data Vec :: (* -> *) -> [*] -> * where
  HNil :: Vec f '[]
  (:::) :: f b -> Vec f bs -> Vec f (b ': bs)

infixr 5 :++, :::

type family (:++) (as :: [*]) (bs :: [*]) :: [*]
type instance '[] :++ bs = bs
type instance (a ': as) :++ bs = a ': (as :++ bs)

happend :: Vec f as -> Vec f bs -> Vec f (as :++ bs)
happend HNil bs = bs
happend (a ::: as) bs = a ::: happend as bs

hsingleton :: f a -> Vec f '[a]
hsingleton x = x ::: HNil

instance HFunctor Vec where
  hmap _ HNil = HNil
  hmap f (x ::: xs) = f x ::: hmap f xs

instance HTraversable Vec where
  htraverse _ HNil = pure HNil
  htraverse f (x ::: xs) = (:::) <$> f x <*> htraverse f xs

class EqF f where
  (==?) :: f a -> f b -> Maybe (Equal a b)

data Lit t where
  Integer :: Integer -> Lit Integer
  Double  :: Double -> Lit Double
  String :: String -> Lit String

instance EqF Lit where
  Integer a ==? Integer b = Just Refl
  ...

value :: Lit a -> a
value (Integer i) = i
value (Double d) = d
value (String s) = s

class Literal a where
  literal :: a -> Lit a

instance Literal Integer where
  literal = Integer

instance Literal String where
  literal = String

data Remote :: (* -> *) -> * -> * where
  Var :: f a -> Remote f a
  Lit :: Lit a -> Remote f a
  Lam :: Scope (Equal b) Remote f a -> Remote f (b -> a)
  Let :: Vec (Scope (Ix bs) Remote f) bs -> Scope (Ix bs) Remote f a -> Remote f a
  Ap  :: Remote f (a -> b) -> Remote f a -> Remote f b

lam_ :: EqF f => f a -> Remote f b -> Remote f (a -> b)
lam_ v f = Lam (abstract (v ==?) f)
 
lit :: Literal a => a -> Remote f a
lit = Lit . literal

instance HFunctor Remote where
  hmap f (Var a)    = Var (f a)
  hmap _ (Lit t)    = Lit t
  hmap f (Lam b)    = Lam (hmap f b)
  hmap f (Let bs b) = Let (hmap (hmap f) bs) (hmap f b)
  hmap f (Ap x y)   = Ap (hmap f x) (hmap f y)

instance HTraversable Remote where
  htraverse f (Var a)    = Var <$> f a
  htraverse _ (Lit t)    = pure $ Lit t
  htraverse f (Lam b)    = Lam <$> htraverse f b
  htraverse f (Let bs b) = Let <$> htraverse (htraverse f) bs <*> htraverse f b
  htraverse f (Ap x y)   = Ap <$> htraverse f x <*> htraverse f y

data MyF a where
  Mean :: MyF (Double -> Double)

instance HMonad Remote where
  hreturn        = Var
  Var a    >>- f = f a
  Lit t    >>- _ = Lit t
  Lam b    >>- f = Lam (b >>>- f)
  Let bs b >>- f = Let (hmap (>>>- f) bs) (b >>>- f)
  Ap x y   >>- f = Ap (x >>- f) (y >>- f)

(!) :: Vec f v -> Ix v a -> f a
(b ::: _)  ! Z   = b
(_ ::: bs) ! S n = bs ! n

eval :: Remote Identity a -> a
eval (Var w) = extract w
eval (Lit i) = value i
eval (Lam b) = \a -> eval (instantiate (\Refl -> Var (Identity a)) b)
eval (Let bs b) = eval (instantiate (vs !) b) where vs = hmap (instantiate (vs !)) bs
eval (Ap x y) = eval x (eval y)

closed :: HTraversable t => t f a -> Maybe (t g a)
closed = htraverse (const Nothing)

newtype V (a :: *) = V Unique

instance EqF V where
  V a ==? V b
    | a == b    = Just (unsafeCoerce Refl)
    | otherwise = Nothing

lam :: (Remote V a -> Remote V b) -> Remote V (a -> b)
lam f = unsafePerformIO $ do
  x <- fmap V newUnique
  return $ Lam $ abstract (x ==?) $ f $ Var x

data Binding a = V a := Remote V a

rhs :: Binding a -> Remote V a
rhs (_ := a) = a

data Bindings = forall bs. Bindings (Vec Binding bs)

elemIndex :: Vec Binding bs -> V a -> Maybe (Ix bs a)
elemIndex HNil              _ = Nothing
elemIndex ((x := r) ::: xs) y = case x ==? y of
  Just Refl -> Just Z
  Nothing   -> S <$> elemIndex xs y

instance Monoid Bindings where
  mempty = Bindings HNil
  Bindings xs `mappend` Bindings ys = Bindings (happend xs ys)

-- Allow the use of DoRec to define let statements

newtype Def a = Def { runDef :: IO (a, Bindings) }

instance Functor Def where
  fmap f (Def m) = Def $ do
    (a,w) <- m
    return (f a, w)

instance Applicative Def where
  pure = return
  (<*>) = ap

instance Monad Def where
  return a = Def $ return (a, mempty)
  Def m >>= f = Def $ do
    (a, xs) <- m
    (b, ys) <- runDef (f a)
    return (b, xs <> ys)

instance MonadFix Def where
  mfix m = Def $ mfix $ \ ~(a, _) -> runDef (m a)

def :: Remote V a -> Def (Remote V a)
def v@Var{} = Def $ return (v, mempty) -- this thing already has a name
def r = Def $ do
  v <- fmap V newUnique
  return (Var v, Bindings (hsingleton (v := r)))

let_ :: Def (Remote V a) -> Remote V a
let_ (Def m) = unsafePerformIO $ do
    (r, Bindings bs) <- m
    return $ Let (hmap (abstract (elemIndex bs) . rhs) bs)
                 (abstract (elemIndex bs) r)

Third, for a much larger "industrial scale" example you can explore our work-in-progress compiler for Ermine that uses bound (and lens) extensively for manipulating its Term language, Type system, Kind system and the Core language it spits out as a witness during type checking.

Happy Hacking!

-Edward Kmett

August 19 2013

P.S. Apparently the proper Dutch convention with names is to use Nicolaas Govert de Bruijn when writing out a full name, but to capitalize De Bruijn when using the surname in isolation.

Bound

Sections