Show me how you handle errors and I'll tell you what programmer you are. Error handling is fundamental to all programming. Language support for error handling varies from none whatsoever (C) to special language extensions (exceptions in C++, Java, etc.). Haskell is unique in its approach because it's expressive enough to let you build your own error handling frameworks. Haskell doesn't need built-in exception support: it implements it in libraries.
We've seen one way of dealing with errors: calling the error
function that terminates the program. This works fine for runtime assertions, which alert us to bugs in the program. But many "errors" are actually expected. We've seen one such example: Data.Map.lookup
fails when called with a key that's not present in the map. The possibility of failure is encoded in the Maybe
return type of lookup
. It's interesting to compare this with similar functions in other languages. In C++, for instance, std::map
defines multiple accessor functions varying only in their failure behavior:
at
throws an exceptionfind
returns an empty iteratoroperator[]
inserts a dummy value using a default constructor for it.
The last one is the most bizarre of the three. Since the array access operator must return a reference, even if the key is not found, it has to create a dummy value. The behavior of at
is potentially dangerous if the client forgets to catch the exception. Of the three, find
is the safest, since the return type suggests to the client iteration rather than straight dereference; and iteration normally starts with checking for termination.
In functional programming, failure is another way of saying that the computation is partial; that is, not defined for all values of arguments. In Haskell we always try to use total functions -- functions defined for all values of their arguments. If the domain of a computation is known at compile time, we can often define a restricted data type to be used for its arguments; for instance, an enumeration instead of an integer. This is not always possible or feasible, so the other option is to turn a partial function into a total function by changing its return type. C++ method find
does this trick by returning an iterator (it always returns an iterator for any value of its argument); Haskell lookup
does it by returning a Maybe
.
This trick of returning a different type in order to turn a non-functional computation into a pure function is used extensively in Haskell and finds its full expression in monads.
"Computation" or "notion of computation" often describes what we want to do in terms that may or may not have immediate pure function implementation: that is values in, values out. Transforming computations into functions is what functional programming is all about.
Either May Be Better than Maybe
The trick with Maybe
is a bit limited. All we know about the failure is that it occurred. In practice, we'd like to know more. So the next step in error handling is to use the Either
data structure defined in the Prelude:
data Either a b = Left a | Right b
Either
is parameterized by two types, not one. A value of the Either
type either contains a value of type a
or of type b
. We can discriminate between the two possibilities by pattern matching on either constructor. Either
is mostly used as a generalization of Maybe
in which Left
not only encodes failure but is accompanied by an error message. Right
encodes success and the accompanying value. So a
is often fixed to be a String
.
Here's how we can use Either
to encode the failure of a lookup without terminating the whole program:
lookUp :: String -> SymTab -> Either String (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> Right (v, symTab)
Nothing -> Left ("Undefined variable " ++ str)
Now the burden is on the caller to pattern match the result of any such call and either continue with the successful result, or handle the failure. More often than not, the error is just passed unchanged to the caller, and so on; and somewhere at the top of the call tree, displayed to the user (remember, the very top is always an IO
function).
It's a know phenomenon that error propagation spreads like a disease throughout the code. You could have seen that in C and sometimes even in C++, when exceptions are not an option. And indeed, you can create the same mess in Haskell. Here's a naive implementation of the evaluator of SumNode
, which checks for errors that can happen in the evaluation of its children, and propagates them if necessary:
evaluate :: Tree -> SymTab -> Either String (Double, SymTab)
evaluate (SumNode op left right) symTab =
case evaluate left symTab of
Left msg -> Left msg
Right (lft, symTab') ->
case evaluate right symTab' of
Left msg -> Left msg
Right (rgt, symTab'') ->
case op of
Plus -> Right (lft + rgt, symTab'')
Minus -> Right (lft - rgt, symTab'')
Notice the creeping indentation. I'm showing you this code so that you know what awaits you if you refuse to learn about monads. However ugly this code is, it works, so the principle is right. We just need some monadic sugar to make it palatable.
Errors are propagated up, until they are finally dealt with in the main IO loop:
loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
in
case evaluate tree symTab of
Left msg -> do
putStrLn $ "Error: " ++ msg
loop symTab -- use old symTab
Right (v, symTab') -> do
print v
loop symTab'
The nice thing is that, since we keep around the old copy of the symbol table, we can start the next iteration after a failure as if nothing happened. It's as if one transaction had been aborted and another started from the the same state.
import Data.Char
import qualified Data.Map as M
data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)
data Token = TokOp Operator
| TokAssign
| TokLParen
| TokRParen
| TokIdent String
| TokNum Double
| TokEnd
deriving (Show, Eq)
operator :: Char -> Operator
operator c | c == '+' = Plus
| c == '-' = Minus
| c == '*' = Times
| c == '/' = Div
tokenize :: String -> [Token]
tokenize [] = []
tokenize (c : cs)
| elem c "+-*/" = TokOp (operator c) : tokenize cs
| c == '=' = TokAssign : tokenize cs
| c == '(' = TokLParen : tokenize cs
| c == ')' = TokRParen : tokenize cs
| isDigit c = number c cs
| isAlpha c = identifier c cs
| isSpace c = tokenize cs
| otherwise = error $ "Cannot tokenize " ++ [c]
identifier :: Char -> String -> [Token]
identifier c cs = let (name, cs') = span isAlphaNum cs in
TokIdent (c:name) : tokenize cs'
number :: Char -> String -> [Token]
number c cs =
let (digs, cs') = span isDigit cs in
TokNum (read (c : digs)) : tokenize cs'
---- parser ----
data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
deriving Show
lookAhead :: [Token] -> Token
lookAhead [] = TokEnd
lookAhead (t:ts) = t
accept :: [Token] -> [Token]
accept [] = error "Nothing to accept"
accept (t:ts) = ts
expression :: [Token] -> (Tree, [Token])
expression toks =
let (termTree, toks') = term toks
in
case lookAhead toks' of
(TokOp op) | elem op [Plus, Minus] ->
let (exTree, toks'') = expression (accept toks')
in (SumNode op termTree exTree, toks'')
TokAssign ->
case termTree of
VarNode str ->
let (exTree, toks'') = expression (accept toks')
in (AssignNode str exTree, toks'')
_ -> error "Only variables can be assigned to"
_ -> (termTree, toks')
term :: [Token] -> (Tree, [Token])
term toks =
let (facTree, toks') = factor toks
in
case lookAhead toks' of
(TokOp op) | elem op [Times, Div] ->
let (termTree, toks'') = term (accept toks')
in (ProdNode op facTree termTree, toks'')
_ -> (facTree, toks')
factor :: [Token] -> (Tree, [Token])
factor toks =
case lookAhead toks of
(TokNum x) -> (NumNode x, accept toks)
(TokIdent str) -> (VarNode str, accept toks)
(TokOp op) | elem op [Plus, Minus] ->
let (facTree, toks') = factor (accept toks)
in (UnaryNode op facTree, toks')
TokLParen ->
let (expTree, toks') = expression (accept toks)
in
if lookAhead toks' /= TokRParen
then error "Missing right parenthesis"
else (expTree, accept toks')
_ -> error $ "Parse error on token: " ++ show toks
parse :: [Token] -> Tree
parse toks = let (tree, toks') = expression toks
in
if null toks'
then tree
else error $ "Leftover tokens: " ++ show toks'
---- evaluator ----
-- show
type SymTab = M.Map String Double
lookUp :: String -> SymTab -> Either String (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> Right (v, symTab)
Nothing -> Left ("Undefined variable " ++ str)
addSymbol :: String -> Double -> SymTab -> Either String ((), SymTab)
addSymbol str val symTab =
let symTab' = M.insert str val symTab
in Right ((), symTab')
evaluate :: Tree -> SymTab -> Either String (Double, SymTab)
evaluate (SumNode op left right) symTab =
case evaluate left symTab of
Left msg -> Left msg
Right (lft, symTab') ->
case evaluate right symTab' of
Left msg -> Left msg
Right (rgt, symTab'') ->
case op of
Plus -> Right (lft + rgt, symTab'')
Minus -> Right (lft - rgt, symTab'')
evaluate (ProdNode op left right) symTab =
case evaluate left symTab of
Left msg -> Left msg
Right (lft, symTab') ->
case evaluate right symTab' of
Left msg -> Left msg
Right (rgt, symTab'') ->
case op of
Times -> Right (lft * rgt, symTab)
Div -> Right (lft / rgt, symTab)
evaluate (UnaryNode op tree) symTab =
case evaluate tree symTab of
Left msg -> Left msg
Right (x, symTab') ->
case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab')
evaluate (NumNode x) symTab = Right (x, symTab)
evaluate (VarNode str) symTab = lookUp str symTab
evaluate (AssignNode str tree) symTab =
case evaluate tree symTab of
Left msg -> Left msg
Right (v, symTab') ->
case addSymbol str v symTab' of
Left msg -> Left msg
Right (_, symTab'') -> Right (v, symTab'')
main = do
loop (M.fromList [("pi", pi)])
loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
in
case evaluate tree symTab of
Left msg -> do
putStrLn $ "Error: " ++ msg
loop symTab -- use old symTab
Right (v, symTab') -> do
print v
loop symTab'
Abstracting the Either Pattern
Looking at the code above, you can see a pattern arising. First of all, we have all these functions that return their results wrapped in the Either
type. Every time we get an Either
value, we pattern match it and fork the computation: When the result is Right
we make the value in it available to the rest of the computation. If the result is Left
, we skip the rest of the computation and propagate the error. We would like to capture this pattern. We'd like to isolate the boilerplate code, leaving "holes" for the client-provided variables. One such hole is to be filled by the initial Either
value. The tricky part is the "skip the rest of the computation" part of the pattern. This can be done if the pattern has another hole for "the rest of the computation" so that it can either execute it or not.
Let's first identify this pattern in the evaluation of a unary node:
The first pattern filler is the value returned by evaluate tree symTab
. The "rest of the calculation" filler is the code:
case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab')
We can abstract this code into a lambda function. Notice that this code uses x
and symTab'
, which were extracted from the first filler. So when we abstract it, the resulting lambda function should take this tuple as an argument:
\(x, symTab') -> case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab')
The parameter op
is captured by the lambda from the outer environment -- remember, lambdas are closures: they can capture values from the environment in which they are defined.
Let's call this new pattern bindE
and implement it as a function of two arguments. Here's how we would call it from the evaluator:
evaluate (UnaryNode op tree) symTab =
bindE (evaluate tree symTab)
(\(x, symTab') ->
case op of
Plus -> Right ( x, symTab')
Minus -> Right (-x, symTab'))
This is just a straight call to bindE
with two arguments -- the second one being a multi-line lambda function.
The implementation of bindE
is pretty straightforward. It just picks the common parts of the pattern and combines them in one function. It deals with the tedium of pattern matching the first argument and propagating the error. In case of success, it calls the continuation with the right arguments:
bindE :: Either String (Double, SymTab)
-> ((Double, SymTab) -> Either String (Double, SymTab))
-> Either String (Double, SymTab)
bindE ev k =
case ev of
Left msg -> Left msg
Right (x, symTab') -> k (x, symTab')
Have a good look at the signature of bindE
. Again, the second argument to bindE
is the function that encapsulates the rest of the computation. It takes a pair (Double, SymTab)
and returns another pair encapsulated in Either
. This function argument is often called a continuation, because it continues the computation. The corresponding argument is often named k
(think kontinuation; c
would be too generic a name).
This pattern can be further abstracted by parameterizing it on the type of the contents of Right
. In fact we need this if we want to include addSymbol
in our scheme (see the unit in the return type):
addSymbol :: String -> Double -> SymTab -> Either String ((), SymTab)
If we want to bind the result of, say, UnaryNode
evaluator to addSymbol
, the input will be of type Either String (Double, SymTab)
and the result of type Either String ((), SymTab)
, so bindE
really needs two type parameters, a
and b
:
bindE :: Either String a
-> (a -> Either String b)
-> Either String b
bindE ev k =
case ev of
Left msg -> Left msg
Right v -> k v
Notice that as long as the client only uses bindE
to deal with Either
values, they don't have to deal with them directly (e.g., pattern match), except when they have to create them; and in the final unpacking, when they want to display the error message. The creation of Either
values can also be abstracted by providing two more functions, return
and fail
:
return :: a -> Either String a
return x = Right x
fail :: String -> Either a
fail msg = Left msg
We can use these functions in the implementation of lookUp
lookUp :: String -> SymTab -> Either String (Double, SymTab)
lookUpb str symTab =
case M.lookup str symTab of
Just v -> return (v, symTab)
Nothing -> fail ("Undefined variable " ++ str)
We can also replace all Right
constructors that are used to return values by calls to return
. This way the client code can be written without any knowledge of the fact that values are encapsulated in the Either
type.
Here's, for instance, an improved version of the UnaryNode
evaluator, this time with no mention of Either
or any of its constructors:
evaluate (UnaryNode op tree) symTab =
bindE (evaluate tree symTab)
(\(x, symTab') ->
case op of
Plus -> return ( x, symTab')
Minus -> return (-x, symTab'))
Maybe it doesn't seem like these transformations buy us much in code size or readability, but they are a step in the direction of hiding the details of error handling.
But we can do even better, once we realize that we have just defined a monad.
The Either Monad
A monad in Haskell is defined by a type constructor (a type parameterized by another type) and two functions, bind and return (optionally, fail). In our case, the type constructor is based on the Either a b
type, with the first type variable fixed to String
(yes, it's exactly like currying a type function). Let's formalize it by defining a new (paremeterized) type:
newtype Evaluator a = Ev (Either String a)
The newtype
declaration is a compromise between type
alias and a full-blown data
declaration. It can be used to define data types that have only one data constructor that takes only one argument -- Ev
in this case. (newtype
is preferred over similar data
declaration because of slightly better performance characteristics.)
The first change to our code will be to replace the type signature of evaluate
from:
evaluate :: Tree -> SymTab -> Either String (Double, SymTab)
to:
evaluate :: Tree -> SymTab -> Evaluator (Double, SymTab)
We can now redefine bindE
, return
, and fail
in terms of Evaluator
. Then we have to tell Haskell that we are defining a monad. Why does Haskell need to know about our monad? Because with this knowledge it will allow us to use the do
notation -- that's the sugar we've been craving.
Monad
is a typeclass -- I'll talk more about typeclasses soon. For now, it's enough to know that, in order to tell Haskell that we are defining a monad, we need to instantiate Monad
with our Evaluator
. When instantiating a Monad
we have to provide the appropriate definitions of bind, return and, optionally, fail. The only tricky one is bind, since Monad
defines it as an infix operator, >>=
(pronounced bind). Operators are just functions with funny names composed of special characters. An infix operator is a function that takes two arguments, one on the left and one on the right. Without further ado, here's the instance declaration for our first monad:
instance Monad Evaluator where
(Ev ev) >>= k =
case ev of
Left msg -> Ev (Left msg)
Right v -> k v
return v = Ev (Right v)
fail msg = Ev (Left msg)
These are exactly the definitions we've seen before, except for the additional layer of the Ev
constructor. Notice that I used infix notation in defining operator >>=
. Its left argument is the pattern (Ev ev)
, and its right argument is the continuation k
.
Let's see how this new monad works in the evaluator of SumNode
-- first without the do
notation:
evaluate (SumNode op left right) symTab =
evaluate left symTab >>= \(lft, symTab') ->
evaluate right symTab' >>= \(rgt, symTab'') ->
case op of
Plus -> return (lft + rgt, symTab'')
Minus -> return (lft - rgt, symTab'')
Notice that the second argument to the first >>=
is a lambda that continues up to the end of the function. Inside its body there is another >>=
with its own lambda. Notice also that the innermost lambda has access not only to rgt
-- which is its argument -- but also to the external lft
and op
, which it captures from its environment.
Here's the same code in do
notation:
evaluate (SumNode op left right) symTab = do
(lft, symTab') <- evaluate left symTab
(rgt, symTab'') <- evaluate right symTab'
case op of
Plus -> return (lft + rgt, symTab'')
Minus -> return (lft - rgt, symTab'')
As you can see, the do
block hides the binding >>=
between the lines, it automatically converts "the rest of the code" to continuations, and it lets you treat the arguments to lambdas as if they were local variables to be assigned to. It's this syntactic sugar that makes do
blocks look so convincingly imperative, even though they are purely functional in nature.
Compare this with our starting point:
evaluate (SumNode op left right) symTab =
case evaluate left symTab of
Left msg -> Left msg
Right (lft, symTab') ->
case evaluate right symTab' of
Left msg -> Left msg
Right (rgt, symTab'') ->
case op of
Plus -> Right (lft + rgt, symTab'')
Minus -> Right (lft - rgt, symTab'')
That's definitely progress. If we could only get rid of this noise of threading symTab
all over the place. Oh, wait, that's the next tutorial and another monad.
Let's bask in the monadic sunshine some more. Did you notice that we have essentially implemented exceptions? fail
behaves very much like throw
. It shortcuts the execution of the current function, propagates the error to its caller, who will in turn shortcut its execution and so on, until the "exception is caught." Catching the exception means unpacking the Evaluator
returned by a function, and either retrieving the result or doing something with the error.
Can you forget to "catch" the exception? Not really -- it's part of the return type. You can't even access the value of your computation without unpacking it. And if your unpacking matches all possible Either
patterns, as it should, you'll be forced to deal with the error case anyway.
Do you remember exception specifications in Java or C++ (currently obsoleted)? In Haskell there's no need for this ad hoc feature because "exception specification" is encoded in the return type of a function. You can't ignore types in Haskell, so you essentially leave it to the compiler to enforce exception safety.
There is a full-blown exception library Control.Exception
in Haskell, complete with throw
, catch
, and a long list of predefined exception types. The Prelude also defines the Monad
instance for Either
, so we could have used it directly. But I thought "inventing" the monad from scratch would give you a better learning experience.
Type Classes
I said that Monad
is a typeclass, but I haven't explained what a typeclass is. It's not exactly like a class in OO languages, but there are some similarities.
If you're familiar with Java or C++, you know that a class may define a set of virtual functions. When you have a polymorphic reference to one of the descendants of such a class, and call one of these virtual functions on it, the function that is called depends on the actual type of the object, which is determined at runtime. This kind of late binding is possible because a polymorphic object carries with it a vtable: an array of function pointers.
Similarly, in Haskell, a type class let's you define the signatures of a set of functions. You may think of a typeclass as an interface. (Although it is possible for a typeclass to provide default implementations for some of the functions -- the monadic fail
, for instance, has such implementation: it calls error
by default.) There's even an analog of a vtable that is (invisibly) passed to polymorphic functions.
The actual implementation of typeclass functions is provided by the client every time they declare some type to be an instance
of a typeclass. In Java or C++ this would happen when the client defines concrete classes that implement an interface. In Haskell you define instances instead. The interesting thing is that, in Haskell, the client is able to connect any typeclass with any type (including built-in types) "after the fact." (In most OO languages you can't make an int
a subclass of IFoo
-- in Haskell you can.)
Let's work through an example:
class Valuable a where
evaluate :: a -> Double
data Expr = Const Double | Add Expr Expr
instance Valuable Expr where
evaluate (Const x) = x
evaluate (Add lft rgt) = evaluate lft + evaluate rgt
instance Valuable Bool where
evaluate True = 1
evaluate False = 0
test :: Valuable a => a -> IO ()
test v = print $ evaluate v
expr :: Expr
expr = Add (Const 2) (Add (Const 1.5) (Const 2.5))
main = do
test expr
test True
I have defined a typeclass Valuable
with one function evaluate
, which takes an instance of Valuable
, a
, and returns a Double
. Separately I have defined a data type Expr
, which has no knowledge of the class Valuable
. Then I realized that Expr
is Valuable
, so I wrote an instance
declaration that makes this connection and provides the "witness" -- the actual implementation of the function evaluate
.
To make things even more exciting, I decided that the built-in type Bool
is also Valuable
and provided and instance
declaration to prove it. Now I was able to write a polymorphic function test
that calls evaluate
on its argument. But unlike in previous examples of polymorphism, the generic argument to test
is constrained: it has to be an instance of the class Valuable
. The type expression before the double arrow =>
defines class constraints for what follows. In this case we have one constraint, Valuable a
, meaning a
must be an instance of Valuable
. In general there may be several such constraints listed between a set of parentheses and separated by commas.
Hadn't I provided the type signature for test
, the compiler would have figured it out by analyzing the body of test
and seeing that I called evaluate
on its argument.
This is an example of a more general mechanism: If you define a function that adds its arguments, the compiler will deduce the Num
constraint for them. If you compare arguments for (in-) equality, it will deduce Eq
constraints. If you print
them, it will deduce Show
, etc.
The C++ Standards Committee almost voted in the concepts proposal in 2011, which would have added constrained polymorphism to C++ templates. But then they gave up because of the tremendous complexity imposed by the requirement of backwards compatibility.
Solution to the Expression Problem
In the previous tutorial I described the expression problem: How can you create a library that would be open to adding new data and new functions. We've seen that, in Haskell, extending a library by adding new functions was easy, but the addition of new varieties of data required modifications to the library. Clever use of typeclasses will let us have the cake and eat it too.
Let's work the magic on the previous example. First, instead of having one type Expr
with many constructors, let's replace it with one typeclass Expr
and many individual data types (by many I mean two in this case). We then glue these data types together by making them instances of Expr
. Expr
itself is empty -- its only purpose is to unify all expression types.
class Expr a
data Const = Const Double
data Add a b = Add a b
instance Expr Const
instance (Expr a, Expr b) => Expr (Add a b)
Notice that I used the names Const
and Add
both to name data types and their constructors. These two namespaces are separate, so if you're running out of names, it's a common practice to reuse them this way. Also, notice the use of two class constraints in the instance definition for Add
. Both a
and b
must be Expr
for Add a b
to be Expr
.
Now we would like to define the function evaluate
for both Const
and Add
nodes. But these are no longer just two constructors -- they are two different data types. The only way to define evaluate
for both is to overload it. This is possible, but only if evaluate
is part of a typeclass. The name Valuable
for such a class seems natural. Let's make it work for Const
and Add
:
class (Expr e) => Valuable e where
evaluate :: e -> Double
instance Valuable Const where
evaluate (Const x) = x
instance (Valuable a, Valuable b) => Valuable (Add a b) where
evaluate (Add lft rgt) = evaluate lft + evaluate rgt
This time we made sure that only expressions can be evaluated: That's the constraint Expr e
in the definition of Valuable
.
So that's our library. Let's now see how a client may extend it by, for instance, adding a new expression type: Mul
.
data Mul a b = Mul a b
instance (Expr a, Expr b) => Expr (Mul a b)
instance (Valuable a, Valuable b) => Valuable (Mul a b) where
evaluate (Mul lft rgt) = evaluate lft * evaluate rgt
Plugging Mul
into the library required only making it the instance of Expr
and Valuable
. Perfect! Our library is now open to adding new data.
Let's check if it's still open to new functions. Let's add a pretty printing function to expressions, including the newly defined Mul
. The trick is to make pretty
a member of a new class Pretty
and then make all the expression types its instances:
class (Expr e) => Pretty e where
pretty :: e -> String
instance Pretty Const where
pretty (Const x) = show x
instance (Pretty a, Pretty b) => Pretty (Add a b) where
pretty (Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
instance (Pretty a, Pretty b) => Pretty (Mul a b) where
pretty (Mul x y) = pretty x ++ " * " ++ pretty y
Granted, there is a bit of syntactic noise here, but we have accomplished quite a feat: We have overcome the expression problem!
class Expr a
data Const = Const Double
data Add a b = Add a b
instance Expr Const
instance (Expr a, Expr b) => Expr (Add a b)
class (Expr e) => Valuable e where
evaluate :: e -> Double
instance Valuable Const where
evaluate (Const x) = x
instance (Valuable a, Valuable b) => Valuable (Add a b) where
evaluate (Add lft rgt) = evaluate lft + evaluate rgt
-- client code
data Mul a b = Mul a b
instance (Expr a, Expr b) => Expr (Mul a b)
instance (Valuable a, Valuable b) => Valuable (Mul a b) where
evaluate (Mul lft rgt) = evaluate lft * evaluate rgt
class (Expr e) => Pretty e where
pretty :: e -> String
instance Pretty Const where
pretty (Const x) = show x
instance (Pretty a, Pretty b) => Pretty (Add a b) where
pretty (Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
instance (Pretty a, Pretty b) => Pretty (Mul a b) where
pretty (Mul x y) = pretty x ++ " * " ++ pretty y
expr = Mul (Const 2) (Add (Const 1.5) (Const 2.5))
main = do
putStrLn $ pretty expr
print $ evaluate expr
The Monad Typeclass
Let's go back to the Monad
typeclass. This is how it's defined in the Prelude:
class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b
return :: a -> m a
fail :: String -> m a
mv >> k = mv >>= \_ -> k
fail s = error s
There's something different about this typeclass: it's not a typeclass that unifies types; rather it unifies type constructors. The parameter m
is a type constructor. It requires another type parameter to become a type. See how m
always acts on either a
or b
, which are type variables. Going back to the Evaluator
in our calculator, it was a type constructor too, with the type parameter a
:
newtype Evaluator a = Ev (Either String a)
That's why we were able to make Evaluator
an instance of Monad
:
instance Monad Evaluator where
Specifically, notice that we used Evaluator
, not Evaluator a
in this definition. You always supply a type constructor to the instance definition of Monad
.
Monad
also defines operator >>
, which ignores the value from its first argument (see its default implementation). You can bind two monadic functions using >>
if you're only calling the first one for its "side effects." This will make more sense in the next tutorial, when we talk about the state monad. Operator >>
is often used with the IO
monad to sequence output functions, as in:
main :: IO ()
main = putStrLn "Hello " >> putStrLn "World"
which is the same as:
main :: IO ()
main = do
putStrLn "Hello "
putStrLn "World"
Exercises
Ex 1. Define the WhyNot
monad:
data WhyNot a = Nah | Sure a
deriving Show
instance Monad WhyNot where
...
safeRoot :: Double -> WhyNot Double
safeRoot x =
if x >= 0 then
return (sqrt x)
else
fail "Boo!"
test :: Double -> WhyNot Double
test x = do
y <- safeRoot x
z <- safeRoot (y - 4)
w <- safeRoot z
return w
main = do
print $ test 9
print $ test 400
data WhyNot a = Nah | Sure a
deriving Show
instance Monad WhyNot where
Sure x >>= k = k x
Nah >>= _ = Nah
return x = Sure x
fail _ = Nah
safeRoot :: Double -> WhyNot Double
safeRoot x =
if x >= 0 then
return (sqrt x)
else
fail "Boo!"
test :: Double -> WhyNot Double
test x = do
y <- safeRoot x
z <- safeRoot (y - 4)
w <- safeRoot z
return w
main = do
print $ test 9
print $ test 400
Ex 2. Define a monad instance for Trace
(no need to override fail
). The idea is to create a trace of execution by sprinkling you code with calls to put
. The result of executing this code should look something like this:
["fact 3","fact 2","fact 1","fact 0"]
6
Hint: List concatenation is done using ++
(we've seen it used for string concatenation, because String
is just a list of Char
).
newtype Trace a = Trace ([String], a)
instance Monad Trace where
...
put :: Show a => String -> a -> Trace ()
put msg v = Trace ([msg ++ " " ++ show v], ())
fact :: Integer -> Trace Integer
fact n = do
put "fact" n
if n == 0
then return 1
else do
m <- fact (n - 1)
return (n * m)
main = let Trace (lst, m) = fact 3
in do
print lst
print m
newtype Trace a = Trace ([String], a)
instance Monad Trace where
return x = Trace ([], x)
(Trace (lst, x)) >>= k =
let Trace (lst', y) = k x
in Trace (lst ++ lst', y)
put :: Show a => String -> a -> Trace ()
put msg v = Trace ([msg ++ " " ++ show v], ())
fact :: Integer -> Trace Integer
fact n = do
put "fact" n
if n == 0
then return 1
else do
m <- fact (n - 1)
return (n * m)
main = let Trace (lst, m) = fact 3
in do
print lst
print m
Ex 3. Instead of deriving Show
, define explicit instances of the Show
typeclass for Operator
and Tree
such that expr
is displayed as:
x = (13.0 - 1.0) / y
It's enough that you provide the implementation of the show
function in the instance declaration. This function should take an Operator
(or a Tree
) and return a string.
data Operator = Plus | Minus | Times | Div
data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
instance Show Operator where
show Plus = " + "
...
instance Show Tree where
show = undefined
expr = AssignNode "x" (ProdNode Div (SumNode Minus (NumNode 13) (NumNode 1)) (VarNode "y"))
main = print expr
data Operator = Plus | Minus | Times | Div
data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
instance Show Operator where
show Plus = " + "
show Minus = " - "
show Times = " * "
show Div = " / "
instance Show Tree where
show (SumNode op lft rgt) = "(" ++ show lft ++ show op ++ show rgt ++ ")"
show (ProdNode op lft rgt) = show lft ++ show op ++ show rgt
show (AssignNode str tree) = str ++ " = " ++ show tree
show (UnaryNode op tree) = show op ++ show tree
show (NumNode x) = show x
show (VarNode str) = str
expr = AssignNode "x" (ProdNode Div (SumNode Minus (NumNode 13) (NumNode 1)) (VarNode "y"))
main = print expr
Ex 4 This is an example that mimics elements of OO programming. Chess pieces are implemented as separate data types: here, for simplicity, just one, Pawn
. The constructor of Pawn
takes the Color
of the piece and its position on the board (0-7 in both dimensions). Pieces are instances of the class Piece
, which declares the following functions: color
, pos
, and moves
. The moves
function takes a piece and returns a list of possible future positions after one move (without regard to other pieces, but respecting the boundaries of the board). Define both the typeclass and the instance, so that the following program works:
data Color = White | Black
deriving (Show, Eq)
data Pawn = Pawn Color (Int, Int)
class Piece a where
...
instance Piece Pawn where
...
pieces = [Pawn White (3, 1), Pawn Black (4, 1), Pawn White (0, 7), Pawn Black (5, 0)]
main = print $ map moves pieces
data Color = White | Black
deriving (Show, Eq)
data Pawn = Pawn Color (Int, Int)
class Piece a where
color :: a -> Color
pos :: a -> (Int, Int)
moves :: a -> [(Int, Int)]
instance Piece Pawn where
color (Pawn c _) = c
pos (Pawn _ pos) = pos
moves pwn = if color pwn == White
then mvs (pos pwn)
else map refl (mvs $ refl (pos pwn))
where
refl (x, y) = (x, 7 - y)
mvs (x, y) = if y == 1
then [(x, y + 1), (x, y + 2)]
else if y == 7
then []
else [(x, y + 1)]
pieces = [Pawn White (3, 1), Pawn Black (4, 1), Pawn White (0, 7), Pawn Black (5, 0)]
main = do
print $ map moves pieces
The Symbolic Calculator So Far
Below is the source code for the current state of the project. Notice how lookUp
and addSymbol
nicely fit into the monadic scheme. I haven't made any changes to the lexer and parser files.
{-# START_FILE Main.hs #-}
module Main where
import qualified Data.Map as M
import Lexer (tokenize)
import Parser (parse)
import Evaluator
main = do
loop (M.fromList [("pi", pi), ("e", exp 1.0)])
loop symTab = do
str <- getLine
if null str
then
return ()
else
let toks = tokenize str
tree = parse toks
Ev ev = evaluate tree symTab
in
case ev of
Left msg -> do
putStrLn $ "Error: " ++ msg
loop symTab -- use old symTab
Right (v, symTab') -> do
print v
loop symTab'
{-# START_FILE Lexer.hs #-}
-- show
module Lexer (Operator(..), Token(..), tokenize, lookAhead, accept) where
-- /show
import Data.Char
data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)
data Token = TokOp Operator
| TokAssign
| TokLParen
| TokRParen
| TokIdent String
| TokNum Double
| TokEnd
deriving (Show, Eq)
lookAhead :: [Token] -> Token
lookAhead [] = TokEnd
lookAhead (t:ts) = t
accept :: [Token] -> [Token]
accept [] = error "Nothing to accept"
accept (t:ts) = ts
tokenize :: String -> [Token]
tokenize [] = []
tokenize (c : cs)
| elem c "+-*/" = TokOp (operator c) : tokenize cs
| c == '=' = TokAssign : tokenize cs
| c == '(' = TokLParen : tokenize cs
| c == ')' = TokRParen : tokenize cs
| isDigit c = number c cs
| isAlpha c = identifier c cs
| isSpace c = tokenize cs
| otherwise = error $ "Cannot tokenize " ++ [c]
identifier :: Char -> String -> [Token]
identifier c cs = let (name, cs') = span isAlphaNum cs in
TokIdent (c:name) : tokenize cs'
number :: Char -> String -> [Token]
number c cs =
let (digs, cs') = span isDigit cs in
TokNum (read (c : digs)) : tokenize cs'
operator :: Char -> Operator
operator c | c == '+' = Plus
| c == '-' = Minus
| c == '*' = Times
| c == '/' = Div
{-# START_FILE Parser.hs #-}
-- show
module Parser (Tree(..), parse) where
-- /show
import Lexer
data Tree = SumNode Operator Tree Tree
| ProdNode Operator Tree Tree
| AssignNode String Tree
| UnaryNode Operator Tree
| NumNode Double
| VarNode String
deriving Show
parse :: [Token] -> Tree
parse toks = let (tree, toks') = expression toks
in
if null toks'
then tree
else error $ "Leftover tokens: " ++ show toks'
expression :: [Token] -> (Tree, [Token])
expression toks =
let (termTree, toks') = term toks
in
case lookAhead toks' of
(TokOp op) | elem op [Plus, Minus] ->
let (exTree, toks'') = expression (accept toks')
in (SumNode op termTree exTree, toks'')
TokAssign ->
case termTree of
VarNode str ->
let (exTree, toks'') = expression (accept toks')
in (AssignNode str exTree, toks'')
_ -> error "Only variables can be assigned to"
_ -> (termTree, toks')
term :: [Token] -> (Tree, [Token])
term toks =
let (facTree, toks') = factor toks
in
case lookAhead toks' of
(TokOp op) | elem op [Times, Div] ->
let (termTree, toks'') = term (accept toks')
in (ProdNode op facTree termTree, toks'')
_ -> (facTree, toks')
factor :: [Token] -> (Tree, [Token])
factor toks =
case lookAhead toks of
(TokNum x) -> (NumNode x, accept toks)
(TokIdent str) -> (VarNode str, accept toks)
(TokOp op) | elem op [Plus, Minus] ->
let (facTree, toks') = factor (accept toks)
in (UnaryNode op facTree, toks')
TokLParen ->
let (expTree, toks') = expression (accept toks)
in
if lookAhead toks' /= TokRParen
then error "Missing right parenthesis"
else (expTree, accept toks')
_ -> error $ "Parse error on token: " ++ show toks
{-# START_FILE Evaluator.hs #-}
module Evaluator (evaluate, Evaluator(..)) where
import Lexer
import Parser
import qualified Data.Map as M
newtype Evaluator a = Ev (Either String a)
instance Monad Evaluator where
(Ev ev) >>= k =
case ev of
Left msg -> Ev (Left msg)
Right v -> k v
return v = Ev (Right v)
fail msg = Ev (Left msg)
type SymTab = M.Map String Double
evaluate :: Tree -> SymTab -> Evaluator (Double, SymTab)
evaluate (SumNode op left right) symTab = do
(lft, symTab') <- evaluate left symTab
(rgt, symTab'') <- evaluate right symTab'
case op of
Plus -> return (lft + rgt, symTab'')
Minus -> return (lft - rgt, symTab'')
evaluate (ProdNode op left right) symTab = do
(lft, symTab') <- evaluate left symTab
(rgt, symTab'') <- evaluate right symTab'
case op of
Times -> return (lft * rgt, symTab)
Div -> return (lft / rgt, symTab)
evaluate (UnaryNode op tree) symTab = do
(x, symTab') <- evaluate tree symTab
case op of
Plus -> return ( x, symTab')
Minus -> return (-x, symTab')
evaluate (NumNode x) symTab = return (x, symTab)
evaluate (VarNode str) symTab = lookUp str symTab
evaluate (AssignNode str tree) symTab = do
(v, symTab') <- evaluate tree symTab
(_, symTab'') <- addSymbol str v symTab'
return (v, symTab'')
lookUp :: String -> SymTab -> Evaluator (Double, SymTab)
lookUp str symTab =
case M.lookup str symTab of
Just v -> return (v, symTab)
Nothing -> fail ("Undefined variable " ++ str)
addSymbol :: String -> Double -> SymTab -> Evaluator ((), SymTab)
addSymbol str val symTab =
let symTab' = M.insert str val symTab
in return ((), symTab')