Now that we are done with the preliminaries, I'd like to show you how to design and develop a small application -- a symbolic calculator. It's a console application: the user types in expressions that are evaluated, and the results are displayed. To make it more interesting, the calculator supports symbolic variables that can be assigned and re-assigned and used in expressions. Here's an example of a user session:
In fact you can run the calculator right here, on the spot:
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Token
import Text.Parsec.Expr
import Text.Parsec.Language
import qualified Data.Map as M
import qualified Control.Monad.State as S
import Control.Monad.Error
import Control.Monad.Identity
-- Lexer
def = emptyDef { identStart = letter
, identLetter = alphaNum
, opStart = oneOf "+-*/="
, opLetter = oneOf "+-*/="
}
lexer :: TokenParser ()
lexer = makeTokenParser def
-- Expression tree
data Expression = Constant Double
| Identifier String
| Addition Expression Expression
| Subtraction Expression Expression
| Multiplication Expression Expression
| Division Expression Expression
| Negation Expression
| Assignment Expression Expression
deriving Show
-- Parser
parseNumber :: Parser Expression
parseNumber = do
v <- naturalOrFloat lexer
case v of
Left i -> return $ Constant $ fromIntegral i
Right n -> return $ Constant n
parseIdentifier :: Parser Expression
parseIdentifier = do
i <- identifier lexer
return $ Identifier i
parseExpression :: Parser Expression
parseExpression = (flip buildExpressionParser) parseTerm [
[ Prefix (reservedOp lexer "-" >> return Negation)
, Prefix (reservedOp lexer "+" >> return id)
]
, [ Infix (reservedOp lexer "*" >> return Multiplication) AssocLeft
, Infix (reservedOp lexer "/" >> return Division) AssocLeft
]
, [ Infix (reservedOp lexer "+" >> return Addition) AssocLeft
, Infix (reservedOp lexer "-" >> return Subtraction) AssocLeft
]
, [ Infix (reservedOp lexer "=" >> return Assignment) AssocRight
]
]
parseTerm :: Parser Expression
parseTerm = parens lexer parseExpression
<|> parseNumber
<|> parseIdentifier
parseInput :: Parser Expression
parseInput = do
whiteSpace lexer
ex <- parseExpression
eof
return ex
-- Evaluator
type SymTab = M.Map String Double
type Evaluator a = S.StateT SymTab (ErrorT String Identity) a
runEvaluator :: Evaluator Double -> SymTab -> Either String (Double, SymTab)
runEvaluator calc symTab = runIdentity $ runErrorT $ S.runStateT calc symTab
eval :: Expression -> Evaluator Double
eval (Constant x) = return x
eval (Identifier i) = do
symtab <- S.get
case M.lookup i symtab of
Nothing -> fail $ "Undefined variable " ++ i
Just e -> return e
eval (Addition eLeft eRight) = do
lft <- eval eLeft
rgt <- eval eRight
return $ lft + rgt
eval (Subtraction eLeft eRight) = do
lft <- eval eLeft
rgt <- eval eRight
return $ lft - rgt
eval (Multiplication eLeft eRight) = do
lft <- eval eLeft
rgt <- eval eRight
return $ lft * rgt
eval (Division eLeft eRight) = do
lft <- eval eLeft
rgt <- eval eRight
return $ lft / rgt
eval (Negation e) = do
val <- eval e
return $ -val
eval (Assignment (Identifier i) e) = do
val <- eval e
S.modify (M.insert i val)
return val
eval (Assignment _ _) =
fail "Left of assignment must be an identifier"
defaultVars :: M.Map String Double
defaultVars = M.fromList
[ ("e", exp 1)
, ("pi", pi)
]
--runEvaluator returns Either String (Double, SymTab Double)
calculate :: SymTab -> String -> (String, SymTab)
calculate symTab s =
case parse parseInput "" s of
Left err -> ("error: " ++ (show err), symTab)
Right exp -> case runEvaluator (eval exp) symTab of
Left err -> ("error: " ++ err, symTab)
Right (val, newSymTab) -> (show val, newSymTab)
loop :: SymTab -> IO ()
loop symTab = do
line <- getLine
if null line
then return ()
else do
let (result, symTab') = calculate symTab line
putStrLn result
loop symTab'
main = loop defaultVars
-- show
-- Enter expressions, one per line. Empty line to quit --
This is not the implementation I'll be describing. I wrote this one using Haskell libraries such as Parsec (one of the standard parsing libraries) and several monad transformers. In this series of tutorials I'd like to implement the same functionality from scratch, so you'll be able to clearly see each step and learn the language in the process.
The Design
- At the very high level, the calculator is a loop that gets a line of text from the user and then calculates and displays the result.
- The calculation is done is three steps:
- Lexical analysis: The string is converted to tokens
- Parsing: Building an expression tree
- Evaluation: The expression is evaluated.
In the first phase of implementation we won't worry about error handling and symbolic variables -- we'll add them later.
Notice that there's nothing Haskell-specific in this design -- it's just a piece of good old software engineering. Some people worry that programming in Haskell means re-learning everything from scratch. This is based on seriously underestimating the amount of software engineering that is common to all programming tasks -- independent of the language.
Designing with Types
We haven't talked about types yet because, even though Haskell is a strongly typed language, it has a powerful type inference system. This feature makes quick prototyping easy. Sometimes you just want to write a function and not worry about types. The compiler will figure them out. (C++11 introduced a modicum of type inference with the keyword auto
.)
On the other hand, software design in Haskell often starts with types. Let's try this approach.
1. Lexical analyzer
Lexical analyzer is implemented as a function tokenize
that takes a string (of type String
) and returns a list of tokens. We'll define the Token
data type later. A list of tokens has the type [Token]
-- the square brackets are used to create lists (both list types, like [Int]
, and list literals, like [1, 2, 3]
). Finally, a function type is constructed with an arrow ->
between the type of the argument and the type of the result (we'll get to multi-argument functions later). Putting all this together, we can write the Haskell type signature for the function tokenize
as follows:
tokenize :: String -> [Token]
This is read as: Tokenize is a function taking a string and returning a list of tokens. The double colon is used to introduce a type signature.
Type names must always start with capital letters, as in String
or Double
(except for names constructed from special characters, like the list type, []
).
2. Parser
The parsing function takes a list of tokens and produces an expression. We'll define the Expression
type later. For now, this is the type of parse
:
parse :: [Token] -> Expression
3. Evaluator
We'll make evaluate
take an Expression
and return a value of the built in type Double
(double precision floating point number).
evaluate :: Expression -> Double
We can define dummy data types for Token
and Expression
, and dummy function bodies; and fire up the compiler to typecheck our design:
data Token
data Expression
tokenize :: String -> [Token]
tokenize = undefined
parse :: [Token] -> Expression
parse = undefined
evaluate :: Expression -> Double
evaluate = undefined
main :: IO ()
main = putStrLn "It works, so are we done yet?"
You might wonder how undefined
plays with the type checker. It turns out that the type of undefined
is the bottom of the type hierarchy, which means it can be implicitly converted to any type. For instance, in the definition of tokenize
the type of undefined
becomes the function type: String->[Token]
.
The type of main
is always IO ()
: the type of IO
monadic action that produces no result (only side effects). The type ()
itself is called unit -- loosely corresponding to void
in C-like languages.
Recursion
Our design calls for a loop that accepts user input and displays the results. All loops in Haskell are implemented either using recursion or using (higher-order) functions whose implementation uses recursion.
You might be concerned about the performance or recursion and the possibility of blowing the stack -- in most cases this is not a problem since the compiler is able to turn most recursions into loops. This is called tail recursion optimization, where the recursive call at the very end of a function is simply turned into a goto to the beginning of the function. More serious performance concerns arise occasionally from Haskell's laziness but we'll talk about it later.
With that in mind, we are ready to implement the top-level loop of our calculator:
main :: IO ()
main = do
line <- getLine
putStrLn line
main
You can think of main
as first calling getLine
, storing the result in the variable line
, then calling putStrLn
with that line
, and then calling itself again. This will create an infinite loop, but no stack will be hurt in the process, since this is a typical case of tail recursion.
Of course, what really happens when the program is running is slightly different because of the IO
monad and general laziness. So it would be more appropriate to say that main
is an IO
action that is a sequence of three other actions: the ones returned by getLine
, putStrLn
, and main
. The last action, when the time comes to execute it, will produce three new actions, etc. But everything happens on the need to run basis, so the inner main
will not be evaluated until the (blocking) action produced by getLine
delivers its result.
Thinking about recursion rather than looping might initially seem unnatural, but it quickly becomes second nature. The main reason Haskell doesn't use loops is because of immutability: Loops in imperative languages usually have some kind of mutable counter or a mutable pointer. It's relatively easy to replace those loops with recursion. But first we need to learn about conditionals: We have to be able to break out of recursion at some point.
Conditional
A conditional in Haskell is just a simple if
, then
, else
construct. The else
is mandatory.
Anything between if
and then
is the condition (you don't even have to surround it with parentheses), and it must evaluate to a Boolean. You can pretty much use the familiar equality and comparison operators, >
, >=
, <
, <=
, ==
, to create Boolean values; except for the not-equal operator which is /=
. You can also combine them using &&
and ||
for logical and and or, and not
for not.
However, unlike in imperative languages, the Haskell if/then/else is not a statement but an expression (similar to C's (?:
) construct). It evaluates to either the then
or the else
expression, both of which have to be of the same type. For instance:
main = do
putStrLn "Enter a number"
str <- getLine
print (if read str >= 1 then 1 else 0)
Here, the if/then/else expression that is the argument to print
evaluates to either 1 or 0.
You might be wandering about the short-circuitting properties of if/then/else or the binary Boolean operators &&
and ||
. In most languages the property of not evaluating the branch that is not taken has to be built into the language as a special feature. Otherwise constructs like:
x = (p != nullptr) ? *p : 0
wouldn't work properly. In Haskell, short-circuiting is just the side effect of laziness. The branch not taken is never used, so it won't be evaluated.
Several explanations are in order: I used the function read
to turn a string into a value. It's an interesting function -- it's overloaded on the return type. Here, the compiler deduced that an integral value was needed because it was compared to another integral value, 1. (Try experimenting with this code by inputing a floating point number. Then change the 1 in the if clause to 1.0 and see if the behavior changes.)
We are now ready to convert a simple imperative loop that prints numbers from 0 to 4 to Haskell. Here's the C++ loop:
for (int i = 0; i < 5; ++i)
std::cout << i << std::endl
And here's its recursive counterpart written in Haskell:
loop :: Int -> IO ()
loop n = do
if n < 5
then do
putStrLn (show n)
loop (n + 1)
else
return ()
main :: IO ()
main = loop 0
The Haskell code looks straightforward, although it's more verbose than its C++ counterpart. That's because I made it control-driven -- which is closer to the imperative version -- rather than data-driven. In Haskell one should really try to think at a higher abstraction level. Here, the goal is to print a list of integers from 0 to 4, so it would be more natural to start with such a list: [0, 1, 2, 3, 4]
or, using a handy shorthand, [0..4]
; and apply a function to it. We'll see examples of this approach later.
Let's talk about types: loop
returns a "void" IO
action, so both branches of the if must also return an IO ()
action. The first branch is a sequence of two actions (hence the use of do
in that branch), the last of which is indeed of the type IO ()
(that's the result of calling loop
). The second branch is more interesting. At first sight you might not even notice anything out of the ordinary: Well, it does return a unit value ()
, which is of the type unit ()
. But how does this value become an IO ()
action? The trick is that return
is not a built-in keyword, it's actually an important monadic function (every monad has it).
The return
function turns whatever value it's given into a monadic value: here it turns ()
into IO ()
. It could also turn "Hello!" into IO String
, etc. We'll see more examples of using return
to "return" a value from a do
block in the future. Also notice the use of the Int
type -- it's a fixed precision integer.
In the next installment we'll start implementing the lexical analyzer and learn more about data types.
Exercises
Ex 1. Print squares of numbers from 1 to 10.
loop :: Int -> IO ()
loop n = undefined
main :: IO ()
main = loop 1
loop :: Int -> IO ()
loop n = do
if n <= 10
then do
putStrLn (show (n * n))
loop (n + 1)
else
return ()
main :: IO ()
main = loop 1
Ex 2. No exposition of recursion is complete without factorial. Use the following property: Factorial of n is n times the factorial of (n - 1), and the factorial of 0 is 1.
fact :: Int -> Int
fact n = undefined
main = print (fact 20)
fact :: Int -> Int
fact n = if n > 0 then n * fact (n - 1) else 1
main = print (fact 20)
Ex 3. The evaluation of factorial starts returning incorrect results right about n = 21 because of the Int
overflow. Try implementing a version that uses the infinite precision Integer
instead of Int
.
fact :: Int -> Int
fact n = if n > 0 then n * fact (n - 1) else 1
fullFact :: ...
fullFact n = ...
main = do
print (fact 23)
print (fullFact 23)
fact :: Int -> Int
fact n = if n > 0 then n * fact (n - 1) else 1
fullFact :: Integer -> Integer
fullFact n = if n > 0 then n * fullFact (n - 1) else 1
main = do
print (fact 23)
print (fullFact 23)
Ex 4. No exposition of recursion is complete without Fibonacci numbers. (I'm using these mathematical examples because we haven't learned about data structures. In general, Haskell is not just about math.) Use the following property of Fibonacci numbers: The n'th Fibonacci number is the sum of the (n-1)'st and the (n-2)'nd, and the first and second Fibonacci numbers are both 1.
fib :: Int -> Int
fib n = undefined
main = print (fib 20)
fib :: Int -> Int
fib n = if n > 2 then fib (n - 1) + fib (n - 2) else 1
main = print (fib 20)