ByteString Basics
ByteString provides a more efficient alternative to Haskell's built-in String
which can be used to store 8-bit character strings and also to handle binary data. It provides alternative versions of functions such as readFile
and also equivalents of standard list manipulation functions:
{-# START_FILE Main.hs #-}
import qualified Data.ByteString as B
main = do
contents <- B.readFile "foo.txt"
print $ B.reverse contents
{-# START_FILE foo.txt #-}
... em esreveR
Characters or bytes?
Depending on the context, we may prefer to view the ByteString
as being made up of a list of elements of type Char
or of Word8
(Haskell's standard representation of a byte). There's only one ByteString
data structure for both, but the library exposes different functions depending on how we want to interpret the contents:
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
bytestring = BC.pack "I'm a ByteString, not a [Char]"
bytes = B.unpack bytestring
chars = BC.unpack bytestring
main = do
BC.putStrLn bytestring
print $ head bytes
print $ head chars
Here we've used the pack
function to convert a String
into a ByteString
and then used two different unpack
functions to get back both a list of Char
s (the original String
) and a list of Word8
s. Data.ByteString
provides the Word8
functions while Data.ByteString.Char8
provides the Char
equivalents.
Of course we don't need to unpack the ByteString
to a list to get the first element. We can just use the head
functions provided by the library itself
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
bytestring = BC.pack "I'm a ByteString, not a [Char]"
main = do
BC.putStrLn bytestring
print $ B.head bytestring
print $ BC.head bytestring
ByteStrings and Unicode
ByeString character functions only work with ASCII text, hence the Char8
in the package name. If you try and use unicode Strings it will mess up:
import qualified Data.ByteString.Char8 as BC
hello = "你好"
helloBytes = BC.pack hello
main = do
putStrLn hello
BC.putStrLn helloBytes
print $ BC.length helloBytes
If you are working with unicode, you should use the Text
package.
Lazy ByteStrings
ByteString
also has a lazy version, which is a better choice if you are processing large amounts of data and don't want to read it all into memory at once. Just import Data.ByteString.Lazy
instead of Data.ByteString
. Sometimes you will find libraries which use one type when you are using the other. For example, Aeson uses lazy ByteStrings, but you may only be dealing with small JSON snippets and want to write your own code using the strict version. You can convert between them easily enough if you have to:
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString.Lazy.Char8 as BLC
strict = BC.pack "I'm a strict ByteString (or am I)"
lazy = BLC.pack "I'm a lazy ByteString (or am I)"
strictToLazy = BL.fromChunks [strict]
lazyToStrict = B.concat $ BL.toChunks lazy
main = do
BLC.putStrLn strictToLazy
BC.putStrLn lazyToStrict
Newer versions of the library have toStrict
and fromStrict
functions in the Data.ByteString.Lazy
module which you can use instead.
The OverloadedStrings
Language Extension
When you enter a string literal, Haskell will normally assume it is of type String
([Char]
). This useful language extension allows us to have string literals interpreted as ByteString
s, provided we import Data.ByteString.Char8
:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Char8 ()
import qualified Data.ByteString as B
bytes = "I'm a ByteString, not a [Char]" :: B.ByteString
str = "I'm just an ordinary [Char]" :: String
main = do
print bytes
print str
As you can see here, we might have to add explicit types in some cases to let Haskell know which kind of string we want. In ghci
, you can get the same behaviour by starting it using:
ghci -XOverloadedStrings
ByteString binary data
Manipulating binary data is easy with ByteString
. In fact, these notes are really a collection of bits and pieces I picked up along the way while doing the exercises for Coursera's Cryptography I and had to use ByteString
for the first time.
Hex and Base64 Encoding
Binary data is often encoded as hex or base64 to provide an ASCII text representation, so we need an easy way of decoding these to a ByteString
containing the bare bytes. This is exactly what the base16-bytestring and base64-bytestring packages were written for.
Here's an example for base64:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Char8 ()
import qualified Data.ByteString as B
import Data.ByteString.Base64 (encode, decode)
Right bytes = decode "SSdtIGEgYmFzZTY0IGVuY29kZWQgQnl0ZVN0cmluZw=="
main = print bytes
And one for a hex-encoded string:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Base16 (encode, decode)
bytes = fst $ decode "49276d2061206865782d656e636f6465642042797465537472696e6720286f722077617329"
main = print bytes
Unfortunately, base16-bytestring isn't available in Stackage yet, so we can't use active code here.
One-Time Pad
If you want to XOR one bytestring against another, to implement one-time pad encryption for example, you can use zipWith
:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Char8 ()
import qualified Data.ByteString as B
import Data.ByteString.Base64 (decode)
import Data.Bits (xor)
Right key = decode "kTSFoLQRrR+hWJlLjAwXqOH5Z3ZLDWray5mBgNK7lLuHdTwab8m/v96y"
encrypt = B.pack . B.zipWith xor key
decrypt = encrypt
main = do
let encrypted = encrypt "I'm a secret message"
print encrypted
print $ decrypt encrypted
That's about it. You can view the full package documentation to see what other functions are available.