ByteString Bits and Pieces

As of March 2020, School of Haskell has been switched to read-only mode.

ByteString Basics

ByteString provides a more efficient alternative to Haskell's built-in String which can be used to store 8-bit character strings and also to handle binary data. It provides alternative versions of functions such as readFile and also equivalents of standard list manipulation functions:

{-# START_FILE Main.hs #-}
import qualified Data.ByteString as B

main = do
    contents <- B.readFile "foo.txt"
    print $ B.reverse contents
{-# START_FILE foo.txt #-}

... em esreveR

Characters or bytes?

Depending on the context, we may prefer to view the ByteString as being made up of a list of elements of type Char or of Word8 (Haskell's standard representation of a byte). There's only one ByteString data structure for both, but the library exposes different functions depending on how we want to interpret the contents:

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC

bytestring = BC.pack "I'm a ByteString, not a [Char]"

bytes = B.unpack bytestring
chars = BC.unpack bytestring

main = do
    BC.putStrLn bytestring
    print $ head bytes
    print $ head chars

Here we've used the pack function to convert a String into a ByteString and then used two different unpack functions to get back both a list of Chars (the original String) and a list of Word8s. Data.ByteString provides the Word8 functions while Data.ByteString.Char8 provides the Char equivalents.

Of course we don't need to unpack the ByteString to a list to get the first element. We can just use the head functions provided by the library itself

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC

bytestring = BC.pack "I'm a ByteString, not a [Char]"

main = do
    BC.putStrLn bytestring
    print $ B.head bytestring
    print $ BC.head bytestring
    

ByteStrings and Unicode

ByeString character functions only work with ASCII text, hence the Char8 in the package name. If you try and use unicode Strings it will mess up:

import qualified Data.ByteString.Char8 as BC

hello      = "你好"
helloBytes = BC.pack hello

main = do
    putStrLn hello
    BC.putStrLn helloBytes
    print $ BC.length helloBytes

If you are working with unicode, you should use the Text package.

Lazy ByteStrings

ByteString also has a lazy version, which is a better choice if you are processing large amounts of data and don't want to read it all into memory at once. Just import Data.ByteString.Lazy instead of Data.ByteString. Sometimes you will find libraries which use one type when you are using the other. For example, Aeson uses lazy ByteStrings, but you may only be dealing with small JSON snippets and want to write your own code using the strict version. You can convert between them easily enough if you have to:

import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString.Lazy.Char8 as BLC

strict = BC.pack "I'm a strict ByteString (or am I)"
lazy = BLC.pack "I'm a lazy ByteString (or am I)"

strictToLazy = BL.fromChunks [strict]
lazyToStrict = B.concat $ BL.toChunks lazy

main = do
    BLC.putStrLn strictToLazy
    BC.putStrLn lazyToStrict

Newer versions of the library have toStrict and fromStrict functions in the Data.ByteString.Lazy module which you can use instead.

The OverloadedStrings Language Extension

When you enter a string literal, Haskell will normally assume it is of type String ([Char]). This useful language extension allows us to have string literals interpreted as ByteStrings, provided we import Data.ByteString.Char8:

{-# LANGUAGE OverloadedStrings #-}

import Data.ByteString.Char8 ()
import qualified Data.ByteString as B

bytes = "I'm a ByteString, not a [Char]" :: B.ByteString

str   = "I'm just an ordinary [Char]"    :: String

main = do
  print bytes
  print str

As you can see here, we might have to add explicit types in some cases to let Haskell know which kind of string we want. In ghci, you can get the same behaviour by starting it using:

ghci -XOverloadedStrings

ByteString binary data

Manipulating binary data is easy with ByteString. In fact, these notes are really a collection of bits and pieces I picked up along the way while doing the exercises for Coursera's Cryptography I and had to use ByteString for the first time.

Hex and Base64 Encoding

Binary data is often encoded as hex or base64 to provide an ASCII text representation, so we need an easy way of decoding these to a ByteString containing the bare bytes. This is exactly what the base16-bytestring and base64-bytestring packages were written for.

Here's an example for base64:

{-# LANGUAGE OverloadedStrings #-}

import Data.ByteString.Char8 ()
import qualified Data.ByteString as B

import Data.ByteString.Base64 (encode, decode)

Right bytes = decode "SSdtIGEgYmFzZTY0IGVuY29kZWQgQnl0ZVN0cmluZw=="

main = print bytes

And one for a hex-encoded string:

{-# LANGUAGE OverloadedStrings #-}

import Data.ByteString.Base16 (encode, decode)

bytes = fst $ decode "49276d2061206865782d656e636f6465642042797465537472696e6720286f722077617329"

main = print bytes

Unfortunately, base16-bytestring isn't available in Stackage yet, so we can't use active code here.

One-Time Pad

If you want to XOR one bytestring against another, to implement one-time pad encryption for example, you can use zipWith:

{-# LANGUAGE OverloadedStrings #-}

import Data.ByteString.Char8 ()
import qualified Data.ByteString as B
import Data.ByteString.Base64 (decode)
import Data.Bits (xor)

Right key = decode "kTSFoLQRrR+hWJlLjAwXqOH5Z3ZLDWray5mBgNK7lLuHdTwab8m/v96y"

encrypt = B.pack . B.zipWith xor key
decrypt = encrypt

main = do
    let encrypted = encrypt "I'm a secret message"
    print encrypted
    print $ decrypt encrypted

That's about it. You can view the full package documentation to see what other functions are available.

comments powered by Disqus