Moore for Less

As of March 2020, School of Haskell has been switched to read-only mode.

I was playing around while stuck on a plane this morning, and realized a few things that had previously escaped me about Moore machines.

In previous posts, I've talked about the notion of an (infinite) Moore machine.

Here we have a machine where each state Moore a b has a label b, and given an input a we transition along an edge to a new state. Unlike a traditional Moore machine, we may well have an infinite number of states, which removes all those pesky limitations on what you can recognize with such a machine.

data Moore a b = Moore b (a -> Moore a b)

In this form it can be seen to be an Cofree comonad.

data Cofree f a = Cofree a (f (Cofree f a))

We can see that Moore a b is isomorphic to Cofree ((->) a) b.

Soul of a Nu Machine

An equivalent definition of a Moore machine that I've covered before is to switch to an explicit 'state' type.

We can derive that definition from the more direct definition above by one of several different means.

Probably the most straightforward way to do so is to exploit the fact that Cofree f a = Fix (Compose ((,) a) f), where Fix f is the greatest fixed point of f, and look at the two different ways to encode the greatest fixed point in Haskell.

The usual definition of Fix is given by:

data Fix f = In { out :: f (Fix f) }

This definition exploits the fact that the least and greatest fixed points are the same, but we can also just use the direct definition as the greatest fixed point, which we get when we write down the definition of an anamorphism, which we can use to build a member of the greatest fixed point of f:

ana :: Functor f => (s -> f s) -> s -> Nu f

and just take its type signature as the definition of Nu.

data Nu f where
  Nu :: (s -> f s) -> s -> Nu f

ana = Nu 

Nu represents the greatest fixed point in a much more portable way than Fix. Fix only really works as a greatest fixed point due to laziness in Haskell. In a strict language these two defintions are not equivalent.

Substituting Nu into Cofree instead of Fix we get:

Cofree f b =
Nu (Compose ((,) b) f) =
∃s. (s -> Compose ((,) b) f s, s) =
∃s. (s -> (b, f s), s) =
∃s. (s -> b, s -> f s, s)

In the particular case of a Moore machine:

Moore a b = 
Cofree ((->) a) b =
∃s. (s -> b, s -> a -> s, s)

This corresponds to the data type:

data Moore a b where
  Moore :: (s -> b) -> (s -> a -> s) -> s -> Moore a b

I've talked about this data type before, as has Gabriel Gonzalez. I have it packaged up in my folds package, and he has a version of it in his foldl package.

Distributive Functors, Represent!

Now for something new: In category theory a representable functor f is a functor for which there exists an object x such that we can equip our functor with a natural isomorphism between f a and (x -> a). It describes all of the arrows out of some object x.

I have long had this definition split across two packages.

We first have the "haskell 98" distributive package, which provides

class Functor g => Distributive g where
  distribute :: Functor f => f (g a) -> g (f a)
  collect :: Functor f => (a -> g b) -> f a -> g (f b)

Distributive is effectively a co-Traversable. Given that the package is Haskell 98 / 2010, it can't supply us x. For every Distributive functor such an x should exist though. In the adjunctions package, you can get your hands on it with:

class Distributive f => Representable f where
  type Rep f
  tabulate :: (Rep f -> a) -> f a
  index :: f a -> Rep f -> a

Here, x = Rep f, and tabulate and index are inverses.

Cofree Anyone?

There is an obvious representation for ((->) x), as (x -> a) is clearly isomorphic to (x -> a)!

instance Representable ((->) x) where
  type Rep ((->) x) = x
  tabulate = id
  index = id

With this we can just randomly embellish the definition of a cofree comonad by explicitly choosing s to be the representation of some representable functor, and just not telling me what it is.

data Cofree f a where
  Cofree :: Representable k => k a -> k (f (Rep k)) -> Rep k -> Cofree f a

instance Functor (Cofree f) where
  fmap f (Cofree k u s) = Cofree (fmap f k) u s

instance Comonad (Cofree f) where
  extract (Cofree k _ s) = index k s
  duplicate (Cofree k u s) = Cofree (tabulate (Cofree k u)) u s

Representable Machines

Now we're equipped to play with a similarly modified definition of a Moore machine.

data Moore a b where
  Moore :: Representable k => k b -> k (a -> Rep k) -> Rep k -> Moore a b

Above and beyond what we can say about Cofree in general, we know something else in the Moore case, namely that both our unknown representable functor k and (->) a are Distributive, so we can freely interchange them in the definition above. This yields the following definition, in terms of which all subsequent instances are defined:

data Moore a b where
  Moore :: Representable k => k b -> (a -> k (Rep k)) -> Rep k -> Moore a b

Since the u argument isn't used at all in the definition of the Comonad for Cofree f and is just silently passed along, these instances for Moore a work either way.

instance Functor (Moore a) where
  fmap f (Moore k u b) = Moore (fmap f k) u b

instance Comonad (Moore a) where
  extract (Moore k _ s)   = index k s
  duplicate (Moore k u s) = Moore (tabulate (Moore k u)) u s
  extend f (Moore k u s)  = Moore (tabulate (f . Moore k u)) u s

Due to the Representable ((->) x) instance no power is lost, but now some choices of k might be suitable for memoization, acting as a trie to hold onto the results rather than recomputing them each time they are asked.

Doing Two Things at Once

While I've put the Comonad for Moore to good use in previous posts, much of the original motivation for using Moore was to give us the ability to describe how to fuse together multiple passes over the data.

To derive the Applicative for our new machine we'll need suitable functors to use to memoize all of our states. Rather than using functions, let's look for some things that are better behaved.

The next simpler instance after the naïve function instance above is:

instance Representable Identity where
  type Rep Identity = ()
  tabulate f = Identity (f ())
  index (Identity a) () = a

Next, given two representable functors, their composition is also representable:

instance (Representable f, Representable g) => Representable (Compose f g) where
  type Rep (Compose f g) = (Rep f, Rep g)
  index (Compose fg) (i,j) = index (index fg i) j
  tabulate = Compose . tabulate . fmap tabulate . curry

So let's put these instances to work:

instance Applicative (Moore a) where
  pure a = Moore (Identity a) (\_ -> Identity ()) ()
  Moore kf uf sf <*> Moore ka ua sa =
    Moore (Compose ((<$> ka) <$> kf))
          (\x -> Compose $ (\y -> (,) y <$> ua x) <$> uf x)
          (sf, sa)

instance ComonadApply (Moore a) where
  (<@>) = (<*>)

This is just the definition we used to use for the foldl-style Moore machine, but now instead of using functions from our state, we just use representable functors that have our states as their representations.

Finally, as a small but useful aside, a Moore machine is a Profunctor, so we can map contravariantly over the inputs as well as covariantly over the results.

instance Profunctor Moore where
  dimap f g (Moore k u s) = Moore (g <$> k) (u . f) s

Stepping Lightly

So how do we run the machine?

We can feed our machine in one of two ways. We can define

step1 :: a -> Moore a b -> b
step1 a (Moore k u s) = index k (index (u a) s)

and rely on the ability to extend (step a) to change out all of the labels on our machine. This unfortunately builds up and tears down a whole representable functor worth of data.

On the other hand, we can simply move the start state and get a whole new machine:

step :: a -> Moore a b -> Moore a b
step a (Moore k u s) = Moore k u (index (u a) s)

The choice between step a and extend (step1 a) is indistinguishable to the outside observer except in terms of performance. While step isn't a Cokleisli arrow, it is much faster.

Running a Tab

That said, we don't have to run the machine one step at a time!

Dan Piponi once wrote an article on recognizing a regular language with a monoid. What he did was build a data type to represent the tabulation of transitions in his DFA.

The a -> f (Rep f) in the body of our Moore machine suggests what such a tabulation might look like, generically. It takes a to a structure that contains references to all the new states.

newtype Tab f = Tab { getTab :: f (Rep f) }

instance Representable f => Monoid (Tab f) where
  mempty = Tab $ tabulate id
  mappend (Tab fs) (Tab gs) = Tab (index gs <$> fs)

You can view this as a form of Endo (Rep f) that happens to be able to memoize the results of each argument to the function if f is sufficiently "nice".

We can now feed our machine a whole Foldable container at a time.

feed :: Foldable f => f a -> Moore a b -> Moore a b
feed as (Moore k u s) = Moore k u (index (getTab $ foldMap (Tab . u) as) s)

Compressive Parsing

The next trick is finding the right container type to fold over to make use of the memoized internal states.

For this, I'll turn to an old package of mine, compressed, which supplies a rather peculiar LZ78 container type. LZ78 is a compression scheme by Lempel and Ziv that has a number of nice properties for my purposes.

data Token a = Token {-# UNPACK #-} !Int a deriving (Eq, Ord)
data LZ78 a
  = Cons {-# UNPACK #-} !(Token a) (LZ78 a)
  | Nil

The idea is this:

1.) You start with a dictionary that maps integers to a list of values and which contains a single entry that maps 0 to the empty string.

2.) Now you receive (or generate) a series of (Int, value) pairs, where each Int represents an existing slot in the dictionary, and the value represents something you want to snoc onto the end of it to make a fresh dictionary entry.

More advanced versions of this scheme collect old entries, but we can define a particularly naive LZ78 encoder / decoder very easily.

We can encode using a variety of different constraint types and times depending on how we represent the dictionary during construction.

In O(n2) we can construct an LZ78 stream using a list internally, but no more than an Eq constraint on a.

encodeEq :: Eq a => [a] -> LZ78 a
encodeEq = go [] 1 0 where
  go _ _ _ [] = Nil
  go _ _ p [c] = Cons (Token p c) Nil
  go d f p (c:cs) = let t = Token p c in case List.lookup t d of
    Just p' -> go d f p' cs
    Nothing -> Cons t (go ((t, f):d) (succ f) 0 cs)

With a Map our time upgrades to O(n log n) with an Ord constraint.

encodeOrd :: Ord a => [a] -> LZ78 a
encodeOrd = go Map.empty 1 0 where
  go _ _ _ [] = Nil
  go _ _ p [c] = Cons (Token p c) Nil
  go d f p (c:cs) = let t = Token p c in case Map.lookup t d of
    Just p' -> go d f p' cs
    Nothing -> Cons t (go (Map.insert t f d) (succ f) 0 cs)

We can also turn to a HashMap if we have Hashable inputs.

encode :: (Hashable a, Eq a) => [a] -> LZ78 a
encode = go HashMap.empty 1 0 where
  go _ _ _ [] = Nil
  go _ _ p [c] = Cons (Token p c) Nil
  go d f p (c:cs) = let t = Token p c in case HashMap.lookup t d of
    Just p' -> go d f p' cs
    Nothing -> Cons t (go (HashMap.insert t f d) (succ f) 0 cs)

But regardless of how it was constructed, we can decode with Foldable.

instance Foldable LZ78 where
  foldMap f = go (Seq.singleton mempty) mempty where
    go _ m Nil = m
    go s m (Cons (Token w c) ws) = m `mappend` go (s |> v) v ws where
      v = Seq.index s w `mappend` f c

The key here is that the decompression scheme never actually looks at the values it decodes, so it is possible to decompress directly in any target Monoid you want.

When you do so you'll gain some sharing of intermediate values.

Other compression schemes may also be useful depending on your application.

The Story So Far

We can open up such a machine and borrow its internal type of tabulations to generate anything that can be generated by such a machine in parallel or incrementally.

It is possible to run a Representable Moore machine directly on compressed inputs and pay proportionally to the size of the compressed data, not the decompressed data.

Representability and Adjunctions

In category theory we have the notion of an adjunction.

Given two functors F : D -> C, and G : C -> D, when F a -> b is naturally isomorphic to a -> G b we describe this situation in one of several equivalent ways, we say that F -| G, G is right adjoint to F, or F is left adjoint to G, and if the categories matter, sometimes we'll write F -| G :: C -> D.

Different authors have slightly different conventions on the latter and may give the signature for F instead of G.

  • The left (or right) adjoint of a functor is unique up to isomorphism if it exists at all.

  • All adjunctions F -| G :: C -> Hask have the property that G is representable and F () represents G

  • Since adjoints are unique the fact that the right adjoint is isomorphic to (F () -> a) lets us go back across the (,) (F ()) -| (->) (F ()) adjunction and use the uniqueness of adjoints to see F a is isomorphic to (,) (F ()) a. In other words, F contains exactly one a. On top of that, every left adjoint F :: Hask -> Hask looks like F = (,) x for some x.

  • Representations of representable functors are unique up to isomorphism.

With all of these constraints, if we write down a class describing adjunctions from Hask -> Hask it is rather poorly inhabited! All instances of this class are isomorphic (for some s) to the canonical (,) s -| (->) s adjunction that gives rise to the State monad and Store comonad!

class (Functor f, Representable g) => f -| g where
  leftAdjunct :: (f a -> b) -> a -> g b
  rightAdjunct :: (a -> g b) -> f a -> b

First consider that if f and g are Representable then Product f g is isomorphic to (->) (Either (Rep f) (Rep g)) and we can also look at a couple of our recently explored instances:

  • Identity is isomorphic to (->) ()

  • If f and g are Representable then Compose f g is isomorphic to (->) (Rep f, Rep g)

If you squint at these a bit the representation looks a lot like the "logarithm" of the data type in question. Exponents become products, products become sums, etc. Conor McBride is fond of calling representable functors "Napierian" and using Log instead of Rep.

Since every adjunction gives us a representable functor (the right adjoint), and every representable functor is the right adjoint in some adjunction, and all of this stuff is the same up to isomorphism, we could rephrase everything we just wrote in terms of an adjunction from Hask -> Hask.

We get a couple of options (all of equivalent expressive power) for how to arrange things:

Moore :: (f -| g) => g b -> f (a -> g (f ())) -> Moore a b
Moore :: (f -| g) => g b -> f (g (a -> f ())) -> Moore a b
Moore :: (f -| g) => f (g b) -> (a -> g (f ())) -> Moore a b

I think the last one of those is the most interesting. When f -| g, then f·g is a comonad and g·f is a monad. Here our Moore machine is pairing up some comonad with a function that gives a monadic action that is intimately related to that comonad.

Since all adjunctions from Hask -> Hask look like the representable cases we've already explored we haven't yet gained anything from this, but we could work with adjunctions that visit other categories than Hask.

I'm My Own Grandpa

There is a nice adjunction from Hask -> Haskop that we use in Haskell a great deal.

Using backwards arrows to denote arrows in Haskop, any such adjunction would look like a statement that f a <- b is isomorphic to a -> g b. f and g here are contravariant functors, and if we turn this around we get that b -> f a is isomorphic to a -> g b.

There is such an adjunction:

(-> r) -| (-> r)

When composed in one direction we get a monad in Hask, when composed the other way around we get a comonad in Haskop. This is why Cont r has no comonadic equivalent. It is its own inverse.

(b -> a -> r) be isomorphic to (a -> b -> r), and this is witnessed by flip in both directions.

Less Like Moore

Now, we can mechanically massage that last definition of a Moore machine to go round trip through Haskop.

Wat :: (b -> r -> r) -> a -> (() -> r) -> r

The () -> in there adds no value, so we can apply it to () to get more or less Rich Hickey's notion of a transducer.

type Transducer a b = forall r. (b -> r -> r) -> (a -> r -> r)

Hickeys's transducers derive from the type signature of foldl:

>>> :t foldl
foldl :: (b -> a -> b) -> b -> [a] -> b
>>> :t foldl.foldl
foldl.foldl :: (b -> a -> b) -> b -> [[a]] -> b
...

and you can convert from that form to this form with a couple of judiciously placed flips.

So in this sense a Transducer is a "generalized Moore machine". The generalization here is powerful enough to allow the transducer to emit multiple bs per a.

Representing Transducers

But now we have functions to and from some arbitrary r in our Transducer definition and we can replay this same motivating trick we used to exploit representations on that definition as well.

type Transducer a b = forall f. Representable f => (b -> f (Rep f)) -> a -> f (Rep f)

which is equivalent to

type Transducer a b = forall f. Representable f => (b -> Tab f) -> a -> Tab f

In fact we can apply such a transducer to a Moore machine to get one that turns each a into potentially several bs, and makes that whole chain of transitions in the state diagram at once.

transduce :: Moore b c -> Transducer a b -> Moore a c
transduce (Moore k u s) t = Moore k (getD #. t (D #. u)) s

Finally, if you work for all tabulations of a function, nothing stops you from working for a monoid through Endo m, so you might as well just go to:

type Transducer a b = forall f. Monoid m => (b -> m) -> a -> m

but this is just what lens calls a Fold

type Fold a b = forall f. (Contravariant f, Applicative f) => (b -> f b) -> a -> f a

If you are (legally) both Contravariant and Functor then your argument must be phantom, and using contramap and fmap you can freely change it to anything you want, so this is isomorphic to the last definition.

Open Thoughts

Similar changes can be applied to a coiterative comonad generated by a comonad, which looks very similar to the Mealy machine even if it has wildly different semantics. But given that coincidence, what does such a Comonad mean for a Mealy machine that has a Monoid on its input type? How would such a machine have to work? What does it do?

Just like we ultimately massaged the transducer into a form where it was obvious we could make the same machinery work for any Monoid once it supported tabulated endomorphisms, can do find a series of direct generalizations that take us from a Moore machine to one that uses an intermediate Monoid? Either like the M machine in folds or using a Monoid action to update the state instead of accepting just any new state.

We built a tabulation of a deterministic not-necessarily-finite automaton. What about a non-deterministic automaton? For that we can make a set of representations using the trie we get for a representable functor f.

newtype Set f = Set { getSet :: f Bool }

instance Representable f => Monoid (Set f) where
  mempty = Set $ tabulate (const False)
  mappend (Set as) (Set bs) = Set $ tabulate $ \i -> index as i || index bs i
  
singleton :: (Representable f, Eq (Rep f)) => Rep f -> Set f
singleton i = Set $ tabulate (i==)

insert :: (Representable f, Eq (Rep f)) => Rep f -> Set f -> Set f
insert i (Set is) = Set $ tabulate $ \j -> index is j || (i==j)

contains :: Set f -> Rep f -> Bool
contains (Set is) i = index is i 

but it'd probably be better to use a real "Set" in practice for most applications. You need something like Foldable f as well as Representable in order to make the Monoid for

newtype N f = N { getN :: f (Set f) }

This would start to limit the domain to at-most-countably-infinite automata.

Given that can we define a nice compiler that takes a regular expression builds an NFA state replete with the appropriate functor as it goes, and then converts it to a DFA?

The compressive parsing technique provided by LZ78 above works for lots of monoids, not just this one.

For instance, we can modify the code in Efficient Parallel and Incremental Parsing of Practical Context-Free Languages to work with a Monoid rather than the notion of a sequence algebra they use there. (A sequence-algebra can be converted to Monoid by using a finger-tree to peel off one symbol of work on one side.)

This would let us parse context-free languages using this same machinery.

The main body of code here is available in this gist

(Also, the code above probably needs a couple of tweaks. Notably, it should use a strict pair for the internal state.)

-Edward Kmett May 28, 2015

comments powered by Disqus