May 122013
 

Problem 1: The source of exceptions is obscured

main = getArgs >>= readFile . head >>= print . length

Even though length is a pure function, this is where the I/O will happen (lazily), which means that is where any exceptions relating to I/O will get raised. Pure code should avoid raising exceptions, which this example violates.

Problem 2: Sharing may cause file contents to remain in memory

main = getArgs >>= readFile . head >>= print . (length . words &&& length)

Because of the way that lazy I/O reads in strings, this line of code will cause the entire contents of the file to be loaded into memory by the call to either length or words, and then it will stay in memory to be handled by the other call to length. You would expect it to process the input at the very least one line at a time, to avoid exhausting memory on very large files.

NOTE: It has been pointed out that this is not really a problem with Lazy I/O, but with laziness in general. The only real way, then, in which an iteratee-type library helps here is that it’s more typical to connect sources and sinks directly together, than to read all the data from a source at one time, and then hand it to two sinks that way. So the problem there is not solved either, it’s just less common to the idiom.

Problem 3: File handles are not closed when you might expect

main = getArgs >>= mapM_ (readFile >=> print . length)

If getArgs returns N files, Haskell will open N file handles, rather than one at a time as you might expect, meaning that running this in a very large directory may exhaust system resources.

Conclusion: Use conduit/pipes/io-streams library to avoid surprises

Lazy I/O is great for prototypical simple examples, but for serious code these problems can be hard to track down — and are eliminated by a library such as conduit.

 Posted by at 7:44 pm
May 122013
 

I think one reason I’ve been avoiding posting to my blog lately is the time commitment of writing something of decent length. To get over this hump, I’m going to shift my focus to writing smaller little discoveries of things I find during my researches into Haskell and technology. Let’s see how that goes.

 Posted by at 7:43 pm
Nov 212012
 

The following is the first in a series of articles I hope to write as a gentle introduction to Edward Kmett’s excellent lens library.

Control.Lens provides a composable way to access and modify sub-parts of data structures (where by modify I mean: return a new copy with that part changed). In this introduction I won’t be talking about the theory or laws behind Lens — both of which are worthy of study — but rather how you can use them to write simpler, more expressive code.

The mighty tuple

You’ve probably used tuples quite often by now, and may have discovered that (,) e is a Functor, letting you do things like:

ghci> fmap (+1) (1,2)
  (1,3)

After which a very common question is: "How do I do that for the first element?" One way is with the Arrow library:

ghci> first (+1) (1,2)
  (2,2)

But this method does not generalize for n-tuples. What about the 3rd element of a 3-tuple? Enter lenses. Here’s the lens equivalent of fmap and first:

ghci> over _1 (+1) (1,2)
  (2,2)
ghci> over _2 (+1) (1,2)
  (1,3)
ghci> over _3 (+1) (1,2,3)
  (1,2,4)

In this example, _1 is a lens, which means it represents both a getter and setter focused on a single element of a data structure. In this case we are using it in both senses, since we are first getting the value from the tuple, applying (+1), and then setting the result back to create a new tuple.

You can also apply your function to both elements of a pair at the same time:

ghci> over both (+1) (3,4)
  (4,5)

There is also an infix operator notation for this:

ghci> both %~ (+1) $ (3,4)
  (4,5)

For the simple case of integer addition, you can use +~ n instead of %~ (+n):

ghci> both +~ 1 $ (3,4)
  (4,5)

Most of the math operators are available in this form: -~, *~, //~, ^~, etc. Or you can use .~ to force both members to a specific value:

ghci> both .~ 9 $ (3,4)
  (9,9)

In the operator form it’s fairly easy to chain applications, which makes it easier to apply the same operator to the odd members of, say, a 5-tuple. I’ll use a new function for this, set, which sets the element selected by the lens. Since the result is the new value, we can compose these:

ghci> set _1 9 . set _3 9 $ (1,1,1,1,1)
  (9,1,9,1,1)

Consider what that would look like without lenses:

ghci> let (a,b,c,d,e) = (1,1,1,1,1) in (9, b, 9, d, e)
  (9,1,9,1,1)

The main difference is that you have to capture and pass through all the members, just to modify the few you’re interested in. But even more, the lens version above will work on any tuple with 3 or more elements, while the non-lens version is fixed to working with 5-tuples.

There are lots of these modifier operators for lenses, here’s a selection:

Operator Meaning
.~ Replace the focused element with some x
?~ Replace the focused element with Just x
%~ Apply a function to the element
+~ Add to it
-~ Subtract
*~ Multiple
//~ Divide
^~ Take the integral power
^^~ Take the fractional power
**~ Take an arbitrary power
||~ Logically or with a boolean
&&~ Logically and with a boolean

In the next post, we will look at how lenses interact with some of the other basic functors: Maybe, Either and List.

 Posted by at 5:28 pm
Oct 202012
 

I have found that arrows in Haskell are far simpler than they might appear based on the literature. They are simply abstractions of functions.

To see how this is practically useful, consider that you have a bunch of functions you want to compose, where some of them are pure and some are monadic. For example, f :: a -> b, g :: b -> m1 c, and h :: c -> m2 d.

Knowing each of the types involved, I could build a composition by hand, but the output type of the composition would have to reflect the intermediate monad types (in the above case, m1 (m2 d)). What if I just wanted to treat the functions as if they were just a -> b, b -> c, and c -> d? That is, I want to abstract away the presence of monads and reason only about the underlying types. I can use arrows to do exactly this.

Here is an arrow which abstracts away the presence of IO for functions in the IO monad, such that I can compose them with pure functions without the user needing to know that IO is involved. We start by defining an IOArrow to wrap IO functions:

data IOArrow a b = IOArrow { runIOArrow :: a -> IO b }

instance Category IOArrow where
  id = IOArrow return
  IOArrow f . IOArrow g = IOArrow $ f <=< g

instance Arrow IOArrow where
  arr f = IOArrow $ return . f
  first (IOArrow f) = IOArrow $ \(a, c) -> do
    x <- f a
    return (x, c)

Then I make some simple functions I want to compose:

foo :: Int -> String
foo = show

bar :: String -> IO Int
bar = return . read

And use them:

main :: IO ()
main = do
  let f = arr (++ "!") . arr foo . IOArrow bar . arr id
  result <- runIOArrow f "123"
  putStrLn result

Here I am calling IOArrow and runIOArrow, but if I were passing these arrows around in a library of polymorphic functions, they would only need to accept arguments of type “Arrow a => a b c”. None of the library code would need to be made aware that a monad was involved. Only the creator and end user of the arrow needs to know.

Generalizing IOArrow to work for functions in any Monad is called the “Kleisli arrow”, and there is already a built-in arrow for doing just that:

main :: IO ()
main = do
  let g = arr (++ "!") . arr foo . Kleisli bar . arr id
  result <- runKleisli g "123"
  putStrLn result

You could of course also use arrow composition operators, and proc syntax, to make it a little clearer that arrows are involved:

arrowUser :: Arrow a => a String String -> a String String
arrowUser f = proc x -> do
  y <- f -< x
  returnA -< y

main :: IO ()
main = do
  let h =     arr (++ "!")
          <<< arr foo
          <<< Kleisli bar
          <<< arr id
  result <- runKleisli (arrowUser h) "123"
  putStrLn result

Here it should be clear that although main knows the IO monad is involved, arrowUser does not. There would be no way of “hiding” IO from arrowUser without arrows — not without resorting to unsafePerformIO to turn the intermediate monadic value back into a pure one (and thus losing that context forever). For example:

arrowUser' :: (String -> String) -> String -> String
arrowUser' f x = f x

main' :: IO ()
main' = do
  let h      = (++ "!") . foo . unsafePerformIO . bar . id
      result = arrowUser' h "123"
  putStrLn result

Try writing that without unsafePerformIO, and without arrowUser' having to deal with any Monad type arguments.

 Posted by at 10:56 am
Sep 062012
 

To show just how significant parallelized algorithms can be, today I discovered pxz, a parallelized version of the xz compression utility, which I use constantly. The proof is in the numbers:

Command Before After Ratio Time
xz 2937M 305M 0.104 32m
pxz -9e 2937M 281M 0.096 4m(!)

I put this alias in my .zshrc:

alias tar='tar --use-compress-program=pxz'

Note that to build pxz on the Mac, I had to comment out a reference to MAP_POPULATE, which the OS X’s mmap function doesn’t support.

 Posted by at 3:01 am