I’ve been reading Real World Haskell now, after having finished the delightful Learn You a Haskell Tutorial. I’m up to chapter 6, about to dive into Typeclasses. In the meantime, I’ve picked a toy project that also has a taste of usefulness: a script to convert the Hackage database into MacPorts Portfiles, respecting inter-package and external library dependencies. I call it HackPorts, of course.
Requirements
This translation should require two things:
The Cabal package, for read information about all packages known to it. This avoids writing a custom parser, or using HTTP to crawl the online Hackage database.
A mapping file of external dependency names to MacPorts port names. This is for dependencies on things like
libbz2
, where the script will need to be taught how MacPorts names that library. This is likely to be the most labor-intensive step, having nothing to do with Haskell.
Initial experiences
Haskell makes a concerted point about separating “pure” from “impure” code. Anything which talks to the outside world, such as reading and writing files, is impure. Anything which can be expressed in terms of standard data types – or compositions thereof – is pure.
Take for example a program to count lines in a file. The pure part of the code
receives a giant string, splits it into lines at line boundaries, counts those
lines, and returns an integer. The impure part takes a command-line argument,
interprets it as a FilePath
(an impure type, since it must concern itself with
operating system-dependent naming conventions), and reads the contents of the
file at that location. The program flows by passed the file contents as a
string to the pure code, and receiving an integer to be printed on the output
device.
This division into pure and impure has an interesting side-effect (no pun
intended): Most of a program’s code is written in isolation of its context of
usage. Take Cabal, as a case in point here. Part of Cabal deals with
downloading information from the Web, reading and writing package files, and
executing external commands, like make
. But another part of Cabal is concerned
only with the structure of package files, and determining the total set of
dependencies required for building a package. These latter details can be
discussed in complete isolation from what is done with that information.
As a result – and I’m not sure whether the Cabal authors designed it this way
or not – Cabal is naturally part “program”, and part API. I was able to start
taking apart package files almost instantly, with extremely little code.
Here’s a toy program to print out a package’s maintainer, if given the path to
a .cabal
file:
import System.Environment (getArgs)
import Distribution.Verbosity (verbose)
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse (readPackageDescription)
= do
main - getArgs
args ,- readPackageDescription verbose (head args)
pkg ,print . maintainer . packageDescription $ pkg
Now, I do suppose it’s just as easy to do a similar thing in Python’s distutils, for example:
import sys
from distutils.extension import *
= read_setup_file(sys.argv[1])
exts print exts[0].language # print the ext 'language'
What excites me is that Haskell uniquely encourages the separation of alogrithm and application – the isolation of context-dependent knowledge into as small a region of a program as possible.
Too many times I’ve tried to use a utility’s code as a “library”, only to find
it was so caught up in its idea of how it should be used, it had never
bothered to abstract its core principles into a set of “pure” function,
independent from that intent. This happens, for example, with the version
control system Git. Although many have wanted a libgit.a
for accessing Git’s
data structures directly from other languages, yet none exists. One is forced
to either shell out to the git
command, or write another implementation to
interface with the “pure” side of what Git does.