Haskell blog engine

Recently I have moved from Django to static pages for my blog, using a Perl script and a git post-receive hook to trigger the updating script. That was too tempting not to write it in Haskell. The full source code is available on GitHub

Let’s start with the central data structure, the Entry data type:

data Entry = Entry {
    entryUrl :: String,
    entryDate :: CalendarTime,
    entryTitle :: String,
    entryTags :: String,
    entryBody :: String}
        deriving (Eq, Show)

An Entry contains its relative url, its post date, title, tags and body. I chose CalendarTime to access to full letters days of the week, but other formats would do too. I will change the tags to a list of strings to group entries by tags. I defined a couple of functions to take care of the formatting of the dates, for the RSS feed and the sitemap.

I will need to sort entries according to their post date to display the last one on the main page and to list them on the archive page. So let’s just derive an instance of Ord accordingly:

instance Ord Entry where
    compare e1 e2 = compare (entryDate e1) (entryDate e2)

Two lines. Using Perl I had to hash the date to sort it using the builtin function. Not a big deal, but more book keeping and code.

The body is of type String to make it more portable. Using Pandoc’s Markdown facilities, it is also simple to convert it to html:

markdownToHtml :: String -> String
markdownToHtml s = let md = readMarkdown defaultParserState {stateStrict = True} s
                   in writeHtmlString defaultWriterOptions md

Now the html part, the core of the script. I use mainly the blaze-html library from Jasper Van der Jeugt. The approach is elegant and convenient, using combinators to write html. It also has nice documention and tutorials. One thing that I like about it is the ease to create templates. I have two different type of pages: blog entries and the rest. The template changes in few places, namely the meta tags, the title and the comments. The head of a non entry simply needs the title of the page, if any:

htmlHead :: String -> Markup
htmlHead t = do
    H.head $ do
        meta ! httpEquiv "content-type" ! content "text/html; charset=UTF-8"
        meta ! name "keywords" ! content "adrien haxaire, programming geomechanics"
        link ! type_ "text/css" ! rel "stylesheet" ! media "all" ! href "base.css"
        H.title $ toHtml ("Programming Geomechanics" ++ t)

The (!) function allows a natural chaining of the tags, as you can see for the link to the css file for example. For an Entry, I add more information through the meta tags; the structure is similar:

entryHead :: Entry -> Markup
entryHead e = do
    H.head $ do
        meta ! httpEquiv "content-type" ! content "text/html; charset=UTF-8"
        meta ! httpEquiv "Last-Modified" ! content (toValue $ show $ rfc882 e)
        meta ! httpEquiv "expires" ! content (toValue $ show $ expireDate e)
        meta ! name "keywords" ! content "adrien haxaire, programming geomechanics"
        meta ! name "keywords" ! content (toValue $ show $ entryTitle e)
        meta ! name "keywords" ! content (toValue $ show $ entryTags e)
        link ! type_ "text/css" ! rel "stylesheet" ! media "all" ! href "base.css"
        H.title $ toHtml ("Programming Geomechanics | " ++ entryTitle e)

The fuction expireDate is just some formatting on the date to fit the requirements of the http-equiv attribute. You can see that it is quite simple up to now. I could try to factor out some bits and pieces, but I’ll do the polishing later.

Now that the head is defined, we can define the template for the body, taking the contents of the body as argument:

htmlBody :: Html -> Markup
htmlBody contents = H.body $ do

Here, entete, htmlFooter and googleAnnalytics are necessary boilerplate for the navigation and the banner, the footer with the licence and the Google Analytics script. Now comes the best part: we can compose heads and body to generate the templates for the entries and the other pages:

htmlTemplate :: String -> Html -> Html
htmlTemplate pageTitle contents = docTypeHtml ! lang "en" $ do
                                    htmlHead pageTitle
                                    htmlBody contents


entryTemplate :: Entry -> Html
entryTemplate e = docTypeHtml ! lang "en" $ do
                    entryHead e
                    htmlBody $ entryContents e

Nice, isn’t it? That’s flexible, simple and easy to troubleshoot, which is something you want when writing html. The entryContents function is just wrapping up in a div the text of the blog entry. It also includes the javascript code for the Disqus comments.

To write my entries on the disk, I use the standard writeFile IO function, calling renderHtml to generate the output string:

writeEntry :: Entry -> IO ()
writeEntry e = writeFile filename (renderHtml $ entryTemplate e)
      filename = "public/" ++ entryUrl e ++ ".html"

To apply it to all the blog entries located in the entries folder, I list the files in that directory, I create Entry from the markdown file, then apply the writeEntry to all of them.

markdownEntries :: IO [String]
markdownEntries = do 
    es <- getDirectoryContents "entries/" 
    return $ L.map (\x -> "entries/" ++ x) (filterMarkdown es)
        filterMarkdown = L.delete "." . L.delete ".." . L.filter (L.isSuffixOf ".md")

entries :: IO [Entry]
entries = do
    mds <- markdownEntries 
    rmds <- mapM readFile mds
    return $ L.map entry rmds

updateEntries :: [Entry] -> IO ()
updateEntries = mapM_ writeEntry

I am quite sure the entries function can be turned into a one-liner, I’ll investigate it; at that time I simply wanted something that works. Edit: I’ve just found it:

entries :: IO [Entry]
entries = liftM (L.map entry) (markdownEntries >>= mapM readFile)

I forgot to mention how to create an Entry from a file. The entry function wraps the different parsers and yields an Entry from a string, the contents of the markdown file. I use Parsec for that (this will be the topic of a future blog post).

If we do not consider the boilerplate needed for the head of the template, you might acknowledge that it’s quite simple, clean and small. I’d like to add here that in other frameworks, the templates are in separate files, with specific templating system. Here, everything holds in one file. That’s quite nice to maintain and to get the global picture of it. And I really do like the concept of my blog being only one file.

That’s all for the conversion of the markdown entries. The index page is the last entry, without title. It already pays to have decomposed the templates because I want the title and the meta tags to remain unchanged, at least if I understood some basic SEO. So let’s just apply the html template to the last entry, that’s straightforward.

updateIndex :: [Entry] -> IO ()
updateIndex es = let filename = "public/index.html"
                     contents = entryContents $ maximum es
                 in writeFile filename (renderHtml $ htmlTemplate "" contents)

I will skip the RSS feed and the sitemap as they are mainly boilerplate. They have some dynamics in them, as they are updated depending on the number of entries. This is also the case for the archive page, so let’s explain this one instead. Good practice is to get away from the IO monad as soon as possible, to work with pure functions. It is just a write operation, so it might not seem so risky. The point is, in Perl I was opening the file and writing directly in it. Lots of things could happen before that was finished. I prefer to reduce the IO to its minimum, i.e. flushing a string in a file. For this, I create a wrapper around the archiveContents function:

updateArchive :: [Entry] -> IO ()
updateArchive es = let filename = "public/archive.html"
                       contents = archiveContents es
                   in writeFile filename (renderHtml $ htmlTemplate " | Archive" contents)

In the archiveContents function, we take advantage of the monadic behaviour of blaze-html by using forM_, which acts as a for loop, on the function archiveLine that writes one line for each entry:

archiveContents :: [Entry] -> Html
archiveContents es = let ses = reverse $ L.sort es
                     in do 
                       H.div ! A.id "contents" ! A.class_ "container" $ do 
                          p "Previous posts:"
                          forM_ ses archiveLine

archiveLine :: Entry -> Markup
archiveLine e = do 
  p $ do 
    a ! href (toValue $ absoluteUrl e) $ toHtml (entryTitle e)
    toHtml (", " ++ (publishedOn e))

That was really an eye-opener for me. It is much more than appending strings. I could have had the same effect with Perl by using the for loop to create a string then flush it too in one print statement. The difference is that in Perl, it felt awkward to do so. In Haskell it seems natural. Run away from the IO monad, and write functions with predictible behaviour. It also shows that the blaze-html library is well thought as this is easily done using monadic operations. To generate my archive page, it only takes three small functions. I could have made a single bigger one, but I prefer small pieces that do one thing but well than confusing longer ones.

To use the runhaskell command to run is as a script, I gather all the updates for my blog in the main function.

main :: IO () 
main = do
  es <- entries
  updateEntries es
  updateIndex es
  updateArchive es
  updateRSS es
  updateSitemap es

In my web server, I replaced the call to the Perl script in the post-receive hook by this new Haskell script, after sourcing my bashrc file to load my custom Haskell install on the server. It’s funny to call it a Haskell script. That seems so diminishing. But not at all, to the contrary. This shows that Haskell is also well suited for those kind of applications. Little scripts that used to fall under the realm of Perl are also simple and small using Haskell. But most of all, this is a very good way to learn lots in Haskell. The basic parts, simple IO, parsing files, etc, having something that does the job. Doesn’t need to be shiny, just an answer to a simple need, moving my blog from Django to static html files. A good way to understand monads is to use them, using IO or blaze-html or Parsec. I’m really happy to have this very small piece of code working because it is a real application that will do something for me and also because it demystifies Haskell for me. I used to have this idea of etheric libraries, not really tangible, as the one I am writing, and that needs many pieces to be useful. This project makes Haskell more concrete to me. And that’s a good step.