Wednesday, September 19, 2007

These blog entries is to record my early adventures in Haskell, to advertise the SACO, and show off the awesomeness of Literal Haskell (which means that this entire post can be pasted into a text file and compiled) by documenting my solutions to programming contest problems.

But this time I'm actually doing it.

Since we've just hosted another successful South African Computer Olympiad, I decided to code some Haskell solutions. Looking through the problems, I decided on Living Dead Parrots: I didn't code a check solution before the contest, so it wouldn't be too boring, but it was easy enough that I would spend more time learning the language than figuring out the algorithm. Most importantly, it was an interactive problem, meaning I would be brought face-to-face with the IO monad, a fearful prospect for the novice Haskeller.

Well, it wasn't as scary as I'd feared. By no means.

In short, the problem we're solving is this: the evaluator has an array of N booleans. You can ask it for the number of true values between two indices (inclusive, base one). Less than a hundredth of the booleans are true. There is a limit on the number of queries you can make, but we ignore it, since it is always larger than the number needed by the correct algorithm.

This solution scores 100% (within the C++ time limit, when compiled with GHC), but is probably not the best Haskell ever written. I'd appreciate any advice you can give for improvements. That's why I'm posting, after all.

First, the preliminaries:

`> module Main>     where>> import System.IO>> rInt :: String -> Int> rInt = read>> flush :: IO ()> flush = hFlush stdout`

Next, the function to query the evaluator. We output "Q a b", and read in an integer. The answer is obviously wrapped in the IO monad.

`> query :: Int -> Int -> IO Int> query a b = do putStrLn ("Q " ++ (show a) ++ " " ++ (show b))>                flush>                answer <- getLine>                return \$ rInt answer`

I'm not particularly proud of this function, which wraps a value in the IO monad. If it's the right thing to do, there's a library function for it; if it's the wrong thing to do, I shouldn't do it. If you know which it is, leave a comment.

`> coerceIO :: a -> IO a> coerceIO x = do return x`

Here comes the meaty part. Search takes a range, and the number of parrots (oh, yes, those booleans represent whether a parrot in a certain cage is pining for the fjords or has kicked the bucket) in that range, and determines where those parrots are.

`> search :: Int -> Int -> Int -> IO [Bool]`

There are three cases. The first is that the range contains no parrots.

`> search start end 0 = coerceIO \$ take (end-start+1) (repeat False)`

The next case is that the range contains only one element, then we have found the location of a living parrot.

`> search start end 1          | start == end  = coerceIO [True]`

The interesting case is the remaining one, where we divide the list into two, and recurse.

`> search start end numParrots = let middle = (end+start) `div` 2>                                in do leftNum <- query start middle>                                      left <- search start middle leftNum>                                      right <- search (middle+1) end (numParrots-leftNum)>                                      return (left ++ right)`

All that remains is `main`, which contains my most suspicious use of `coerceIO`, to bind purely functional variables in a monad: I haven't figured out how to use `let` in a `do`.

It first reads a line containing two integers: the size of the parrot array, and the number of questions. We ignore the latter. We then find the number of live parrots in the array, and begin the search. The output format is "A 0 0 0 0 1 0 1 0 1 ...".

`> main :: IO ()> main = do line <- getLine>           n <- coerceIO \$ rInt \$ head \$ words line>           numParrots <- query 1 n>           parrots <- search 1 n numParrots>           putStrLn \$ "A " ++ (unwords \$ map (\x -> if x then "1" else "0") parrots)>           return ()`

Notes on Haskell syntax, for non-Haskellers: `f \$ x` is the same as `f x`, except that it binds more loosely. Its purpose is mostly to eliminate lots of irritating superfluous parentheses. `(\x -> y)` is an anonymous function mapping `x` to `y`

We now run a parallel contest online: same problems, but open to anyone. You should check it out this time next year. We might also be running the contests from the training camps online (training camps happen about three times a year, starting in February). Unfortunately, Haskell is not one of the supported languages, but after the contests you can download the evaluator programs and test data to score solutions in any language.

In the SACO, interactive problem solutions communicate through pipes with an evaluator, rather than reading an input file and writing an output file. This allows problems where access to the full dataset is unnecessary, and is either not allowed or too big to be read completely within the time limit.

1. Your "coerceIO", as its definition implies, can be written as just "return".

("do" is just sugar for writing multiple IOs in sequence. So "do return x" is the same as "return x"; the only time you need "do" is when you you have more than one thing to do, like in "query".)

Let in a do: "do let x = foo; let y = bar; return z" (semicolons and newlines are interchangable).

2. Same goes to "rInt": if you just use

> query :: Int -> Int -> IO Int
> query = ...

compiler will note the type of "query" and use Int-specific "read" automatically.

Hope it helps.

3. Just in addition to evan's comment, I think it's worth pointing out that in Haskell, return is not a keyword, like it is in most imperative languages. It is simply a function which takes a value, and produces a computation which does nothing but return that value.

Consequently, it doesn't have the control effects which return does in most imperative languages: return x in the middle of a do-block in Haskell is a complete no-op.

This might sound inconvenient at first to some imperative programmers, but it's surprisingly good for readability and refactoring. It means that every do-block will run to completion (unless there's an exception thrown).

Hence, whenever you have a line in a do-block which refers to some other computation which is defined as a do-block, it's like you can simply splice the lines of that block in without worrying if something in the middle of it will cause the block to finish too early. Similarly, you can trivially factor out a set of lines from a do-block without worrying about whether there are any early returns.

These properties of do-notation are in fact guaranteed more generally by the monad laws.

4. evan, thank you. D'oh. A little bit of thought is all I need. Good to know that you can do lets in a do. Is this a special case or am I not understanding let, either? The let I know has to be followed by an "in" clause.

panda, indeed. I'm not sure why I used rInt... the examples of read I looked at all do funny ::Int stuff, but I now realise that's because they don't have enough context to determine type.

cale, I'm aware that return is a function... I just hadn't grokked what it actually did. I guess I need to spend an intimate evening with the monad axioms.

5. The let in do-syntax is a special case. The translation rule is:

do { let { decls }; stmts }
--> let { decls } in do { stmts }

which means that the declarations made scope over the remainder of the do-block.

6. You can replace:

> take (end-start+1) (repeat False)

with:

> replicate (end-start+1) False

Also, I believe that:

> tree2list (Tree l r) = (tree2list l) ++ (tree2list r)
(this isn't exactly what you were doing, but is similar)

is less efficient than:

> tree2list t = tree2list' t []
> tree2list' (Tree l r) acc = tree2list l (tree2list r acc)
> tree2list' (Leaf v) acc = v : acc

See here: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-16.html#%_thm_2.63

I hope that's understandable/correct?

7. Rodrigo, you are right.