The following post is part of Spoke Proof. Tell your friends.
You could’ve invented Haskell’s I/O
Dilemma
Dilemma: you are a lazy, pure, typed functional language and you want to be able to read standard input.
In an impure language like C this isn’t a big deal. Those languages
have a read
that look mighty like
// Reads up to newline from standard input, and returns
// it as an array of bytes.
// (You'll have to free the pointer, I guess.)
char *read(void) {
// ...
}
(void)
is our way of telling the compiler that read
takes no
arguments. Which means read
starts with nothing!
Nothing but the C environment, that is, which just so happens to include all sorts of
useful things. Things like global variables, system calls, malloc(3)
, free(3)
,
rand(3)
, and all of libc. All the things you need to
implement read(3)
.
In you — a pure language — a function that takes no arguments
starts with nothing. And this time we mean actually nothing.
A pure language cannot by definition create side effects. (You
probably find this highly restrictive. But purity is not simply a tax;
it gives us equational reasoning, and a faster compiler, and stronger
types.) So anyway whatever unless we are more clever, there is only
one way to implement read
with this type:
-- Reads up to newline from standard input, yada yada.
read :: Bytes
read = utf8Encode "hello, world"
We can vary the string, but it always has to be a constant.
[Brief authorial interjection] I am assuming some familiarity with ML/Haskell-like syntax here. I am often uncertain of the amount of explaining that goes into a good post. Every dispatch from the frontier seems to require two other dispatches. Just stick with me here. I’ll write up a how-to-read-Haskell post later.
And so we abandon the idea of pure I/O.…
… or we can be more clever.
Being more clever
We said earlier that you lack an “environment,” for some definition of
that word, in read
and also every other function, and we ascribed
this handcuff to purity. This is not quite true.
There is exactly one place where you do have an
environment. In C, it is everywhere. You can
call malloc(3)
from anywhere or you can run a shell with
rm -rf *
from anywhere, and the C runtime knows exactly
what you mean.
But you only have one, and it is the main
function.
In C we are used to main
functions that are very dumb.
// Please return an exit code.
int main(void) {
// ...
}
You might think that you have to do the same thing in your language, and ask executable programs to implement an entry point with the same type.
main :: Int
main = undefined
But why? Without the baggage of C’s int main(void)
we might
adventure onward and explore more interesting types.
For example, what if main
could ask the runtime to do something?
data Request =
| ReadStdIn
main :: Request
main = ReadStdIn
Cute, and pure. But you’ll notice immediately a big problem. main
can ask questions, but it can never hear answers. You might at this
point try
submitRequest :: Request -> Bytes
submitRequest = undefined
main :: Int
main = submitRequest ReadStdin
But you’ll end up with the same problem as before: there is no way to
implement Request -> Bytes
in a pure language, besides all the
trivial useless ways.
You might then ask: can we send the bytes to main
directly? We did
make a big fuss about main
having some sort of “environment.” And
the answer is yes!
data Request =
| ReadStdin
| Exit Int
data Event =
| ProgramJustStarted
| ReadStdinComplete Bytes
main :: Event -> Request
main ProgramJustStarted = ReadStdin
main (ReadStdinComplete bytes) = Exit 0
You now have I/O in a completely pure way. And they said it couldn’t be done.
Troubling rumors from the edge of civilization
An astute reader might criticize your library design on the point that it seems impossible to read from stdin more than once. (After all, how would you distinguish between different reads?)
As a computer scientist, you solve problems by either caching, counting, or naming. In this case, counting seems best.
data Event =
| ProgramJustStarted
| ReadStdinComplete Int Bytes
main :: Event -> [Request]
main ProgramJustStarted = ReadStdin
main (ReadStdinComplete 0 bytes) = ReadStdin
main (ReadStdinComplete 1 bytes) = Exit 0
Ah, thinks the astute reader. She has allowed me to read
bytes
, and so I have two bytes
variables, but I could never use
both in the same expression.
This problem is very interesting. The problem of sequencing and interleaving computation is something on which you could spend an entire novel; you could lead an entire seminar on the problem, and how humans have tackled it. Generals have led soldiers to battle on much simpler problems. The problem transcends both creation/art and death/war.
But there’s something simple at hand that you can reach for, which is lists.
Lists?
You and I, we have been using [a]
very innocently so far, to
represent a list of values of type a
. But we are a lazy language.
And in a lazy language, lists are actually much closer to streams.
In a lazy language, all the values in a list may not be determined at the time the list is passed into a function. So you could have this program:
data Event =
| ProgramJustStarted
| ReadStdinComplete Int Bytes
main :: [Event] -> [Request]
main [ProgramJustStarted, ReadStdinComplete a, ReadStdinComplete b] =
[ ReadStdin
, ReadStdin
, Exit 0
]
And here’s how you could run such a program, if you were the runtime (and the metaphorical conceit of this post is that you are, so just go with it):
Pass
(ProgramJustStarted : thunk) :: [Event]
to this program’smain
.Take the
head
ofmain
’s list of requests.If the request is
Exit Int
, exit with that exit code.Else if the request is
ReadStdin
, make a system call to readbytes
from your local friendly POSIX-compliant operating system. Then expand thunk out by one step:(ProgramJustStarted : ReadStdinComplete bytes) :: Event
. Then repeat.
The key here is that the list is eventually all three events.1
We cautiously venture that this might actually work.
It does
It does! This is, at its core, the design of I/O library in the Haskell 1.0 Report (see section 7 and figure 3 of this impossibly well-written and interesting, “you should read the entire thing and not just section 7”-type paper).2 Haskell has since wildly improved upon this design, but that is a post for another day.
Let’s round out this longwinded weblog post with something useful: a program that reads two integers from standard input and prints the sum to standard out.
data Handle =
| StandardIn
| StandardOut
| File Int -- !
data Event =
| ProgramJustStarted
| OpenFileComplete Handle -- !
| ReadComplete Handle Bytes
| PrintComplete
data Request =
| OpenFile FilePath -- !
| Read Handle
| Print Handle Bytes
| Exit Int
-- I mean, why not?
integerOf :: Bytes -> Int
integerOf bytes = case utf8Decode bytes of
| "0" -> 0
| "1" -> 1
| "2" -> 2
| "3" -> 3
| "4" -> 4
| "5" -> 5
| "6" -> 6
| "7" -> 7
| "8" -> 8
| "9" -> 9
-- and so on
show :: Int -> Bytes
show = undefined -- same thing, but in reverse
main :: [Event] -> [Request]
main [ ProgramJustStarted
, ReadComplete StandardIn a
, ReadComplete Standardin b] =
[ Read StandardIn
, Read StandardIn
, (Print StandardOut . show) (integerOf a + integerOf b)
, Exit 0
]
It may seem to you strange that we are able to define a list of requests whose values depend on the list of events whose values depend on said list of requests whose values depend on.…
Why is that possible? And how is it possible in such few lines of code?
Perhaps we should stop teaching C and Java as first languages.
Next time let’s talk about continuations and, after that, monads.
-
Haskell users might point out here that this requires lazy pattern-matching, whereas Haskell defaults to strict, so at this point the weblog post begins to deviate from Haskell. And but so Is this really Haskell I/O?. To which we say, take it down a notch. ↩
Haskell jumps into lazy pattern matching if you prepend a tilde to your patterns. We have simply elided the tildes here.
-
See, I told you tildes were the way to go. ↩
Honestly, if you read the PDF at this link you could skip this entire post. Understatement: Simon Peyton-Jones is a good writer. His being one of the architects of Haskell and GHC is just icing.
But: if you want to translate the code from this post into code from the paper, let me help you a little:
In Haskell and in the paper, tildes start lazy pattern matching, but prevent the nice square-bracket syntax we’ve been using.
The paper is doing the correct thing ending with an underscore e.g.
x : y : _
, which allows more events than we were expecting. Our program would simply crash if more events were sent than we expected.spj has aliased
Behaviour
to[Response] -> [Request]
, which is equivalent to our[Event] -> [Request]
. Thusmain :: Behaviour
.ProgramHasStarted
is actuallySuccess
.