Skip to content

Commit 5c0eff2

Browse files
committed
New blog post.
1 parent 3fe3315 commit 5c0eff2

File tree

1 file changed

+323
-0
lines changed

1 file changed

+323
-0
lines changed
+323
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
---
2+
title: What it takes to add a new backend to Futhark
3+
description: A reader asked and the Futhark devblog answers the call.
4+
---
5+
6+
Recently Scott Pakin [suggested writing a blog post on how to add a
7+
new backend to the Futhark
8+
compiler](https://github.com/diku-dk/futhark/discussions/2224), and
9+
since there's active fiddling with the backends at this very moment,
10+
this is not a bad idea. Let us manage expectations up front: this will
11+
not be a *tutorial* on adding a backend. I will not go into the deep
12+
details on the specific internal APIs that should be used. Instead, I
13+
will focused on the core representations, and give an idea about the
14+
kind of work (often complicated) and magnitude (sometimes relatively
15+
little) it takes to add a new backend. It's also somewhat open
16+
precisely what a "backend" even means. There's a significant
17+
difference in complexity between adding a command `futhark foo
18+
bar.fut` that produces *something* based on `bar.fut` (very easy), to
19+
implementing another C-like GPU backend (not hard, but you need to
20+
touch a lot of pieces), to creating a fundamentally new backend for an
21+
alien piece of hardware (depending on your needs, can be extremely
22+
challenging).
23+
24+
I will still link pertinent pieces of the source code as applicable -
25+
sometimes it is instructive just how simple (or simplistic) it is to
26+
finally glue together the complex bits. The Futhark compiler currently
27+
supports a fairly diverse set of targets (sequential CPU, multicore,
28+
different GPU APIs, C, Python). To achieve this without undue
29+
duplication of code and effort, the compiler uses fairly heavily
30+
parameterised representations of the program being compiled. I'll try
31+
to get the gist across, but the full details are very, well, detailed
32+
(and I always feel like they should be simplified - it's not the
33+
aspect of the compiler we're most proud of).
34+
35+
For a drier exposition, there is also [the internal compiler
36+
documentation](https://hackage.haskell.org/package/futhark-0.25.27/docs/Futhark.html).
37+
38+
## Architectural overview
39+
40+
The Futhark compiler is a monolithic program written in Haskell. All
41+
passes and backends are part of the same executable. In principle, it
42+
would not be too difficult to make it possible to write a backend in a
43+
different language, as a separate executable, although it hasn't been
44+
relevant so far.
45+
46+
The compiler consists of three main parts:
47+
48+
* The *frontend*, which is concerned with parsing the Futhark source
49+
language, type-checking it, and transforming it to the core
50+
intermediate representation (IR).
51+
52+
* The *middle-end*, which performs gradual refinement,
53+
transformation, and optimisation, on a program represented in
54+
various dialects of the the IR format (more on that below).
55+
56+
* The *backend*, which translates from the IR representation into
57+
some lower level representation, such as C - likely via several
58+
steps.
59+
60+
These parts form a chain. The compiler will always run the frontend,
61+
ultimately producing an intermediate representation of the program,
62+
then run an appropriate middle-end *pipeline*, which produces another
63+
representation of the program, and finally pass this to the backend.
64+
65+
The frontend is pretty much the same no matter how you invoke the
66+
Futhark compiler (`futhark c`, `futhark cuda`, etc), but the
67+
middle-end and backend behaves very differently based on the compiler
68+
mode. For example, rather than having a single IR, the compiler
69+
actually has a *family* of IR dialects, suited for different purposes,
70+
and used at different stages of the compiler. To give a gist of what I
71+
mean, consider an extensible Haskell datatype for representing very
72+
simple expressions:
73+
74+
```Haskell
75+
data Exp o = Var String
76+
| Val Int
77+
| Add (Exp o) (Exp o)
78+
| Sub (Exp o) (Exp o)
79+
| Op o
80+
```
81+
82+
Apart from representing variables, values, additions, and
83+
subtractions, this `Exp` type has also has a type parameter `o` that
84+
represents some *other* kind of operation, via the `Op` constructor.
85+
This means we can instantiate a variant of `Exp` that contains, say,
86+
square root as the operation:
87+
88+
```Haskell
89+
data SqrtOp = Sqrt ExpWithSqrt
90+
91+
type ExpWithSqrt = Exp SqrtOp
92+
```
93+
94+
But we could also have one with some general notion of a function call:
95+
96+
```Haskell
97+
data FunOp = Fun String [ExpWithFun]
98+
99+
type ExpWithFun = Exp FunOp
100+
```
101+
102+
We can now write functions that operate on `Exp o` values, as long as
103+
we parameterise those functions with how to handle the `o` cases
104+
(using function parameters, type classes, or what have you). This
105+
technique is useful when we want to have a collection of types that
106+
are largely the same. For example, in the middle-end of the Futhark
107+
compiler, we initially use an IR dialect called `SOACS`, where
108+
parallelism is expressed using nested higher order operations quite
109+
similar to the source language. Eventually, the program undergoes a
110+
[flattening transformation](2019-02-18-futhark-at-ppopp.html), after
111+
which parallelism is expressed using a different vocabulary of
112+
flat-parallel operations. Later, even the representation of types goes
113+
from something similar to the source language, to a representation
114+
that also contains information about memory layout. Most of the
115+
language-level details of the IR, such as how arithmetic and control
116+
flow is expressed, remains the same, and so does a lot of compiler
117+
code, such as simplification passes, that can operate on any dialect.
118+
119+
The actual representation is a little more involved than explained
120+
above, and is quite similar to the approach described in the paper
121+
[Trees that
122+
Grow](https://www.cs.tufts.edu/comp/150FP/archive/simon-peyton-jones/trees-that-grow.pdf).
123+
In particular, type-level functions are used to avoid having a
124+
different type parameter for everything that can vary.
125+
126+
We use this notion of IR dialects pervasively throughout both the
127+
middle-end and the backend. The middle-end uses a *pipeline* based on
128+
the compilation mode, which eventually produces a program in some IR
129+
dialect. That is, pipelines can be considered pure functions from some
130+
IR dialect to some other (or the same) IR dialect. For the `c`
131+
backend, this dialect is called `SeqMem` (no parallelism, with
132+
[information about array layout and
133+
allocations](2024-03-06-array-representation.html)), for the
134+
`multicore` and `ispc` backends it is `MCMem` (multicore parallel
135+
operations), and for the GPU backends it is called `GPUMem`. You can
136+
[see some of the default pipelines
137+
here](https://github.com/diku-dk/futhark/blob/4f2d5e67c744bfef87e4410176e7d52e980603da/src/Futhark/Passes.hs).
138+
139+
Writing a new backend thus consists of picking a pipeline that
140+
transforms the IR into the dialect you wish, and then doing
141+
*something* with that IR - where the compiler is actually quite
142+
agnostic regarding what that *something* might be. Not every backend
143+
needs a distinct IR dialect - all of the GPU backends use the same IR
144+
dialect, for example.
145+
146+
## Backend actions
147+
148+
In Futhark compiler implementation lingo, the "backend" is called an
149+
"action", and is an essentially arbitrary procedure that runs on the
150+
result of a middle-end pipeline:
151+
152+
```Haskell
153+
data Action rep = Action
154+
{ actionName :: String,
155+
actionDescription :: String,
156+
actionProcedure :: Prog rep -> FutharkM ()
157+
}
158+
```
159+
160+
Here `rep` is a type-level token representing the IR dialect accepted
161+
by the action, and `FutharkM` is monad that supports IO effects,
162+
meaning that these "actions" can perform arbitrary IO. For example,
163+
the action for `futhark c` will do a bunch of code generation in pure
164+
Haskell code, but *also* write some files and run a C compiler:
165+
166+
```Haskell
167+
compileCAction :: FutharkConfig -> CompilerMode -> FilePath -> Action SeqMem
168+
compileCAction fcfg mode outpath =
169+
Action
170+
{ actionName = "Compile to sequential C",
171+
actionDescription = "Compile to sequential C",
172+
actionProcedure = helper
173+
}
174+
where
175+
helper prog = do
176+
cprog <- handleWarnings fcfg $ SequentialC.compileProg versionString prog
177+
let cpath = outpath `addExtension` "c"
178+
hpath = outpath `addExtension` "h"
179+
jsonpath = outpath `addExtension` "json"
180+
181+
case mode of
182+
ToLibrary -> do
183+
let (header, impl, manifest) = SequentialC.asLibrary cprog
184+
liftIO $ T.writeFile hpath $ cPrependHeader header
185+
liftIO $ T.writeFile cpath $ cPrependHeader impl
186+
liftIO $ T.writeFile jsonpath manifest
187+
ToExecutable -> do
188+
liftIO $ T.writeFile cpath $ SequentialC.asExecutable cprog
189+
runCC cpath outpath ["-O3", "-std=c99"] ["-lm"]
190+
ToServer -> do
191+
liftIO $ T.writeFile cpath $ SequentialC.asServer cprog
192+
runCC cpath outpath ["-O3", "-std=c99"] ["-lm"]
193+
```
194+
195+
Here the `SequentialC.compileProg` function does the actual C code
196+
generation. I'll elaborate a bit on it, but at an architectural level,
197+
it is not constrained at all in what it does. In principle, an action
198+
could just dump the final IR to disk and run some entirely different
199+
program that takes care of code generation. You might even write an
200+
action that expects the program to still be in one of the early IR
201+
dialects, such as the ones that do not have memory information, or
202+
even the one that still has nested parallelism. This might be
203+
appropriate if you are targeting some other (relatively) high level
204+
language.
205+
206+
Ultimately, if you wish to write a backend that does not need a new IR
207+
dialect, and also does not need to reuse any of the existing C
208+
generation machinery, then this is consequently quite easy - at least
209+
as far as integration with the compiler is concerned.
210+
211+
To actually hook up a pipeline with an action and produce something
212+
that can be invoked by the command line, you need to write a largely
213+
boilerplate `main` definition, like [this one for `futhark
214+
Haskell`](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CLI/C.hs):
215+
216+
```c
217+
main :: String -> [String] -> IO ()
218+
main = compilerMain
219+
()
220+
[]
221+
"Compile sequential C"
222+
"Generate sequential C code from optimised Futhark program."
223+
seqmemPipeline
224+
$ \fcfg () mode outpath prog ->
225+
actionProcedure (compileCAction fcfg mode outpath) prog
226+
```
227+
228+
And then [finally hook it up to the big list of
229+
subommands](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CLI/Main.hs).
230+
That's all it takes.
231+
232+
## Imperative code generation
233+
234+
While it is true that an action can be arbitrary imperative code, in
235+
*practice* all of Futhark's C-based backends (and even the [Python
236+
ones](https://futhark-lang.org/blog/2016-04-15-futhark-and-pyopencl.html))
237+
make use of significant shared infastructure to avoid having to
238+
reimplement the wheel too often.
239+
240+
As a starting point, the Futhark compiler defines an *imperative*
241+
intermediate representation, called
242+
[Imp](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CodeGen/ImpCode.hs).
243+
As with the middle-end, Imp is actually an extensible language, with
244+
various dialects. For example, [sequential
245+
ImpGen](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CodeGen/ImpCode/Sequential.hs).
246+
In contrast to the functional middle-end IR, which is very well
247+
defined, with type checking rules and a well-defined external syntax,
248+
Imp is a lot more ad hoc, and does not for example have a parser.
249+
Semantically, it's largely a simplified form of C. In fact, it is not
250+
even in [SSA
251+
form](https://en.wikipedia.org/wiki/Static_single-assignment_form),
252+
which still works out alright, because we do *no* optimisation at the
253+
Imp level.
254+
255+
The translation from the functional IR to Imp is done by a module
256+
called
257+
[ImpGen](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CodeGen/ImpGen.hs).
258+
It is heavily parameterised, because it essentially has to go from an
259+
*arbitrary IR dialect* to *an arbitrary Imp dialect*. It is full of
260+
implementation details, but not particularly interesting.
261+
262+
Once the compiler has obtained an Imp representation of the program,
263+
it can then turn that program into C or Python, or even some other
264+
language. This is largely a mechanical process - the semantic gap from
265+
Imp to C (or the baroque form of Python produced by Futhark) is not
266+
great, and mostly involves mapping Imp constructs to the facilities
267+
provided by the (very small) Futhark runtime system, and of course
268+
generating syntactically valid code. To ease maintenance of the three
269+
GPU backends (`cuda`, `opencl`, `hip`), we also make use of a small
270+
GPU abstraction layer
271+
([gpu.h](https://github.com/diku-dk/futhark/blob/master/rts/c/gpu.h),
272+
discussed in [this
273+
paper](https://futhark-lang.org/publications/fproper24.pdf).
274+
275+
## Advice on writing a backend
276+
277+
Futhark is not set up to make it especially easy to add new backends,
278+
but neither is it particularly difficult. After all, as of this
279+
writing we support 10 different backends. Here is some advice for any
280+
prospective people who wish to seek glory by adding a backend:
281+
282+
* If you want to target a very high level parallel language, use only
283+
Futhark's frontend and the middle-end up to the `SOACS`
284+
representation. This will give you a monomorphic first-order program
285+
(except for parallel operations) where all types are scalars or
286+
arrays of scalars, but still with nested parallelism, although that
287+
parallelism will be well fused. I do think it would be a fun
288+
experiment to generate code for a fork-join language, such as
289+
[MPL](https://github.com/MPLLang/mpl), from this representation.
290+
291+
* If you want to target a slightly less high level parallel language,
292+
in particular one that does not handle nested parallelism well,
293+
consider processing the output of the `GPU` representation. Despite
294+
the name, it is not *truly* GPU specific (the parts that are can be
295+
ignored or modified, and are mostly about metadata and tuning), but
296+
merely guarantees the absence of nested parallelism. It is still
297+
high level and with value-oriented semantics, with no notion of
298+
memory.
299+
300+
* If you want to target a new GPU backend, implement the `gpu.h`
301+
abstraction layer. The code generation work for CPU-side work will
302+
then be fairly straightforward, although you may still need to do
303+
significant work to generate the actual GPU kernel code. We are
304+
currently going through this process with the in-progress [WebGPU
305+
backend](https://github.com/diku-dk/futhark/pull/2140), and most of
306+
the challenges are related to the particular limitations of WebGPU
307+
(a post for another time), and not so much the compiler engineering.
308+
309+
* If you want to generate low level code of any kind, you will likely
310+
find it easiest to use one of the IR dialects with memory
311+
information. If you want to generate something that is relatively
312+
C-like (and generating e.g. machine code or JavaScript is both
313+
"C-like" in this regard), then using the existing machinery for
314+
generating Imp is almost certainly easiest.
315+
316+
However, in all cases, I would say a very good idea is to contact one
317+
of the Futhark developers for advice and help. Having a third party
318+
add a new backend is not really something we have considered much (all
319+
of the backends have been written under our close supervision), and
320+
while the *technical* challenges are not all that major by the
321+
standards of writing compiler backends, the *documentation* is not
322+
really up to the task. But I would certainly be very excited for
323+
someone to give it a try.

0 commit comments

Comments
 (0)