|
| 1 | +--- |
| 2 | +title: What it takes to add a new backend to Futhark |
| 3 | +description: A reader asked and the Futhark devblog answers the call. |
| 4 | +--- |
| 5 | + |
| 6 | +Recently Scott Pakin [suggested writing a blog post on how to add a |
| 7 | +new backend to the Futhark |
| 8 | +compiler](https://github.com/diku-dk/futhark/discussions/2224), and |
| 9 | +since there's active fiddling with the backends at this very moment, |
| 10 | +this is not a bad idea. Let us manage expectations up front: this will |
| 11 | +not be a *tutorial* on adding a backend. I will not go into the deep |
| 12 | +details on the specific internal APIs that should be used. Instead, I |
| 13 | +will focused on the core representations, and give an idea about the |
| 14 | +kind of work (often complicated) and magnitude (sometimes relatively |
| 15 | +little) it takes to add a new backend. It's also somewhat open |
| 16 | +precisely what a "backend" even means. There's a significant |
| 17 | +difference in complexity between adding a command `futhark foo |
| 18 | +bar.fut` that produces *something* based on `bar.fut` (very easy), to |
| 19 | +implementing another C-like GPU backend (not hard, but you need to |
| 20 | +touch a lot of pieces), to creating a fundamentally new backend for an |
| 21 | +alien piece of hardware (depending on your needs, can be extremely |
| 22 | +challenging). |
| 23 | + |
| 24 | +I will still link pertinent pieces of the source code as applicable - |
| 25 | +sometimes it is instructive just how simple (or simplistic) it is to |
| 26 | +finally glue together the complex bits. The Futhark compiler currently |
| 27 | +supports a fairly diverse set of targets (sequential CPU, multicore, |
| 28 | +different GPU APIs, C, Python). To achieve this without undue |
| 29 | +duplication of code and effort, the compiler uses fairly heavily |
| 30 | +parameterised representations of the program being compiled. I'll try |
| 31 | +to get the gist across, but the full details are very, well, detailed |
| 32 | +(and I always feel like they should be simplified - it's not the |
| 33 | +aspect of the compiler we're most proud of). |
| 34 | + |
| 35 | +For a drier exposition, there is also [the internal compiler |
| 36 | +documentation](https://hackage.haskell.org/package/futhark-0.25.27/docs/Futhark.html). |
| 37 | + |
| 38 | +## Architectural overview |
| 39 | + |
| 40 | +The Futhark compiler is a monolithic program written in Haskell. All |
| 41 | +passes and backends are part of the same executable. In principle, it |
| 42 | +would not be too difficult to make it possible to write a backend in a |
| 43 | +different language, as a separate executable, although it hasn't been |
| 44 | +relevant so far. |
| 45 | + |
| 46 | +The compiler consists of three main parts: |
| 47 | + |
| 48 | + * The *frontend*, which is concerned with parsing the Futhark source |
| 49 | + language, type-checking it, and transforming it to the core |
| 50 | + intermediate representation (IR). |
| 51 | + |
| 52 | + * The *middle-end*, which performs gradual refinement, |
| 53 | + transformation, and optimisation, on a program represented in |
| 54 | + various dialects of the the IR format (more on that below). |
| 55 | + |
| 56 | + * The *backend*, which translates from the IR representation into |
| 57 | + some lower level representation, such as C - likely via several |
| 58 | + steps. |
| 59 | + |
| 60 | +These parts form a chain. The compiler will always run the frontend, |
| 61 | +ultimately producing an intermediate representation of the program, |
| 62 | +then run an appropriate middle-end *pipeline*, which produces another |
| 63 | +representation of the program, and finally pass this to the backend. |
| 64 | + |
| 65 | +The frontend is pretty much the same no matter how you invoke the |
| 66 | +Futhark compiler (`futhark c`, `futhark cuda`, etc), but the |
| 67 | +middle-end and backend behaves very differently based on the compiler |
| 68 | +mode. For example, rather than having a single IR, the compiler |
| 69 | +actually has a *family* of IR dialects, suited for different purposes, |
| 70 | +and used at different stages of the compiler. To give a gist of what I |
| 71 | +mean, consider an extensible Haskell datatype for representing very |
| 72 | +simple expressions: |
| 73 | + |
| 74 | +```Haskell |
| 75 | +data Exp o = Var String |
| 76 | + | Val Int |
| 77 | + | Add (Exp o) (Exp o) |
| 78 | + | Sub (Exp o) (Exp o) |
| 79 | + | Op o |
| 80 | +``` |
| 81 | + |
| 82 | +Apart from representing variables, values, additions, and |
| 83 | +subtractions, this `Exp` type has also has a type parameter `o` that |
| 84 | +represents some *other* kind of operation, via the `Op` constructor. |
| 85 | +This means we can instantiate a variant of `Exp` that contains, say, |
| 86 | +square root as the operation: |
| 87 | + |
| 88 | +```Haskell |
| 89 | +data SqrtOp = Sqrt ExpWithSqrt |
| 90 | + |
| 91 | +type ExpWithSqrt = Exp SqrtOp |
| 92 | +``` |
| 93 | + |
| 94 | +But we could also have one with some general notion of a function call: |
| 95 | + |
| 96 | +```Haskell |
| 97 | +data FunOp = Fun String [ExpWithFun] |
| 98 | + |
| 99 | +type ExpWithFun = Exp FunOp |
| 100 | +``` |
| 101 | + |
| 102 | +We can now write functions that operate on `Exp o` values, as long as |
| 103 | +we parameterise those functions with how to handle the `o` cases |
| 104 | +(using function parameters, type classes, or what have you). This |
| 105 | +technique is useful when we want to have a collection of types that |
| 106 | +are largely the same. For example, in the middle-end of the Futhark |
| 107 | +compiler, we initially use an IR dialect called `SOACS`, where |
| 108 | +parallelism is expressed using nested higher order operations quite |
| 109 | +similar to the source language. Eventually, the program undergoes a |
| 110 | +[flattening transformation](2019-02-18-futhark-at-ppopp.html), after |
| 111 | +which parallelism is expressed using a different vocabulary of |
| 112 | +flat-parallel operations. Later, even the representation of types goes |
| 113 | +from something similar to the source language, to a representation |
| 114 | +that also contains information about memory layout. Most of the |
| 115 | +language-level details of the IR, such as how arithmetic and control |
| 116 | +flow is expressed, remains the same, and so does a lot of compiler |
| 117 | +code, such as simplification passes, that can operate on any dialect. |
| 118 | + |
| 119 | +The actual representation is a little more involved than explained |
| 120 | +above, and is quite similar to the approach described in the paper |
| 121 | +[Trees that |
| 122 | +Grow](https://www.cs.tufts.edu/comp/150FP/archive/simon-peyton-jones/trees-that-grow.pdf). |
| 123 | +In particular, type-level functions are used to avoid having a |
| 124 | +different type parameter for everything that can vary. |
| 125 | + |
| 126 | +We use this notion of IR dialects pervasively throughout both the |
| 127 | +middle-end and the backend. The middle-end uses a *pipeline* based on |
| 128 | +the compilation mode, which eventually produces a program in some IR |
| 129 | +dialect. That is, pipelines can be considered pure functions from some |
| 130 | +IR dialect to some other (or the same) IR dialect. For the `c` |
| 131 | +backend, this dialect is called `SeqMem` (no parallelism, with |
| 132 | +[information about array layout and |
| 133 | +allocations](2024-03-06-array-representation.html)), for the |
| 134 | +`multicore` and `ispc` backends it is `MCMem` (multicore parallel |
| 135 | +operations), and for the GPU backends it is called `GPUMem`. You can |
| 136 | +[see some of the default pipelines |
| 137 | +here](https://github.com/diku-dk/futhark/blob/4f2d5e67c744bfef87e4410176e7d52e980603da/src/Futhark/Passes.hs). |
| 138 | + |
| 139 | +Writing a new backend thus consists of picking a pipeline that |
| 140 | +transforms the IR into the dialect you wish, and then doing |
| 141 | +*something* with that IR - where the compiler is actually quite |
| 142 | +agnostic regarding what that *something* might be. Not every backend |
| 143 | +needs a distinct IR dialect - all of the GPU backends use the same IR |
| 144 | +dialect, for example. |
| 145 | + |
| 146 | +## Backend actions |
| 147 | + |
| 148 | +In Futhark compiler implementation lingo, the "backend" is called an |
| 149 | +"action", and is an essentially arbitrary procedure that runs on the |
| 150 | +result of a middle-end pipeline: |
| 151 | + |
| 152 | +```Haskell |
| 153 | +data Action rep = Action |
| 154 | + { actionName :: String, |
| 155 | + actionDescription :: String, |
| 156 | + actionProcedure :: Prog rep -> FutharkM () |
| 157 | + } |
| 158 | +``` |
| 159 | + |
| 160 | +Here `rep` is a type-level token representing the IR dialect accepted |
| 161 | +by the action, and `FutharkM` is monad that supports IO effects, |
| 162 | +meaning that these "actions" can perform arbitrary IO. For example, |
| 163 | +the action for `futhark c` will do a bunch of code generation in pure |
| 164 | +Haskell code, but *also* write some files and run a C compiler: |
| 165 | + |
| 166 | +```Haskell |
| 167 | +compileCAction :: FutharkConfig -> CompilerMode -> FilePath -> Action SeqMem |
| 168 | +compileCAction fcfg mode outpath = |
| 169 | + Action |
| 170 | + { actionName = "Compile to sequential C", |
| 171 | + actionDescription = "Compile to sequential C", |
| 172 | + actionProcedure = helper |
| 173 | + } |
| 174 | + where |
| 175 | + helper prog = do |
| 176 | + cprog <- handleWarnings fcfg $ SequentialC.compileProg versionString prog |
| 177 | + let cpath = outpath `addExtension` "c" |
| 178 | + hpath = outpath `addExtension` "h" |
| 179 | + jsonpath = outpath `addExtension` "json" |
| 180 | + |
| 181 | + case mode of |
| 182 | + ToLibrary -> do |
| 183 | + let (header, impl, manifest) = SequentialC.asLibrary cprog |
| 184 | + liftIO $ T.writeFile hpath $ cPrependHeader header |
| 185 | + liftIO $ T.writeFile cpath $ cPrependHeader impl |
| 186 | + liftIO $ T.writeFile jsonpath manifest |
| 187 | + ToExecutable -> do |
| 188 | + liftIO $ T.writeFile cpath $ SequentialC.asExecutable cprog |
| 189 | + runCC cpath outpath ["-O3", "-std=c99"] ["-lm"] |
| 190 | + ToServer -> do |
| 191 | + liftIO $ T.writeFile cpath $ SequentialC.asServer cprog |
| 192 | + runCC cpath outpath ["-O3", "-std=c99"] ["-lm"] |
| 193 | +``` |
| 194 | + |
| 195 | +Here the `SequentialC.compileProg` function does the actual C code |
| 196 | +generation. I'll elaborate a bit on it, but at an architectural level, |
| 197 | +it is not constrained at all in what it does. In principle, an action |
| 198 | +could just dump the final IR to disk and run some entirely different |
| 199 | +program that takes care of code generation. You might even write an |
| 200 | +action that expects the program to still be in one of the early IR |
| 201 | +dialects, such as the ones that do not have memory information, or |
| 202 | +even the one that still has nested parallelism. This might be |
| 203 | +appropriate if you are targeting some other (relatively) high level |
| 204 | +language. |
| 205 | + |
| 206 | +Ultimately, if you wish to write a backend that does not need a new IR |
| 207 | +dialect, and also does not need to reuse any of the existing C |
| 208 | +generation machinery, then this is consequently quite easy - at least |
| 209 | +as far as integration with the compiler is concerned. |
| 210 | + |
| 211 | +To actually hook up a pipeline with an action and produce something |
| 212 | +that can be invoked by the command line, you need to write a largely |
| 213 | +boilerplate `main` definition, like [this one for `futhark |
| 214 | +Haskell`](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CLI/C.hs): |
| 215 | + |
| 216 | +```c |
| 217 | +main :: String -> [String] -> IO () |
| 218 | +main = compilerMain |
| 219 | + () |
| 220 | + [] |
| 221 | + "Compile sequential C" |
| 222 | + "Generate sequential C code from optimised Futhark program." |
| 223 | + seqmemPipeline |
| 224 | + $ \fcfg () mode outpath prog -> |
| 225 | + actionProcedure (compileCAction fcfg mode outpath) prog |
| 226 | +``` |
| 227 | +
|
| 228 | +And then [finally hook it up to the big list of |
| 229 | +subommands](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CLI/Main.hs). |
| 230 | +That's all it takes. |
| 231 | +
|
| 232 | +## Imperative code generation |
| 233 | +
|
| 234 | +While it is true that an action can be arbitrary imperative code, in |
| 235 | +*practice* all of Futhark's C-based backends (and even the [Python |
| 236 | +ones](https://futhark-lang.org/blog/2016-04-15-futhark-and-pyopencl.html)) |
| 237 | +make use of significant shared infastructure to avoid having to |
| 238 | +reimplement the wheel too often. |
| 239 | +
|
| 240 | +As a starting point, the Futhark compiler defines an *imperative* |
| 241 | +intermediate representation, called |
| 242 | +[Imp](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CodeGen/ImpCode.hs). |
| 243 | +As with the middle-end, Imp is actually an extensible language, with |
| 244 | +various dialects. For example, [sequential |
| 245 | +ImpGen](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CodeGen/ImpCode/Sequential.hs). |
| 246 | +In contrast to the functional middle-end IR, which is very well |
| 247 | +defined, with type checking rules and a well-defined external syntax, |
| 248 | +Imp is a lot more ad hoc, and does not for example have a parser. |
| 249 | +Semantically, it's largely a simplified form of C. In fact, it is not |
| 250 | +even in [SSA |
| 251 | +form](https://en.wikipedia.org/wiki/Static_single-assignment_form), |
| 252 | +which still works out alright, because we do *no* optimisation at the |
| 253 | +Imp level. |
| 254 | +
|
| 255 | +The translation from the functional IR to Imp is done by a module |
| 256 | +called |
| 257 | +[ImpGen](https://github.com/diku-dk/futhark/blob/master/src/Futhark/CodeGen/ImpGen.hs). |
| 258 | +It is heavily parameterised, because it essentially has to go from an |
| 259 | +*arbitrary IR dialect* to *an arbitrary Imp dialect*. It is full of |
| 260 | +implementation details, but not particularly interesting. |
| 261 | +
|
| 262 | +Once the compiler has obtained an Imp representation of the program, |
| 263 | +it can then turn that program into C or Python, or even some other |
| 264 | +language. This is largely a mechanical process - the semantic gap from |
| 265 | +Imp to C (or the baroque form of Python produced by Futhark) is not |
| 266 | +great, and mostly involves mapping Imp constructs to the facilities |
| 267 | +provided by the (very small) Futhark runtime system, and of course |
| 268 | +generating syntactically valid code. To ease maintenance of the three |
| 269 | +GPU backends (`cuda`, `opencl`, `hip`), we also make use of a small |
| 270 | +GPU abstraction layer |
| 271 | +([gpu.h](https://github.com/diku-dk/futhark/blob/master/rts/c/gpu.h), |
| 272 | +discussed in [this |
| 273 | +paper](https://futhark-lang.org/publications/fproper24.pdf). |
| 274 | +
|
| 275 | +## Advice on writing a backend |
| 276 | +
|
| 277 | +Futhark is not set up to make it especially easy to add new backends, |
| 278 | +but neither is it particularly difficult. After all, as of this |
| 279 | +writing we support 10 different backends. Here is some advice for any |
| 280 | +prospective people who wish to seek glory by adding a backend: |
| 281 | +
|
| 282 | +* If you want to target a very high level parallel language, use only |
| 283 | + Futhark's frontend and the middle-end up to the `SOACS` |
| 284 | + representation. This will give you a monomorphic first-order program |
| 285 | + (except for parallel operations) where all types are scalars or |
| 286 | + arrays of scalars, but still with nested parallelism, although that |
| 287 | + parallelism will be well fused. I do think it would be a fun |
| 288 | + experiment to generate code for a fork-join language, such as |
| 289 | + [MPL](https://github.com/MPLLang/mpl), from this representation. |
| 290 | +
|
| 291 | +* If you want to target a slightly less high level parallel language, |
| 292 | + in particular one that does not handle nested parallelism well, |
| 293 | + consider processing the output of the `GPU` representation. Despite |
| 294 | + the name, it is not *truly* GPU specific (the parts that are can be |
| 295 | + ignored or modified, and are mostly about metadata and tuning), but |
| 296 | + merely guarantees the absence of nested parallelism. It is still |
| 297 | + high level and with value-oriented semantics, with no notion of |
| 298 | + memory. |
| 299 | +
|
| 300 | +* If you want to target a new GPU backend, implement the `gpu.h` |
| 301 | + abstraction layer. The code generation work for CPU-side work will |
| 302 | + then be fairly straightforward, although you may still need to do |
| 303 | + significant work to generate the actual GPU kernel code. We are |
| 304 | + currently going through this process with the in-progress [WebGPU |
| 305 | + backend](https://github.com/diku-dk/futhark/pull/2140), and most of |
| 306 | + the challenges are related to the particular limitations of WebGPU |
| 307 | + (a post for another time), and not so much the compiler engineering. |
| 308 | +
|
| 309 | +* If you want to generate low level code of any kind, you will likely |
| 310 | + find it easiest to use one of the IR dialects with memory |
| 311 | + information. If you want to generate something that is relatively |
| 312 | + C-like (and generating e.g. machine code or JavaScript is both |
| 313 | + "C-like" in this regard), then using the existing machinery for |
| 314 | + generating Imp is almost certainly easiest. |
| 315 | +
|
| 316 | +However, in all cases, I would say a very good idea is to contact one |
| 317 | +of the Futhark developers for advice and help. Having a third party |
| 318 | +add a new backend is not really something we have considered much (all |
| 319 | +of the backends have been written under our close supervision), and |
| 320 | +while the *technical* challenges are not all that major by the |
| 321 | +standards of writing compiler backends, the *documentation* is not |
| 322 | +really up to the task. But I would certainly be very excited for |
| 323 | +someone to give it a try. |
0 commit comments