Further thoughts on TornadoVM

10Mar20

TornadoVM was definitely the coolest thing I learned about at QCon London last week, which is why I wrote up the presentation on InfoQ.

It seems that people on the Orange web site are also interested in the intersection of Java, GPUs and FPGA, as the piece was #1 there last night as I went to bed, and as I write this there are 60 comments in the thread.

I’ve been interested in this stuff for a very long time

and wrote about FPGA almost 6 years ago, so no need to go over the same old ground here.

What I didn’t mention then was that FPGA was the end of a whole spectrum of solutions I’d looked at during my banking days as ways to build smaller grids (or even not build grids at all and solve problems on a single box). If we take multi core CPUs as the starting point, the spectrum included GPUs, the Cell Processor, QuickSilver and ultimately FPGAs; and notionally there was a trade off between developer complexity and pure performance potential at each step along that spectrum.

Just In Time (JIT) Compilers are perfectly positioned to reason about performance

Which means that virtual machine (VM) based languages like Java (and Go[1]) might be best positioned to exploit the acceleration on offer.

After all the whole point of JIT is to optimise the trade-off between language interpretation and compilation, so why not add extra dimensionality to that and optimise the trade-off between different hardware targets.

TornadoVM can speculatively execute code on different accelerators to see which might offer the best speedup. Of course that sort of testing and profiling is only useful in a dev environment, once the right accelerator is found for a given workload it can be locked into place for production.

It’s just OpenCL under the hood

Yep. But the whole point is that developers don’t have to learn OpenCL and associated tool chains. That work has been done for them.

Again, this is powerful in addressing trade-offs, this time between developer productivity and system cost (ultimately energy). My FPGA experiment at the bank crashed on the rocks of it would have cost $1.5m in quants to save $1m in data centre costs, and nobody invests in negative payouts. If the quants could have kept on going with C++ (or Haskell or anything else they liked) rather than needing to learn VHDL or Verilog then it becomes a very different picture.

Which means it’s not real Java

So what, arithmetic doesn’t need fancy objects.

There’s some residual cognitive load here for developers. They firstly need to reason about suitable code blocks, and apply annotations, and then they must ensure that those code blocks are simple enough to work on the accelerator.

If I had greater faith in compilers I’d maybe suggest that they could figure this stuff out for themselves, and save developers from that effort; but I don’t the compiler will not save you.

Conclusion

My FPGA experiment some 15 years ago taught me a hard lesson about system boundaries when thinking about performance, and the trade-off between developer productivity and system productivity turns out to matter a lot. TornadoVM looks to me like something that’s addressing that trade-off, which is why I look forward to watching how it develops and what it might get used for.

Updates

10 Mar 2020 I’d originally written ‘TornadoVM doesn’t (yet) speculatively execute code on different accelerators to see which might offer the best speedup, but that’s the sort of thing that could come along down the line‘, but Dr Juan Fumero corrected me on Twitter and pointed to a pre-print of the paper explaining how it works.

Note

[1] Rob Taylor at Reconfigure.io worked on running Golang on FPGA



No Responses Yet to “Further thoughts on TornadoVM”

  1. Leave a Comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.