WebAssembly 'initial-memory' Trickery
Well, it’s a been a while. So long in fact that I actually forgot how I had the site set up and couldn’t figure out how to actually add a new entry for about 15 minutes. Well, on the plus side, everything is up to date now.
I don’t really do as much side projects anymore, I find doing 40 hours of full-time programming tends to discourage too much hobby projects and instead I just spend my free time doing more relaxing hobbies that don’t really take any real mental effort, like video games.
I was inspired by a very recent problem at work, where I’m working on a rendering engine that also compiles to WebAssembly (i.e. as close as I comfortably want to get to ‘web development’.)
Specifically, there was an unexpected and considerable performance deficit when initialising a 3D scene on the Chrome browser, which performed fine on a native Win32 platform and also performed nearly as well on Firefox. Evidently there was something browser specific at play.
This project is set up intentionally with a very lightweight toolchain (i.e. not Emscripten.) Just plain C code being compiled by a fairly recent Clang/LLVM version, so it was odd that this was happening.
Now, I personally tend to program with a very crude ‘trial & error’ methodology in all my work. It may not be the most efficient approach, but I particularly like this because it allows me to very quickly learn, and occasionally pick up some interesting observations from iterative approaches in my work - some of which deviate quite a bit from the original goal, but can sometimes provide some interesting bits of knowledge upon retrospection.
In this case, it came in particularly handy. I knew up front that this was something that was browser specific, so I figured there’s little point in trying to change the code around to accomodate a single browser engine. Rather than trying 1 little change at a time, I went for a bit of a “shotgun” approach and made a whole bunch of changes - compiler flags, linker flags, toolchain upgrades.. All kinds of things that in combination were quite impractical, but it allowed me to quickly see if the full combination had any effect. And it did.
At this point I had to go in reverse. Revert my changes one by one until I found what really made an impact. Maybe it was a combination of things? Maybe it was a change that relied on another?
Turns out it was a single change. Changing the initial WebAssembly memory size. Changing this resulted in deserializing my data no longer taking ~5 minutes, but just ~20 seconds. Still twice as slow as Firefox & native variants, but far more acceptable, than.. well.. 5 minutes. Huh? I decided to investigate further why this had such a drastic effect.
For a very brief bit of background, this application sets a very small initial size, but explicitly sets a maximum memory size of 4GB (because at the time, if the maximum isn’t set explicitly, some browsers only allow up to 2GB of memory usage.) Upon closer inspection, I found out that Chrome does not reserve the full maximum address space, unlike Firefox. Instead, when the heap is exhausted, a new address space is allocated and the old data is copied across. This only gets worse over time as the heap gets larger which explains why there’s such a significant performance difference. It looks like Chrome does in fact reserve some address space, but only up to 1GB.
Ironically, setting the initial memory size to 4GB (which sounds like a terrible idea, yes, but bear with me) doesn’t actually initialise 4GB. If anything, it behaves quite similarly to what Firefox was already doing - reserving 4GB of address space / virtual memory, and allocates a small and practical heap up front that grows over time. I’ll have to test this doesn’t cause instability on mobile devices, but otherwise this seems to completely fix the performance problems with loading large data up front.
TL;DR - Chrome doesn’t reserve address space for WebAssembly memory. Consider being bold and setting your “initial” size high if you deal with potentially large data, even if your gut tells you this is wrong. Maybe when memory64 (i.e. 64-bit WebAssembly) becomes standardised this problem goes away, but we’ll see.