Boosting WebAssembly Performance with Speculative Inlining and Deoptimization in V8

WebAssembly execution has traditionally relied on static optimization, but with the introduction of WasmGC, V8 now brings speculative techniques—specifically call_indirect inlining and deoptimization—to the platform. Shipping in Chrome M137, these optimizations leverage runtime feedback to generate faster machine code, delivering dramatic speedups for garbage-collected languages like Dart, Java, and Kotlin. This Q&A explores how they work, why they matter, and what’s next.

1. What new WebAssembly optimizations did V8 introduce in Chrome M137?

V8 implemented two complementary optimizations: speculative call_indirect inlining and deoptimization support for WebAssembly. Speculative inlining allows the compiler to inline indirect function calls (call_indirect) based on past execution patterns, assuming the same target will be called again. Deoptimization support provides a safety net: if the assumption proves wrong at runtime, V8 can discard the optimized code and fall back to a slower, correct execution path. This pair enables V8 to generate aggressive, type-specialized machine code for WebAssembly, particularly benefiting WasmGC programs that involve dynamic dispatch and rich type hierarchies. Together, they mark V8’s first use of runtime feedback–driven optimizations for WebAssembly, moving beyond purely ahead-of-time approaches.

Boosting WebAssembly Performance with Speculative Inlining and Deoptimization in V8 — Source: v8.dev

2. How significant are the performance improvements from these optimizations?

The combined optimizations yield substantial speedups. On a set of Dart microbenchmarks, average performance improves by more than 50%. For larger, realistic applications—such as those compiled from Java or Kotlin via WasmGC—the speedup ranges from 1% to 8%. While the microbenchmark gains are dramatic, the more modest improvements on larger codebases still represent meaningful wins, especially in areas where indirect calls are common. The deoptimization infrastructure also lays the groundwork for even more aggressive future optimizations, promising further gains as V8 continues to refine its WebAssembly pipeline.

3. Why didn’t WebAssembly need speculative optimizations before?

WebAssembly 1.0, launched in 2017, was designed for statically typed languages like C, C++, and Rust. These languages already benefit from strong ahead-of-time (AOT) compilation via toolchains such as Emscripten (LLVM-based) or Binaryen. The resulting binaries are highly optimized, with static typing at the function, instruction, and variable levels eliminating many runtime checks. JavaScript, in contrast, requires speculative optimization because its dynamic nature makes it impossible to know variable types or function targets at compile time. WebAssembly’s static guarantees meant V8 could generate efficient code without runtime feedback, so deoptimization—a cornerstone of JavaScript JITs—was unnecessary. WasmGC changes this by introducing higher-level, dynamically typed constructs that mirror some of JavaScript’s complexity.

4. What is WasmGC, and how does it motivate these changes?

WasmGC is a WebAssembly proposal that adds native support for garbage collection, enabling managed languages like Java, Kotlin, and Dart to compile to WebAssembly. Its bytecode is far more abstract than WebAssembly 1.0: it includes rich types such as structs and arrays, subtyping, and operations on those types. This higher-level representation reduces the amount of compile-time information available—for instance, indirect calls can have many possible targets. As a result, static AOT compilation alone leaves performance on the table. Speculative techniques become essential: by profiling which types or call targets actually appear at runtime, V8 can generate specialized, fast machine code, restoring performance parity with native execution. Without these optimizations, WasmGC’s overhead would limit its adoption for performance-sensitive applications.

5. How does deoptimization work for WebAssembly in V8?

Deoptimization (deopt) allows V8 to roll back from optimized code when the assumptions it relied on are violated. For example, if V8 speculatively inlines a function call based on a particular target, but at runtime the call resolves to a different function, the deopt mechanism throws away the optimized machine code and resumes execution using a slower, but correct, baseline interpreter or compiler. This process mirrors JavaScript’s deopt strategy: feedback is collected during unoptimized execution, then used to generate optimized code with embedded guards. If a guard fails, the system jumps to a “deoptimization continuation,” ensuring correctness. The key novelty is applying this to WebAssembly, where previously such dynamic feedback wasn’t needed. The deopt infrastructure is also reusable for future optimizations, like speculative type conversion or branch prediction.

6. What is speculative call_indirect inlining, and why is it important for WasmGC?

In WebAssembly, indirect function calls (call_indirect) use a table of function references, so the callee isn’t known at compile time. Without speculation, V8 must generate generic dispatch code that looks up the table, checks types, and jumps—an inherently slow path. Speculative call_indirect inlining assumes that the most frequently encountered target will continue to be called. V8 embeds the address of that target directly, akin to a direct call. If the assumption holds, execution is much faster. WasmGC programs, with virtual method dispatch and interface calls, rely heavily on indirect calls. By inlining the common case, V8 reduces call overhead and enables further optimizations like constant propagation and dead code elimination. This technique is particularly potent for object-oriented code, where only a few implementations dominate hot paths.

7. What future optimizations does deopt support enable for WebAssembly?

Deoptimization is a foundational building block for many advanced JIT techniques. With it in place, V8 can now consider a range of speculative optimizations for WebAssembly. For instance, speculative type narrowing could assume that a field of a struct holds an integer, generating fast integer operations instead of generic ones. Speculative branch selection might optimize control flow based on common branch outcomes. Polymorphic inline caching could cache recently resolved types for virtual calls. All these can fall back to deopt if assumptions break. Additionally, combined with inlining, deopt could enable adaptive optimization that profiles hot code paths and re-optimizes them with tighter assumptions. As WasmGC and other proposals mature, these techniques will be crucial for closing the performance gap with native execution, especially for languages that rely heavily on dynamic dispatch and garbage collection.

Tags: