WebAssembly / Web runtime (both for wasm-simd and WebGPU) #8216
Replies: 9 comments 1 reply
-
cc: @mcr229 or @digantdesai regarding running XNNPACK via wasm |
Beta Was this translation helpful? Give feedback.
-
Also cc: @mergennachin |
Beta Was this translation helpful? Give feedback.
-
I've talked with @digantdesai about this before. I think for xnnpack he mentioned it should just be plug and play. Ive been wanting to try out wasm for sometime now just havent had the bandwidth. |
Beta Was this translation helpful? Give feedback.
-
I also wonder about the fusion capabilities of executorch :) Does it allow Inductor codegen'd fused kernels (e.g. think quant/dequant fused into the flash attn kernel directly, with positional embedding computation also fused into this kernel)? Another interesting backend is webgpu/wgpu: https://github.com/huggingface/ratchet or even directly wgpu/wgsl shaders could in theory be a compilation target for fused kernels But even if executorch does not support wild codegen/fusions - it's still be good to have it as a baseline with comparisons against ort-web and tflate-tfjs and tvm-wasm and ggml compiled to wasm. This should show roughly where all these frameworks stand (especially if compiling is relatively doable) |
Beta Was this translation helpful? Give feedback.
-
And given that currently PyTorch does not have its own inference wasm/WebGPU story, having executorch compiled to wasm-simd might be a nice baseline to have (especially if it's minimalistic and relatively simple to compile) |
Beta Was this translation helpful? Give feedback.
-
I suspect much of the core should compilable with emscripten cpp compiler. Probably not optimized operators though and not too sure about backends/xnnpack |
Beta Was this translation helpful? Give feedback.
-
Maybe best would be adding some sort of GitHub Actions CI test compiling it with emscripten... (even if no tests using it exist so far) |
Beta Was this translation helpful? Give feedback.
-
It should be, given a bunch of WASM[SIMD] kernels. I haven't tried it myself though. IIRC there aren't any CI for that on github/xnnpack either. |
Beta Was this translation helpful? Give feedback.
-
xnnpack is also known to compile (and maybe even tested) for wasm/simd, so somehow this should be achievable... don't know if any compact backend library/project exists for webgpu kernels |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm wondering if ExecuTorch can be compiled for WebAssembly target? As far as I understand, XNNPACK exists for wasm-simd, so theoretically at least for CPU it can be done? (e.g. to be compared with tflite+tfjs, ort-web and tvm-wasm at least for some popular models like MobileNets)
(This is especially interesting if strong fusion/codegen can be done to produce fused wasm-simd code/fused WebGPU programs - although maybe this is an ask for Inductor)
Beta Was this translation helpful? Give feedback.
All reactions