LLMFusedOpHandlers
CPU fallback implementations for fused LLM operations.
These handlers decompose fused ops back into sequences of TensorOps calls. They produce correct results on any backend but don't provide the performance benefit of a true fused kernel. Platform-specific backends (Metal, CUDA) should register their own handlers to override these.
Register via:
LLMFusedOpHandlers.registerAll()Content copied to clipboard
Functions
Link copied to clipboard
Register all CPU fallback handlers with ComputeGraphExecutor. Call once at application startup.