NATIVE_OPTIMIZED

Mixed mode: dequantize F32/F16/BF16 tensors to FP32, but keep quantized weight tensors (Q4_0, Q8_0, etc.) as raw bytes for native kernel consumption.

This allows loading with dtype=FP32 while preserving quantized weights for platform-specific optimized kernels (e.g. MemorySegment-backed SIMD).