MetaXuda

MetaXuda is an experimental CUDA-compatible runtime shim for Apple Silicon, written in Rust, that allows Numba CUDA kernels to run unmodified by transparently mapping CUDA runtime calls to Apple Metal.

It is designed as a drop-in replacement for core CUDA runtime libraries, enabling GPU-accelerated Python workflows on macOS without requiring the NVIDIA CUDA Toolkit or NVIDIA hardware.

✨ Features

Drop-in replacement for libcudart.dylib and libcuda.dylib
Run Numba CUDA kernels (@cuda.jit) directly on Apple Metal
Metal-backed implementations of core CUDA APIs:
- cudaMalloc / cudaFree
- cudaMemcpy / cudaMemcpyAsync
- cudaLaunchKernel
Asynchronous execution with stream-style overlap (copy / compute / copy)
Tier-aware memory management (GPU-first execution)
Ships with:
- Stubbed libdevice.bc for Numba compatibility
- Precompiled Metal .metallib shaders for fused math operations
- cuda_pipeline.so, exposing a low-level execution API that allows Numba and other callers to bypass the CUDA runtime shim and dispatch operations directly
No CUDA Toolkit, NVIDIA drivers, or NVIDIA GPU required

⚠️ Project Status

Alpha / Research Prototype

MetaXuda is under active development and currently targets:

Numba CUDA kernels
Single-GPU execution on Apple Silicon

Not all CUDA APIs are implemented, and behavior may differ from NVIDIA CUDA in edge cases.

Metaxu

Molt Pulse

MetaXuda

✨ Features

⚠️ Project Status

⚙️ Installation

Requirements

Install (Editable / Dev)

📂 Package Layout

🚀 Usage

🗜️ Quantization, Compression, and Disk Tiering

Environment Configuration

🧮 Operation Coverage

🧠 Architecture Overview

License

🙏 Disclaimer

Ecosystem Role