Changelog¶

v0.1.0 (2026-03-26)¶

First release.

Direct model forward -- no graph execution overhead, single shared forward loop
Paged KV cache -- block pool with reference counting, LRU eviction, prefix caching
Continuous batching -- decode-first scheduling with chunked prefill
OpenAI-compatible HTTP API with SSE streaming and Web UI
Docker deployment -- multi-stage build for closed-source distribution

	CPU	GPU
Linear	oneDNN (runtime ISA dispatch)	cuBLAS / cuBLASLt
Attention	Paged GQA (decode/prefill/batched)	Custom paged kernels
RMSNorm, RoPE, SwiGLU, Add	AVX vectorized	CUDA vectorized