Fast, lossless LLM inference via dual-view diffusion decoding.
-
chiennv/Orthrus-Qwen3-4B
Text Generation • 5B • Updated • 300 • 7 -
chiennv/Orthrus-Qwen3-8B
Text Generation • 10B • Updated • 1.72k • 16 -
chiennv/Orthrus-Qwen3-1.7B
Text Generation • 2B • Updated • 543 • 7 -
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
Paper • 2605.12825 • Published • 11