DeepSeek-R1T-Chimera

Model merge of DeepSeek-R1 and DeepSeek-V3 (0324)

An open weights model combining the intelligence of R1 with the token efficiency of V3.

For details on the construction process and analyses of Chimera model variants, please read our paper.

Paper on arXiV | Announcement on X | LinkedIn post | Try it on OpenRouter

Update: we released R1T2-Chimera that is both faster and smarter than R1.

Model Details

Architecture: DeepSeek-MoE Transformer-based language model
Combination Method: Merged model weights from DeepSeek-R1 and DeepSeek-V3 (0324)
Release Date: 2025-04-27

Use, Out-of-scope Use, Limitations, Risks, Recommendations et al

Regarding R1T Chimera, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model.

These guidelines are available here on Hugging Face.

Contact

Email: [email protected]
X.com: @tngtech

Citation

@misc{tng_technology_consulting_gmbh_2025,
    author       = { TNG Technology Consulting GmbH },
    title        = { DeepSeek-R1T-Chimera },
    year         = 2025,
    month        = {April},
    url          = { https://huggingface.co/tngtech/DeepSeek-R1T-Chimera },
    doi          = { 10.57967/hf/5330 },
    publisher    = { Hugging Face }
}

Downloads last month: 361

Safetensors

Model size

685B params

Tensor type

F32

BF16

F8_E4M3

Model tree for tngtech/DeepSeek-R1T-Chimera

Base model

deepseek-ai/DeepSeek-R1

Quantized

(67)

this model

Quantizations

3 models

Spaces using tngtech/DeepSeek-R1T-Chimera 10

Paper for tngtech/DeepSeek-R1T-Chimera

Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors

Paper • 2506.14794 • Published May 31, 2025 • 1