CohenQu/Joint-Train-deepscalar_RL_hard_500_verl_0.35_0.001_0.001_32_32_20k_4_0713 2B • Updated Jul 14, 2025 • 5
CohenQu/Joint-Train-deepscalar_RL_hard_500_verl_0.35_0.001_0.001_32_32_20k_4_0710 2B • Updated Jul 12, 2025 • 2
CohenQu/Joint-Train-deepscalar_RL_hard_500_verl_0.35_0.001_0.001_32_32_20k_4_new 2B • Updated Jun 28, 2025 • 2
CohenQu/Joint-Train-deepscalar_RL_hard_500_verl_0.35_0.001_0.001_32_32_20k_4 2B • Updated Jun 27, 2025 • 3
CohenQu/Joint-Train-deepscalar_RL_easy_500_verl_0.35_0.001_0.001_16_16k 2B • Updated Jun 13, 2025 • 3
CohenQu/DeepSeek-R1-Distill-Qwen-1.5B_with_info_gain_Iteration2 Text Generation • 2B • Updated May 26, 2025 • 3
CohenQu/Qwen3-4B-SFT_HintGen-STaR.04.00_1e-6_no_think Text Generation • 4B • Updated May 21, 2025 • 4
CohenQu/Qwen3-4B-Base_HintGen-STaR.04.00_1e-6_no_think Text Generation • 4B • Updated May 21, 2025 • 6