Emergent Social Intelligence Risks in Generative Multi-Agent Systems Paper • 2603.27771 • Published Mar 29 • 52
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 22
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 22
yujunzhou/SFT_Advanced_Risk_Self_Grading_Qwen3-4B-Base Text Generation • 4B • Updated Dec 17, 2025 • 4
yujunzhou/SFT_Advanced_Risk_Self_Grading_Qwen3-4B-Base Text Generation • 4B • Updated Dec 17, 2025 • 4
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated Dec 17, 2025 • 3
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated Dec 17, 2025 • 3
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 3
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 3
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 1 •