SkeptiSTEM-4B-v2 Stage R2 (Format LoRA)
This is the Stage R2 format priming LoRA for SkeptiSTEM-4B-v2.
Purpose
Teaches the model to output structured reasoning in this format:
<start_working_out>
... working out the problem step by step ...
<end_working_out>
<SOLUTION>
final answer
</SOLUTION>
Training Details
- Base model: HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit
- Dataset: OpenMathReasoning-mini (CoT subset)
- Examples: ~2,403
- Epochs: 1 (format priming only)
- LoRA rank: 64
Usage
from unsloth import FastLanguageModel
from peft import PeftModel
# Load base
base, tokenizer = FastLanguageModel.from_pretrained(
"HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit",
max_seq_length=4096,
load_in_4bit=True,
)
# Apply R2 format adapter
model = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageR2-format-lora")
# Or merge it
model = model.merge_and_unload()
FastLanguageModel.for_inference(model)
Next Stage
This adapter is used as a foundation for:
- Stage R3: GRPO training with DOUBT framework
- Stage CD: Chat restoration + DPO
Trained with Unsloth.
- Downloads last month
- 40
Model tree for HallD/SkeptiSTEM-4B-v2-stageR2-format-lora
Base model
Qwen/Qwen3-4B-Base
Finetuned
unsloth/Qwen3-4B-Base