SkeptiSTEM-4B-v2 Stage R2 (Format LoRA)

This is the Stage R2 format priming LoRA for SkeptiSTEM-4B-v2.

Purpose

Teaches the model to output structured reasoning in this format:

<start_working_out>
... working out the problem step by step ...
<end_working_out>

<SOLUTION>
final answer
</SOLUTION>

Training Details

Base model: HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit
Dataset: OpenMathReasoning-mini (CoT subset)
Examples: ~2,403
Epochs: 1 (format priming only)
LoRA rank: 64

Usage

from unsloth import FastLanguageModel
from peft import PeftModel

# Load base
base, tokenizer = FastLanguageModel.from_pretrained(
    "HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

# Apply R2 format adapter
model = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageR2-format-lora")

# Or merge it
model = model.merge_and_unload()

FastLanguageModel.for_inference(model)

Next Stage

This adapter is used as a foundation for:

Stage R3: GRPO training with DOUBT framework
Stage CD: Chat restoration + DPO

Trained with Unsloth.

Downloads last month: 40

Model tree for HallD/SkeptiSTEM-4B-v2-stageR2-format-lora

Base model

Qwen/Qwen3-4B-Base

Finetuned

unsloth/Qwen3-4B-Base

Finetuned

HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit

Adapter

(1)

this model