SkeptiSTEM-4B-v2 Stage R2 (Format LoRA)

This is the Stage R2 format priming LoRA for SkeptiSTEM-4B-v2.

Purpose

Teaches the model to output structured reasoning in this format:

<start_working_out>
... working out the problem step by step ...
<end_working_out>

<SOLUTION>
final answer
</SOLUTION>

Training Details

  • Base model: HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit
  • Dataset: OpenMathReasoning-mini (CoT subset)
  • Examples: ~2,403
  • Epochs: 1 (format priming only)
  • LoRA rank: 64

Usage

from unsloth import FastLanguageModel
from peft import PeftModel

# Load base
base, tokenizer = FastLanguageModel.from_pretrained(
    "HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

# Apply R2 format adapter
model = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageR2-format-lora")

# Or merge it
model = model.merge_and_unload()

FastLanguageModel.for_inference(model)

Next Stage

This adapter is used as a foundation for:

  • Stage R3: GRPO training with DOUBT framework
  • Stage CD: Chat restoration + DPO

Trained with Unsloth.

Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HallD/SkeptiSTEM-4B-v2-stageR2-format-lora

Base model

Qwen/Qwen3-4B-Base
Adapter
(1)
this model