|
Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer
|
|
0
|
10
|
February 18, 2026
|
|
LLaVA Steering: Why does grounding fix hallucinations in captioning but not in Yes/No QA?
|
|
0
|
5
|
February 18, 2026
|
|
Issue: Hidden State Steering Improves Generative Grounding (CHAIR) but Fails on Yes/No Probing (POPE)
|
|
0
|
3
|
February 18, 2026
|
|
Gemma 3 12B: 4-bit Quantization failing/ignored in Transformers v5.1.0 (Gemma3ForConditionalGeneration)
|
|
7
|
28
|
February 17, 2026
|
|
KV Caching problem with gemma 3
|
|
2
|
18
|
February 17, 2026
|
|
Num_beam_groups removed in V5?
|
|
1
|
10
|
February 14, 2026
|
|
[LLaVA-1.5] Implementing Control Barrier Functions (LCBF) via Attention Hooking – Persistent AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'
|
|
4
|
9
|
February 13, 2026
|
|
Error while importing "Trainer"
|
|
1
|
25
|
February 13, 2026
|
|
[LLaVA-1.5] Very low hallucination rate & weak attention correlation in "Attention Gap" experiment – Is my implementation of output_attentions correct?
|
|
4
|
19
|
February 12, 2026
|
|
Confusion with freezing Whisper's feature encoder
|
|
3
|
13
|
February 11, 2026
|
|
When using Whisper, pipeline notifies that generation_config default values have been modified, even for base models
|
|
4
|
34
|
February 8, 2026
|
|
Hyperparameters vs message format prompt tuning
|
|
2
|
26
|
February 6, 2026
|
|
SFT Conversation llama3-8b-Instruct fails with assistant_only_loss=True
|
|
2
|
54
|
February 5, 2026
|
|
How to train T5 to distinguish task-relevant tokens from contextual noise?
|
|
1
|
19
|
February 5, 2026
|
|
Finetuning whisper attention mask not set and canot be inferred
|
|
5
|
6180
|
February 4, 2026
|
|
Abnormal generation after multi GPU
|
|
4
|
37
|
February 4, 2026
|
|
500 Internal Error - We're working hard to fix this as soon as possible
|
|
46
|
3160
|
February 1, 2026
|
|
Caching image prototype embeddings for image-guided object detection using OWL-ViT
|
|
3
|
494
|
January 31, 2026
|
|
[Quiestion]How to specify 'model_type' of 'Qwen/Qwen3-VL-8B-Instruct-GGUF'?
|
|
4
|
47
|
January 30, 2026
|
|
SAM3Video: CLIPTextModelOutput passed as tensor causes crash with text prompts
|
|
0
|
40
|
January 29, 2026
|
|
Different lm_head size and vocab_size
|
|
1
|
918
|
January 28, 2026
|
|
Custom KV Cache Steering Implementation Fails with IndexError in LLaVA Generation
|
|
1
|
17
|
January 28, 2026
|
|
Transformers v5 timelines
|
|
1
|
39
|
January 28, 2026
|
|
Issue: Discrepancy Between Layer-Wise Density Plots vs. Mean Trajectory Plots in LLaVA-1.5 Attention Analysis
|
|
2
|
18
|
January 25, 2026
|
|
[Discussion] Validating Attention Map Visualization for Visual Fading in LLaVA-1.5
|
|
4
|
45
|
January 23, 2026
|
|
No fix for High Vulnerabilities in transformers latest package
|
|
2
|
36
|
January 22, 2026
|
|
How to disable caching in .from_pretrained()
|
|
4
|
1272
|
January 18, 2026
|
|
DetLLM – Deterministic Inference Checks
|
|
0
|
26
|
January 17, 2026
|
|
Distributed LLaMA Inference Engine Built from Scratch (KV Cache, GQA, RoPE)
|
|
0
|
29
|
January 16, 2026
|
|
Run name issue, different run name file in webpage & local
|
|
1
|
91
|
January 16, 2026
|