Tau-Bench looping

by Doctor-Chad-PhD - opened 7 days ago

7 days ago

•

Thank you for this model, it was a great surprise.

I also wanted to point out that even in shb777's repo the config.json see here is still missing 2 eos_token_id's that are declared in generation_config.json.

It should be:

  "eos_token_id": [
    128001,
    128008,
    128009
  ],

At least, that's what it is for llama3.1 8b, but who knows. I do think it makes sense since they are also in generation_config.json.
Anyway, I thought maybe that's what's causing the looping?

Doctor-Chad-PhD

7 days ago

Here is the entire 3.1 8B config in case you need it:

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 8.0,
    "low_freq_factor": 1.0,
    "high_freq_factor": 4.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.42.3",
  "use_cache": true,
  "vocab_size": 128256
}

Fizzarolli

Allura (Forge) org 7 days ago

nah, i don't think that's the issue with taubench; what was happening (i need to upload these logs later) is that it tried to individually check the details of every available flight and got confused and kept repeating checks

Doctor-Chad-PhD

7 days ago

Got it, that makes sense. Thanks!

Doctor-Chad-PhD changed discussion status to closed 7 days ago

Fizzarolli

Allura (Forge) org 7 days ago

also for some reason the benchmark was scoring a bunch of clearly incorrect traces where it just redirected to a human agent instantly as correct; thats the main reason why it was removed, im not confident that the scores are coherent

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment