Tau-Bench looping
Thank you for this model, it was a great surprise.
I also wanted to point out that even in shb777's repo the config.json see here is still missing 2 eos_token_id's that are declared in generation_config.json.
It should be:
"eos_token_id": [
128001,
128008,
128009
],
At least, that's what it is for llama3.1 8b, but who knows. I do think it makes sense since they are also in generation_config.json.
Anyway, I thought maybe that's what's causing the looping?
Here is the entire 3.1 8B config in case you need it:
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
],
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"low_freq_factor": 1.0,
"high_freq_factor": 4.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.42.3",
"use_cache": true,
"vocab_size": 128256
}
nah, i don't think that's the issue with taubench; what was happening (i need to upload these logs later) is that it tried to individually check the details of every available flight and got confused and kept repeating checks
Got it, that makes sense. Thanks!
also for some reason the benchmark was scoring a bunch of clearly incorrect traces where it just redirected to a human agent instantly as correct; thats the main reason why it was removed, im not confident that the scores are coherent