id2label and label2id are incompatible with multi_nli dataset

by kslnet - opened Apr 27, 2023

Apr 27, 2023

Hi, the id2label and label2id in config.json are:

"id2label": {
"0": "CONTRADICTION",
"1": "NEUTRAL",
"2": "ENTAILMENT"
},
"label2id": {
"CONTRADICTION": 0,
"NEUTRAL": 1,
"ENTAILMENT": 2
},

However, according to the multi_nli dataset (https://huggingface.co/datasets/multi_nli) , 0 should be mapped to "ENTAILMENT" and 2 to "CONTRADICTION":

"label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2)."

Therefore, I believe id2label and label2id need to be corrected?

sgugger

Apr 27, 2023

No, because the model was trained with the labels in this order.

kslnet

Apr 27, 2023

But don't the labels need to correspond to the way the data was annotated?

kslnet

Apr 28, 2023

Are you saying that you did not use the multi_nli data as-is, but reversed the labels before using it to fine-tune roberta-large? Sorry if I am misunderstanding something basic.

sgugger

May 1, 2023

I'm saying the model was trained to predict 0 for contradiction, 1 for neutral and 2 for entailment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment