For comparison, here is the output from text_batch.json file for BART (which seems to work well).
{
"input_ids": [
"<s> who sings does he love me with reba?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> what is the smallest prime number that is greater than 30?</s><pad><pad><pad><pad><pad><pad>",
"<s> who introduced the system of civil services in india?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> when was the public service commission original version of the upsc set up?</s><pad><pad><pad>",
"<s> who wrote the song two out of three ain't bad?</s><pad><pad><pad><pad><pad><pad>",
"<s> who has the most receiving yards in one game?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> how many games to get premier league medal?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> what do they call snowboarders in johnny tsunami?</s><pad><pad><pad><pad><pad><pad>",
"<s> who is the old man in waiting on a woman?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> in attack on titan who is the female titan?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> who got pregnant in gossip girl season 5?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> who sang the theme from the greatest american hero?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> who wins the 2017 australian open men's single title?</s><pad><pad><pad><pad><pad>",
"<s> who does the voice for love island australia?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> when did the king kong ride burn down?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> 5 types of control that could be programmed on a gui?</s><pad><pad><pad><pad><pad><pad>",
"<s> who is the ceo of t rowe price?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> who plays chaka in land of the lost?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> who is the head coach of the minnesota timberwolves?</s><pad><pad><pad><pad><pad><pad>",
"<s> when was the planning commission set up to prepare a blue print of development for the country?</s>",
"<s> where is the mesophyll located in a plant?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> who sings bartender i really did it this time?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> who wrote somebody like you by keith urban?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> in what year did japan attack pearl harbor?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> when is the show six coming back on?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> who does haruhi end up with in ouran highschool host club?</s><pad><pad>",
"<s> who was the first woman appointed to the supreme court?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> who controlled blue cross when it was formed?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
"<s> the most readily absorbed form of iron in the diet is?</s><pad><pad><pad><pad><pad><pad>",
"<s> who has the highest minimum wage in the usa?</s><pad><pad><pad><pad><pad><pad><pad>",
"<s> when does fairy tail dragon cry come out in canada?</s><pad><pad><pad><pad><pad><pad>",
"<s> who is known as the father of humanism?</s><pad><pad><pad><pad><pad><pad><pad><pad>"
],
"attention_mask": [
32,
20
],
"labels": [
"<s> Linda Davis</s><pad><pad><pad><pad><pad>",
"<s> 31</s><pad><pad><pad><pad><pad><pad>",
"<s> Charles Cornwallis</s><pad><pad><pad><pad>",
"<s> October 1, 1926</s><pad><pad><pad>",
"<s> Meat Loaf</s><pad><pad><pad><pad>",
"<s> Flipper Anderson</s><pad><pad><pad><pad>",
"<s> a minimum of five</s><pad><pad><pad>",
"<s> Urchins</s><pad><pad><pad><pad>",
"<s> Andy Griffith</s><pad><pad><pad><pad><pad>",
"<s> Ymir Fritz</s><pad><pad><pad><pad>",
"<s> Blair</s><pad><pad><pad><pad><pad><pad>",
"<s> American singer Joey Scarbury</s><pad><pad>",
"<s> Roger Federer</s><pad><pad><pad><pad>",
"<s> Eoghan McDermott</s><pad><pad>",
"<s> 2008</s><pad><pad><pad><pad><pad><pad>",
"<s> List box</s><pad><pad><pad><pad><pad>",
"<s> William Stromberg</s><pad><pad><pad>",
"<s> Jorma Taccone</s><pad>",
"<s> Thomas Joseph Thibodeau Jr.</s>",
"<s> 15 March 1950</s><pad><pad><pad><pad>",
"<s> In leaves</s><pad><pad><pad><pad><pad>",
"<s> American southern rock group Rehab</s><pad><pad>",
"<s> John Shanks</s><pad><pad><pad><pad>",
"<s> 1941</s><pad><pad><pad><pad><pad><pad>",
"<s> May 28, 2018</s><pad><pad><pad>",
"<s> Tamaki</s><pad><pad><pad><pad><pad>",
"<s> Sandra Day O'Connor</s><pad><pad>",
"<s> 1929</s><pad><pad><pad><pad><pad><pad>",
"<s> animal products</s><pad><pad><pad><pad><pad>",
"<s> Washington</s><pad><pad><pad><pad><pad><pad>",
"<s> August 14, 2017</s><pad><pad><pad>",
"<s> Petrarch</s><pad><pad><pad><pad><pad>"
],
"ids": [
"<s>",
".",
" and",
"-",
" is",
" The",
" it",
" be",
" are",
" (",
" will",
" \ufffd",
"\ufffd",
" we",
" had",
",\"",
" can",
" $",
".\"",
" year",
" two",
" our",
" into",
" new",
" In",
"I",
"S",
"'",
" 1",
"?",
" get",
" back"
],
"decoder_input_ids": [
"</s><s> Linda Davis</s><pad><pad><pad><pad>",
"</s><s> 31</s><pad><pad><pad><pad><pad>",
"</s><s> Charles Cornwallis</s><pad><pad><pad>",
"</s><s> October 1, 1926</s><pad><pad>",
"</s><s> Meat Loaf</s><pad><pad><pad>",
"</s><s> Flipper Anderson</s><pad><pad><pad>",
"</s><s> a minimum of five</s><pad><pad>",
"</s><s> Urchins</s><pad><pad><pad>",
"</s><s> Andy Griffith</s><pad><pad><pad><pad>",
"</s><s> Ymir Fritz</s><pad><pad><pad>",
"</s><s> Blair</s><pad><pad><pad><pad><pad>",
"</s><s> American singer Joey Scarbury</s><pad>",
"</s><s> Roger Federer</s><pad><pad><pad>",
"</s><s> Eoghan McDermott</s><pad>",
"</s><s> 2008</s><pad><pad><pad><pad><pad>",
"</s><s> List box</s><pad><pad><pad><pad>",
"</s><s> William Stromberg</s><pad><pad>",
"</s><s> Jorma Taccone</s>",
"</s><s> Thomas Joseph Thibodeau Jr.",
"</s><s> 15 March 1950</s><pad><pad><pad>",
"</s><s> In leaves</s><pad><pad><pad><pad>",
"</s><s> American southern rock group Rehab</s><pad>",
"</s><s> John Shanks</s><pad><pad><pad>",
"</s><s> 1941</s><pad><pad><pad><pad><pad>",
"</s><s> May 28, 2018</s><pad><pad>",
"</s><s> Tamaki</s><pad><pad><pad><pad>",
"</s><s> Sandra Day O'Connor</s><pad>",
"</s><s> 1929</s><pad><pad><pad><pad><pad>",
"</s><s> animal products</s><pad><pad><pad><pad>",
"</s><s> Washington</s><pad><pad><pad><pad><pad>",
"</s><s> August 14, 2017</s><pad><pad>",
"</s><s> Petrarch</s><pad><pad><pad><pad>"
]
}
Comparing this output with the file corresponding to t5-small, two things come to my mind:
- In
text_batch.jsonof t5-small, don’t see any<pad>or<s>tags, both amonglabelsandinput_ids. Is that normal? - For both models,
attention_maskis quite short. Shouldn’t this be a list of0s and1s, with the same lengths asinput_ids?