Am I doing this right?

Fairly new to ML and very new to transformers. Want to make sure I’m doing the right thing … I’m trying to do text classification with a small data set and though this would be a good option (is it?)

Here’s the basics of my code:

texts = ["random text string...", ...]
labels = [1, 0, ...]

tokenized_sents = []
attention_masks = []

for sentence in sentences:
    tokenized_sents.append(tokenizer.encode(sentence, add_special_tokens=True, ...))
input_ids = pad_sequences(tokenized_sents)

for sentence in input_ids:
  att_mask = [int(token_id > 0) for token_id in sentence]

dataset =, attention_masks, labels)

# copied from
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss), epochs=2, steps_per_epoch=115)

I was pretty confident this all worked, but then when I did the following test:

sent = ["I like to watch movies"]
sent = tokenizer.encode(sentence, add_special_tokens=True, ...)
att_mask = [int(token_id > 0) for token_id in sent]
ds =, att_mask)

I got a super long array. But the labels can only be 1 or 0 and there’s only one sample, so I was expecting a 1 by 2 array. Any idea why this doesn’t work?
Also, what’s the best way to save this model and use it for predictions later.
Thank you.

I would suggest expanding this example instead: