when I do: from transformers import AutoTokenizer, AutoModel
I would have expected to find a AutoTokenizer.py file and an AutoModel.py file, but they aren’t there.
(Typically from module_a import b, c means that b.py and c.py exist as python files in the module_a directory?).
What is wrong with my thinking?
Where can I find the code for AutoTokenizer and AutoModel please?
Hey @iamholmes in general we use the __init__.py file under src/transformers to define all the class imports. For example, here is the import for AutoModel
Above that line you can see we import from models.auto and indeed here are all the Python files associated with the auto-classes for models and tokenizers.
Hi guys, a quick question about the Accelarate API.
I saw in the course that you explain how to use it in a Pytorch training loop. I was wondering if there’s a way to integrate it in a TrainerAPI-based loop, or if there is a way to exploit multiple GPUs in the Trainer API itself.
Hi @Neuroinformatica! You can exploit multiple GPUs with the Trainer (see the docs). By default it will use all available GPUs for training, but you can configure that by setting:
import os
# Just train on a single device
os.environ["CUDA_VISIBLE_DEVICES"]="0"
I see where you’re coming from since len(tf_train_dataset) corresponds to tokenized_datasets["train"] divided by the batch size. Nevertheless, you could either elaborate a bit more on that in the comment or just leave it out so it’s less misleading for beginners like me. However, this is simply a suggestion and may be obvious or irrelevant to many people!
Thanks @bonschorno - I’m glad you’re enjoying the course
Btw after chatting with @Rocketknight1 (the TensorFlow maintainer of transformers), he pointed out that:
The tf.data.Dataset objects here already have a batch() operation applied - they’re more like torch DataLoader objects. After a batch operation, their len() is num_samples // batch_size already, so we shouldn’t need to do that division twice.
In any case, we’ll improve the comment because it confused me as well
This works well, but it has the disadvantage of returning a dictionary (with our keys, input_ids, attention_mask, and token_type_ids, and values that are lists of lists). It will also only work if you have enough RAM to store your whole dataset during the tokenization (whereas the datasets from the Datasets library are Apache Arrow files stored on the disk, so you only keep the samples you ask for loaded in memory).
Can somebody explain why the disadvantage is of returning a dictionary?
Note that the “evaluate” package mentioned in the example requires a package called “sklearn”, and if you try to run metric.compute() locally you will get a message about needing to run “pip install sklearn”.
However, the “sklearn” package is going through a “brownout” and is only a stub. To get the proper packages, you need to install “scikit-learn” instead:
In this section of chapter 3, you have mentioned the following:
The Trainer will work out of the box on multiple GPUs or TPUs and provides lots of options, like mixed-precision training (use fp16 = True in your training arguments). We will go over everything it supports in Chapter 10.
** Fine-tuning a model with the Trainer API**
Hi @lewtun, I trust you are well. predictions = trainer.predict(tokenized_datasets["validation"])
Please, how do I make predictions in an inference mode?
More like: trainer.predict(['The man is sick])
Hi @lewtun
I wrote the training + evaluation loop, but the script never gets to the evaluation part. Please do you know why?
from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
count =0
for epoch in range(num_epochs):
model.train()
for data in train_dl:
data = {k:v.to(device) for k,v in data.items()}
output = model(**data)
loss = output.loss
loss.backward()
optimizer.step()
x = optimizer
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
count+=1
if count % 100 ==0:
print(count)
model.eval()
for data in validation_dl:
data = {k:v.to(device) for k,v in data.items()}
with torch.no_grad():
outputs = model(**batch)
logits = outputs.logits
preds = torch.argmax(logits, axis=-1)
metric.add_batch(predictions=preds, references=data['labels'] )
metric.compute()
At the end of “Processing the data” you suggest a harder challenge of building a processing function that works for all GLUE tasks. I took a swing and had some questions:
It looks like “ax” only has “test” – we don’t do any tokenization of the “test” set that I saw – I imagine we should, though?
“ax” also has “premise” and “hypothesis” – I’m guessing these just become “sentence1” and “sentence2”?
Given these differences, do we basically write conditional code for train/test/validation and “sentence” vs “sentence1/2” vs “hypothesis/premise” or is there a better way to do this? I don’t imagine the “AutoTokenizer” handles this for us in a convenient way?
Regarding this statement made in Chapter 3 under Fine-tuning with Trainer API section
You will notice that unlike in Chapter 2, you get a warning after instantiating this pretrained model. This is because BERT has not been pretrained on classifying pairs of sentences, so the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added instead
I wanted to ask if this is the case with any kind of model? In this case since BERT is not pretrained on classifying pairs but we are using it for that purpose the head has been replaced. If it was some other model maybe GPT, T5 or something else would the same scenario apply(the head getting replaced to overcome the lack of the specific pretraining objective)?