I got not equivalent for the 15th entry of the training dataset and
equivalent for the 87th entry in the activity
from datasets import load_dataset
raw_datasets = load_dataset(“glue”, “sst2”)
from transformers import DataCollatorWithPadding
sentences = len(raw_datasets[‘train’].features)
print(sentences)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
def tokenizer_function(example):
if ‘sentence’ in example: # Check if ‘sentence’ field exists
return tokenizer(example[‘sentence’], truncation=True)
elif ‘sentence1’ in example and ‘sentence2’ in example: # Check for ‘sentence1’ and ‘sentence2’
return tokenizer(example[‘sentence1’], example[‘sentence2’], truncation=True)
else:
raise ValueError(“Invalid dataset format: Example must contain either ‘sentence’ or ‘sentence1’ and ‘sentence2’ fields.”)
tokenized_datasets = raw_datasets.map(tokenizer_function, batched=True)
try:
tokenized_datasets = tokenized_datasets.remove_columns([“idx”, “sentence1”, “sentence2”])
except ValueError:
pass
try:
tokenized_datasets = tokenized_datasets.remove_columns([“idx”, “sentence”])
except ValueError:
pass
tokenized_datasets = tokenized_datasets.rename_column (“label” , “labels”)
tokenized_datasets = tokenized_datasets.with_format(“torch”)
batch = data_collator(tokenized_datasets[‘train’][:8])
[len(x) for x in tokenized_datasets[‘train’][:8][‘input_ids’]]
{k: v.shape for k, v in batch.items()}
my solution to exercise
I’m getting You must call wandb.init() before wandb.log() error when I attempt to run
trainer.train()
I tried to import wandb as gemini recombined, but then it asked for a key upon pip install, so I do not believe that is the correct solution. Any help is appreciated
full message:
Error Traceback (most recent call last)
in <cell line: 1>() ----> 1 trainer.train()
7 frames
/usr/local/lib/python3.10/dist-packages/wandb/sdk/lib/preinit.py in preinit_wrapper(*args, **kwargs) 34 ) → Callable: 35 def preinit_wrapper(*args: Any, **kwargs: Any) → Any: —> 36 raise wandb.Error(f"You must call wandb.init() before {name}()") 37 38 preinit_wrapper.name = str(name)
Hi NLP Course Team,
I’ve been exploring the fine-tuning content in your NLP course and creating educational materials in this space. The PEFT library tutorial is very comprehensive for implementation aspects. While working through the material, I noticed an opportunity to deepen the theoretical foundations - explaining the ‘why’ behind the ‘what’ and ‘how’ of these methods:
- Deriving PEFT methods from first principles
- Mathematical intuition behind different approaches when applicable
- Trade-offs between various fine-tuning methods
- Building foundations for innovation in PEFT
I’ve been writing about these topics on LinkedIn and creating tutorials that bridge theory with implementation. Understanding the underlying principles can help us not just implement existing methods, but also innovate and develop their own approaches.
I strongly believe in the power of knowledge sharing and collaboration, and my goal is to be part of something greater than myself—building tools and fostering communities that inspire others to learn, innovate, and create.
Looking forward to your thoughts!
Calling the trainer.train()
function in the collab asks for the wandb API key. Why is that needed? Is it free? Could someone explain a bit about it. (I might have missed it during the course.)
If you don’t specify the report_to= option, you get that error. Just a strange specification.
i got this error accoridng this block of code, could please guide me ?
import tensorflow as tf
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
ERROR:
ValueError: Could not interpret optimizer identifier: <keras.src.optimizers.adam.Adam object at 0x7e80ed478950>
It seems that you need to change the import statement depending on the version of Keras.
#from keras.optimizers import Adam
from keras.optimizers import adam_v2
Oh, thank you bro
sorry, but
from keras.optimizers import adam_v2
didn’t work for me
Legacy Keras issue…?
the example in chapter 3 has an error:
the AdamW function needs to imported as following:
from torch.optim import AdamW
and not from the transformers
library
It seems to have disappeared…
while #dynamic Padding, why the code not gone through error,
as tokenized_datasets do not has any “labels” field instead it has “label” field,
but in .to_tf_dataset() method, why setting
label_cols= [“labels”]
working,
it must be from these ==> [‘sentence1’, ‘sentence2’, ‘label’, ‘idx’, ‘input_ids’, ‘token_type_ids’, ‘attention_mask’],
after setting learning_rate with PlynomialDecay ==> accuracy after 3 epoch is just 62%,
it is not accepctable, how to improve it, enve traditional language model like viterbi algo achieved more then this.
This is because DataCollatorWithPadding returns a dict with labels as its elements. If you want to handle other data appropriately, you will need to write your own collate_fn (Data Collator).
why section-4: “a full training” not available for TensorFlow,
can i skip this section, i never worked with torch
what do i do?
need guidence
The AdamW function has moved from the transformers
library, it now needs to be imported as follows:
from torch.optim import AdamW