I got not equivalent for the 15th entry of the training dataset and
equivalent for the 87th entry in the activity
from datasets import load_dataset
raw_datasets = load_dataset(“glue”, “sst2”)
from transformers import DataCollatorWithPadding
sentences = len(raw_datasets[‘train’].features)
print(sentences)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
def tokenizer_function(example):
if ‘sentence’ in example: # Check if ‘sentence’ field exists
return tokenizer(example[‘sentence’], truncation=True)
elif ‘sentence1’ in example and ‘sentence2’ in example: # Check for ‘sentence1’ and ‘sentence2’
return tokenizer(example[‘sentence1’], example[‘sentence2’], truncation=True)
else:
raise ValueError(“Invalid dataset format: Example must contain either ‘sentence’ or ‘sentence1’ and ‘sentence2’ fields.”)
tokenized_datasets = raw_datasets.map(tokenizer_function, batched=True)
try:
tokenized_datasets = tokenized_datasets.remove_columns([“idx”, “sentence1”, “sentence2”])
except ValueError:
pass
try:
tokenized_datasets = tokenized_datasets.remove_columns([“idx”, “sentence”])
except ValueError:
pass
tokenized_datasets = tokenized_datasets.rename_column (“label” , “labels”)
tokenized_datasets = tokenized_datasets.with_format(“torch”)
batch = data_collator(tokenized_datasets[‘train’][:8])
[len(x) for x in tokenized_datasets[‘train’][:8][‘input_ids’]]
{k: v.shape for k, v in batch.items()}
my solution to exercise
I’m getting You must call wandb.init() before wandb.log() error when I attempt to run
trainer.train()
I tried to import wandb as gemini recombined, but then it asked for a key upon pip install, so I do not believe that is the correct solution. Any help is appreciated
full message:
Error Traceback (most recent call last)
in <cell line: 1>() ----> 1 trainer.train()
7 frames
/usr/local/lib/python3.10/dist-packages/wandb/sdk/lib/preinit.py in preinit_wrapper(*args, **kwargs) 34 ) → Callable: 35 def preinit_wrapper(*args: Any, **kwargs: Any) → Any: —> 36 raise wandb.Error(f"You must call wandb.init() before {name}()") 37 38 preinit_wrapper.name = str(name)
Hi NLP Course Team,
I’ve been exploring the fine-tuning content in your NLP course and creating educational materials in this space. The PEFT library tutorial is very comprehensive for implementation aspects. While working through the material, I noticed an opportunity to deepen the theoretical foundations - explaining the ‘why’ behind the ‘what’ and ‘how’ of these methods:
- Deriving PEFT methods from first principles
- Mathematical intuition behind different approaches when applicable
- Trade-offs between various fine-tuning methods
- Building foundations for innovation in PEFT
I’ve been writing about these topics on LinkedIn and creating tutorials that bridge theory with implementation. Understanding the underlying principles can help us not just implement existing methods, but also innovate and develop their own approaches.
I strongly believe in the power of knowledge sharing and collaboration, and my goal is to be part of something greater than myself—building tools and fostering communities that inspire others to learn, innovate, and create.
Looking forward to your thoughts!