Hello all, I have a dataset object train_ds.
Output:
Dataset({
features: ['filepath', 'class', 'fold'],
num_rows: 6810
})
When I attempt to map using a preprocess function this works correctly:
def preprocess_function(examples):
examples['newclass'] = examples['class']
return examples
train_dataset = train_ds.map(
preprocess_function,
batch_size=100,
batched=True,
num_proc=4,
load_from_cache_file=False
)
Dataset({
features: ['filepath', 'class', 'fold', 'newclass'],
num_rows: 6810
})
However, I cannot define any functions outside preprocess_function it seems, or something is bugging out.
def preprocess_function(examples):
examples['audio'] = [torchaudio.load(path) for path in examples['filepath']]
return examples
However I get the same error even if I define a function and attempt to use that function inside of preprocess_function. It’s as if the function “forgets” all other variables and functions in the notebook during the error.
def test_function(path):
return path
#this next function should give me a new column with text filepaths named "audio"
def preprocess_function(examples):
examples['audio'] = [test_function(path) for path in examples['filepath']]
return examples
train_dataset = train_ds.map(
preprocess_function,
batch_size=100,
batched=True,
num_proc=4,
load_from_cache_file=False
)
769 return self._value
770 else:
--> 771 raise self._value
NameError: name 'test_function' is not defined
Can someone help me identify what’s going on? I am using this as an example but working in VSCode: