Hi — as title states, I’m using batch transform and hadn’t hit any errors with smaller batch sizes but am getting DefaultCPUAllocator: can't allocate memory: you tried to allocate 12582912000 bytes. Error code 12 (Cannot allocate memory) : 400
when I attempt to pass a larger set of 1000 texts.
I’m using a custom BERT-base model, here’s my inference.py
:
def predict_fn(data, model_tokenizer):
model, tokenizer = model_tokenizer
print("LENGTH OF DATA: ", len(data)) # This is outputting 1000, indicating the whole batch was passed at once
encoded_inputs = tokenizer(data, padding='max_length', max_length=512, truncation=True, return_tensors="pt")
with torch.no_grad():
output = model(encoded_inputs['input_ids'], encoded_inputs['attention_mask'])
return {"output": output.tolist()}
def input_fn(input_data: str, content_type):
if content_type == 'text/csv':
stream = StringIO(input_data)
request_list = list(csv.DictReader(stream))
return {"inputs": [entry['inputs'] for entry in request_list]}
elif content_type == 'application/json':
return [json.loads(line)['inputs'] for line in input_data.split(sep='\n')[:-1]]
And here’s the code I’m using to set up the transform job — I’ve tried manually setting max_concurrent_transforms
to a lot of values, but even the default gets this problem.
batch_job = self.model.transformer(
instance_count=instance_count, # 1
instance_type=instance_type, # ml.g4dn.xlarge
output_path=s3_path_join('s3://', self.bucket, 'output'),
strategy='MultiRecord',
assemble_with='Line')
batch_job.transform(
data=s3_path_join('s3://', self.bucket, 'input', f'uncat_{self.model_type}_ads.jsonl'),
content_type = 'application/json',
split_type='Line')
Any thoughts? I’m thinking I could manually set up a batch-style processing in predict_fn
but that seems like a poor workaround.