Hi,
While training bert model for masked language modelling, I am getting CUDA OOM. I am unable to train for more than one batch, even with batch size 1. I used torch.cuda.memory_summary()
to monitor GPU usage between batches. Below is my code and the memory output.
I want know why, despite deleting variables, the current memory usage is still so high? Are there any other variables which are connected to the computational graph? How can I free it up?
Code-
for epoch in range(epochs):
loop = tqdm(loader, leave=True)
for batch in loader:
print("Memory before:")
print(torch.cuda.memory_summary(0))
optim.zero_grad()
ids= apply_mask(torch.stack(batch.input_ids).t())
input_ids = ids.to(device)
attention_mask = torch.stack(batch.attention_mask).t().to(device)
labels = batch.labels.to(device)
print("data loaded")
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
print("output")
loss = outputs.loss
lgits= outputs.logits
loss.backward(retain_graph=False)
print("backprop")
optim.step()
del loss, lgits, outputs, input_ids, attention_mask, labels, ids
Output of memory-
Memory before:
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 1479 MB | 1479 MB | 1479 MB | 0 B |
| from large pool | 1477 MB | 1477 MB | 1477 MB | 0 B |
| from small pool | 2 MB | 2 MB | 2 MB | 0 B |
|---------------------------------------------------------------------------|
| Active memory | 1479 MB | 1479 MB | 1479 MB | 0 B |
| from large pool | 1477 MB | 1477 MB | 1477 MB | 0 B |
| from small pool | 2 MB | 2 MB | 2 MB | 0 B |
|---------------------------------------------------------------------------|
| GPU reserved memory | 1492 MB | 1492 MB | 1492 MB | 0 B |
| from large pool | 1488 MB | 1488 MB | 1488 MB | 0 B |
| from small pool | 4 MB | 4 MB | 4 MB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 12591 KB | 22137 KB | 43513 KB | 30922 KB |
| from large pool | 10978 KB | 21346 KB | 39490 KB | 28512 KB |
| from small pool | 1612 KB | 2021 KB | 4023 KB | 2410 KB |
|---------------------------------------------------------------------------|
| Allocations | 204 | 204 | 204 | 0 |
| from large pool | 26 | 26 | 26 | 0 |
| from small pool | 178 | 178 | 178 | 0 |
|---------------------------------------------------------------------------|
| Active allocs | 204 | 204 | 204 | 0 |
| from large pool | 26 | 26 | 26 | 0 |
| from small pool | 178 | 178 | 178 | 0 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 6 | 6 | 6 | 0 |
| from large pool | 4 | 4 | 4 | 0 |
| from small pool | 2 | 2 | 2 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 5 | 5 | 5 | 0 |
| from large pool | 3 | 3 | 3 | 0 |
| from small pool | 2 | 2 | 2 | 0 |
|---------------------------------------------------------------------------|
| Oversize allocations | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Oversize GPU segments | 0 | 0 | 0 | 0 |
|===========================================================================|
data loaded
output
backprop
Memory before:
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 5916 MB | 8107 MB | 13730 MB | 7814 MB |
| from large pool | 5906 MB | 8102 MB | 13649 MB | 7742 MB |
| from small pool | 9 MB | 29 MB | 81 MB | 71 MB |
|---------------------------------------------------------------------------|
| Active memory | 5916 MB | 8107 MB | 13730 MB | 7814 MB |
| from large pool | 5906 MB | 8102 MB | 13649 MB | 7742 MB |
| from small pool | 9 MB | 29 MB | 81 MB | 71 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 10654 MB | 10654 MB | 10654 MB | 0 B |
| from large pool | 10624 MB | 10624 MB | 10624 MB | 0 B |
| from small pool | 30 MB | 30 MB | 30 MB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 767367 KB | 850 MB | 1021 MB | 278738 KB |
| from large pool | 750674 KB | 848 MB | 925 MB | 197055 KB |
| from small pool | 16693 KB | 21 MB | 96 MB | 81683 KB |
|---------------------------------------------------------------------------|
| Allocations | 810 | 817 | 1671 | 861 |
| from large pool | 104 | 106 | 136 | 32 |
| from small pool | 706 | 712 | 1535 | 829 |
|---------------------------------------------------------------------------|
| Active allocs | 810 | 817 | 1671 | 861 |
| from large pool | 104 | 106 | 136 | 32 |
| from small pool | 706 | 712 | 1535 | 829 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 27 | 27 | 27 | 0 |
| from large pool | 12 | 12 | 12 | 0 |
| from small pool | 15 | 15 | 15 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 26 | 48 | 404 | 378 |
| from large pool | 7 | 8 | 10 | 3 |
| from small pool | 19 | 42 | 394 | 375 |
|---------------------------------------------------------------------------|
| Oversize allocations | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Oversize GPU segments | 0 | 0 | 0 | 0 |
|===========================================================================|