I am trying to get accelerate working on a video task and I am running into problems with processes getting stuck.
Here’s a brief summary of my problem: I have multiple directories containing multiple (up to a thousand) image frames. Because loading all images for a batch of videos at once is not possible due to memory constraints, I am trying to iteratively encode a batch of videos using a resnet and feed the cached embeddings to a sequence model. I want to fine-tune the encoder as well and for that reason precomputing the embeddings is not possible.
My thinking goes like this:
- Get a list of paths to all video directories.
- Distribute subsets of the paths evenly among all available GPUs.
- Within each GPU we then sequentially loop over the subset of paths and:
3.1 For each path to a video directory create a dataset and -loader
3.2 and iteratively encode batches of this loader with a partially frozen resnet and store results in a cache
3.3 Finally, we aggregate the caches for a given batch, pad all image sequences to same length and feed the resulting batch to a sequence model.
Here’s the code that I use:
import torch from torchvision.models import resnet18 from accelerate import Accelerator from torch.utils.data import Dataset, DataLoader from pathlib import Path from torchvision import transforms from tqdm import tqdm from PIL import Image def chunker(seq, size): """chunk given sequence into batches of given size""" return (seq[pos:pos + size] for pos in range(0, len(seq), size)) class ImageDataset(Dataset): def __init__(self, img_dir): super().__init__() self.img_paths = list(img_dir.rglob('*.jpg')) self.transform = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), transforms.Normalize( (0.485, 0.456, 0.406), (0.229, 0.224, 0.225)), ]) def __getitem__(self, idx): path = self.img_paths[idx] img = self.transform(Image.open(path)) return img def __len__(self): return len(self.img_paths) def main(): torch.multiprocessing.set_sharing_strategy('file_system') # resolves too many files open error accelerator = Accelerator(fp16=True, device_placement=False) # partially freeze network model = resnet18(pretrained=False) for param in model.parameters(): param.requires_grad = False model.fc = torch.nn.Linear(model.fc.in_features, 256) model.to(accelerator.device) model = accelerator.prepare_model(model) # 1. Get a list of paths to all image directories. data_path = Path('/path/to/data_root/') img_dirs = list(data_path.glob('*')) # 2. distribute subsets of the paths evenly among all available GPUs n_dirs = len(img_dirs) split_size = n_dirs // accelerator.num_processes img_dirs = img_dirs[accelerator.process_index*split_size : (accelerator.process_index+1)*split_size] # just use single image bag for testing, in practise we would loop over these items img_dir_batch = list(chunker(img_dirs, 1)) states =  # container to collect outputs for img_dir in img_dir_batch: # 3.1 create dataset and loader for current video ds = ImageDataset(img_dir) dl = DataLoader(ds, batch_size=16, num_workers=1) # 3.2 iteratively encode and cache frames outs =  progress = tqdm(dl, disable=not accelerator.is_local_main_process) for img in progress: torch.cuda.empty_cache() # free memory out = model(img.to(accelerator.device)) outs.append(out) outs = torch.cat(outs,dim=0) states.append(outs) # 3.3 aggregate batch containing multiple videos # here we would then zero pad `states` into a single batch such that sequences have same length # next we feed the batch into another model if __name__=="__main__": main() print('done.')
This seems to work and the first (out of two) processes finishes fine (i.e. reaching the print statement at the end). The second process however completes the encoding stage and then just hangs indefinitely at the end.
I am using a single machine with 2 GPUs.
Thank you all, any help is appreciated