Accelerate - video encoding across GPUs fails

I am running inference on multiple GPUs. But before that, I have to encode input video.

Below is my code to do it:

# setup accelerator to use multiple GPUs
from accelerate import Accelerator
accelerator = Accelerator()
accelerator.state.num_processes = 2
accelerator.state.distributed_type = "MULTI_GPU"
device = accelerator.device

# get video stream
probe = ffmpeg.probe(args.video_example)
video_stream = next(
    (stream for stream in probe["streams"] if stream["codec_type"] == "video"), None
)

# resize to have smaller dimension = 224, but maintain aspect ratio
width = int(video_stream["width"])
height = int(video_stream["height"])
num, denum = video_stream["avg_frame_rate"].split("/")
frame_rate = int(num) / int(denum)
if height >= width:
    h, w = int(height * 224 / width), 224
else:
    h, w = 224, int(width * 224 / height)
assert frame_rate >= 1

cmd = ffmpeg.input(args.video_example).filter("fps", fps=1).filter("scale", w, h)
x = int((w - 224) / 2.0)
y = int((h - 224) / 2.0)
cmd = cmd.crop(x, y, 224, 224)
out, _ = cmd.output("pipe:", format="rawvideo", pix_fmt="rgb24").run(
    capture_stdout=True, quiet=True
)

# preprocess video and shift preprocessed frames to GPU
h, w = 224, 224
video = np.frombuffer(out, np.uint8).reshape([-1, h, w, 3])
video = torch.from_numpy(video.astype("float32"))
video = video.permute(0, 3, 1, 2)
video = video.squeeze()
video = preprocess(video)
with torch.no_grad():
    video = backbone.encode_image(video.to(device))

But I get CUDA out of memory error, while one of the GPU is completely unused.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.67 GiB (GPU 0; 14.76 GiB total capacity; 9.96 GiB already allocated; 1.90 GiB free; 11.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The origin of the error from stack trace is

with torch.no_grad():
    video = backbone.encode_image(video.to(device))

Am I doing something wrong, or encoding is not possible to distribute across GPUs?