HF Dataset as a Replay Buffer for RL applications


I am interested in using HuggingFace models & datasets for a Reinforcement Learning use case. For my purpose I would need to implement a replay buffer.

I considered using HF Datasets due to (1) easily coupling with HF models and (2) efficiency stemming from zero-copy reads by memory mapping the whole dataset. However, I do not see any functionality for (efficiently) augmenting the dataset. Is this functionality there?

Additionally I need the other replay buffer functionality: sampling based on priorities, unloading the buffer etc.

Do you think I should customize HF Datasets for my use case or I better couple some other replay buffer (e.g. rllib, stable baselines) with HF Models?

Thanks in advance. cc the HF RL team: @ThomasSimonini @edbeeching @natolambert @lvwerra

Hi Blazej – I agree. Are there any structural blockers to using datasets for this? I guess the challenge is how to do something like FIFO/LIFO nature of a replay buffer. I wonder if it’s interesting to just keep all of the data and have a wrapper that keeps the N most recent.

What do you mean by augmenting?

I think there are some discussions around this with another collaborator, let me follow up internally on this too.

Hi Nathan, thanks for the response.
Indeed FIFO/LIFO sampling and removing functionality is something that I need. Additionally sampling proportional to item’s priority is desired. Would something like this be possible with datasets while retaining the efficiency of datasets?

By augmenting the dataset I mean adding new items to the buffer (in an efficient manner).
Having such a functionality would definitely push HF forward as a place for RL experiments.

Yeah, that makes sense. I’ve shared it with the RL & dataset teams.

Thanks. Looking forward to hearing back from you!

Hi Nathan! Any updates on the matter from the RL or Dataset team?

Not really, sadly. I’ve been mostly working on non-RL things, but there’s note of this. Hopefully more can get built on it soon.