[Announcement] Switching our git infra to Gitaly

[This is a copy of a message on our internal Slack, re-shared here to sollicit feedback from the community. Please ping us with any feedback/question!]

We are hard at work with the Infra and the Hub backend teams, to migrate our git storage to a new, more scalable system.

The number of repos hosted on hf.co has grown considerably (now 53k models, 9k datasets, 3k spaces) and, given that more and more people rely on the hub for Production use cases (30 million loads of pretrained models from Python libraries per day for instance), we also need to move from a Single-point-of-failure system to a distributed, Highly-available one (for instance, we want to minimize the risk of losing user repos…)

We are going to deploy Gitaly which is an infra package by GitLab, implemented in Go, which proxies calls to git repos over the network (gRPC calls) instead of on a local filesystem. GitLab and GitHub are pretty much, to my knowledge, the only two other companies which handle git repos at scale, so it’s good to reuse their open source work instead of rebuilding from scratch. (we have a collaboration channel with the Gitaly team over at GitLab :fire:)

The migration should be transparent from the outside world but let us know if you have any question. We’re going to be hard at work and focused on this task in the next ~week or so, so wanted to mention that for everyone to be aware of the context. The migration is primarily worked on by @XciD @anthony Eliott Coyac @pierric @sbrandeis :tada: with assists from the larger Infra and Hub team.

Ping us with any question!

15 Likes

also thanks to @coyotte508 !

and… it’s now live!!

(since last week actually)

5 Likes