From Crypto Mining to LLM Fine-tuning: Unlocking Large Language Model Fine-tuning through Collaborative Compute Pools

I would like to initiate a discussion on the concept of collaborative computing pools for LLM fine-tuning. Imagine a world where anyone, not just tech giants with supercomputers, can contribute to the cutting edge AI research. This vision becomes closer to reality with the concept of collaborative computing pools for LLM fine-tuning. Inspired by mining pools in the cryptocurrency world, these pools would aggregate individual computing resources to tackle the immense computational demands of fine-tuning LLMs.

Why is this necessary? While the latest advancements allow fine-tuning on consumer GPUs, their limited memory (typically 6-8 GB) makes them unsuitable for handling even the smallest of open-source LLMs like the Llama 7B. Pooling resources unlocks the potential to fine-tune even the larger models with tens of billions of parameters, democratizing access to LLM development.

This aligns perfectly with the open-source ethos shaping the LLM landscape. Just as open-source data, models, and knowledge have fueled rapid progress, this kind of open-source compute could be the next game-changer. Individual contributions converge into a shared resource hub, enabling users to tap into a vast compute reservoir for LLM fine-tuning.

Technically, this hinges on model parallelism, splitting the LLM across multiple devices, distributed training, communication, and synchronization. DeepSpeed and Megatron-LM could be potential libraries facilitating this. Data parallelism can also be employed to further scale the training process.

The pool could implement a voting system where users propose diverse methods for model training, and the community votes on the most promising approaches and then decides how to utilize the shared resources. This fosters knowledge sharing, research collaboration, and lowers the barrier to entry for newcomers.

I am interested in hearing the thoughts and insights of the community on the feasibility, potential issues, and challenges of this concept. I am particularly interested in discussing any specific model parallelism or communication frameworks that would be well-suited for its implementation.


I think its a great idea.
I have thought about these things myself.
I was wondering if perhaps distributed support could be built into for example pytorch.

I would love work with the pytorch C++ code if that approach could be viable.