From Crypto Mining to LLM Fine-tuning: Unlocking Large Language Model Fine-tuning through Collaborative Compute Pools

I would like to initiate a discussion on the concept of collaborative computing pools for LLM fine-tuning. Imagine a world where anyone, not just tech giants with supercomputers, can contribute to the cutting edge AI research. This vision becomes closer to reality with the concept of collaborative computing pools for LLM fine-tuning. Inspired by mining pools in the cryptocurrency world, these pools would aggregate individual computing resources to tackle the immense computational demands of fine-tuning LLMs.

Why is this necessary? While the latest advancements allow fine-tuning on consumer GPUs, their limited memory (typically 6-8 GB) makes them unsuitable for handling even the smallest of open-source LLMs like the Llama 7B. Pooling resources unlocks the potential to fine-tune even the larger models with tens of billions of parameters, democratizing access to LLM development.

This aligns perfectly with the open-source ethos shaping the LLM landscape. Just as open-source data, models, and knowledge have fueled rapid progress, this kind of open-source compute could be the next game-changer. Individual contributions converge into a shared resource hub, enabling users to tap into a vast compute reservoir for LLM fine-tuning.

Technically, this hinges on model parallelism, splitting the LLM across multiple devices, distributed training, communication, and synchronization. DeepSpeed and Megatron-LM could be potential libraries facilitating this. Data parallelism can also be employed to further scale the training process.

The pool could implement a voting system where users propose diverse methods for model training, and the community votes on the most promising approaches and then decides how to utilize the shared resources. This fosters knowledge sharing, research collaboration, and lowers the barrier to entry for newcomers.

I am interested in hearing the thoughts and insights of the community on the feasibility, potential issues, and challenges of this concept. I am particularly interested in discussing any specific model parallelism or communication frameworks that would be well-suited for its implementation.

6 Likes

I think its a great idea.
I have thought about these things myself.
I was wondering if perhaps distributed support could be built into for example pytorch.

I would love work with the pytorch C++ code if that approach could be viable.

2 Likes

I would like to continue the discussion on an interesting topic. Are there any specialists among you who know how to effectively combine the computing power of mining rigs? I am particularly interested in the challenges involved and how they might be addressed.

For instance, many people still have old rigs with 10 NVIDIA 1060 graphics cards assembled in one system. These cards have significant resources, with numerous CUDA cores and access to memory. Some people run computations on the CPU, but graphics cards have a much larger number of parallel processing cores.

As far as I understand, high bandwidth between the graphics cards is not required for such tasks. The main computations can be performed within each card, and the results can be aggregated at the output. In the case of, for example, diffusion models, the final result is a small image (about a few kilobytes).

The question is: can these tasks be effectively parallelized across multiple graphics cards to use their combined power? Perhaps some of you have already dealt with similar tasks and can suggest existing approaches or potential bottlenecks?

I would be glad to hear your ideas, suggestions, or links to useful resources.