Google/flan-ul2

queeny · July 3, 2023, 10:35am

*How can we optimize the hardware when employing the aforementioned model?

I’m training the aforementioned model for a chatbot using Langchain on dominolab gpu, but I’d like to know how to run it locally without requiring extremely expensive hardware.

OPTIMIZING THE COST

nielsr · July 3, 2023, 2:11pm

Hi,

Hugging Face provides the Optimum library to optimize HF models: 🤗 Optimum. This includes things like ONNX export (ONNX is an efficient format to store neural networks), quantization (rather than using 32 bits = 4 bytes to store each parameter, you can use 4 or 8 bits).

There’s also frameworks like TGI and ggml which make sure chatbot-like models run as fast as possible, even on your local laptop. Both of them seem to support the T5 architecture, which google/flan-UL2 uses.

Topic		Replies	Views
Optimizing models using ONNX Models	1	1117	October 21, 2020
How to convert hf model to optimized model with kv-caching Beginners	0	83	August 26, 2024
Optimisation and Quantization of Tensorflow Model 🤗Optimum	1	658	May 3, 2023
Supporting ONNX optimized models 🤗Transformers	16	3042	September 1, 2021
ONNX Flan-T5 Model OOM on GPU 🤗Optimum	2	2632	June 15, 2023

Google/flan-ul2

Related topics