Avoid loading checkpoint shards for each inference

alzaia · December 19, 2023, 3:35pm

I am currently using LLaVA for inference and I was wondering if there was a way to avoid reloading the checkpoint shards every time I predict a new sample. For reference, I am closely following the doc and the colab here: llava-hf/llava-1.5-7b-hf · Hugging Face.

enochlev · December 19, 2023, 6:31pm

provide us with a snippet from your code

otherwize I can only say don’t call this function every time you do an inference

LlavaForConditionalGeneration.from_pretrained

alzaia · December 19, 2023, 7:48pm

I’m using the pipeline from HuggingFace, so my code is super simple:

        image = Image.open(image_path)

        pipe = pipeline(
            "image-to-text",
            model=model_id,
            model_kwargs={"quantization_config": quantization_config},
        )

        output = pipe(
            image, prompt=text_prompt, generate_kwargs={"max_new_tokens": 200}
        )

Should I use pure transformers library instead to have more control?

Topic		Replies	Views
How to avert 'loading checkpoint shards'? 🤗Transformers	4	12804	November 1, 2024
Download fails for llava-hf/bakLlava-v1-hf 🤗Transformers	0	236	July 1, 2024
How can I load specific checkpoint of trained model 🤗Transformers	0	612	April 28, 2022
Continual Training on my own checkpoint 🤗Transformers	1	83	June 27, 2024
Inference Endpoints creation Intermediate	1	467	January 14, 2024

Avoid loading checkpoint shards for each inference

Related topics