ZeroGPU issues: Assertion `srcIndex < srcSelectDimSize` failed

mlwong · October 12, 2024, 12:31pm

Hi I am quite new to ZeroGPU, thanks for any help in advance!

I am trying to utilize ZeroGPU, but it oddly keep moving my pipeline to CPU. When I manually specify pipeline.model.to('cuda'), the function returns a bunch of assertion error like this one: indexSelectLargeIndex: block: [30,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.

I have properly set the decloartor @spaces.GPU like this

    @spaces.GPU(duration=20)
    def __call__(self, *args):
        """Performs masked language modeling prediction.

        This method should be called only after the `load` method has been executed
        to ensure that the model and pipeline are properly initialized. It accepts
        arguments to pass to the Hugging Face fill-mask pipeline.

        Args:
            *args: Variable length argument list to pass to the pipeline.

        Returns:
            The output of the fill-mask pipeline.

        Raises:
            BrokenPipeError: If the model has not been loaded before calling this method.
        """
        if self.pipeline is None:
            msg = "Model was not initialized, have you run load()?"
            raise BrokenPipeError(msg)
        
        self.logger.info(f"Called with arguments {args = }")
        self.logger.info(f"Model: {self.pipeline.model.device = }")
        self.pipeline.model.to('cuda')
        pipe_out, = self.pipeline(*args)
        pipe_out = pipe_out['generated_text']
        self.logger.info(f"Generated text: {pipe_out}")
        
        # remove repeated lines by hard coding
        mo = re.search("\. (questionable|anterio|zius)", pipe_out)
        
        if mo is not None:
            end_sig = mo.start()
            pipe_out = pipe_out[:end_sig + 1]
        self.logger.info(f"Displayed text: {pipe_out}")
        return pipe_out

This function works as expected on my local machine.

The pipeline was created like this in a class method that’s called during gradio build:

            self.pipeline = hf_pipeline("text2text-generation", 
                                        model=self.model, 
                                        tokenizer=self.tokenizer, 
                                        device='cuda', 
                                        num_beams=4,
                                        do_sample=True,
                                        top_k = 5,
                                        temperature=.95,
                                        early_stopping=True,
                                        no_repeat_ngram_size=5, 
                                        max_new_tokens=60)
            self.pipeline.model.to('cuda') # I specify this to make sure it's in cuda

I am scratching my head hard now.

John6666 · October 12, 2024, 1:08pm

The specifications for the Zero GPU space have changed significantly since the summer, making it much more difficult to get it to work. Well, it is a bug.
Please follow the details below.

Topic		Replies	Views
Issue with .to("cuda") on Space and ZeroGPU - RuntimeError: Expected all tensors to be on the same device Spaces	12	560	November 11, 2024
Error while initializing ZeroGPU Models	11	134	July 30, 2025
Accelerate multi-gpu error: Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" 🤗Accelerate	0	439	March 8, 2024
Error running model in zerogpu Spaces	6	333	October 3, 2024
I am using zero gpu put the embedings isnt working Spaces	2	109	February 28, 2025

ZeroGPU issues: Assertion `srcIndex < srcSelectDimSize` failed

Related topics