ZeroGPU issues: Assertion `srcIndex < srcSelectDimSize` failed

Hi I am quite new to ZeroGPU, thanks for any help in advance!

I am trying to utilize ZeroGPU, but it oddly keep moving my pipeline to CPU. When I manually specify pipeline.model.to('cuda'), the function returns a bunch of assertion error like this one: indexSelectLargeIndex: block: [30,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.

I have properly set the decloartor @spaces.GPU like this

    @spaces.GPU(duration=20)
    def __call__(self, *args):
        """Performs masked language modeling prediction.

        This method should be called only after the `load` method has been executed
        to ensure that the model and pipeline are properly initialized. It accepts
        arguments to pass to the Hugging Face fill-mask pipeline.

        Args:
            *args: Variable length argument list to pass to the pipeline.

        Returns:
            The output of the fill-mask pipeline.

        Raises:
            BrokenPipeError: If the model has not been loaded before calling this method.
        """
        if self.pipeline is None:
            msg = "Model was not initialized, have you run load()?"
            raise BrokenPipeError(msg)
        
        self.logger.info(f"Called with arguments {args = }")
        self.logger.info(f"Model: {self.pipeline.model.device = }")
        self.pipeline.model.to('cuda')
        pipe_out, = self.pipeline(*args)
        pipe_out = pipe_out['generated_text']
        self.logger.info(f"Generated text: {pipe_out}")
        
        # remove repeated lines by hard coding
        mo = re.search("\. (questionable|anterio|zius)", pipe_out)
        
        if mo is not None:
            end_sig = mo.start()
            pipe_out = pipe_out[:end_sig + 1]
        self.logger.info(f"Displayed text: {pipe_out}")
        return pipe_out

This function works as expected on my local machine.

The pipeline was created like this in a class method that’s called during gradio build:

            self.pipeline = hf_pipeline("text2text-generation", 
                                        model=self.model, 
                                        tokenizer=self.tokenizer, 
                                        device='cuda', 
                                        num_beams=4,
                                        do_sample=True,
                                        top_k = 5,
                                        temperature=.95,
                                        early_stopping=True,
                                        no_repeat_ngram_size=5, 
                                        max_new_tokens=60)
            self.pipeline.model.to('cuda') # I specify this to make sure it's in cuda

I am scratching my head hard now.

1 Like

The specifications for the Zero GPU space have changed significantly since the summer, making it much more difficult to get it to work. Well, it is a bug.
Please follow the details below.

1 Like