Hi I am quite new to ZeroGPU, thanks for any help in advance!
I am trying to utilize ZeroGPU, but it oddly keep moving my pipeline to CPU. When I manually specify pipeline.model.to('cuda')
, the function returns a bunch of assertion error like this one: indexSelectLargeIndex: block: [30,0,0], thread: [32,0,0] Assertion
srcIndex < srcSelectDimSize failed.
I have properly set the decloartor @spaces.GPU
like this
@spaces.GPU(duration=20)
def __call__(self, *args):
"""Performs masked language modeling prediction.
This method should be called only after the `load` method has been executed
to ensure that the model and pipeline are properly initialized. It accepts
arguments to pass to the Hugging Face fill-mask pipeline.
Args:
*args: Variable length argument list to pass to the pipeline.
Returns:
The output of the fill-mask pipeline.
Raises:
BrokenPipeError: If the model has not been loaded before calling this method.
"""
if self.pipeline is None:
msg = "Model was not initialized, have you run load()?"
raise BrokenPipeError(msg)
self.logger.info(f"Called with arguments {args = }")
self.logger.info(f"Model: {self.pipeline.model.device = }")
self.pipeline.model.to('cuda')
pipe_out, = self.pipeline(*args)
pipe_out = pipe_out['generated_text']
self.logger.info(f"Generated text: {pipe_out}")
# remove repeated lines by hard coding
mo = re.search("\. (questionable|anterio|zius)", pipe_out)
if mo is not None:
end_sig = mo.start()
pipe_out = pipe_out[:end_sig + 1]
self.logger.info(f"Displayed text: {pipe_out}")
return pipe_out
This function works as expected on my local machine.
The pipeline was created like this in a class method that’s called during gradio
build:
self.pipeline = hf_pipeline("text2text-generation",
model=self.model,
tokenizer=self.tokenizer,
device='cuda',
num_beams=4,
do_sample=True,
top_k = 5,
temperature=.95,
early_stopping=True,
no_repeat_ngram_size=5,
max_new_tokens=60)
self.pipeline.model.to('cuda') # I specify this to make sure it's in cuda
I am scratching my head hard now.