How to use DeepSparse in Transformer?

mwitiderrick · March 11, 2024, 6:29am

@fifthwheel you can do something like this

from typing import List, Optional
from langchain_community.llms import DeepSparse
from langchain.llms.utils import enforce_stop_tokens

class LLMService():
    model: object = None

    @property
    def _llm_type(self) -> str:
        return "deepsparse"

    def _call(self,
              prompt: str,
              stop: Optional[List[str]] = None) -> str:
        response = self.model(prompt)
        if stop is not None:
            response = enforce_stop_tokens(response, stop)
        return response

    def load_model(self, model_name_or_path: str = "hf:neuralmagic/mpt-7b-chat-pruned50-quant"):
        self.model = DeepSparse(
        model=model_name_or_path,
        model_config={"sequence_length": 2048},
        generation_config={"max_new_tokens": 300},
    )
        

if __name__ == '__main__':
    chatLLM = LLMService()
    chatLLM.load_model()

Topic		Replies	Views
Help with Sparse LLM Implementation 🤗Transformers	0	205	April 14, 2024
Converting DeepSpeech model to Transformers 🤗Transformers	0	305	September 17, 2021
How to modify Model Class with AutoModelForCausalLM.from_config 🤗Transformers	0	335	February 4, 2024
Fine-tuning Llama-7B Models	2	10637	May 2, 2023
Struggle with finetuneing flan-t5-xxl using deepspeed DeepSpeed	3	853	March 12, 2024

How to use DeepSparse in Transformer?

Related topics