Hello everyone. I am using ibm granite 20b model for code generation task, its working pretty good but when I make my prompt and examples in prompt longer, it gets very slow… Can anyone tell how can I make it faster with longer prompts. I have already applied quantization etc
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Conversational pipeline by huggingface transformer taking too long to generate output | 0 | 843 | September 27, 2023 | |
Handle long generation in text generation pipeline | 0 | 510 | June 16, 2023 | |
Inference slows down after restrictions | 0 | 203 | March 22, 2021 | |
Optimize response time of model output | 0 | 675 | December 23, 2021 | |
Closest model available to OpenAI's codex/ GitHub Copilot for code completion | 6 | 7719 | August 7, 2023 |