I’m currently working on a project where i had a prototype working, as i’m now working on the production version i’m re adapting some of my formats, one of them is a pydantic class like this :
class Object(BaseModel) :
some variables here
class TheActualClass(BaseModel) :
var : Object
some other variables
With the production there is alot more variables in Object than there was with the prototype, leading to a strange behavior of TGI which i also encoutered when processing long context data (too large for my vram probably ?), TGI does not react at all, my gpu is not doing anything and it is kinda just frozen forcing me to reload the container because even after half an hour of waiting there is absolutely nothing that happens.
I did find a workaround by replacing the Object from a BaseModel to a TypedDict, but i’m kinda curious on why the engine is acting this way of just being frozen, but also because it is the engine i plan to use for the production case and if this happens in use that might become a pretty big problem.
Maybe anyone has an idea about what might be the problem ?
I don’t think so (or atleast this is not the only thing) as i had this exact same behavior with no formatting just very very long context, what’s really strange is that there is really nothing happening, no error to debug, no generation (i can hear the gpu when it generates usually), nothing happening on the docker logs, it just doesnt act at all and doesnt receive any query when this happens tgi just become unusable
With the workaround i found for formatting and cutting long context into smaller blocks, i didn’t encounter this again, maybe it is just because i’m at the limit of my hardware but i still find it strange that there is not even an error or anything
what’s really strange is that there is really nothing happening, no error to debug, no generation (i can hear the gpu when it generates usually), nothing happening on the docker logs, it just doesnt act at all and doesnt receive any query when this happens tgi just become unusable
Wow… that’s pretty weird. If TGI is crashing for some logical reason, there should be some kind of log or output…
The fact that the GPU fan isn’t spinning means that it didn’t even reach the point of loading the model, and it really did abort at the very beginning.
Well, it’s good that there seems to be a workaround…