The only way to do this from Python is to offload the torch model and tensors to the CPU from as appropriate a scope as possible, delete the objects themselves in detail, and then call gc and empty_cache() after making sure that the tensors are not being referenced from anywhere. Be careful, as there are cases where tqdm and other such tools are implicitly referencing them.
In other words, this is the current approach. If this doesn’t work, something is wrong. You should suspect a bug or a problem with the library.
Another method is to separate the execution of the model into a separate script and execute it in a sub-process. This way, the OS will manage the memory, so it is more forceful than Python. However, it is not clean and it takes time.
@not-lain This could be a tricky VRAM leak problem.