Runtime Error in Paid Space after upgrade and no info

Hi,

We have this space with a gradio app that was running perfectly last week with the free plan:

And since monday this week we got a Runtime Error saying:

# Runtime error
## Memory limit exceeded (16Gi)

The build logs were fine and the errors comes with the start up of the container. We decided to go for a paid version and after updating the error we have now is:

# Runtime error
## Memory limit exceeded (32Gi)

Please someone from infra, could you give us more information about what is going on here and how to solve this? We need to present a poster in a conference next week where we want to share this dashboard. @radames @michellehbn

Many thanks

Rosa
P.S.: I checked the forum and it was not helpful any similar posts since we have no info about the problem.

hi @cyberosa, could you trace your app memory footprint? perhaps running it locally to see if it’s not growing about the Space hardware limit?

Hi @radames ,

The app is working perfectly fine locally so we cannot reproduce the issue. Do you have any other logs of the container that could give us info why the app is working locally but not working in HF?

Best

P.S.: My local machine is a Mac-Pro M3 with 18 Gb RAM and I monitored the execution and could not detect any error.

Trying to restart the space again…

Screenshot 2024-05-24 at 17.15.23

Ok it did not work :frowning: Could someone increase the verbose of the docker logs so that I can get more information?

I recommend you logging the memory on your app and see the memory on logs both locally and on Spaces.
In terms of logs on Spaces, the log tab is all we could access as well.
Let me try running it here locally, I’ll let you know

1 Like

It would be helpful if someone could kill the container in HF, delete the old image to force building a new one. In my own experience with docker, and other colleagues confirm the same, sometimes rebuilding but without removing the old image does not work.

Many thanks for your help. Looking forward to your own local check :pray:

You can do that by doing a factory reboot. Have you tried it?

Factory rebuild

Click this button to trigger a factory rebuild of your Space. This will invalidate Docker layer caches and rebuild your space from scratch, reinstalling all dependencies.

Was not aware of this option… let me try

No. It did not help. Same error. This is the memory consumption of the app running on my computer:

I am playing with the app and it works perfectly fine. Any other ideas?

1 Like

hi it seems like there is a memory spike on prepare_data() which is causing the OOM issue.

Many thanks @radames May I ask which tool did you use to identify the source of trouble?
I will work on refactoring the app and try again my luck next week :slight_smile:

on linux you can run htop and watch memory usage, it peaked over 28.7RAM

1 Like

In the end, the issue was in the way we were computing errors in one parquet file. The app was parsing the data in the wrong way. Once that was changed the app is running again. Anyway I detected some performance improvements that can be done and we will be working on that soon.
Many thanks for your helpful support.

1 Like

Amazing! I’m glad it worked!

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.