Saving outcomes if Error while applying map function on dataset

I use an API (like huggingface_hub) to let a language model answer questions from my dataset.
Since I want to send every single example to the language model, I wrote a function that does that and then use the map function to map this API call to every example of my dataset.

My issue is: If there is an Error at any point (e.g. the API throws an Error after one hour, because sth happend) I loose all the information. Lets say the map worked for the first 100 examples and then the API throws an Error, I loose the generated data of all the 100 examples that were handled correctly before the error happened.

Is there a way to let the map function return all the items it already finished, when an Error occurs. I do not know how to access this information and always have to start from new…

Hi! You can use try/except to catch the error and return None to avoid interrupting the processing procedure:


def get_api_response(ex):
    try:
        # API call...
        return {"response": "..."} 
    except Exception:
        return {"response": None}

dataset.map(get_api_response)

Thanks for that solution. I was already thinking about doing sth like that, but I thought the map function might have an option for doing it automatically…
But as you show it is very easy to implement on my own :slight_smile: