Problem in XGBoost with Hosted Infernece API

Hi,
I have posted this to the beginner’s forum and unfortunately got no reply

When I train and upload a pickled xgboost model (with default hyperparameters) and use the default settings for HF Hosted Infernece API in Gradio

iface = gr.Interface.load(…)
iface.launch()

I get this error from Gradio

DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, The experimental DMatrix parameterenable_categorical must be set to True. Invalid columns:x1: object, x2: object

I think the HF Hosted Infernece API , passes the values as object, where xgboost expects the float values,any idea on how to deal with this on my(client) side , would be much appreciated.

1 Like

Could be a gradio bug. What’s the model/full code to reproduce @uisikdag ?

1 Like

Thank you so much for your reply

The model is at

uisikdag/simple_clasi_okl25

Would this be fine ?Also messaged you the full code @freddyaboulton

Thank you @uisikdag !

Yes, that is enough! I see that the Infence API also returns that error message so it may not be a problem with gradio.

Maybe @merve @adrin can look into this since they are skops maintainers? I can’t find Benjamin’s forum profile to tag him lol

1 Like

It seems like xgboost “remembers” the dtypes of its inputs at training time and raises an error if they don’t match at inference time. The inference widget does not seem to perform any coercion, resulting in “object” dtype.

Not exactly the same issue, but sounds like it’s related: A dtype converter transformer · Issue #36 · skops-dev/skops · GitHub

1 Like

@BenjaminB thank you very much for your reply, based on the exact same ground I trained the xgboost both with pandas.dataframe (float types) and converted df to numpy array and re-trained, both, unfortunately, gave the same error, current model at HF is the one trained with pandas.dataframe with float type input variables…

Could you please try calling predict on your local model (trained with df), with the input being a df with all dtypes being object. If that works, then it cannot only be the missing coercion.

1 Like

@BenjaminB xgboost only allowed me to convert to int,float,boolean,category , as the nearest solution to object I converted all elements of x (x1,x2,x3) df to category

X=X.astype(‘category’)

and trained with

model = XGBClassifier(tree_method=“gpu_hist”,enable_categorical=True)

unfortunately the problem seems to be persisting…

latest version of the model with trained with this option is at

uisikdag/simple_clasi_okl25

as a note:

print(X.dtypes)
Temp category
CO2 category
Hum category
dtype: object

@BenjaminB @merve @adrin further ideas will be much appreciated…

I downloaded your model and created a data frame to predict with. It worked as long as the df used numerical dtypes, but when I cast it to object, it failed with the same error message as you got. So the problem is most likely that the inference API passes the data as object dtype.

One quick solution would be if it called df.convert_dtypes() on the df before calling predict, but maybe there are situations where we don’t want that. Do you have any idea @merve?

@uisikdag There is not much you can do right now until we fix it on the inference API side. If you really need it right now, you could create an sklearn Pipeline with a FunctionTransformer as a first step that calls the mentioned method on the df, and with your xgboost model as second step.

1 Like

I can work on this I think, it’s a bit tricky to convert dtypes on inference API side though as it requires to make assumptions. :confused:

1 Like

@BenjaminB thank you so much for your suggestion, for the moment

pipe = Pipeline([('transform', FunctionTransformer(np.nan_to_num)),('classifier', XGBClassifier())])

or

pipe = Pipeline([('transform', FunctionTransformer(np.float64)),('classifier', XGBClassifier())])

seems to be working :slightly_smiling_face: but I will be very happy to get notified if you change the inference API

Your help is very much appreciated :pray: