Problem in XGBoost with Hosted Infernece API

uisikdag · January 25, 2023, 9:35am

Hi,
I have posted this to the beginner’s forum and unfortunately got no reply

When I train and upload a pickled xgboost model (with default hyperparameters) and use the default settings for HF Hosted Infernece API in Gradio

iface = gr.Interface.load(…)
iface.launch()

I get this error from Gradio

DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, The experimental DMatrix parameterenable_categorical must be set to True. Invalid columns:x1: object, x2: object

I think the HF Hosted Infernece API , passes the values as object, where xgboost expects the float values,any idea on how to deal with this on my(client) side , would be much appreciated.

freddyaboulton · January 25, 2023, 10:33am

Could be a gradio bug. What’s the model/full code to reproduce @uisikdag ?

uisikdag · January 25, 2023, 10:51am

Thank you so much for your reply

The model is at

uisikdag/simple_clasi_okl25

Would this be fine ?Also messaged you the full code @freddyaboulton

freddyaboulton · January 25, 2023, 12:27pm

Thank you @uisikdag !

Yes, that is enough! I see that the Infence API also returns that error message so it may not be a problem with gradio.

Maybe @merve @adrin can look into this since they are skops maintainers? I can’t find Benjamin’s forum profile to tag him lol

BenjaminB · January 25, 2023, 1:12pm

It seems like xgboost “remembers” the dtypes of its inputs at training time and raises an error if they don’t match at inference time. The inference widget does not seem to perform any coercion, resulting in “object” dtype.

Not exactly the same issue, but sounds like it’s related: A dtype converter transformer · Issue #36 · skops-dev/skops · GitHub

uisikdag · January 25, 2023, 1:52pm

@BenjaminB thank you very much for your reply, based on the exact same ground I trained the xgboost both with pandas.dataframe (float types) and converted df to numpy array and re-trained, both, unfortunately, gave the same error, current model at HF is the one trained with pandas.dataframe with float type input variables…

BenjaminB · January 25, 2023, 2:17pm

Could you please try calling predict on your local model (trained with df), with the input being a df with all dtypes being object. If that works, then it cannot only be the missing coercion.

uisikdag · January 25, 2023, 4:52pm

@BenjaminB xgboost only allowed me to convert to int,float,boolean,category , as the nearest solution to object I converted all elements of x (x1,x2,x3) df to category

X=X.astype(‘category’)

and trained with

model = XGBClassifier(tree_method=“gpu_hist”,enable_categorical=True)

unfortunately the problem seems to be persisting…

latest version of the model with trained with this option is at

uisikdag/simple_clasi_okl25

as a note:

print(X.dtypes)
Temp category
CO2 category
Hum category
dtype: object

uisikdag · January 26, 2023, 4:32pm

@BenjaminB @merve @adrin further ideas will be much appreciated…

BenjaminB · January 30, 2023, 10:51am

I downloaded your model and created a data frame to predict with. It worked as long as the df used numerical dtypes, but when I cast it to object, it failed with the same error message as you got. So the problem is most likely that the inference API passes the data as object dtype.

One quick solution would be if it called df.convert_dtypes() on the df before calling predict, but maybe there are situations where we don’t want that. Do you have any idea @merve?

@uisikdag There is not much you can do right now until we fix it on the inference API side. If you really need it right now, you could create an sklearn Pipeline with a FunctionTransformer as a first step that calls the mentioned method on the df, and with your xgboost model as second step.

merve · January 30, 2023, 2:54pm

I can work on this I think, it’s a bit tricky to convert dtypes on inference API side though as it requires to make assumptions.

uisikdag · January 31, 2023, 12:47am

@BenjaminB thank you so much for your suggestion, for the moment

pipe = Pipeline([('transform', FunctionTransformer(np.nan_to_num)),('classifier', XGBClassifier())])

or

pipe = Pipeline([('transform', FunctionTransformer(np.float64)),('classifier', XGBClassifier())])

seems to be working but I will be very happy to get notified if you change the inference API

Your help is very much appreciated

Topic		Replies	Views
Help Needed-XGBoost Tabular Classification Beginners	0	522	January 7, 2023
Sometimes arbitrarily, I get an inference error with some of the examples in my gradio app 🔒 Gradio	2	1233	August 15, 2022
API of working gradio App responding with empty error Beginners	0	1019	January 8, 2024
RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float Models	0	2550	October 11, 2022
Error on Gradio - TypeError: issubclass() arg 1 must be a class 🔒 Gradio	1	5396	May 5, 2023

Problem in XGBoost with Hosted Infernece API

Related topics