I have posted this to the beginner’s forum and unfortunately got no reply
When I train and upload a pickled xgboost model (with default hyperparameters) and use the default settings for HF Hosted Infernece API in Gradio
iface = gr.Interface.load(…)
I get this error from Gradio
DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, The experimental DMatrix parameter
enable_categorical must be set to
True. Invalid columns:x1: object, x2: object
I think the HF Hosted Infernece API , passes the values as object, where xgboost expects the float values,any idea on how to deal with this on my(client) side , would be much appreciated.
Could be a gradio bug. What’s the model/full code to reproduce @uisikdag ?
Thank you so much for your reply
The model is at
Would this be fine ?Also messaged you the full code @freddyaboulton
Thank you @uisikdag !
Yes, that is enough! I see that the Infence API also returns that error message so it may not be a problem with gradio.
Maybe @merve @adrin can look into this since they are skops maintainers? I can’t find Benjamin’s forum profile to tag him lol
It seems like xgboost “remembers” the dtypes of its inputs at training time and raises an error if they don’t match at inference time. The inference widget does not seem to perform any coercion, resulting in “object” dtype.
Not exactly the same issue, but sounds like it’s related: A dtype converter transformer · Issue #36 · skops-dev/skops · GitHub
@BenjaminB thank you very much for your reply, based on the exact same ground I trained the xgboost both with pandas.dataframe (float types) and converted df to numpy array and re-trained, both, unfortunately, gave the same error, current model at HF is the one trained with pandas.dataframe with float type input variables…
Could you please try calling predict on your local model (trained with df), with the input being a df with all dtypes being object. If that works, then it cannot only be the missing coercion.
@BenjaminB xgboost only allowed me to convert to int,float,boolean,category , as the nearest solution to object I converted all elements of x (x1,x2,x3) df to category
and trained with
model = XGBClassifier(tree_method=“gpu_hist”,enable_categorical=True)
unfortunately the problem seems to be persisting…
latest version of the model with trained with this option is at
as a note:
@BenjaminB @merve @adrin further ideas will be much appreciated…
I downloaded your model and created a data frame to predict with. It worked as long as the df used numerical dtypes, but when I cast it to object, it failed with the same error message as you got. So the problem is most likely that the inference API passes the data as object dtype.
One quick solution would be if it called
df.convert_dtypes() on the df before calling predict, but maybe there are situations where we don’t want that. Do you have any idea @merve?
@uisikdag There is not much you can do right now until we fix it on the inference API side. If you really need it right now, you could create an sklearn
Pipeline with a
FunctionTransformer as a first step that calls the mentioned method on the df, and with your xgboost model as second step.
I can work on this I think, it’s a bit tricky to convert dtypes on inference API side though as it requires to make assumptions.
@BenjaminB thank you so much for your suggestion, for the moment
pipe = Pipeline([('transform', FunctionTransformer(np.nan_to_num)),('classifier', XGBClassifier())])
pipe = Pipeline([('transform', FunctionTransformer(np.float64)),('classifier', XGBClassifier())])
seems to be working but I will be very happy to get notified if you change the inference API
Your help is very much appreciated