Hi,
I have posted this to the beginnerās forum and unfortunately got no reply
When I train and upload a pickled xgboost model (with default hyperparameters) and use the default settings for HF Hosted Infernece API in Gradio
iface = gr.Interface.load(ā¦)
iface.launch()
I get this error from Gradio
DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, The experimental DMatrix parameterenable_categorical must be set to True. Invalid columns:x1: object, x2: object
I think the HF Hosted Infernece API , passes the values as object, where xgboost expects the float values,any idea on how to deal with this on my(client) side , would be much appreciated.
It seems like xgboost āremembersā the dtypes of its inputs at training time and raises an error if they donāt match at inference time. The inference widget does not seem to perform any coercion, resulting in āobjectā dtype.
@BenjaminB thank you very much for your reply, based on the exact same ground I trained the xgboost both with pandas.dataframe (float types) and converted df to numpy array and re-trained, both, unfortunately, gave the same error, current model at HF is the one trained with pandas.dataframe with float type input variablesā¦
Could you please try calling predict on your local model (trained with df), with the input being a df with all dtypes being object. If that works, then it cannot only be the missing coercion.
@BenjaminB xgboost only allowed me to convert to int,float,boolean,category , as the nearest solution to object I converted all elements of x (x1,x2,x3) df to category
X=X.astype(ācategoryā)
and trained with
model = XGBClassifier(tree_method=āgpu_histā,enable_categorical=True)
unfortunately the problem seems to be persistingā¦
latest version of the model with trained with this option is at
uisikdag/simple_clasi_okl25
as a note:
print(X.dtypes)
Temp category
CO2 category
Hum category
dtype: object
I downloaded your model and created a data frame to predict with. It worked as long as the df used numerical dtypes, but when I cast it to object, it failed with the same error message as you got. So the problem is most likely that the inference API passes the data as object dtype.
One quick solution would be if it called df.convert_dtypes() on the df before calling predict, but maybe there are situations where we donāt want that. Do you have any idea @merve?
@uisikdag There is not much you can do right now until we fix it on the inference API side. If you really need it right now, you could create an sklearn Pipeline with a FunctionTransformer as a first step that calls the mentioned method on the df, and with your xgboost model as second step.