Can the T5 model classify codes such as codebert-small-v1?

kirilinko · April 27, 2025, 10:03am

Hello.
I’m doing code classification with codebert-small-v1, but as the maximum sequence is 512 tokens, this may limit me when faced with a certain amount of code (because of the size). On the other hand, I’ve noticed that T5 has a greater margin as regards the maximum sequence. Is it possible to use the T5 model for sort code classification to have the same output as codebert-small-v1? In the sense that I have the probability of appearance of each class of vulnerability in the code?

John6666 · April 27, 2025, 10:27am

I’m not familiar with it, but it seems possible.

kirilinko · April 28, 2025, 9:12am

But I’m a bit surprised, when I try to classify with “TFAutoModelForSequenceClassification”, I get an error telling me that model T5 is not compatible. However, with codeBert small, no problem. I want to try another model because, I lack performance in predictions. My current model manages to classify the code well according to the CWE around 8 classes, but not when the code is vulnerable (only two classes) Do you have any idea what to do?

John6666 · April 28, 2025, 12:50pm

Hmm…

github.com/huggingface/transformers

Problem running T5 (configuration) with text classification

opened 10:14PM - 25 Feb 21 UTC

closed 05:13PM - 26 Feb 21 UTC

ioana-blue

## Environment info - `transformers` version: 4.3.2 - Platform: Linux-4.18….0-193.el8.x86_64-x86_64-with-glibc2.10 - Python version: 3.8.3 - PyTorch version (GPU?): 1.5.1+cu101 (True) - Tensorflow version (GPU?): not installed (NA) - Using GPU in script?: yes - Using distributed or parallel set-up in script?: single gpu ### Who can help Perhaps @patrickvonplaten, @patil-suraj could help? ## Information Model I am using (Bert, XLNet ...): T5 The problem arises when using: * [ ] the official example scripts: (give details below) * [x] my own modified scripts: (give details below) The tasks I am working on is: * [ ] an official GLUE/SQUaD task: (give the name) * [x] my own task or dataset: (give details below) ## To reproduce I'm trying to run the T5 base model. It seems that I use the correct model path (i.e., t5-base) and it finds and downloads the model, but crashes when it tries to instantiate it. The problem seems to be around the configuration class not being found. This is what I get: ``` File "../../../models/tr-4.3.2/run_puppets.py", line 279, in main model = AutoModelForSequenceClassification.from_pretrained( File "/dccstor/redrug_ier/envs/last-tr/lib/python3.8/site-packages/transformers/models/auto/modeling_auto.py", line 1362, in from_pretrained raise ValueError( ValueError: Unrecognized configuration class <class 'transformers.models.t5.configuration_t5.T5Config'> for this kind of AutoModel: AutoModelForSequenceClassification. Model type should be one of ConvBertConfig, LEDConfig, DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, MBartConfig, BartConfig, LongformerConfig, RobertaConfig, SqueezeBertConfig, LayoutLMConfig, BertConfig, XLNetConfig, MobileBertConfig, FlaubertConfig, XLMConfig, ElectraConfig, FunnelConfig, DebertaConfig, GPT2Config, OpenAIGPTConfig, ReformerConfig, CTRLConfig, TransfoXLConfig, MPNetConfig, TapasConfig. ``` I dig a bit and I may have a hunch why this happens. The config file is there: https://github.com/huggingface/transformers/blob/master/src/transformers/models/t5/configuration_t5.py#L32 but it's not recorded here: https://github.com/huggingface/transformers/blob/master/src/transformers/models/auto/modeling_auto.py#L514 So the check here fails: https://github.com/huggingface/transformers/blob/master/src/transformers/models/auto/modeling_auto.py#L1389 And the ValueError is raised. I hope this is it. It looks like an easy fix :) Thanks! PS: I'm running the same scripts/files with other models without problems. This seems to be something specific to T5.

even though T5 can be used very well for text-classification it remains a text-to-text only model. So you can only load the model via
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained(“t5-small”)

kirilinko · April 30, 2025, 11:23am

thank you !

system · April 30, 2025, 11:24pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can the Salesforce's T5\codegen models can be used for classification? Beginners	0	218	July 17, 2022
Can the Salesforce’s T5\codegen models can be used for classification? Models	0	366	July 20, 2022
Fine-tuning CodeT5 for Regression Beginners	0	156	February 13, 2024
T5forConditionalGeneration + classification Models	3	1273	December 13, 2020
Text Binary Classification with Byt5 Models	0	465	June 20, 2022

Can the T5 model classify codes such as codebert-small-v1?

Related topics