Hi,
I am trying to use a pre-trained Graphormer fintetuned on a custom dataset for regression. I can feed forward the data to the trainer without a problem, but what I canât do correctly is feed-forward the test set.
I tried the pipeline on the web,
pipe = pipeline(âgraph-mlâ, model=âclefourrier/graphormer-base-pcqm4mv2â), which throws the error,
KeyError Traceback (most recent call last)
Cell In[188], line 4
1 # Use a pipeline as a high-level helper
2 from transformers import pipeline
----> 4 pipe = pipeline(âgraph-mlâ, model=âclefourrier/graphormer-base-pcqm4mv2â)
File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/init.py:744, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
740 pipeline_class = get_class_from_dynamic_module(
741 class_ref, model, revision=revision, use_auth_token=use_auth_token
742 )
743 else:
â 744 normalized_task, targeted_task, task_options = check_task(task)
745 if pipeline_class is None:
746 pipeline_class = targeted_task[âimplâ]
File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/init.py:487, in check_task(task)
445 def check_task(task: str) â Tuple[str, Dict, Any]:
446 ââ"
447 Checks an incoming task string, to validate itâs correct and return the default Pipeline and Model classes, and
448 default models if they exist.
(âŚ)
485
486 ââ"
â 487 return PIPELINE_REGISTRY.check_task(task)
File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/base.py:1197, in PipelineRegistry.check_task(self, task)
1194 return task, targeted_task, (tokens[1], tokens[3])
1195 raise KeyError(f"Invalid translation task {task}, use âtranslation_XX_to_YYâ format")
â 1197 raise KeyError(
1198 f"Unknown task {task}, available tasks are {self.get_supported_tasks() + [âtranslation_XX_to_YYâ]}"
1199 )
KeyError: âUnknown task graph-ml, available tasks are [âaudio-classificationâ, âautomatic-speech-recognitionâ, âconversationalâ, âdepth-estimationâ, âdocument-question-answeringâ, âfeature-extractionâ, âfill-maskâ, âimage-classificationâ, âimage-segmentationâ, âimage-to-textâ, âmask-generationâ, ânerâ, âobject-detectionâ, âquestion-answeringâ, âsentiment-analysisâ, âsummarizationâ, âtable-question-answeringâ, âtext-classificationâ, âtext-generationâ, âtext2text-generationâ, âtoken-classificationâ, âtranslationâ, âvideo-classificationâ, âvisual-question-answeringâ, âvqaâ, âzero-shot-audio-classificationâ, âzero-shot-classificationâ, âzero-shot-image-classificationâ, âzero-shot-object-detectionâ, âtranslation_XX_to_YYâ]â
and then I tried the tokenizer,
tokenizer = AutoTokenizer.from_pretrained(âclefourrier/graphormer-base-pcqm4mv2â)
model = GraphormerForGraphClassification.from_pretrained(âclefourrier/graphormer-base-pcqm4mv2â)
which throws the following error,
KeyError Traceback (most recent call last)
Cell In[187], line 6
2 from transformers import AutoTokenizer, GraphormerForGraphClassification
4 model_checkpoint = âMpro_pcqm4mv2_trnsformerâ # pre-trained model from which to fine-tune
----> 6 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
7 model = GraphormerForGraphClassification.from_pretrained(âclefourrier/graphormer-base-pcqm4mv2â)
File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:718, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
716 model_type = config_class_to_model_type(type(config).name)
717 if model_type is not None:
â 718 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
719 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
720 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:674, in _LazyAutoMapping.getitem(self, key)
672 model_name = self._model_mapping[mtype]
673 return self._load_attr_from_module(mtype, model_name)
â 674 raise KeyError(key)
KeyError: <class âtransformers.models.graphormer.configuration_graphormer.GraphormerConfigâ>
!
In the training process, the GraphormerDataCollator() is used. I donât know how to pass as an argument when predicting.
I tried the script below to call the model for prediction,
with torch.no_grad():
outputs = model(
input_nodes=input_nodes,
input_edges=input_edges,
n_node = num_nodes,
attn_bias=attn_bias,
in_degree=in_degree,
out_degree=out_degree,
spatial_pos=spatial_pos,
attn_edge_type=attn_edge_type,
ignore_mismatched_sizes = True
)
I still get dimension errors despite using the preprocessed test split of the training data. My training set was processed following the Tutorial as such,
from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator
train = train_test_valid_dataset[âtrainâ].map(preprocess_item, batched=False)
test = train_test_valid_dataset[âtestâ].map(preprocess_item, batched=False)
val = train_test_valid_dataset[âvalidâ].map(preprocess_item, batched=False)
How the test data looks before processing,
Dataset({
features: [âedge_indexâ, âedge_attrâ, ânode_featâ, ânum_nodesâ, âyâ],
num_rows: 6
})
After processing,
Dataset({
features: [âedge_indexâ, âedge_attrâ, ânode_featâ, ânum_nodesâ, âyâ, âinput_nodesâ, âattn_biasâ, âattn_edge_typeâ, âspatial_posâ, âin_degreeâ, âout_degreeâ, âinput_edgesâ, âlabelsâ],
num_rows: 6
})
Could someone be kind enough and give me an example of how I can call the model for inference on one instance? @clefourrier, should I use a tokenizer for piping graphs? Does the data need to be processed differently for testing?
Specs:
transformers
version: 4.31.0- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.17
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.1
- Accelerate version: 0.21.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script? No
- Using distributed or parallel set-up in script? No