Missing config.json file after AutoTraining in Colab (w/ Colab Pro Account on a V100)

Why is a config.json not generated by AutoTrain by default? Is there a specific setting or flag that needs to be enabled to output this file?
File Size Issue:
What could cause pytorch_model.bin to be so small (888 Bytes)?
Could this be a symptom of an incomplete or failed save operation?
Manual Configuration:
Are there standard procedures or checks to verify that a manually created config.json is accurate?
Are there tools to validate the config.json against the actual PyTorch model file?
Error Resolution:
How to resolve the OSError encountered while loading the model?
Are there specific requirements for the directory structure when loading models from a Hugging Face repository?
Model Integrity:
Given the missing config.json and the small size of pytorch_model.bin, are there steps to verify the integrity of the trained model?

Context:
Environment: Google Colab (Pro Version using a V100) for training.
Tool: Utilizing Hugging Face AutoTrain for fine-tuning a language model.
Sequence of Events:

Initial Training:
Successfully trained a NousResearch/Llama-2-7b-chat-hfmodel using AutoTrain on a dataset (Kabatubare/frederick).
Process seemingly completed without errors, resulting in several output files. But there was a missing config.json file, making it impossible to use.
Missing config.json:
Despite successful training, noticed that the config.json file was not generated.
Without config.json, the trained model cannot be loaded for inference or further training.

Manual Configuration:
Created a config.json manually (thanks GPT-4) based on the ā€˜base modelā€™ used for fine-tuning (NousResearch/Llama-2-7b-chat-hf) plus additional training and adapter parameters derived from the fine-tuned modelā€™s files AutoTrain uploads to the HF repository.
Uploaded this config.json to the Hugging Face repository where the model resides.
Upload to Repository:
Uploaded all relevant files, including pytorch_model.bin, adapter_config.json, adapter_model.bin, and others, to a Hugging Face repository named Kabatubare/meta_douglas_2.
Model Loading Error:
Attempted to load the model and encountered the following error:
vbnetCopy code
OSError: Kabatubare/meta_douglas_2 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt, or flax_model.msgpack.

File Size Anomaly:
Noticed that the size of the uploaded pytorch_model.bin is only 888 Bytes, which is far smaller than what is typical for such files.

Repository file structure had all these but no config.json:
adapter_config.json
adapter_model.bin
added_tokens.json
config.json (manually added)
pytorch_model.bin (888 Bytes, suspected to be incorrect or incomplete)
Tokenizer files (tokenizer.json, tokenizer.model, etc.)
Training parameters (training_args.bin, training_params.json)

1 Like