Detailed Problem Summary
- Environment: Google Colab (Pro Version using a V100) for training.
- Tool: Utilizing Hugging Face AutoTrain for fine-tuning a language model.
Sequence of Events:
- Initial Training:
- Successfully trained a model using AutoTrain.
- Process seemingly completed without errors, resulting in several output files.
- Despite successful training, noticed that the
config.jsonfile was not generated.
config.json, the trained model cannot be loaded for inference or further training.
- Manual Configuration:
- Created a
config.jsonmanually based on a the ‘base model’ used for fine-tuning (NousResearch/Llama-2-7b-chat-hf) plus additional training and adapter parameters derived from the fine-tuned model’s files AutoTrain uploads to the HF repository.
- Uploaded this
config.jsonto the Hugging Face repository where the model resides.
- Upload to Repository:
- Uploaded all relevant files, including
adapter_model.bin, and others, to a Hugging Face repository named
- Model Loading Error:
- Attempted to load the model and encountered the following error:
OSError: Kabatubare/meta_douglas_2 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt, or flax_model.msgpack.
- File Size Anomaly:
- Noticed that the size of the uploaded
pytorch_model.binis only 888 Bytes, which is far smaller than what is typical for such files.
Repository File Structure:
pytorch_model.bin(888 Bytes, suspected to be incorrect or incomplete)
- Tokenizer files (
- Training parameters (
Specific Questions for the Hugging Face Community:
- Configuration File: Why is a
config.jsonnot generated by AutoTrain by default? Is there a specific setting or flag that needs to be enabled to output this file?
- File Size Issue:
- What could cause
pytorch_model.binto be so small (888 Bytes)?
- Could this be a symptom of an incomplete or failed save operation?
- Manual Configuration:
- Are there standard procedures or checks to verify that a manually created
- Are there tools to validate the
config.jsonagainst the actual PyTorch model file?
- Error Resolution:
- How to resolve the
OSErrorencountered while loading the model?
- Are there specific requirements for the directory structure when loading models from a Hugging Face repository?
- Model Integrity:
- Given the missing
config.jsonand the small size of
pytorch_model.bin, are there steps to verify the integrity of the trained model?