Then, I tried to avoid saving the model.base_model.save_pretrained()
Step 0: Remove the previously saved local model directory
! rm -rf "alvations/ALMA-7B-R"
Step 1: Save the model + tokenizer without the model.base_model
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# Load base model and LoRA weights
model = AutoModelForCausalLM.from_pretrained("haoranxu/ALMA-7B-R", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("haoranxu/ALMA-7B-R", padding_side='left')
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
model.save_pretrained("alvations/ALMA-7B-R")
tokenizer.save_pretrained("alvations/ALMA-7B-R")
import os
os._exit(00)
[out]:
['Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish: I love machine translation.']
Step 2: Local the model + tokenizer from the local directory
It sort of works alright now.
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"alvations/ALMA-7B-R",
local_files_only=True,
torch_dtype=torch.float16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"alvations/ALMA-7B-R",
local_files_only=True
)
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
import os
os._exit(00)
[out]:
['Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish: I love machine translation.']
Step 3a: Make sure HF_HUB_OFFLINE=1
works and throws an error for a model not found locally
! HF_HUB_OFFLINE=1 python -c 'from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")'
[out]:
! HF_HUB_OFFLINE=1 python -c 'from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
metadata = get_hf_file_metadata(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
r = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
response = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_http.py", line 78, in send
raise OfflineModeIsEnabled(
huggingface_hub.utils._http.OfflineModeIsEnabled: Cannot reach https://huggingface.co/facebook/nllb-200-distilled-600M/resolve/main/config.json: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 398, in cached_file
resolved_file = hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1371, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 782, in from_pretrained
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1111, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 633, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 688, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 441, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like facebook/nllb-200-distilled-600M is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Step3b: Run Step 2 with HF_HUB_OFFLINE=1
This seems to work with offline model directory saved using .save_pretrained(...)
%%writefile test.py
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"alvations/ALMA-7B-R",
local_files_only=True,
torch_dtype=torch.float16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"alvations/ALMA-7B-R",
local_files_only=True
)
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
Then:
! HF_HUB_OFFLINE=1 python test.py
[out]:
['Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish: I love machine translation.']
Step 4a: Now lets push that model to Huggingface Hub
from huggingface_hub import notebook_login
notebook_login()
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"alvations/ALMA-7B-R",
local_files_only=True,
torch_dtype=torch.float16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"alvations/ALMA-7B-R",
local_files_only=True
)
model.push_to_hub("alvations/ALMA-7B-R-remerged")
tokenizer.push_to_hub("alvations/ALMA-7B-R-remerged")
import os
os._exit(00)
Step 4b: Redownload the model from HF hub to a local directory using snapshot_download()
from huggingface_hub import snapshot_download
snapshot_download("alvations/ALMA-7B-R-remerged", cache_dir="mynewcachedir")
Step 4c: (With Internet) Reload the model not from the local directory but from the cache_dir
! rm -rf alvations/*
Then:
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"alvations/ALMA-7B-R-remerged",
local_files_only=True,
cache_dir="mynewcachedir",
torch_dtype=torch.float16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"alvations/ALMA-7B-R-remerged",
cache_dir="mynewcachedir",
local_files_only=True
)
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
import os
os._exit(00)
[out]:
['Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish: I love machine translation.']
Step 5: (Without Internet) Reload the model not from the local directory but from the cache_dir
with HF_HUB_OFFLINE=1
%%writefile test2.py
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"alvations/ALMA-7B-R-remerged",
local_files_only=True,
cache_dir="mynewcachedir",
torch_dtype=torch.float16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"alvations/ALMA-7B-R-remerged",
cache_dir="mynewcachedir",
local_files_only=True
)
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
Then:
! HF_HUB_OFFLINE=1 python test2.py
[out]:
['Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish: I love machine translation.']
Then we try:
! HF_HUB_OFFLINE=1 TRANSFORMER_OFFLINE=1 python test2.py
[out]:
['Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish: I love machine translation.']
Q: Then why do we get different errors initially with Huggingface accessing Hub to resolve safetensors ?
A: Maybe different transformers and tokenizers version?
FYI, all the above in this comment is from colab with these pip freeze
:
transformers==4.38.2
tokenizers==0.15.2