Accessing Local Files in Interface Endpoints

I’ve already created an issue on GitHub (Loading local files with Interface Endpoints doesn't work · Issue #2106 · huggingface/huggingface_hub · GitHub), but I’ll place my question here also.

I’m trying to load in a local .pkl file and use it with the Interface Endpoint of Hugging Face.

Code

from typing import  Dict, List, Any
import torch
from UniKP.build_vocab import WordVocab
from UniKP.pretrain_trfm import TrfmSeq2seq
from UniKP.utils import split
from transformers import T5EncoderModel, T5Tokenizer
from transformers import T5Tokenizer
import re
import gc
import numpy as np
import pickle
import math

class EndpointHandler():
	def __init__(self, path=""):
		self.tokenizer = T5Tokenizer.from_pretrained("Rostlab/prot_t5_xl_half_uniref50-enc", do_lower_case=False, torch_dtype=torch.float16)
		self.model = T5EncoderModel.from_pretrained("Rostlab/prot_t5_xl_half_uniref50-enc")

		self.trfm_pkl = torch.load("UniKP_models/trfm.pkl")
		self.vocab = WordVocab.load_vocab("UniKP_models/vocab.pkl")

		self.Km_model_path = "UniKP_models/Km.pkl"
		self.Kcat_model_path = "UniKP_models/Kcat.pkl"
		self.Kcat_over_Km_model_path = "UniKP_models/Kcat_over_Km.pkl"

Problem

For some reason, it can’t find the file named trfm_12_23000.pkl or vocab.pkl.

In the next section, the file structure will be shown.

image

Project Structure

C:.
│   .gitattributes
│   .gitignore
│   ** handler.py **
│   README.md
│   requirements.txt
│
├───UniKP
│   │
│   │   **build_vocab.py**
│   │    ...
│   │   utils.py
│   │   __init__.py
│   ...
│
│
└───UniKP_models
        Kcat.pkl
        Kcat_over_Km.pkl
        Km.pkl
        **trfm_12_23000.pkl**
        **vocab.pkl**

As you can see, the handler.py file is not in the same directory and needs to traverse the directory to find the models in the folder UniKP_models.

Things I’ve tried

  • Changing the path format to: “./UniKP …” and “…/UniKP …”. Both didn’t work
  • Using the hf_hub_download that can download files from repos. I tried downloading the file that is currently in my own repo, so that it returns a file path. I would’ve been able to use this file path, but it also didn’t work:
# example
self.REPO_ID = "repo/id"

self.vocab_path = hf_hub_download(self.REPO_ID, filename="UniKP_models/vocab.pkl", local_files_only=True)
self.trfm_pkl_path = hf_hub_download(self.REPO_ID, filename="UniKP_models/trfm.pkl", local_files_only=True)

self.trfm_pkl = torch.load(self.trfm_pkl_path)
self.vocab = WordVocab.load_vocab(self.vocab_path)

I’ve also tried looking for models on Hugging Face that use Interface Endpoints, but that also load in files locally (and especially .pkl files).

Furthermore, I also understand that .pkl files aren’t the most secure, but that shouldn’t stop Hugging Face from loading them in.

Question

Is it possible to load in local files from the same repository in a model using Interface Endpoint?

If anyone can help me with this, that would be greatly appreciated.
If there is any extra information that I need to provide, just ask.

Thank you for your time.

Reproduction

from typing import  Dict, List, Any
import torch
import pickle

class EndpointHandler():
	def __init__(self, path=""):
		self.trfm_pkl = torch.load("UniKP_models/trfm.pkl")

	def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
		""" 
                      Perform call
                """

		return []

System Info

- huggingface_hub version: 0.14.1
- Platform: Windows-10-10.0.22000-SP0
- Python version: 3.10.0
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Has saved token ?: True
- Configured git credential helpers: manager-core, store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.4.0
- hf_transfer: N/A
- gradio: N/A
- ENDPOINT: https://huggingface.co
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False

Thank you for your help.


The contents of this issue were also been posted on GitHub, as mentioned before. I’m copy-pasting the comment that I made on GitHub, since the issue has been resolved. Posting this here also in case someone has the same problem.


Since the Hugging Face problem is resolved, I can close this issue. The next headings will contain a recap of the conversation.

Main problem

The HF Interface Endpoints couldn’t find a certain file, when I did this: "vocab.pkl". The problem had nothing to do with the .pkl extension (found out later).

Solution

@philschmid came up with the solution to use this type of format: f"{path}/trfm.pkl". That solved the main issue.

Secondary problem

The secondary problem has nothing to do with Hugging Face or Interface Endpoints. The problem lies with pickle. It can’t find a certain class to unpickle the file. The same problem is described here:
https://blog.csdn.net/m0_45447650/article/details/135009018

Secondary solution

The solution that I’m going to implement is the following:

  • unpickle the file locally without the Interface Endpoints
  • save the contents of the pickle file to a .txt file in the following manner:
import pickle
from build_vocab import WordVocab

vocab_path = "vocab.pkl"

vocab = WordVocab.load_vocab(vocab_path)

vocab_content = "\n".join(vocab.itos)

with open("vocab_content.txt", "w") as f:
  f.write(vocab_content)
  • load the file back in the following manner:
# path to the vocab_content 
vocab_content_path = f"{path}/vocab_content.txt"

# load the vocab_content instead of the pickle file
with open(vocab_content_path, "r", encoding="utf-8") as f:
	vocab_content = f.read().strip().split("\n")

# load the vocab and trfm model
self.vocab = WordVocab(vocab_content)

This works, because the WordVocab class contains the texts parameter in the constructor. Might be a hacky way to solve this, but it works.

Thanks again to @Wauplin and @philschmid for helping me solve the main solution.


This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.