How to load an old (ko)bert .pt file, and convert the code that was supposed to run the old model into modern transformers/huggingface code

hey everyone, I essentially have code that was written for the old kobert (korean bert) and gluonnlp python libraries. I also have a .pt model file.
I would like to be able to run the trained model using huggingface.
The details:
The training seems to have been done on one of the kobert models (skt/kobert-base-v1 · Hugging Face or maybe monologg/kobert · Hugging Face) with these hyper parameters:
max_len = 512
batch_size = 6
warmup_ratio = 0.1
num_epochs = 20
max_grad_norm = 1
log_interval = 20
learning_rate = 5e-6
num_workers = 2
n_splits = 5
and this category list:
category_list = [“화장품”, “패션”, “요리음식”, “여행아웃도어”, “인테리어”, “엔터테인먼트”, “육아”, “아이티”, “자동차”, “헬스/피트니스”, “반려동물”]
e.g. it has been trained to take a single sentence and return a relevant category.
a dataset class has been defined like this:
from import Dataset
import gluonnlp as nlp

class BERTDataset(Dataset):
def init(self, dataset, bert_tokenizer, max_len,
pad, pair):
transform =
bert_tokenizer, max_seq_length=max_len, pad=pad, pair=pair)

    self.sentences = []

    for data in dataset:
        if len(data)<=max_len:

def __getitem__(self, i):
    return (self.sentences[i])

def __len__(self):
    return (len(self.sentences))

from what I can tell, beyond the ugly code, this is just equivalent to adding “truncation=True” in a tokenizer, but I’m not 100% sure.

a classifier has been defined which seems to just create the equivalent of ‘token_type_ids’ in a tokenizer:
from torch import nn

class BERTClassifier(nn.Module):
def init(self,
hidden_size = 768,
super(BERTClassifier, self).init()
self.bert = bert
self.dr_rate = dr_rate

    self.classifier = nn.Linear(hidden_size , num_classes)
    if dr_rate:
        self.dropout = nn.Dropout(p=dr_rate)

def gen_attention_mask(self, token_ids, valid_length):
    attention_mask = torch.zeros_like(token_ids)
    for i, v in enumerate(valid_length):
        attention_mask[i][:v] = 1
    return attention_mask.float()

def forward(self, token_ids, valid_length, segment_ids):
    attention_mask = self.gen_attention_mask(token_ids, valid_length)
    _, pooler = self.bert(input_ids = token_ids, token_type_ids = segment_ids.long(), attention_mask = attention_mask.float().to(token_ids.device))
    if self.dr_rate:
        out = self.dropout(pooler)
    return self.classifier(out)

and a tokenizer & initial model are loaded:
from kobert.utils import get_tokenizer
from kobert.pytorch_kobert import get_pytorch_kobert_model

bertmodel, vocab = get_pytorch_kobert_model()

tokenizer = get_tokenizer()
tok =, vocab, lower=False)
threshold = 5.26
device = torch.device(“cuda:0”)

modelbest = torch.load(model_name, map_location=device)

then the model is used like this:
def GetMediaCategory(captionlist):
for i in range(len(captionlist)):
captionlist[i] = unicodedata.normalize(‘NFC’,captionlist[i])
captionlist[i] = ’ ‘.join(re.compile(’[가-힣]+').findall(captionlist[i]))
if len(captionlist[i]) == 0:
captionlist[i] = ‘기타’

datalist = BERTDataset(captionlist, tok, max_len, True, False)
test_dataloader =, batch_size=batch_size, num_workers=num_workers)

for batch_id,(token_ids, valid_length, segment_ids) in enumerate(notebook.tqdm(test_dataloader)):
    token_ids = token_ids.long().to(device)
    segment_ids = segment_ids.long().to(device)
    valid_length= valid_length
    outlist = []
    valuelist = []
    out = modelbest(token_ids, valid_length, segment_ids)
    for outi in out:
        if outi.max().tolist() > threshold:

return wholeout, wholevalue

again, please excuse the shitty code. I’ve refactored it, but since I couldn’t run the original or refactored code, it felt wrong to share undebugged code. presumably this was debugged by the original authors :confused:

anyhow, I hope this is enough information to be able to tell how to load and use the trained model, using the huggingface library ecosystem. If more of the original code would be useful, I’ll gladly upload it, I just thought that these pieces might be the most relevant ones, if at all.