Creating custom model

I’d like to train model on my custom dataset(I don’t want to use pretrained tokenizer and model). According to my understanding I should do following:

  1. Selecting desired network architecture - for example let it be ‘ibert’
  2. Get coresponding config file: config = AutoConfig.for_model(‘ibert’)
  3. Create model from config: model = AutoModel.from_config(config)
  4. Create tokenizer
    Here is first questin - can I create and train any tokenizer from ‘tokenizers’ pakage when I train own model from scratch?
    The second question - if I’d like to get tokenizer config from existing model, for example ‘allenai/scibert_scivocab_uncased’ - how can I do it? I don’t need pretrained tokenizer - I’d like to train it on my own dataset from scratch?