Question about BERT for qa

Hello,

I am very new to NLP and transformers and working/playing with BERT for question answering task and trying to understand. I already looked at example notebook but still have few outstanding questions. Say that I am trying to fine-tune BERT model using SQUAD,

  1. I am trying to understand output part. After running the model, it returns start_logits and end_logits. So say that my input is 50 tokens where first 10 are question and 40 are context part. The example I saw from the notebook sort logits from big to small in n_best_size and pick start and end index that find if it is within the context. But I am curious, don’t we have to 0 out first 10 and then sort n_best_size because otherwise, it contains index with question part? Also that could accidentally cause start_index being the question piece?

  2. About preprocessing training data, say that I set max_length=30. My context is long that it will break down into 5 after tokenizer. But my answer is also long (for example 50 tokens) that the answer span takes 3 chunks. From the notebook example, it is trying to find which chunk answer span, and due to this, it cannot find answer (since it takes multiple chunks). What is proper way to mark answer span for this case? I was thinking of making answer span from [token_start:index_end] for 1st chunk, [index_start:index_end] for 2nd chunk and [index_start:token_end] for 3rd chunk which index start and end is start and end index of corresponding context token.

  3. Similar to my question 1, but in question 2 situation, then how do I properly find span for answer if I end up using 3 chunks for answer?

Thank you so much for the help!