I found a bug in your Course

In section Main NLP taska in chapter Question Answering
function preprocess_training_examples
should have if offset[context_start][0] > end_char or offset[context_end][1] < start_char or offset[context_end][1] < end_char:
instead of if offset[context_start][0] > end_char or offset[context_end][1] < start_char
because if tokenized context contains only a part of the answer offset[context_end][1] is smaller than end_char which results in incorrect labels
try example 965 of training dataset
it sets end position on [SEP] token

1 Like

I think that instead of:

if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
    start_positions.append(0)
    end_positions.append(0)

It should be:

if offset[context_start][0] > start_char or offset[context_end][1] < end_char:
    start_positions.append(0)
    end_positions.append(0)

Am I correct? This is because earlier in the section, it’s written that:

We will also set those labels (0, 0) in the unfortunate case where the answer has been truncated so that we only have the start (or end) of it.

The following diagram explains this:

Context 1 fully contains the answer. Context 2 STARTS AFTER the answer STARTS. Context 3 ENDS BEFORE the answer ENDS.

1 Like

Yes I think you are right

Thanks for reporting this @Gozdi! It should now be fixed on the website :slight_smile:

1 Like