I found a bug in your Course

In section Main NLP taska in chapter Question Answering
function preprocess_training_examples
should have if offset[context_start][0] > end_char or offset[context_end][1] < start_char or offset[context_end][1] < end_char:
instead of if offset[context_start][0] > end_char or offset[context_end][1] < start_char
because if tokenized context contains only a part of the answer offset[context_end][1] is smaller than end_char which results in incorrect labels
try example 965 of training dataset
it sets end position on [SEP] token

1 Like

I think that instead of:

if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
    start_positions.append(0)
    end_positions.append(0)

It should be:

if offset[context_start][0] > start_char or offset[context_end][1] < end_char:
    start_positions.append(0)
    end_positions.append(0)

Am I correct? This is because earlier in the section, it’s written that:

β€œ We will also set those labels (0, 0) in the unfortunate case where the answer has been truncated so that we only have the start (or end) of it. ”

The following diagram explains this:

Context 1 fully contains the answer. Context 2 STARTS AFTER the answer STARTS. Context 3 ENDS BEFORE the answer ENDS.

1 Like

Yes I think you are right

Thanks for reporting this @Gozdi! It should now be fixed on the website :slight_smile:

1 Like