Skip to content

Commit 9abfe4b

Browse files
authored
[ML] fix NLP question_answering task when best answer is only one token (elastic#88347)
There are scenarios when question_answering find the best start/end token and they are the same token. An example of this is: context: "My name is Ben and I live in London" question: "Where do I live?" The correct answer here is London and its a single token. Without this fix, we will return in London with a lower probability.
1 parent 4762dc3 commit 9abfe4b

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/changelog/88347.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 88347
2+
summary: Fix NLP `question_answering` task when best answer is only one token
3+
area: Machine Learning
4+
type: bug
5+
issues: []

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/nlp/QuestionAnsweringProcessor.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ static void topScores(
212212
if (startNormalized[i] == 0) {
213213
continue;
214214
}
215-
for (int j = i + 1; j < (maxAnswerLength + i) && j < tokenSize; j++) {
215+
for (int j = i; j < (maxAnswerLength + i) && j < tokenSize; j++) {
216216
double score = startNormalized[i] * endNormalized[j];
217217
if (score > maxScore) {
218218
maxScore = score;
@@ -224,7 +224,7 @@ static void topScores(
224224
return;
225225
}
226226
for (int i = seq2Start; i < tokenSize; i++) {
227-
for (int j = i + 1; j < (maxAnswerLength + i) && j < tokenSize; j++) {
227+
for (int j = i; j < (maxAnswerLength + i) && j < tokenSize; j++) {
228228
topScoresCollector.accept(
229229
new ScoreAndIndices(i - seq2Start, j - seq2Start, startNormalized[i] * endNormalized[j], spanIndex)
230230
);

0 commit comments

Comments
 (0)