-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Conversation
thank you, this would be really useful, I was thinking it would be an issue for token tokenization but mose's fine. I did not remember if mose support Chinese tho. |
@colmantse: there is a Chinese-tokenization version of BLEU (used in recent WMT) https://github.com/mjpost/sacreBLEU/blob/b38690e1537cd4719c3517ef77c8255c5a107cc8/sacrebleu.py#L414-L528 |
WMT17 homepage references the same chinese character n-gram here: http://www.statmt.org/wmt17/tokenizeChinese.py |
Fix BLEU computation for edge case of no matching 4-gram (or trigram,...). Smoothing is the default in the official BLEU implementation https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl#L843-L885 (Smoothing is not present in multi-bleu.perl, but this script explicitly says it is internal purposes only and it is recommended to use mteval-v14.pl instead.)
so one can compute real BLEU on two files (MT translation=hypothesis and reference). I've tested this on few datasets and it seems to agree with the official implementation mteval-v14.pl.
not necessarily the latest one in a given output_dir
So it can be used for continous evaluation or for resuming older evaluation from a checkpoint with a given number of steps. It is also possible to specify the name of the events subdirectory and tag suffix.
Fix for the case when evaluating one checkpoint takes longer than creating a new checkpoint.
Hello,
|
@Gldkslfmsd thanks for bug report, but I believe this is not related to this PR (although we could fix it here) - I have not changed the way brevity penalty is computed. |
Yes. And maybe it can happen if the reference file is empty.
+1 |
with a clear error message instead of misleading "division by zero"
@Gldkslfmsd: your error was because of empty translation. Empty reference would fail elsewhere. |
It happened because I was making my own extension to this code and I was
debugging it on a one-line source file. Some models translated it as a
blank file, but it was a valid translation. So I would vote for no crash in
this case.
2017-11-30 20:41 GMT+01:00 Martin Popel <[email protected]>:
… @Gldkslfmsd <https://github.com/gldkslfmsd>: your error was because of
empty translation. Empty reference would fail elsewhere.
Either way, I changed my mind and I think this is very suspicious, so we
should fail with a clear error
rather than return bleu=0. Thus I have added two assertions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#436 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFEYXev9snMyS5QlMv63-dclmdeVlYG7ks5s7wT3gaJpZM4QngOp>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
FYI: |
@martinpopel @lukaszkaiser it seems that the changes of this PR were removed from master in 20c7e41. |
@noe: Yes. It was not intentional removal within that commit, this is just a consequence of the chosen merging strategy between github and the internal google repo. This PR has not been fully integrated into the internal repo. @lukaszkaiser wrote it was because of Python2 problems, but I am not aware of any (those I marked in the code with comments, ie. the |
This PR adds a
t2t-bleu
script which computes "real" BLEU (giving the same result as sacréBLEU with--tokenization intl
and as mteval-v14.pl with--international-tokenization
).It can be used in two ways:
t2t-bleu --translation=my-wmt13.de --reference=wmt13_deen.de
I find the second way more useful than
t2t-trainer --schedule=continous_eval
because:--schedule=continous_eval
which uses the newest checkpoint always).Now, I have rebased this PR to v1.3.0.