-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Could you provide a consistent way to calculate BLEU? #405
Comments
Just depending how you tokenize your text could lead to different value. |
Our BLEU functions are correct, use the utils/get...bleu.sh script for results comparable with publications. |
I think no one uses the get_ende_bleu.sh script because two lines are hard-wrapped, so it cannot be executed as is: tensor2tensor/tensor2tensor/utils/get_ende_bleu.sh Lines 15 to 18 in 097ea5f
It would be nice to have a way how to compute the official BLEU (both case sensitive and insensitive) for a beam-searched translation (of a possibly avg_checkpointed model) and see this curve in TensorBoard. Now, when beam-search decoding is fast (4 minutes for 3000 sentences) it seems doable (I use --save_checkpoints_secs=3600 that is one checkpoint&evaluation per hour).It's on my todo list (together with character-based metrics, e.g. chrF3 or characTER), but unfortunately at the bottom of the list, which is rather a wish list:-). |
I plan to make a PR with a script as described in my previous post. |
Would be great to have as a metric! We're also thinking about reporting all future results (on test) with https://github.com/mjpost/sacreBLEU -- what do you think guys? |
I will send a PR soon (just tidying and testing).
Yes. Even if sacreBLEU is integrated into T2T, I will still need most of my new code, which evaluates all checkpoints in a directory and stores the curve in TensorBoard events file ( |
I did the PR: #436 |
It takes several steps to calculate BLEU. And it is not absolutely clear how BLEU should be calculated with the decoded text.
I would be nice to have a eval function just calculate the BLEU in the standard and correct way against given target.
The text was updated successfully, but these errors were encountered: