-
Notifications
You must be signed in to change notification settings - Fork 3.6k
scripts for proper BLEU evaluation, batch translation and averaging #488
Conversation
dd2387c
to
7fce2d2
Compare
`t2t-bleu` computes the "real" BLEU (giving the same result as [sacréBLEU](https://github.com/awslabs/sockeye/tree/master/contrib/sacrebleu) with `--tokenization intl` and as [mteval-v14.pl](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl) with `--international-tokenization`). It can be used in two ways: * To evaluate an already translated file: `t2t-bleu --translation=my-wmt13.de --reference=wmt13_deen.de` * To evaluate all translations in a given directory. `t2t-translate-all` translates all checkpoints in a given directory. A custom command (e.g. SGE cluster wrapper) can be used instead of `t2t-decoder` for the translation. `t2t-avg-all` for each checkpoint in a given directory it averages it with the N preceding ones. All three scripts wait a given number of minutes for new checkpoints (produced by t2t-decoder, which can be run concurrently with these scripts).
7fce2d2
to
584a290
Compare
Thank you! |
it seems has some small problems. Traceback (most recent call last): |
I see, I am sorry for the problems. class UnicodeRegex:
"""Ad-hoc hack to recognize all punctuation and symbols.
without dependening on https://pypi.python.org/pypi/regex/."""
def _property_chars(prefix):
return ''.join(six.unichr(x) for x in range(sys.maxunicode)
if unicodedata.category(six.unichr(x)).startswith(prefix))
punctuation = _property_chars('P')
nondigit_punct_re = re.compile(r'([^\d])([' + punctuation + r'])')
punct_nondigit_re = re.compile(r'([' + punctuation + r'])([^\d])')
symbol_re = re.compile('([' + _property_chars('S') + '])') which someone changed into class UnicodeRegex(object):
"""Ad-hoc hack to recognize all punctuation and symbols."""
def __init__(self):
def _property_chars(prefix):
return ''.join(six.unichr(x) for x in range(sys.maxunicode)
if unicodedata.category(six.unichr(x)).startswith(prefix))
punctuation = self._property_chars('P')
self.nondigit_punct_re = re.compile(r'([^\d])([' + punctuation + r'])')
self.punct_nondigit_re = re.compile(r'([' + punctuation + r'])([^\d])')
self.symbol_re = re.compile('([' + _property_chars('S') + '])') but did not adapt accordingly the code few lines below. |
Great! Thanks |
see #514 |
and allow keeping timestamp in that case. This is needed for t2t-translate-all + t2t-bleu to work as expected (I forgot to add this commit to tensorflow#488).
* no pip download progress bars in Travis log see #523 * allow specifying --checkpoint_path with t2t-decoder and allow keeping timestamp in that case. This is needed for t2t-translate-all + t2t-bleu to work as expected (I forgot to add this commit to #488). * prevent tf.gfile.Glob crashes due to concurrent filesystem edits tf.gfile.Glob may crash with tensorflow.python.framework.errors_impl.NotFoundError: xy/model.ckpt-1130761_temp_9cb4cb0b0f5f4382b5ea947aadfb7a40; No such file or directory Let's use standard glob.glob instead, it seems to be more reliable. * reintroducing FLAGS deleted by someone this is needed for **locals() to work * speedup BLEU tokenization As I think about it, I would prefer my original implementation https://github.com/tensorflow/tensor2tensor/blob/bb1173adce940e62c840970fa0f06f69fd9398db/tensor2tensor/utils/bleu_hook.py#L147-L157 But it seems there are some T2T/Google internal Python guidelines forbidding this, so we have to live with the singleton. * another solution of #523 * make save_checkpoints_secs work again The functionality was broken during the adoption of TPU trainer_lib.py instead of the original trainer_utils.py. Currently, the default is to save checkpoints each 2000 steps, while in previous T2T versions the default was each 10 minutes. * adapt according to @rsepassi's review * Update NotFoundError
t2t-bleu
computes the "real" BLEU(giving the same result as sacréBLEU with
--tokenization intl
and as mteval-v14.pl with
--international-tokenization
).It can be used in two ways:
t2t-bleu --translation=my-wmt13.de --reference=wmt13_deen.de
t2t-translate-all
).t2t-translate-all
translates all checkpoints in a given directory (created byt2t-trainer
ort2t-avg-all
).A custom command (e.g. SGE cluster wrapper) can be used instead of
t2t-decoder
for the translation.t2t-avg-all
for each checkpoint in a given directoryit averages it with the N preceding ones.
All three scripts wait a given number of minutes for new checkpoints
(produced by t2t-decoder, which can be run concurrently with these scripts).
This PR is an improved version of #436 (which was not fully merged).