Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

scripts for proper BLEU evaluation, batch translation and averaging #488

Merged
merged 1 commit into from
Jan 9, 2018

Conversation

martinpopel
Copy link
Contributor

@martinpopel martinpopel commented Dec 23, 2017

t2t-bleu computes the "real" BLEU
(giving the same result as sacréBLEU with --tokenization intl
and as mteval-v14.pl with --international-tokenization).
It can be used in two ways:

  • To evaluate one translated file: t2t-bleu --translation=my-wmt13.de --reference=wmt13_deen.de
  • To evaluate all translations in a given directory (created e.g. by t2t-translate-all).

t2t-translate-all translates all checkpoints in a given directory (created by t2t-trainer or t2t-avg-all).
A custom command (e.g. SGE cluster wrapper) can be used instead of t2t-decoder for the translation.

t2t-avg-all for each checkpoint in a given directory
it averages it with the N preceding ones.

All three scripts wait a given number of minutes for new checkpoints
(produced by t2t-decoder, which can be run concurrently with these scripts).

This PR is an improved version of #436 (which was not fully merged).

`t2t-bleu` computes the "real" BLEU
(giving the same result as [sacréBLEU](https://github.com/awslabs/sockeye/tree/master/contrib/sacrebleu) with `--tokenization intl`
and as [mteval-v14.pl](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl) with `--international-tokenization`).
It can be used in two ways:
* To evaluate an already translated file: `t2t-bleu --translation=my-wmt13.de --reference=wmt13_deen.de`
* To evaluate all translations in a given directory.

`t2t-translate-all` translates all checkpoints in a given directory.
A custom command (e.g. SGE cluster wrapper) can be used instead of `t2t-decoder` for the translation.

`t2t-avg-all` for each checkpoint in a given directory
it averages it with the N preceding ones.

All three scripts wait a given number of minutes for new checkpoints
(produced by t2t-decoder, which can be run concurrently with these scripts).
@rsepassi rsepassi merged commit f55462a into tensorflow:master Jan 9, 2018
@rsepassi
Copy link
Contributor

rsepassi commented Jan 9, 2018

Thank you!

@terrychenism
Copy link

it seems has some small problems.

Traceback (most recent call last):
File "/usr/local/bin/t2t-bleu", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/workspace/tf_project/tensor2tensor/tensor2tensor/bin/t2t-bleu", line 137, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/workspace/tf_project/tensor2tensor/tensor2tensor/bin/t2t-bleu", line 80, in main
bleu = 100 * bleu_hook.bleu_wrapper(FLAGS.reference, FLAGS.translation, case_sensitive=False)
File "/workspace/tf_project/tensor2tensor/tensor2tensor/utils/bleu_hook.py", line 202, in bleu_wrapper
ref_tokens = [bleu_tokenize(x) for x in ref_lines]
File "/workspace/tf_project/tensor2tensor/tensor2tensor/utils/bleu_hook.py", line 187, in bleu_tokenize
string = UnicodeRegex.nondigit_punct_re.sub(r'\1 \2 ', string)
AttributeError: type object 'UnicodeRegex' has no attribute 'nondigit_punct_re'

@martinpopel
Copy link
Contributor Author

I see, I am sorry for the problems.
In my original code (#436), there was

class UnicodeRegex:
  """Ad-hoc hack to recognize all punctuation and symbols.

  without dependening on https://pypi.python.org/pypi/regex/."""
  def _property_chars(prefix):
    return ''.join(six.unichr(x) for x in range(sys.maxunicode)
                   if unicodedata.category(six.unichr(x)).startswith(prefix))
  punctuation = _property_chars('P')
  nondigit_punct_re = re.compile(r'([^\d])([' + punctuation + r'])')
  punct_nondigit_re = re.compile(r'([' + punctuation + r'])([^\d])')
  symbol_re = re.compile('([' + _property_chars('S') + '])')

which someone changed into

class UnicodeRegex(object):
  """Ad-hoc hack to recognize all punctuation and symbols."""

  def __init__(self):
    def _property_chars(prefix):
      return ''.join(six.unichr(x) for x in range(sys.maxunicode)
                     if unicodedata.category(six.unichr(x)).startswith(prefix))
    punctuation = self._property_chars('P')
    self.nondigit_punct_re = re.compile(r'([^\d])([' + punctuation + r'])')
    self.punct_nondigit_re = re.compile(r'([' + punctuation + r'])([^\d])')
    self.symbol_re = re.compile('([' + _property_chars('S') + '])')

but did not adapt accordingly the code few lines below.
In the end it is my fault, I haven't included any tests and rebased my new scripts on top of the master without checking the changes done in the master.
I will send a fix PR tomorrow.

@terrychenism
Copy link

Great! Thanks

@martinpopel
Copy link
Contributor Author

see #514

martinpopel added a commit to martinpopel/tensor2tensor that referenced this pull request Jan 17, 2018
and allow keeping timestamp in that case.
This is needed for t2t-translate-all + t2t-bleu to work
as expected (I forgot to add this commit to tensorflow#488).
rsepassi pushed a commit that referenced this pull request Jan 25, 2018
* no pip download progress bars in Travis log

see #523

* allow specifying --checkpoint_path with t2t-decoder

and allow keeping timestamp in that case.
This is needed for t2t-translate-all + t2t-bleu to work
as expected (I forgot to add this commit to #488).

* prevent tf.gfile.Glob crashes due to concurrent filesystem edits

tf.gfile.Glob may crash with
 tensorflow.python.framework.errors_impl.NotFoundError:
 xy/model.ckpt-1130761_temp_9cb4cb0b0f5f4382b5ea947aadfb7a40; No such file or directory
Let's use standard glob.glob instead, it seems to be more reliable.

* reintroducing FLAGS deleted by someone

this is needed for **locals() to work

* speedup BLEU tokenization

As I think about it, I would prefer my original implementation
https://github.com/tensorflow/tensor2tensor/blob/bb1173adce940e62c840970fa0f06f69fd9398db/tensor2tensor/utils/bleu_hook.py#L147-L157
But it seems there are some T2T/Google internal Python guidelines forbidding this,
so we have to live with the singleton.

* another solution of #523

* make save_checkpoints_secs work again

The functionality was broken during the adoption of TPU trainer_lib.py
instead of the original trainer_utils.py.
Currently, the default is to save checkpoints each 2000 steps,
while in previous T2T versions the default was each 10 minutes.

* adapt according to @rsepassi's review

* Update NotFoundError
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants