Setting max_length low makes BLEU unexpectedly worse #582

martinpopel · 2018-02-13T18:28:53Z

Sentences longer than the parameter max_length are excluded from training and lowering this parameter helps to prevent OOM errors and allows to use higher batch_size, so it is quite useful.
Unfortunately, setting this parameter too low results in low BLEU and retarded learning curves. The graph below shows curves (evaluated on dev set) for max_length 25, 50, 70, 150, 200 and 400:

There are two possible explanations, but I think both of them are false:

Setting max_length too low makes the training data smaller. However, with max_length=70 only 2.1% of my training sentences are excluded. Moreover, the "70" BLEU curve is decreasing after the first hour of training, while processing the whole training data (one epoch) takes more than two days of training.
A model trained on short sentences only does not achieve good results when applied on long sentences. However, there are only 2.2% sentences longer than 70 subwords in my dev set (and 0.3% sentences longer than 100 subwords), so this does not seem to be the cause either.

When I increased the batch_size from 1500 to 2000, the results improved: the "25" and "50" curves were still retarded, but "70" and higher achieved the same result as when training without any max_length restriction.
Can someone explain this? Or even fix it if it is a bug?

The text was updated successfully, but these errors were encountered:

noe · 2018-02-14T09:58:32Z

@martinpopel are these numbers from tensor2tensor 1.2.9 or from a more recent version? (I ask this in relation to bug #529 , as 1.2.9 is the version some of us are working in).

martinpopel · 2018-02-14T11:06:48Z

@noe: Yes, these numbers (graph) are with 1.2.9.

mehmedes · 2018-02-14T18:16:42Z

@martinpopel How did you find out how many subwords your sentences have?

martinpopel · 2018-02-14T18:46:45Z

@mehmedes using this ad-hoc script

rsepassi added the question label Mar 20, 2018

martinpopel mentioned this issue Mar 29, 2018

Incorrect metrics or bad decodding or both? #671

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting max_length low makes BLEU unexpectedly worse #582

Setting max_length low makes BLEU unexpectedly worse #582

martinpopel commented Feb 13, 2018

noe commented Feb 14, 2018 •

edited

Loading

martinpopel commented Feb 14, 2018

mehmedes commented Feb 14, 2018

martinpopel commented Feb 14, 2018

Setting max_length low makes BLEU unexpectedly worse #582

Setting max_length low makes BLEU unexpectedly worse #582

Comments

martinpopel commented Feb 13, 2018

noe commented Feb 14, 2018 • edited Loading

martinpopel commented Feb 14, 2018

mehmedes commented Feb 14, 2018

martinpopel commented Feb 14, 2018

noe commented Feb 14, 2018 •

edited

Loading