how to reproduce the result of "attention is all you need"? #691

DC-Swind · 2018-04-08T08:29:53Z

Description

I want to use transformer to reproduce the result of EN-DE 4.5M dataset which is used in the paper "attention is all you need". But I can't find any guideline.

What I need:

How to run the transformer?
There're some examples which use "t2t_trainner", but EN-DE 4.5M dataset is not in the problems list.
How to feed the EN-DE 4.5M dataset into the model?
I just have 4.5M EN-DE sentence pairs, how to produce "target_space_id" or other features to the model? How to initialize the embedding matrix?
The explaination of the inputs/outputs of the function in transformer is not clear. e.g. The explaination of "target_space_id" is "A Tensor". But I want to know more about these inputs/outputs, how could I find them?

Is there any guideline for reproducing the result of the paper or just explaining how to use transformer to train a model on a dataset which only contains sentence pairs?

TensorFlow and tensor2tensor versions

Tensorflow 1.4, tensor2tensor 1.5.6

martinpopel · 2018-04-08T09:37:03Z

But I can't find any guideline.

It is here https://github.com/tensorflow/tensor2tensor#walkthrough
You should get BLEU>20 on a single GPU, depending on the batch_size you can fit into your GPU and how long you let it train (--train_steps).
To get better results (replicate the paper or get even better results), you should use 8 GPUs and batch size 4096.
For further discussion of the issues with replicating the en-de results see #317 (and close this issue) and recent Gitter discussion.

but EN-DE 4.5M dataset is not in the problems list.

It is. translate_ende_wmt32k

How to feed the EN-DE 4.5M dataset into the model?

Just follow the Walkthrough.

The input and target space ids were important for multi-task (multi-problem) training, which is now broken. You can ignore it.

DC-Swind · 2018-04-08T11:30:47Z

Thank you for your reply.

DC-Swind closed this as completed Apr 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to reproduce the result of "attention is all you need"? #691

how to reproduce the result of "attention is all you need"? #691

DC-Swind commented Apr 8, 2018 •

edited

Loading

martinpopel commented Apr 8, 2018

DC-Swind commented Apr 8, 2018

how to reproduce the result of "attention is all you need"? #691

how to reproduce the result of "attention is all you need"? #691

Comments

DC-Swind commented Apr 8, 2018 • edited Loading

Description

TensorFlow and tensor2tensor versions

martinpopel commented Apr 8, 2018

DC-Swind commented Apr 8, 2018

DC-Swind commented Apr 8, 2018 •

edited

Loading