Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

how to reproduce the result of "attention is all you need"? #691

Closed
DC-Swind opened this issue Apr 8, 2018 · 2 comments
Closed

how to reproduce the result of "attention is all you need"? #691

DC-Swind opened this issue Apr 8, 2018 · 2 comments

Comments

@DC-Swind
Copy link

DC-Swind commented Apr 8, 2018

Description

I want to use transformer to reproduce the result of EN-DE 4.5M dataset which is used in the paper "attention is all you need". But I can't find any guideline.

What I need:

  1. How to run the transformer?
    There're some examples which use "t2t_trainner", but EN-DE 4.5M dataset is not in the problems list.
  2. How to feed the EN-DE 4.5M dataset into the model?
    I just have 4.5M EN-DE sentence pairs, how to produce "target_space_id" or other features to the model? How to initialize the embedding matrix?
  3. The explaination of the inputs/outputs of the function in transformer is not clear. e.g. The explaination of "target_space_id" is "A Tensor". But I want to know more about these inputs/outputs, how could I find them?

Is there any guideline for reproducing the result of the paper or just explaining how to use transformer to train a model on a dataset which only contains sentence pairs?

TensorFlow and tensor2tensor versions

Tensorflow 1.4, tensor2tensor 1.5.6

@martinpopel
Copy link
Contributor

But I can't find any guideline.

It is here https://github.com/tensorflow/tensor2tensor#walkthrough
You should get BLEU>20 on a single GPU, depending on the batch_size you can fit into your GPU and how long you let it train (--train_steps).
To get better results (replicate the paper or get even better results), you should use 8 GPUs and batch size 4096.
For further discussion of the issues with replicating the en-de results see #317 (and close this issue) and recent Gitter discussion.

but EN-DE 4.5M dataset is not in the problems list.

It is. translate_ende_wmt32k

How to feed the EN-DE 4.5M dataset into the model?

Just follow the Walkthrough.

The input and target space ids were important for multi-task (multi-problem) training, which is now broken. You can ignore it.

@DC-Swind
Copy link
Author

DC-Swind commented Apr 8, 2018

Thank you for your reply.

@DC-Swind DC-Swind closed this as completed Apr 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants