Skip to content

Latest PyTorch version known to work... #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fabio-costa-movile opened this issue Oct 31, 2019 · 3 comments
Closed

Latest PyTorch version known to work... #173

fabio-costa-movile opened this issue Oct 31, 2019 · 3 comments
Labels

Comments

@fabio-costa-movile
Copy link

Hi!

I am trying to train a model with LJSpeech data using default preset but failling when using GPU in the latest PyTorch version.

For PyTorch 1.3:

I had to make some fixes to my drivers so running

torch.cuda.is_available()

in my Python CLI gives me True. I used the following setup with conda:

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

I had also to fix modules.py get_mask_from_lengths function to make it work and stop warning. The function became:

def get_mask_from_lengths(memory, memory_lengths):
    """Get mask tensor from list of length
    Args:
        memory: (batch, max_time, dim)
        memory_lengths: array like
    """
    mask = memory.data.new(memory.size(0), memory.size(1)).bool().zero_()
    for idx, l in enumerate(memory_lengths):
        mask[idx][:l] = 1
    return mask^1

However I could not train the model due to "RuntimeError: Caught RuntimeError in pin memory thread for device 0." error

For PyTorch 1.2:

I used

conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

But could not train due to "RuntimeError: reduce failed to synchronize: device-side assert triggered"

For PyTorch 1.1:

I did the setup using

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

So far the training is running...

Cheers!

@r9y9 r9y9 added the bug label Nov 1, 2019
@tripzero
Copy link
Contributor

tripzero commented Nov 7, 2019

@fabio-costa-movile can you submit your return "mask^1" fix as a pull request?

@r9y9
Copy link
Owner

r9y9 commented Dec 21, 2019

I got to work on fixing the code for pytorch 1.3. I have at least fixed the issue:

However I could not train the model due to "RuntimeError: Caught RuntimeError in pin memory thread for device 0." error

Sorry for the inconvenience.

@r9y9
Copy link
Owner

r9y9 commented Dec 21, 2019

The mask creation part is also fixed. It should work on pytorch >= 1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants