Skip to content

Compiler error /cuda/setup.py #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
puria-izady opened this issue Jan 19, 2019 · 9 comments · May be fixed by #67
Open

Compiler error /cuda/setup.py #27

puria-izady opened this issue Jan 19, 2019 · 9 comments · May be fixed by #67

Comments

@puria-izady
Copy link

puria-izady commented Jan 19, 2019

Hello,

the compilation of the setup.py in cpp is successful but, for /cuda/setup.py I get the following compile error. Therefore I would like to ask you, if you have an idea what my mistake could be.

Best regards

System:

  • OS: Ubuntu 18.04.1 LTS
  • PyTorch version: 1.0
  • How you installed PyTorch (conda, pip, source): conda
  • Python version: 3.6.8
  • CUDA/cuDNN version: 10.0
  • GPU models and configuration: GeForce GTX 1080 Ti
  • GCC version (if compiling from source): 7.3.0

Error log:

rrunning install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
gcc -pthread -B /pizady/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/pizady/anaconda3/include/python3.6m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from lltm_cuda.cpp:1:0:
/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]
 #warning \
  ^~~~~~~
/usr/local/cuda/bin/nvcc -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/pizady/anaconda3/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
lltm_cuda_kernel.cu(54): error: calling a __host__ function("std::fmax<double, float> ") from a __global__ function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel<float> ") is not allowed

lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code

2 errors detected in the compilation of "/tmp/tmpxft_00000f0c_00000000-6_lltm_cuda_kernel.cpp1.ii".
@goldsborough
Copy link
Contributor

I don't think you made any mistake.

So, for the warning:

Please include torch/extension.h

For the error, this has been asked a few times: https://github.com/pytorch/extension-cpp/issues?utf8=%E2%9C%93&q=is%3Aissue+fmax

I think the consensus was this is an environment error, and the best solution is to build PyTorch from source

@dedoogong
Copy link

dedoogong commented Apr 12, 2019

no, it is because of cuda API. No relevance to Pytorch.
just cast the second arg to (double). That's the best solution.

@ClementPinard
Copy link
Contributor

Got the same error here.

Ubuntu 16.04
Cuda 10.0
Pytorch 1.1.0a0+7e73783 (built from source)
python 3.7

although solution from #21 seems to work.
Discussion from #15 also hints that casting to scalar_t might actually be the thing to do if numbers are implicitely cast to double.

Normally i would add the (scalar_t) cast and move on, but I wanted to submit a PR (see #31) and cannot build on a clean workspace.

Any hints on what to do ? I actually could build before, (last summer) but since then, I updated my python version, along with cuda (and of course pytorch). I might try on a docker build to have a perfeclty clean install, but if the problem is common enough maybe we can add this cast on fmax (and fmin, everything to scalar_t is better than everything to double)

@ClementPinard
Copy link
Contributor

After some investigations, it seems related to gcc version. Originally tested it in gcc-7 but it didn't work. Changed to gcc-5 with a simple "update alternatives" and now it works.
pytorch was compiled from source with gcc-7.

Any idea what might have changed from gcc-5 to gcc-7 ?

@soumith
Copy link
Member

soumith commented May 5, 2019

I reproduced this on docker today, and fixed the issue with this commit 1031028

@soumith soumith closed this as completed May 5, 2019
@ClementPinard
Copy link
Contributor

ClementPinard commented May 6, 2019

Hi thanks for the commit ! unfortunately, I believe the fminfand fmaxf is implicitely casting everything to float32. As a consequence, the check.py and grad_check.py are now broken with cuda, because the precision is not sufficient for float64 tensors.
Example output :

python check.py forward -c

Forward: Baseline (Python) vs. C++ ... Ok
Forward: Baseline (Python) vs. CUDA ... Traceback (most recent call last):
  File "check.py", line 104, in <module>
    check_forward(variables, options.cuda, options.verbose)
  File "check.py", line 45, in check_forward
    check_equal(baseline_values, cuda_values, verbose)
  File "check.py", line 22, in check_equal
    np.testing.assert_allclose(x, y, err_msg="Index: {}".format(i))
  File "/home/cpinard/anaconda3/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 1452, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/home/cpinard/anaconda3/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 789, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0
Index: 0
(mismatch 13.333333333333329%)
 x: array([-1.206306e-04,  9.878260e-01, -2.557970e-01,  3.771263e-01,
       -1.863440e-01,  5.914125e-02,  6.415094e-01,  3.132478e-04,
        1.672588e-03, -4.412979e-03, -1.300380e-01, -7.609038e-01,
        5.438342e-01,  6.241342e-02, -3.342839e-01])
 y: array([-1.206305e-04,  9.878260e-01, -2.557970e-01,  3.771263e-01,
       -1.863440e-01,  5.914125e-02,  6.415094e-01,  3.132469e-04,
        1.672588e-03, -4.412979e-03, -1.300380e-01, -7.609038e-01,
        5.438342e-01,  6.241342e-02, -3.342839e-01])

@soumith
Copy link
Member

soumith commented May 7, 2019

whoops, this is my bad. let me re-setup the environment and see what I can do about this.

@soumith soumith reopened this May 7, 2019
@stevewongv
Copy link

@soumith Hi Soumith, do you find the solution for this precision problem? I met this problem in my C++ extension, too.

@saedrna
Copy link

saedrna commented Mar 8, 2023

I also encountered a similar problem. After deleting some paths in the PATH variable that I felt might cause conflicts, I was able to solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants