Skip to content

CI skips some of the SocketCAN unit tests #1479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lumagi opened this issue Jan 6, 2023 · 13 comments
Closed

CI skips some of the SocketCAN unit tests #1479

lumagi opened this issue Jan 6, 2023 · 13 comments
Labels

Comments

@lumagi
Copy link
Collaborator

lumagi commented Jan 6, 2023

Describe the bug

The SocketCAN tests are only executed when the environment variable TEST_SOCKETCAN is present. The variable was likely introduced in the Travis CI environment and passed into test execution via tox. I noticed that it is no longer present in the current GitHub Workflow CI environment. This causes several of the SocketCAN tests to be skipped.

Updating the pipeline would include two things:

  • Executing test/open_vcan.sh on Linux machines
  • Adding the environment variable TEST_SOCKETCAN or removing it and simply executing these tests for all Linux systems

I know that workflow scripts have root permission on the GitHub runners. But I wasn't able to check if they also have capability NET_ADMIN to open vcan interfaces. Additionally, I wasn't sure if I could and didn't want to open a PR and mess with the workflow pipeline before opening an issue first. @zariiii9003 was the variable removed on purpose? If not, I can volunteer a patch, but I wanted to ask for some feedback first.

@lumagi lumagi added the bug label Jan 6, 2023
@zariiii9003
Copy link
Collaborator

I'm no linux expert, but i think the problem is, that the github actions runner images do not contain the vcan module

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 8, 2023

You are correct. I did some testing, and Microsoft does not ship the vcan module with their Azure kernels. And it doesn't look like this is something they would like to change.
I did some tests and it's possible to compile the module manually. But this takes approx. 2 minutes. I guess the only option that would make sense would be to have a job that runs nightly and recompiles the module and pushes it into the repository cache. Then, the runners can pull the module from the cache and load it to open the vcan interfaces.
I set this up in a test repo: https://github.com/lumagi/gh_workflow_tests/tree/main/.github/workflows
But tbh, it's not entirely straight forward :-/ It relies on the fact that the kernel source code contained in the linux-source package is synced to the kernel version of the runner. For ubuntu-latest this seems to be the case because Azure uses version 5.15.0 which is the same as the version used by Ubuntu 22.04. If we were to use something else like Ubuntu 20.04 for the runner, it seems like Microsoft still runs kernel version 5.15.0 although Ubuntu 20.04 normally runs with 5.4.0.

@zariiii9003
Copy link
Collaborator

Is the repository cache reliable? Can we assume, that it will not be cleaned randomly?

The windows pypy tests currently take ~5min.
If it's only 2 minutes we could probably integrate it into our main CI workflow. Maybe via shell script? We already have one in test/open_vcan.sh.

CC @hartkopp

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 8, 2023

There are two possible scenarios:

  • Rebuild the module with every test invocation: no cache needed, +2.5m execution time
  • Check current kernel version periodically and if necessary rebuild the module, store it in the cache: cache needed, test execution time won't increase

The former can be done by a periodic workflow trigger to run once a day.

The following is from the GitHub doc:

GitHub will remove any cache entries that have not been accessed in over 7 days. There is no limit on the number of caches you can store, but the total size of all caches in a repository is limited to 10 GB.

So I think the cache shouldn't be an issue. The workflow that I wrote is is basically a shell script anyways. So this could easily be converted.

But we need to be aware that this is likely going to be more maintenance intensive since the kernel module compilation has a lot of moving parts. I would also like to hear what @hartkopp has to say about this.

@hartkopp
Copy link
Collaborator

hartkopp commented Jan 8, 2023

That's really bad that Microsoft does not ship the vcan driver. What can you do with CAN network layer support without a CAN driver?!? :-D

Looking into @lumagi 's workflow script I wonder if we could speed up the kernel download/unpack/compile process by simply picking the correct Ubuntu modules from the Ubuntu kernel repositories at https://kernel.ubuntu.com/

E.g. in https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.82/ (replace v5.15.82 with the version from uname) there are the Linux kernel modules:

wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.82/amd64/linux-modules-5.15.82-051582-generic_5.15.82-051582.202212080647_amd64.deb

Then unpack the deb package with

mkdir x && dpkg-deb -x linux-modules-5.15.82-051582-generic_5.15.82-051582.202212080647_amd64.deb

And then the vcan.ko file can be found in x/lib/modules/5.15.82-051582-generic/kernel/drivers/net/can/vcan.ko

@hartkopp
Copy link
Collaborator

hartkopp commented Jan 8, 2023

Just checked where to find Azure Kernel packages.

https://packages.ubuntu.com/focal/amd64/linux-modules-extra-5.15.0-1029-azure/download

http://security.ubuntu.com/ubuntu/pool/main/l/linux-azure-5.15/linux-modules-extra-5.15.0-1029-azure_5.15.0-1029.36~20.04.1_amd64.deb

The question is, whether you could simply install the linux-modules-extra package which definitely contains the vcan.ko module.

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 8, 2023

You are correct, the module is contained in the package I am able to bring up CAN interfaces afterwards. That is definitely the simplest solution. In essence, the following should suffice:

sudo apt-get -y install linux-modules-extra-$(uname -r)
sudo modprobe vcan

@zariiii9003
Copy link
Collaborator

I can confirm, that it works: https://github.com/zariiii9003/python-can/actions/runs/3876181524/jobs/6609703648
Some tests are failing though.

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 9, 2023

Oh, it's interesting that PyPy seems to have a problem with socket.bind for CAN. Did the tests previously run with PyPy on Travis?

Edit:
It seems that this is a known issue for PyPy: https://foss.heptapod.net/pypy/pypy/-/issues/3809

@zariiii9003
Copy link
Collaborator

zariiii9003 commented Jan 10, 2023

I can't see the logs anymore, but it looks like the pypy tests ran without socketcan:
image

If PyPy does not support it, we can run the tests on cpython only.
@lumagi PR is coming, right? 😄

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 10, 2023

Yes, I faintly remember something there :D I will write a PR over the next couple of days.

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 10, 2023

Et voilà, work has begun #1484. I still need to investigate why that one test fails for CPython, though.

@lumagi
Copy link
Collaborator Author

lumagi commented Jan 14, 2023

Closed with PR #1484.

@lumagi lumagi closed this as completed Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants