Skip to content

Support for Podman with CAPD #5146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stmcginnis opened this issue Aug 24, 2021 · 30 comments
Closed

Support for Podman with CAPD #5146

stmcginnis opened this issue Aug 24, 2021 · 30 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@stmcginnis
Copy link
Contributor

User Story

As a user I would like to use the CAPD provider on a system that has podman installed so I can use non-Docker runtimes.

Detailed Description

Raised by @mdbooth. We had previously used calls out to the docker CLI to perform container management in CAPD. For multiple reasons, we wanted to move away from CLI calls to using the API directly. Prior to this, podman could be aliased to docker to get things to work.

Since CAPD only guarantees compatibility with Docker today, this is considered an enhancement, even though it was something that could be worked around by the CLI in the past.

The changes to use the API were done in a way that it should be possible to provide other runtime interfaces. It should be possible to implement podman support by providing an implementation using the podman APIs directly.

Another possibility is the podman apiv2 work. Once implemented, this will provide a native API and a compatibility API that might be able to be used by pointing the docker client at the podman API interface.

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 24, 2021
@mdbooth
Copy link
Contributor

mdbooth commented Sep 2, 2021

It's possible to work round this using podman system service. I've tested this in CAPO and documented it here: kubernetes-sigs/cluster-api-provider-openstack#982

@fabriziopandini
Copy link
Member

I'm still wrapping up my mind around the problem, but I think there are many layer on this and we should break down the problem in smaller pieces.

One possible approach is to make different considerations for the E2E framework and CAPD.

CAPD:

  • uses a big portion/all of our recently introduced container.Runtime interface
  • it runs in a container, with the only requirement to get a docker socket in input
  • while recent changes modified the way we consume the docker socket (via API instead of using a copy the CLI embedded in the CAPD container), this does not change the requirement above and aliasing podman wasn't a viable solution even in the past.

I'm personally convinced that for CAPD the container.Runtime interface is a big step forward; we are already seeing improvements in error messages and we are now in a position that allows us to implement much more robust unit tests.

If we need to invest more work in this, IMO the next step is to use the CRI interface, while instead it feels to me that implementing support for podman or any other CLI will be a step back in the evolution of this tool that is crucial for our both our test signal and for the quick start UX.

The E2E framework:

  • uses a small portion of our recently introduced container.Runtime interface
  • it runs within the user local environment, and the recent changes switching the way we interact wit the container runtime from CLI calls to socket calls based on Docker API impacted on users aliasing podman
  • it can be used for providing clusters with CAPD as well without it (using other providers), like the user originally reporting the issue.

Also in this case I consider CRI support the best next option, given that it will embrace users with different runtime engines and be agnostic to the CLI tool they are using.

However, as a middle ground/temporary stop gap trying to help users , I will be also open to consider alternative solutions shelling out for a very specific use case:

  • test framework running locally on machines with docker CLI (or aliased podman) and executing tests involving infrastructure providers different than CAPD

Ideally this should be addressed by providing a new partial implementation of the container.Runtime, but this probably requires some more research if we decide to go though this path instead of investing in CRI support which probably is a better solution for this problem scope

@stmcginnis
Copy link
Contributor Author

Just to note: I am trying to verify the instructions being added here will work for general CAPD use. Running into some system configuration problems, but if I get by those and that works, it seems a reasonable workaround for now.

@mdbooth
Copy link
Contributor

mdbooth commented Sep 5, 2021

@stmcginnis I'd be interested in adding any practical configuration issues you hit to that documentation. Guessing they're Ubuntu related? If it's generally useful even in the interim you're also welcome to move it to CAPD, in which case I'll replace the CAPO document with a link.

This formed part of an unexpectedly frustrating experience for me. I'll be delighted to help others avoid repeating it!

@stmcginnis
Copy link
Contributor Author

Thanks @mdbooth, I think it would be great to have this documentation with CAPD.

In my case, I was running a default server deployment of RHEL8. It's quite possible I wasn't doing something right. I deployed the OS and it looks like podman was installed by default. I tried following the steps in your CAPO docs patch, but would get an error when trying to do anything using the docker.sock created.

I am going to deploy a fresh install, just to make sure. I'll make sure podman is working, then try to follow the steps. I'm also getting a Fedora system set up.

@mdbooth
Copy link
Contributor

mdbooth commented Sep 6, 2021

With RHEL 8 I also think you'll hit an issue with OverlayFS because the kernel isn't new enough. My first attempt was on CentOS 8 iirc and I hit that.

With podman system service you should see that it creates a unix socket when it starts up. I didn't need any configuration beyond that. Assuming you're not running it as root, you just need to delete the unix socket which was created by the podman-docker package and replace it with a symlink to your unprivileged socket.

@stmcginnis
Copy link
Contributor Author

I'm trying to follow the steps on a Fedora 34 system, and running into an issue even trying to spin up a kind cluster.

First I hit the "short-name" issue that caused the image pull to fail. I was able to get around that with setting short-name-mode="enforcing" based on the comments in kubernetes-sigs/kind#2186. That got past the image pull step. But then fails on the "Writing configuration" step with this:

ERROR: failed to create cluster: failed to generate kubeadm config content: failed to get kubernetes version from node: failed to get file: command "podman exec --privileged kind-control-plane cat /kind/version" failed with error: exit status 255
Command Output: Error: can only create exec sessions on running containers: container state improper

Any missing steps here @mdbooth?

@mdbooth
Copy link
Contributor

mdbooth commented Sep 8, 2021

I just reproduced kind create cluster on a fresh system. Here's everything I did:

  • Install Fedora 34 cloud image
  • # dnf update
  • # Create delegate.conf and iptables.conf with contents from https://kind.sigs.k8s.io/docs/user/rootless/
  • # Install podman-docker
  • # Reboot
  • $ Download kind binary and chmod
  • $ podman login docker.io
  • $ podman pull kindest/node:v1.21.1
    • Select docker.io from the list
  • $ ./kind create cluster
  • Profit

Pulling the image manually seems to work round setting short-name-mode. I must have forgotten that I did that.

Is there any chance that you're hitting docker.io rate limiting? I paid up to work round this, but perhaps we could host these images elsewhere, e.g. quay.io, for the benefit of others? This would be equally beneficial to users of docker.

@stmcginnis
Copy link
Contributor Author

stmcginnis commented Sep 8, 2021

Great, I got this to work. Here's what I did:

  • Install Fedora 34
  • sudo dnf update
  • Create delegate.conf and iptables.conf with contents from https://kind.sigs.k8s.io/docs/user/rootless/
  • Did NOT install podman-docker. Wanted to see what would happen without that. No problems, so might just need a note to conditionally delete the /var/run/docker.sock file if podman-docker is not installed.
  • sudo reboot
  • Download kind binary and chmod
  • podman pull kindest/node:v1.21.1
    • Select docker.io from the list
  • tmux -c "podman system service -t 0
  • sudo ln -s /run/user/$(id -u)/podman/podman.sock /var/run/docker.sock
  • Follow Docker instructions in https://cluster-api.sigs.k8s.io/user/quick-start.html to mount /var/run/docker.sock and create a cluster

We would probably need a note about this only working on newer Red Hat based distros since there is the issue with RHEL 8. Or at least some kind of disclaimer that this is a workaround that has worked in some environments, but it's not officially supported.

I think it would be great to get this added to our docs somewhere. I'm not sure where the best place would be for that though. Maybe under reference? Possibly with a link from a note in the quickstart guide? Any thoughts on this @fabriziopandini?

@mdbooth
Copy link
Contributor

mdbooth commented Sep 8, 2021

Ah, yes. So for just creating a kind cluster and using it via clusterctl you don't need podman-docker because kind has podman support. You don't even need to create the docker.sock, because it won't be used.

However, the test framework explicitly connects to a running docker agent to push locally-built images into the kind cluster (I assume that's what it's doing?), so you do need it to run the E2E tests.

@stmcginnis
Copy link
Contributor Author

I should clarify, I followed the quick start to create a CAPD cluster. So it uses the same mechanism as the e2e framework.

@mdbooth
Copy link
Contributor

mdbooth commented Sep 9, 2021

Are you sure? I was able to start a cluster without podman system service following the quick start guide a few weeks ago, which is why I was surprised that the E2E tests didn't work. I described the problem in #4380 (comment): code in cluster-api/test/framework/bootstrap/kind_util.go explicitly connects to a docker agent. I wouldn't expect us to call this outside of a test suite.

I wonder if I'm missing something important because I don't know what the D in CAPD stands for 🤔

@sbueringer
Copy link
Member

sbueringer commented Sep 9, 2021

D stands for Docker :)

But I agree the save func is different to the one in kind (which uses os.Exec). Apart from that I don't think we try to load images into kind in the quickstart.

We don't have to, as the quickstart uses released images. The CAPD-based e2e tests in the cluster-api repo use locally built images and thus has to load them into kind as configured in: https://github.com/kubernetes-sigs/cluster-api/blob/master/test/e2e/config/docker.yaml#L12-L19

@mdbooth
Copy link
Contributor

mdbooth commented Sep 9, 2021

D stands for Docker :)

Suddenly it becomes clear 😂

But I agree the save func is different to the one in kind (which uses os.Exec). Apart from that I don't think we try to load images into kind in the quickstart.

We don't have to, as the quickstart uses released images. The CAPD-based e2e tests in the cluster-api repo use locally built images and thus has to load them into kind as configured in: https://github.com/kubernetes-sigs/cluster-api/blob/master/test/e2e/config/docker.yaml#L12-L19

This was my understanding.

@vincepri
Copy link
Member

/milestone Next

@stmcginnis
Copy link
Contributor Author

I spent a little time on this to see what it would look like to have podman support with the new runtime layer. I could not get everything working right with just manual testing. I think we may want to support podman for the e2e tests and the initial "setup" parts, but then capd-controller runtime operations to stay with using docker inside the container.

POC code that is not fully working can be found here: https://github.com/stmcginnis/cluster-api/blob/capp/test/infrastructure/container/podman.go

On Fedora, I needed to install podman and podman-docker.

I am able to get a container running that can access the mounted docker.sock, but there are permissions issues unless run with sudo. It also does not appear to return everything if I do a podman ps:

[smcginnis@fedora ~]$ podman ps
CONTAINER ID  IMAGE                           COMMAND               CREATED      STATUS             PORTS       NAMES
02cdda6de05f  docker.io/library/nginx:latest  nginx -g daemon o...  3 hours ago  Up 13 minutes ago              beautiful_galileo
[smcginnis@fedora ~]$ sudo podman run -v /run:/run --privileged --security-opt label=disable quay.io/podman/stable podman --remote ps
CONTAINER ID  IMAGE                         COMMAND               CREATED                 STATUS                     PORTS       NAMES
44cedf65cab8  quay.io/podman/stable:latest  podman --remote p...  Less than a second ago  Up Less than a second ago              sweet_pare

Just adding these notes here in case it helps anyone else looking into this and if anyone wants to pick up this work.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 12, 2022
@stmcginnis
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2022
@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the Next milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 28, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2022
@vincepri
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@vincepri: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Aug 21, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 21, 2023
@vincepri
Copy link
Member

vincepri commented Aug 21, 2023

What's the latest regarding CAPD + podman support?

@stmcginnis @mdbooth ?

@vincepri vincepri removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 21, 2023
@vincepri
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 21, 2023
@stmcginnis
Copy link
Contributor Author

What's the latest regarding CAPD + podman support?

I think if the podman-docker package is installed, it should work. That provided the docker.sock configuration so the API client can communicate with podman via expected API interface.

That said, I have not tested lately, and I am sure there are all kinds of corner cases with rootless and other permission differences. If someone is interested, it would be great to get some real world testing done and report back the results here so we can start to understand what some of those corner cases might be.

@vincepri
Copy link
Member

vincepri commented Nov 8, 2023

Closing this, no activity

/close

@k8s-ci-robot
Copy link
Contributor

@vincepri: Closing this issue.

In response to this:

Closing this, no activity

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants