Skip to content

OOMKilled during init-persistent-home on a devworkspace image with lots of files in /home/tooling #1404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
isuftin opened this issue Mar 19, 2025 · 4 comments
Assignees

Comments

@isuftin
Copy link

isuftin commented Mar 19, 2025

Description

As we grow our development team's custom UDI, we're beginning to also grow /home/tooling. Currently, /home/tooling is about 1.1G with lots of small files (this includes a virtualenv with a bunch of libraries installed and nvm with AWS CDK and some other tooling installed).

When attempting to deploy a new workspace with this image, the init-persistent-home container keeps getting OOMKilled.

The initContainers spec on the cluster has a memoryLimit of 128Mi and a requests memoryLimit of 64Mi. This is not something we set and I don't believe this can be set.

stow is known to eat up a large amount of memory when it works against a bunch of tiny files.

But being that we can't set the initContainer's memory limit from the CheCluster config or from a devfile, is there something that can be done to alleviate this?

How To Reproduce

Unfortunately I don't have an exact recipe for creating a large enough developer image for testing. We install:

  • NVM w/ aws-cdk, aws-cdk-libs, cdk-nag, jest, Yarn
  • Python 3.11 w/ Virtualenv and:
    • aws-cdk-lib
    • aws-sam-cli
    • boto3
    • cdk-nag
    • cfn-lint
    • cloudformation-cli
    • cloudformation-cli-python-plugin
    • cloudformation-cli-typescript-plugin
    • s3cmd
    • ansible-builder
    • ansible-core
    • ansible-creator
    • ansible-lint
    • ansible-navigator
    • ansible-sign
    • ansible<10.0.0
    • awxkit
    • cryptography
    • flaky
    • hvac[parser]
    • molecule
    • molecule-plugins
    • pytest-ansible
    • pytest-instafail
    • pytest-testinfra
    • pytest-xdist
    • pywinrm
    • testinfra
    • tox-ansible
    • yamllint

With Ansible, we also use ansiible-galaxy to install Ansible collections:

  • amazon.aws
  • ansible.posix
  • ansible.scm
  • community.aws
  • community.crypto

Expected behavior

The init-persistent-home container runs without being OOMKilled

Additional context

We run this via OpenShift DevSpaces Operator 3.19.0 and DevWorkspace operator 0.32.1

Example of error output:

  initContainerStatuses:
    - restartCount: 1
      started: false
      ready: false
      name: init-persistent-home
      state:
        terminated:
          exitCode: 137
          reason: OOMKilled
          startedAt: '2025-03-19T18:31:39Z'
          finishedAt: '2025-03-19T18:32:01Z'
          containerID: 'cri-o://2348815301cc156799b88f4fd431b59aa9391b8f669a23da8deae7061b90ac57'
      imageID: 'our.internal.registry/developer-environment@sha256:93057107ee797123795016fbd0872d31fb097eb16c836a5d9984a87144d3eea7'
      image: 'our.internal.registry/developer-environment:1'
      lastState:
        terminated:
          exitCode: 137
          reason: OOMKilled
          startedAt: '2025-03-19T18:31:15Z'
          finishedAt: '2025-03-19T18:31:37Z'
          containerID: 'cri-o://709bc9f23b125e23f146c7ffa07b31697f2ae9d1254910959c3c86c5ee038afc'
      containerID: 'cri-o://2348815301cc156799b88f4fd431b59aa9391b8f669a23da8deae7061b90ac57'
    - name: che-code-injector
@isuftin isuftin changed the title OOMKilled during init-persistent-home on a devworkspace image with lots of files /home/tooling OOMKilled during init-persistent-home on a devworkspace image with lots of files in /home/tooling Mar 19, 2025
@akurinnoy akurinnoy moved this to 🚧 In Progress in Eclipse Che Team B Backlog Mar 31, 2025
@dkwon17
Copy link
Collaborator

dkwon17 commented Mar 31, 2025

Hello @isuftin thank you for reporting this issue. For the time being could you try setting / creating:

kind: DevWorkspaceOperatorConfig
apiVersion: controller.devfile.io/v1alpha1
metadata:
  name: devworkspace-operator-config
  namespace: <operator install namespace>
config:
  workspace:
    defaultContainerResources:
      limits:
        memory: <new memory limit>

in your cluster?

@dkwon17 dkwon17 closed this as completed Mar 31, 2025
@github-project-automation github-project-automation bot moved this from 🚧 In Progress to ✅ Done in Eclipse Che Team B Backlog Mar 31, 2025
@dkwon17 dkwon17 reopened this Mar 31, 2025
@dkwon17 dkwon17 moved this from ✅ Done to 🚧 In Progress in Eclipse Che Team B Backlog Mar 31, 2025
@isuftin
Copy link
Author

isuftin commented Apr 1, 2025

@dkwon17 - We hacked that configuration in to a live deployment and it seems to have done the trick. Our issue is that we are deploying this in OpenShift through ArgoCD and after doing a bit of research, I don't see a way to set this as part of an overall deployment override. In our deployment, the DWOC is a child of CheCluster and with that, I can't find a way in their API to coax this during a redeploy.

@isuftin
Copy link
Author

isuftin commented Apr 2, 2025

@dkwon17 - Figured it out. I had to match the name CheCluster deploys the DWOC under, which is devworkspace-config. Once i got that, it looks like the memory limit I set is being respected by initContainers.

@dkwon17
Copy link
Collaborator

dkwon17 commented Apr 15, 2025

Thank you for confirming @isuftin. Please reopen the issue if you'd like some further improvement for editing the init container memory limit

@dkwon17 dkwon17 closed this as completed Apr 15, 2025
@github-project-automation github-project-automation bot moved this from 🚧 In Progress to ✅ Done in Eclipse Che Team B Backlog Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants