Skip to content

Updates to use amazon linux 2023 ami and reflect the actual RealMemory of the compute nodes #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: plugin-v2
Choose a base branch
from

Conversation

okram999
Copy link

@okram999 okram999 commented Apr 28, 2025

Issue #, if available:

Update the quickstart template.yaml to use Amazon Linux 2023 and include realmemory in the partition.json

Description of changes:

  • changed AMI id's SSM parameters to point to AL2023
  • switch yum to dnf
  • Added RealMemory in the partitions.json to reflect the available memory of the compute nodes. If not specified, slurm failed to fetch the actual available memory.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@okram999 okram999 changed the title Updates to use amazon linux 2023 ami from amazon linux 2 Updates to use amazon linux 2023 ami and reflect the actual RealMemory of the compute nodes Apr 28, 2025
cd /home/ec2-user/slurm-*
/home/ec2-user/slurm-*/configure --prefix=/nfs/slurm
make -j 4
tar -xf slurm-*.tar.bz2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if SlurmPackageUrl is not .tar.bz2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bollig - if its changes the script will break. Its an upstream packaging decision that devs normally doesn't change in a whim. The archive extension appears to be consistently using .tar.bz2. https://download.schedmd.com/slurm/.

Now we can try to handle some of them but will not be full proof.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something ugly like this

wget -q ${SlurmPackageUrl}
# Extract based on file extension
if ls slurm-*.tar.gz >/dev/null 2>&1; then
    tar -xzf slurm-*.tar.gz
elif ls slurm-*.tar.bz2 >/dev/null 2>&1; then
    tar -xjf slurm-*.tar.bz2
elif ls slurm-*.tgz >/dev/null 2>&1; then
    tar -xzf slurm-*.tgz
elif ls slurm-*.tar >/dev/null 2>&1; then
    tar -xf slurm-*.tar
else
    echo "No recognized Slurm archive found"
    exit 1
fi

# Change to the extracted directory, excluding any archive files
cd "$(ls -d /home/ec2-user/slurm-* | grep -v -E '\.tar\.gz$|\.tar\.bz2$|\.tgz$|\.tar$')"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair. this is ok as is, just thinking about people who may roll/distribute their own patch-fixed version of slurm.

@@ -47,6 +47,12 @@ Parameters:
Default: 2
Description: Number of vCPUs for the compute node instance type

ComputeNodeMemory:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why statically define this? Why not auto-detect RealMemory and have users provide a SchedulableMemory percentage (see ParallelCluster for example)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchedulableMemory 'll be something that needs further research and for my next PR. :)

IFAIK, SchedulableMemory is a Slurm configuration that's shipped with AWS ParallelCluster. And we are not dealing with parallelcluster here. As for the auto-detect for RealMemory; we need an ec2 describe call for the instance specified. This will also need more broader change which is beyond the scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants