Skip to content

Honor masking of systemd-sysctl.service #24234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2017

Conversation

zeha
Copy link
Contributor

@zeha zeha commented Apr 21, 2017

During package install on systemd-based systems, some sysctl settings
should be set (e.g. vm.max_map_count).

In some environments, changing sysctl settings plainly does not work;
previously a global environment variable named ES_SKIP_SET_KERNEL_PARAMETERS
was introduced to skip calling sysctl, but this causes trouble for:

  • configuration management systems, which usually cannot apply an env
    var when running a package manager
  • package upgrades, which will not have the env var set any more, and
    thus leaving the package management system in a bad state (possibly
    half-way upgraded, can be very hard to recover)

This removes the env var again and instead of calling systemd-sysctl manually,
tells systemd to restart the wrapper unit - which itself can be masked by
system administrators or management tools if it is known that sysctl does
not work in a given environment.
The restart is not silent on systems in their default configuration, but
is ignored if the unit is masked.

Related: #21899, voxpupuli/puppet-elasticsearch#806

During package install on systemd-based systems, some sysctl settings
should be set (e.g. vm.max_map_count).

In some environments, changing sysctl settings plainly does not work;
previously a global environment variable named ES_SKIP_SET_KERNEL_PARAMETERS
was introduced to skip calling sysctl, but this causes trouble for:
 * configuration management systems, which usually cannot apply an env
   var when running a package manager
 * package upgrades, which will not have the env var set any more, and
   thus leaving the package management system in a bad state (possibly
   half-way upgraded, can be very hard to recover)

This removes the env var again and instead of calling systemd-sysctl manually,
tells systemd to restart the wrapper unit - which itself can be masked by
system administrators or management tools if it is known that sysctl does
not work in a given environment.
The restart is not silent on systems in their default configuration, but
is ignored if the unit is masked.
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@clintongormley clintongormley added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >enhancement review labels Apr 25, 2017
* master: (619 commits)
  Add support for clear scroll to high level REST client (elastic#25038)
  Tiny correction in inner-hits.asciidoc (elastic#25066)
  Added release notes for 6.0.0-alpha2
  Expand index expressions against indices only when managing aliases (elastic#23997)
  Collapse inner hits rest test should not skip 5.x
  Settings: Fix secure settings by prefix (elastic#25064)
  add `exclude_keys` option to KeyValueProcessor (elastic#24876)
  Test: update missing body tests to run against versions >= 5.5.0
  Track EWMA[1] of task execution time in search threadpool executor
  Removes an invalid assert in resizing big arrays which does not always hold (resizing can result in a smaller size than the current size, while the assert attempted to verify the new size is always greater than the current).
  Fixed NPEs caused by requests without content. (elastic#23497)
  Plugins can register pre-configured char filters (elastic#25000)
  Build: Allow preserving shared dir (elastic#24962)
  Tests: Make secure settings available from settings builder for tests (elastic#25037)
  [TEST] Skip wildcard expansion test due to breaking change
  Test that gradle and Java version types match (elastic#24943)
  Include duplicate jar when jarhell check fails
  Change ScriptContexts to use needs instead of uses$. (elastic#25036)
  Change `has_child`, `has_parent` queries and `childen` aggregation to work with the new join field type and at the same time maintaining support for the `_parent` meta field type.
  Remove comma-separated feature parsing for GetIndicesAction
  ...
@jasontedor
Copy link
Member

In the course of doing my due diligence on this one, I've learned that we are violating packaging guidelines by allowing the installation scripts to exit with non-zero status code. In the bluntest possible terms:

All scriptlets MUST exit with the zero exit status.

Technically this is a breaking change so I will integrate this and add a note to the migration docs.

Thank you for this PR @zeha!

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jasontedor jasontedor merged commit c3ec6a1 into elastic:master Jun 6, 2017
jasontedor pushed a commit that referenced this pull request Jun 6, 2017
During package install on systemd-based systems, some sysctl settings
should be set (e.g. vm.max_map_count).

In some environments, changing sysctl settings plainly does not work;
previously a global environment variable named
ES_SKIP_SET_KERNEL_PARAMETERS was introduced to skip calling sysctl, but
this causes trouble for:
 - configuration management systems, which usually cannot apply an env
   var when running a package manager
 - package upgrades, which will not have the env var set any more, and
   thus leaving the package management system in a bad state (possibly
   half-way upgraded, can be very hard to recover)

This removes the env var again and instead of calling systemd-sysctl
manually, tells systemd to restart the wrapper unit - which itself can
be masked by system administrators or management tools if it is known
that sysctl does not work in a given environment.

The restart is not silent on systems in their default configuration, but
is ignored if the unit is masked.

Relates #24234
@jasontedor
Copy link
Member

I pushed 2f8ba3a to add a note to the migration docs.

@pdf
Copy link

pdf commented Jul 6, 2017

This breaks deb systems where the systemd-sysctl.service is masked, because packaging scripts are implicitly run with -e, so the non-zero exit code from the call to restart a masked service aborts package installation/upgrade.

echo "unrecognized value [$ES_SKIP_SET_KERNEL_PARAMETERS] for ES_SKIP_SET_KERNEL_PARAMETERS; must [false] (default) or [true]"
exit 1
if command -v systemctl > /dev/null; then
systemctl restart systemd-sysctl.service
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this call is meant to fail silently, it should be:

systemctl restart systemd-sysctl.service || true

@porbas
Copy link

porbas commented Jul 10, 2017

When trying to upgrade ES on ubuntu 16.04:

root@runner1-gitlab-prd:~# root@runner1-gitlab-prd:~# apt install -f
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up elasticsearch (5.5.0) ...
Failed to restart systemd-sysctl.service: Unit systemd-sysctl.service is masked.
dpkg: error processing package elasticsearch (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 elasticsearch
E: Sub-process /usr/bin/dpkg returned an error code (1)

change in postinst script, line 57 (add trailing "|| true"):

  systemctl restart systemd-sysctl.service || true

Now it works! :)

root@runner1-gitlab-prd:~# apt install -f
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up elasticsearch (5.5.0) ...
Failed to restart systemd-sysctl.service: Unit systemd-sysctl.service is masked.

@pdf
Copy link

pdf commented Jul 10, 2017

ping @jasontedor your currently live packages are borken.

@jasontedor
Copy link
Member

Sorry for this. I tested the PR on my workstation which runs Fedora 25 and it does not behave this way; it's only Debian based systems that see this problem. I opened #25657.

@juergenknaack
Copy link

juergenknaack commented Jul 12, 2017

Taking a look at #25657, i think adding /usr/lib/sysctl.d/elasticsearch.conf is breaking systemd-sysctl in the first place on systems, where this setting is not allowed/possible (e.g. container).
Masking the service as a consequence and working around it within elasticsearch installer may fix it for elasticsearch but still would leave it broken to others.
There has to be a way to avoid putting "vm.max_map_count=262144" into /usr/lib/sysctl.d/elasticsearch.conf. That would be the proper way for these cases.

@pdf
Copy link

pdf commented Jul 12, 2017

@juergenknaack not necessarily saying that this is a non-problem, but your unprivileged containers really should be masking systemd-sysctl.service by default, since it will never work, and other source may attempt to populate sysctl. This sounds more like a container infrastructure problem than an elasticsearch packaging problem.

@juergenknaack
Copy link

@pdf I see where you are coming from... If there is no reasonable usecase for systemd-sysctl in containers at all, masking would be an appropriate way. Agree to that.

@juergenknaack
Copy link

@pdf I rechecked our container environment (Virtuozzo). We use systemd-sysctl to enforce some settings (e.g. net.ipv4.ip_forward=1) and it's working for us. The additonal /usr/lib/sysctl.d/elasticsearch file breaks this.

@pdf
Copy link

pdf commented Jul 12, 2017

@juergenknaack looks like the way lxc handles this is to add start condition overrides to systemd-sysctl.service that check whether /proc/sys/net specifically is writable (rather than /proc/sys), then sets it to run systemd-sysctl with --prefix net added, to restrict it to only apply properties for net. This is absolutely out of scope for ES to fix - you should contact the VZ upstream or distro package maints.

NB: I'm not in any way affiliated with ES, they may decide to try and do something about it, but the above is my read.

@jasontedor
Copy link
Member

Thank you @pdf, you have it exactly right.

@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >enhancement Team:Delivery Meta label for Delivery team v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants