Skip to content

Consolidate docker availability build logic #52548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 21, 2020

Conversation

mark-vieira
Copy link
Contributor

@mark-vieira mark-vieira commented Feb 20, 2020

This pull request consolidates and standardizes the mechanisms by which we determine the availability of a compatible Docker installation on the build host, as well as conditionally avoid certain tasks based on that availability. The latter has been refactored such that it is always deferred to task execution or graph calculation time so that in the majority case, when a build does not request any docker-related tasks, we don't make any such attempt to determine if Docker exists on the system.

Prior to this refactoring we have 3 main sources of docker availability logic:

  1. The logic in DockerUtils which is quite robust and actually checks that Docker exists, can be executed with privileged commands and meets a minimum version requirement. This was used only for the purpose of determining whether we could build the Docker distributions.
  2. Some logic in BuildPlugin for throwing an exception if any task declared it required a Docker installation meeting the requirements defined in (1). Again, only the :distribution:docker project leveraged this.
  3. Logic in DistroTestPlugin to determine whether or not Docker distribution tests could be executed. This used a different, and more simple implementation than DockerUtils, only assuming Windows never works, Mac always works, and Linux works, unless it's a variant that has been explicitly blacklisted in CI.
  4. The last bit existed in TestFixturesPlugins for the purposes of determining whether docker-compose exists on the system. This was another relatively simple implementation, simply looking for the executable from a known list of locations.

This pull request solves some of the inconsistencies listed above via:

  1. The introduction of a single DockerSupportService that is used in all instances where a determination on whether a compatible Docker installation is needed. This includes building our Docker images, testing them and spinning up Docker-based test fixtures.
  2. Some changes to ElasticsearchDistribution such that we can depend on a "lenient" distribution. If a distribution sets required = false and Docker is unavailable we simply skip building it. This allows testing that uses a locally build Docker distribution to be gracefully skipped in the absence of a Docker installation. This was previously done by checking for Docker availability at configuration time and omitting task dependencies (or entire tasks altogether) to avoid trying to build the Docker images.
  3. Refactoring of any direct dependencies on Docker build tasks to use the DistributionDownloadPlugin and lenient distributions described in (2).

@mark-vieira mark-vieira added the :Delivery/Build Build or test infrastructure label Feb 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Build)

throwDockerRequiredException(message);
}

private boolean isBlacklistedOs() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should stay away from terms like "blacklist" and "whitelist".

Suggested change
private boolean isBlacklistedOs() {
private boolean isExcludedOs() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've renamed this.

flavor = distroFlavor
type = 'docker'
version = VersionProperties.getElasticsearch()
required = false // This ensures we skip this testing if Docker is unavailable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only skip testing? Docker is required for a successful overall build on supported OSs, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips building the upstream docker images that are consumed by the tests. I realize now that comment isn't great, I'll improve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment text to make it clear this simply makes building the images optional. The logic for actually skipping the tests themselves lives in TestFixturesPlugin.

Copy link
Contributor

@pugnascotia pugnascotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing I'd like to see changed and one question, but LGTM. This approach looks a lot tidier.

@mark-vieira
Copy link
Contributor Author

@pugnascotia could you provide some context on the necessity of .ci/dockerOnLinuxExclusions? Are those machines with Docker installed but something is just borked with it? Would using your logic to actually run a command, test the version, etc be enough to rule out "incompatible" agents or do we still need to explicit exclusions?

Signed-off-by: Mark Vieira <[email protected]>
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Just a few comments in addition to the terminology change of "required" to "failifUnavailable" we discussed offline.

import java.util.stream.Collectors;

/**
* <p>Plugin providing {@link DockerSupportService} for detecting Docker installations and determining requirements for Docker-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This shouldn't start with <p>, as the first line of a javadoc is the short description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

* <p>Plugin providing {@link DockerSupportService} for detecting Docker installations and determining requirements for Docker-based
* Elasticsearch build tasks.</p>
*
* <p>Additionally registers a task graph listener used to assert a compatible Docker installation exists when task requiring Docker are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is whacky, but <p> should basically be used as a paragraph break, without a closing tag, in javadocs. From oracle docs (https://www.oracle.com/technetwork/articles/java/index-137868.html):

If you have more than one paragraph in the doc comment, separate the paragraphs with a

paragraph tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid javadocs. Done.

project.getGradle().getTaskGraph().whenReady(graph -> {
List<String> dockerTasks = graph.getAllTasks().stream().filter(task -> {
ExtraPropertiesExtension ext = task.getExtensions().getExtraProperties();
return ext.has(REQUIRES_DOCKER_ATTRIBUTE) && ext.get(REQUIRES_DOCKER_ATTRIBUTE).equals("true");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting the property to a boolean string instead of an actual boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were indeed setting to a boolean but were checking for a String. I've fixed this.

*
* @throws GradleException if Docker is not available. The exception message gives the reason.
*/
void assertDockerIsAvailable(List<String> tasks) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we are using asserts, maybe call use check or ensure terminology?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.


private void throwDockerRequiredException(final String message, Exception e) {
throw new GradleException(
message + "\nyou can address this by attending to the reported issue, " + "removing the offending tasks from being executed.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing -> or removing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded this message.

return project.getTasks().register(destructiveDistroTestTaskName(distribution), Test.class, t -> {
// Disable Docker distribution tests unless a Docker installation is available
t.onlyIf(t2 -> distribution.getType() != Type.DOCKER || dockerSupport.get().getDockerAvailability().isAvailable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why onlyIf here instead of using the requiresDocker property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, using requiresDocker is strict. It fails the build if docker is unavailable. We want to simply skip that task in that scenario.

Signed-off-by: Mark Vieira <[email protected]>
Signed-off-by: Mark Vieira <[email protected]>
@mark-vieira
Copy link
Contributor Author

@rjernst I believe I've addressed you feedback.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -270,6 +276,11 @@ void finalizeValues() {
"platform not allowed for elasticsearch distribution [" + name + "] of type [" + getType() + "]"
);
}
if (getType() == Type.DOCKER && bundledJdk.isPresent()) {
throw new IllegalArgumentException(
"bundledJdk not allowed for elasticsearch distribution [" + name + "] of type [docker]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundledJdk setting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or property or something like that? as it is written it sounds like docker can't use the bundledJdk, when in face it only uses it.

@pugnascotia
Copy link
Contributor

@pugnascotia could you provide some context on the necessity of .ci/dockerOnLinuxExclusions? Are those machines with Docker installed but something is just borked with it? Would using your logic to actually run a command, test the version, etc be enough to rule out "incompatible" agents or do we still need to explicit exclusions?

There were / are some CI machines that can't run Docker for better or worse reasons. I settled on assuming Docker is available unless we know otherwise. I didn't want to rely on detection in case, say, a Packer build went wrong and failed to include Docker or broke it.

@mark-vieira
Copy link
Contributor Author

I didn't want to rely on detection in case, say, a Packer build went wrong and failed to include Docker or broke it.

Right, I guess my thought is that now that we look for Docker, and try and execute it, we could just rely on that logic for the most part. The risk there though is that should all our Packer builds suddenly stop installing Docker, we'd simply start silently skipping all these tests.

@rjernst
Copy link
Member

rjernst commented Feb 20, 2020

I would rather keep the exclusion list until we can expect docker on all CI images at some future time. We should expect docker to work, and error if it doesn't, except in these edge cases.

@mark-vieira
Copy link
Contributor Author

I would rather keep the exclusion list until we can expect docker on all CI images at some future time. We should expect docker to work, and error if it doesn't, except in these edge cases.

Works for me. It's not particularly costly to maintain the status quo.

Signed-off-by: Mark Vieira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team v7.6.1 v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants