kdc vagrant fixture time out while booting #39977

andyb-elastic · 2019-03-12T19:50:58Z

This has happened in a couple recent builds, it may be another more general issue with vagrant or this image but I'm not sure

Looks like

  config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false
==> krb5kdc: Clearing any previously set network interfaces...
==> krb5kdc: Preparing network interfaces based on configuration...
    krb5kdc: Adapter 1: nat
==> krb5kdc: Forwarding ports...
    krb5kdc: 88 (guest) => 60088 (host) (adapter 1)
    krb5kdc: 88 (guest) => 60088 (host) (adapter 1)
    krb5kdc: 22 (guest) => 2222 (host) (adapter 1)
==> krb5kdc: Booting VM...
==> krb5kdc: Waiting for machine to boot. This may take a few minutes...
    krb5kdc: SSH address: 127.0.0.1:2222
    krb5kdc: SSH username: vagrant
    krb5kdc: SSH auth method: private key
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within

> Task :plugins:repository-hdfs:krb5kdcFixture FAILED
:plugins:repository-hdfs:krb5kdcFixture (Thread[Execution worker for ':' Thread 9,5,main]) completed. Took 6 mins 18.541 secs.
the configured ("config.vm.boot_timeout" value) time period.

:plugins:repository-hdfs:krb5kdcFixture#stop (Thread[Execution worker for ':' Thread 9,5,main]) started.
If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.

If you're using a custom box, make sure that networking is properly
working and you're able to connect to the machine. It is a common
problem that networking isn't setup properly in these boxes.
Verify that authentication configurations are also setup properly,
as well.

If the box appears to be booting properly, you may want to increase
the timeout ("config.vm.boot_timeout") value.

> Task :plugins:repository-hdfs:krb5kdcFixture#stop
Task ':plugins:repository-hdfs:krb5kdcFixture#stop' is not up-to-date because:
  Task has not declared any outputs despite executing actions.
Custom actions are attached to task ':plugins:repository-hdfs:krb5kdcFixture#stop'.
Starting process 'command 'vagrant''. Working directory: /var/lib/jenkins/workspace/elastic+elasticsearch+master+intake/plugins/repository-hdfs Command: vagrant halt krb5kdc
Successfully started process 'command 'vagrant''
==> krb5kdc: Attempting graceful shutdown of VM...
    krb5kdc: Guest communication could not be established! This is usually because
    krb5kdc: SSH is not running, the authentication information was changed,
    krb5kdc: or some other networking issue. Vagrant will force halt, if
    krb5kdc: capable.
==> krb5kdc: Forcing shutdown of VM...
:plugins:repository-hdfs:krb5kdcFixture#stop (Thread[Execution worker for ':' Thread 9,5,main]) completed. Took 19.844 secs.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/2498/console
build-intake-master-2498.txt.zip

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=zulu11,nodes=immutable&&linux&&docker/165/console

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java8,nodes=immutable&&linux&&docker/165/console

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-03-12T19:51:00Z

Pinging @elastic/es-distributed

elasticmachine · 2019-03-13T07:42:35Z

Pinging @elastic/es-core-infra

alpar-t · 2019-03-13T17:51:59Z

@elasticdog this is happening because our ubuntu image now has vagrant and we are picking that up but it's apparently not working.

elasticdog · 2019-03-13T20:40:50Z

@atorok that is interesting...both VirtualBox and Vagrant were added into the Ubuntu 18.04 image to test nested virtualization, specifically I've been looking at running the packaging tests in GCP. The general gist is that it works, but is very slow. I've not experienced any direct failures after bringing up a VM like is shown in the output here, but would expect more of our images to have VirtualBox/Vagrant installed moving forward if we can figure out where the performance hit comes from.

I'm open to suggestions on the approach you'd like to take here to prevent unexpected runs.

alpar-t · 2019-03-15T06:36:56Z

@elasticdog I'we been thinking about having a way to label images as "experiemntal"/ "beta" etc. to allow for experimentation in a way that limits the overall impact without putting a lot of burden on the team and infrastructure, so it should be something simple, setting up a separate image build or some complex versioning scheme would not help us.

I think we should add a "stable" label and require it for critical jobs that pick from a pool of workers like intake and PR builds. Then when we do something like this, we could remove the "stable" label from Ubuntu 18.04 so it will no longer be used for those jobs, and add a job that runs only on this platform. It may be a bit of overkill for this situation, but I imagine there will be others we will want to try in the future, like adding more cores once we better palatalize the build and changes in general that require some experimentation to get right.

In the mean time, we have been migrating away from Vagrant based test fixtures, this is the last one, and we are inching towards having a setup that would allow us to run packaging tests against GCP with each running on it's own VM to make them much faster. If we adapt the build to make it easier to do so we would still need packer 2.0 based images to do so. I think that won't be too hard to port from the existing ones, probably the more time consuming bit is to also generate identical VirtualBox images.
VMX might give us some benefits right now, but on the long term I think running the tests in GCP VMs is what we should aim for.

alpar-t · 2019-03-15T06:54:33Z

This would be fixed by #34095

elasticdog · 2019-03-15T14:30:12Z

I definitely agree that it would be good to come up with some mechanism to have experimental images without affecting the regular ones. It can be tricky to balance our own Packer build times and complexity, but I'll think about how we can introduce things like this more cleanly (I hadn't expected there to be any side effects to having this software baked into the images).

jaymode · 2019-03-18T20:34:02Z

Another failure on 7.x, https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+periodic/226/console

romseygeek · 2019-03-19T13:35:06Z

Another in 7x: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/632/console

ywelsch · 2019-07-29T08:56:58Z

More recent failure of this (not sure if related, or a different failure):

:test:fixtures:krb5kdc-fixture:composeUp FAILED
Building peppa
Building hdfs
Recreating 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1 ... 
Recreating 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__hdfs_1  ... 
Recreating 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1 ... error

ERROR: for 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1  Cannot start service peppa: unable to remount dir as readonly: device or resource busy
Recreating 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__hdfs_1  ... done

ERROR: for peppa  Cannot start service peppa: unable to remount dir as readonly: device or resource busy
Encountered errors while bringing up the project.
Stopping 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__hdfs_1 ... 
Stopping 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__hdfs_1 ... done
Removing 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__hdfs_1               ... 
Removing 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1              ... 
Removing a75dff4beb91_276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1 ... 
Removing 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1              ... done
Removing a75dff4beb91_276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__peppa_1 ... done
Removing 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__hdfs_1               ... done
Removing network 276cd8b2e6fb66d48075e05cd8bd231c_krb5kdc-fixture__default

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test:fixtures:krb5kdc-fixture:composeUp'.
> Exit-code 1 when calling /usr/local/bin/docker-compose, stdout: N/A

Build scan: https://gradle.com/s/zhozfy4xnifjs

alpar-t · 2019-09-05T11:37:23Z

We switched to docker based fixtures so this is no longer applicable

andyb-elastic added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Mar 12, 2019

ywelsch assigned jbaiera Mar 13, 2019

ywelsch added the :Delivery/Build Build or test infrastructure label Mar 13, 2019

ywelsch removed the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Mar 13, 2019

alpar-t mentioned this issue Mar 25, 2019

Test fixtures krb5 #40297

Merged

alpar-t closed this as completed Sep 5, 2019

mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kdc vagrant fixture time out while booting #39977

kdc vagrant fixture time out while booting #39977

andyb-elastic commented Mar 12, 2019

elasticmachine commented Mar 12, 2019

elasticmachine commented Mar 13, 2019

alpar-t commented Mar 13, 2019

elasticdog commented Mar 13, 2019

alpar-t commented Mar 15, 2019

alpar-t commented Mar 15, 2019

elasticdog commented Mar 15, 2019 •

edited

Loading

jaymode commented Mar 18, 2019

romseygeek commented Mar 19, 2019

ywelsch commented Jul 29, 2019

alpar-t commented Sep 5, 2019

kdc vagrant fixture time out while booting #39977

kdc vagrant fixture time out while booting #39977

Comments

andyb-elastic commented Mar 12, 2019

elasticmachine commented Mar 12, 2019

elasticmachine commented Mar 13, 2019

alpar-t commented Mar 13, 2019

elasticdog commented Mar 13, 2019

alpar-t commented Mar 15, 2019

alpar-t commented Mar 15, 2019

elasticdog commented Mar 15, 2019 • edited Loading

jaymode commented Mar 18, 2019

romseygeek commented Mar 19, 2019

ywelsch commented Jul 29, 2019

alpar-t commented Sep 5, 2019

elasticdog commented Mar 15, 2019 •

edited

Loading