Skip to content

Deadlock issue at startup while using micrometer export #17765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Liodene opened this issue Aug 2, 2019 · 2 comments
Closed

Deadlock issue at startup while using micrometer export #17765

Liodene opened this issue Aug 2, 2019 · 2 comments
Assignees
Labels
for: external-project For an external project and not something we can fix

Comments

@Liodene
Copy link

Liodene commented Aug 2, 2019

Hello everyone,

We came across a deadlock at startup on one of our applications related to Micrometer/Actuator metrics autoconfiguration.

For our business needs, I had to re-create an OpenTSDB metric exporter (since it's not bundled anymore with Spring Boot 2). I got inspired from the previous OpenTSDB exporter and current metric exporter (like Influx). The only specificity is that I wanted to validate my ConfigurationProperties class with javax.validation, which was where the problem begun.

Here is the thread dump of the deadlock

 =============================
 "localhost-startStop-1":
  waiting to lock monitor 0x00007fbe78e3d468 (object 0x00000000f7d22858, a java.lang.Object),
  which is held by "Service Thread"
 "Service Thread":
  waiting to lock monitor 0x00007fbe78e3b788 (object 0x00000000f6be5020, a java.util.concurrent.ConcurrentHashMap),
  which is held by "localhost-startStop-1"
 p),
 Java stack information for the threads listed above:
 ===================================================
 "localhost-startStop-1":
 #011at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:558)
 #011- waiting to lock <0x00000000f7d22858> (a java.lang.Object)
 #011at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:528)
 #011at io.micrometer.core.instrument.MeterRegistry.timer(MeterRegistry.java:268)
 #011at io.micrometer.core.instrument.Timer$Builder.register(Timer.java:464)
 #011at com.zaxxer.hikari.metrics.micrometer.MicrometerMetricsTracker.<init>(MicrometerMetricsTracker.java:78)
 #011at com.zaxxer.hikari.metrics.micrometer.MicrometerMetricsTrackerFactory.create(MicrometerMetricsTrackerFactory.java:20)
 #011at com.zaxxer.hikari.pool.HikariPool.setMetricsTrackerFactory(HikariPool.java:302)
 #011at com.zaxxer.hikari.HikariDataSource.setMetricsTrackerFactory(HikariDataSource.java:263)
 #011at org.springframework.boot.actuate.autoconfigure.metrics.jdbc.DataSourcePoolMetricsAutoConfiguration$HikariDataSourceMetricsConfiguration.bindMetricsRegistryToHikariDataSource(DataSourcePoolMetricsAutoConfiguration.java:126)
 #011at org.springframework.boot.actuate.autoconfigure.metrics.jdbc.DataSourcePoolMetricsAutoConfiguration$HikariDataSourceMetricsConfiguration.bindMetricsRegistryToHikariDataSources(DataSourcePoolMetricsAutoConfiguration.java:118)
 #011at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 #011at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 #011at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 #011at java.lang.reflect.Method.invoke(Method.java:498)
 #011at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.inject(AutowiredAnnotationBeanPostProcessor.java:708)
 #011at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:90)
 #011at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:374)
 #011at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1411)
 #011at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:592)
 #011at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515)
 #011at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320)
 #011at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$129/789436751.getObject(Unknown Source)
 #011at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
 #011- locked <0x00000000f6be5020> (a java.util.concurrent.ConcurrentHashMap)
 #011at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318)
 #011at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
 #011at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:845)
 #011at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877)
 #011at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549)
 #011- locked <0x00000000f6be4978> (a java.lang.Object)
 #011at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:140)
 #011at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:742)
 #011at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:389)
 #011at org.springframework.boot.SpringApplication.run(SpringApplication.java:311)
 #011at org.springframework.boot.web.servlet.support.SpringBootServletInitializer.run(SpringBootServletInitializer.java:151)
 #011at org.springframework.boot.web.servlet.support.SpringBootServletInitializer.createRootApplicationContext(SpringBootServletInitializer.java:131)
 #011at org.springframework.boot.web.servlet.support.SpringBootServletInitializer.onStartup(SpringBootServletInitializer.java:91)
 #011at org.springframework.web.SpringServletContainerInitializer.onStartup(SpringServletContainerInitializer.java:171)
 #011at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5154)
 #011- locked <0x00000000f570b1e0> (a org.apache.catalina.core.StandardContext)
 #011at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
 #011- locked <0x00000000f570b1e0> (a org.apache.catalina.core.StandardContext)
 #011at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:743)
 #011at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:719)
 #011at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:714)
 #011at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:970)
 #011at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1841)
 #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 #011at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 #011at java.lang.Thread.run(Thread.java:748)
 "Service Thread":
 #011at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:179)
 #011- waiting to lock <0x00000000f6be5020> (a java.util.concurrent.ConcurrentHashMap)
 #011at org.springframework.beans.factory.support.AbstractBeanFactory.isTypeMatch(AbstractBeanFactory.java:493)
 #011at org.springframework.beans.factory.support.DefaultListableBeanFactory.doGetBeanNamesForType(DefaultListableBeanFactory.java:518)
 #011at org.springframework.beans.factory.support.DefaultListableBeanFactory.getBeanNamesForType(DefaultListableBeanFactory.java:481)
 #011at org.springframework.beans.factory.BeanFactoryUtils.beanNamesForTypeIncludingAncestors(BeanFactoryUtils.java:227)
 #011at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1415)
 #011at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1214)
 #011at org.springframework.context.annotation.ContextAnnotationAutowireCandidateResolver$1.getTarget(ContextAnnotationAutowireCandidateResolver.java:90)
 #011at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:192)
 #011at com.sun.proxy.$Proxy94.forExecutables(Unknown Source)
 #011at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:99)
 #011at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
 #011at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688)
 #011at my.package.OpenTsdbProperties$$EnhancerBySpringCGLIB$$62b04d96.getStep(<generated>)
 #011at org.springframework.boot.actuate.autoconfigure.metrics.export.properties.StepRegistryPropertiesConfigAdapter$$Lambda$395/784926852.apply(Unknown Source)
 #011at org.springframework.boot.actuate.autoconfigure.metrics.export.properties.PropertiesConfigAdapter.get(PropertiesConfigAdapter.java:53)
 #011at org.springframework.boot.actuate.autoconfigure.metrics.export.properties.StepRegistryPropertiesConfigAdapter.step(StepRegistryPropertiesConfigAdapter.java:51)
 #011at io.micrometer.core.instrument.step.StepMeterRegistry.defaultHistogramConfig(StepMeterRegistry.java:94)
 #011at io.micrometer.core.instrument.MeterRegistry.lambda$timer$2(MeterRegistry.java:270)
 #011at io.micrometer.core.instrument.MeterRegistry$$Lambda$447/1190935252.apply(Unknown Source)
 #011at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:575)
 #011- locked <0x00000000f7d22858> (a java.lang.Object)
 #011at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:528)
 #011at io.micrometer.core.instrument.MeterRegistry.timer(MeterRegistry.java:268)
 #011at io.micrometer.core.instrument.Timer$Builder.register(Timer.java:464)
 #011at io.micrometer.core.instrument.binder.jvm.JvmGcMetrics.lambda$bindTo$0(JvmGcMetrics.java:162)
 #011at io.micrometer.core.instrument.binder.jvm.JvmGcMetrics$$Lambda$413/692860962.handleNotification(Unknown Source)
 #011at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:156)
 #011at sun.management.GarbageCollectorImpl.createGCNotification(GarbageCollectorImpl.java:143)
 56)
 Found 1 deadlock.

On the thread "localhost-startStop-1", which is the main thread:

  • start the application
  • lock on the singleton cache
  • autoconfigure actuator metrics
  • create meter and try to lock on the meter cache

On the thread "Service Thread":

  • GC is running which is instrumented by micrometer
  • try to export metrics
  • lock on meter cache
  • start instantiation of my custom exporter
  • try to validate my properties
  • try to lock on the singleton cache

Autoconfiguration class:

@Configuration
@AutoConfigureBefore({ CompositeMeterRegistryAutoConfiguration.class,
                       SimpleMetricsExportAutoConfiguration.class })
@AutoConfigureAfter(MetricsAutoConfiguration.class)
@ConditionalOnProperty(prefix = "management.metrics.export.opentsdb", name = "enabled", matchIfMissing = false)
@EnableConfigurationProperties(OpenTsdbProperties.class)
public class OpenTsdbMetricsExportAutoConfiguration {

    @Bean
    @ConditionalOnMissingBean
    public OpenTsdbConfig openTsdbConfig(OpenTsdbProperties openTsdbProperties) {
        return new OpenTsdbPropertiesConfigAdapter(openTsdbProperties);
    }

    @Bean
    @ConditionalOnMissingBean
    public OpenTsdbMeterRegistry openTsdbMeterRegistry(OpenTsdbConfig openTsdbConfig,
                                                       Clock clock) {
        return new OpenTsdbMeterRegistry(openTsdbConfig, clock);
    }
}

Properties class:

@ConfigurationProperties(prefix = "management.metrics.export.opentsdb")
@Validated
public class OpenTsdbProperties extends StepRegistryProperties {

    /**
     * URI of the OpenTSDB server.
     */
    @NotNull
    private String uri ;

    /**
     * Hostname to use in the tags.
     */
    @NotNull
    private String hostname;

    /**
     * Whether to enable GZIP compression of metrics batches published to OpenTSDB.
     */
    private boolean compressed = true;

    public String getUri() {
        return this.uri;
    }

    public void setUri(String uri) {
        this.uri = uri;
    }

    public boolean isCompressed() {
        return this.compressed;
    }

    public void setCompressed(boolean compressed) {
        this.compressed = compressed;
    }

    public String getHostname() {
        return hostname;
    }

    public void setHostname(String hostname) {
        this.hostname = hostname;
    }
}

Regarding the reproducibility, the deadlock window is really short in our case:

  • happened on 95% of starup on one application
  • happened on 5% of startup on another one with similar setup
  • changing logging level changed the behaviour since it changed the timings. In our case it was failing in 95% of startup, in 10-15% of startup in DEBUG and never in TRACE
  • we were not able to recreate those condition locally

I know this sounds like an unprobable series of event which leads to a one time problem, but I guess it could have been something else than a bean validation, and something else than a custom exporter.

Isn't there an issue with the order in which the micrometer elements are configured? It seems odd to start instrumenting elements and reporting metrics before the system is fully up and running. My spring-muggle opinion is that MeterRegistry should be configured before Meter/Metrics.

Version used:

  • Spring-boot: 2.1.6-RELEASE
  • micrometer: 1.1.5
  • hibernate-validator: 6.0.17.Final
  • javax validation-api: 2.0.1-Final

In the meantime, we fixed the issue by doing the field validation manually.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Aug 2, 2019
@philwebb philwebb added the for: team-attention An issue we'd like other members of the team to review label Aug 2, 2019
@wilkinsona
Copy link
Member

Thanks for the detailed analysis.

Isn't there an issue with the order in which the micrometer elements are configured? It seems odd to start instrumenting elements and reporting metrics before the system is fully up and running. My spring-muggle opinion is that MeterRegistry should be configured before Meter/Metrics.

The MeterRegistry should be ready for use as soon as it's exposed as a bean. The two threads that are deadlocking are both trying to register a timer. The problem is that both Micrometer and Spring Framework are making foreign calls while holding locks. This is always going to be a recipe for deadlock, irrespective of any ordering changes.

I don't think there's anything we can do to fix this in Boot. While Micrometer's meter registration makes foreign calls while holding a lock, it will always be possible for deadlock to occur.

What do you think, @shakuzen?

@wilkinsona
Copy link
Member

I've opened a Framework issue and a Micrometer issue so that they can hopefully avoid calling foreign code while holding a lock. We'll pick up new versions of Framework and Micrometer that hopefully fix those issues in due course.

@wilkinsona wilkinsona added for: external-project For an external project and not something we can fix and removed status: waiting-for-triage An issue we've not yet triaged labels Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix
Projects
None yet
Development

No branches or pull requests

4 participants