Nacos 出现大量线程创建的问题排查

4,926 阅读2分钟

原文:www.liaochuntao.cn/2019/09/04/…

问题描述

有用户反馈说在使用nacos时,随着程序的运行,Java线程在不断的创建,达到了两三千的情况,导致CPULoad指标达到百分之百

解决过程

观察nacos发现,这些被大量创建的线程,最终挂钩的对象为NacosConfigService

public NacosConfigService(Properties properties) throws NacosException {
  String encodeTmp = properties.getProperty(PropertyKeyConst.ENCODE);
  if (StringUtils.isBlank(encodeTmp)) {
    encode = Constants.ENCODE;
  } else {
    encode = encodeTmp.trim();
  }
  initNamespace(properties);
  agent = new MetricsHttpAgent(new ServerHttpAgent(properties));
  agent.start();
  worker = new ClientWorker(agent, configFilterChainManager, properties);
}

而其实的挂钩对象为ClientWorker

@SuppressWarnings("PMD.ThreadPoolCreationRule")
public ClientWorker(final HttpAgent agent, final ConfigFilterChainManager configFilterChainManager, final Properties properties) {
  this.agent = agent;
  this.configFilterChainManager = configFilterChainManager;

  // Initialize the timeout parameter

  init(properties);

  executor = Executors.newScheduledThreadPool(1, new ThreadFactory() {
      @Override
      public Thread newThread(Runnable r) {
        Thread t = new Thread(r);
        t.setName("com.alibaba.nacos.client.Worker." + agent.getName());
        t.setDaemon(true);
        return t;
      }
  });

  executorService = Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors(), new ThreadFactory() {
      @Override
      public Thread newThread(Runnable r) {
        Thread t = new Thread(r);
        t.setName("com.alibaba.nacos.client.Worker.longPolling." + agent.getName());
        t.setDaemon(true);
        return t;
      }
  });

  executor.scheduleWithFixedDelay(new Runnable() {
      @Override
      public void run() {
        try {
          checkConfigInfo();
        } catch (Throwable e) {
          LOGGER.error("[" + agent.getName() + "] [sub-check] rotate check error", e);
        }
      }
  }, 1L, 10L, TimeUnit.MILLISECONDS);
}

因此我最初是怀疑用户是否是创建了大量的NacosConfigService对象

用户jmap数据

用户的JMAP对象直方图数据

可以看出,当前JVM中的ClientWorker对象达到了两千多个,而从上面的nacos源码分析可知,ClientWorker对象挂着线程池

用户自排查

首先让用户自行排查是否自行创建了大量的NacosConfigService实例,这是部分用户反馈确实由于自己的误操作导致创建了大量的NacosConfigService对象

Spring-Cloub-Alibaba组件检查

但是还有部分用户说,他们仅仅依赖spring-cloud-alibaba-nacos组件,没有自己操作NacosConfigService对象,仍然存在大量线程被创建的问题,最终由一个用户的自检查的反馈确定了spring-cloud-alibaba-nacosBUG

@ConfigurationProperties(NacosConfigProperties.PREFIX)
public class NacosConfigProperties {
  ...
  private ConfigService configService;
  ...
  @Deprecated
	public ConfigService configServiceInstance() {

		if (null != configService) {
			return configService;
		}

		Properties properties = new Properties();
		...

		try {
			configService = NacosFactory.createConfigService(properties);
			return configService;
		}
		catch (Exception e) {
			log.error("create config service error!properties={},e=,", this, e);
			return null;
		}
	}
}

这个配置类中,缓存着一个ConfigService对象实例,本意是自己维护一个对象的单例,但是实际,每当spring-cloudcontext刷新后,这个NacosConfigPropertiesbean是会被重新创建的,因此,一旦有配置更新——>Context刷新——>NacosConfigProperties被重新创建——>ConfigService缓存失效——>ConfigService重新创建

因此,由于这个因果关系的存在,导致这个ConfigService的缓存在Context刷新后就无法作用了

解决PR

public class NacosConfigManager implements ApplicationContextAware {

	private ConfigService configService;

	public ConfigService getConfigService() {
		return configService;
		return ServiceHolder.getInstance().getService();
	}

	@Override
	public void setApplicationContext(ApplicationContext applicationContext)
			throws BeansException {
		NacosConfigProperties properties = applicationContext
				.getBean(NacosConfigProperties.class);
		configService = properties.configServiceInstance();
		ServiceHolder holder = ServiceHolder.getInstance();
		if (!holder.alreadyInit) {
			ServiceHolder.getInstance().setService(properties.configServiceInstance());
		}
	}

	static class ServiceHolder {
		private ConfigService service = null;

		private boolean alreadyInit = false;

		private static final ServiceHolder holder = new ServiceHolder();

		ServiceHolder() {
		}

		static ServiceHolder getInstance() {
			return holder;
		}

		void setService(ConfigService service) {
			alreadyInit = true;
			this.service = service;
		}

		ConfigService getService() {
			return service;
		}
	}

}