有用户反馈说在使用 nacos 时,随着程序的运行, Java 线程在不断的创建,达到了两三千的情况,导致 CPU 的 Load 指标达到百分之百
观察 nacos 发现,这些被大量创建的线程,最终挂钩的对象为 NacosConfigService
public NacosConfigService(Properties properties) throws NacosException {
String encodeTmp = properties.getProperty(PropertyKeyConst.ENCODE);
if (StringUtils.isBlank(encodeTmp)) {
encode = Constants.ENCODE;
} else {
encode = encodeTmp.trim();
}
initNamespace(properties);
agent = new MetricsHttpAgent(new ServerHttpAgent(properties));
agent.start();
worker = new ClientWorker(agent, configFilterChainManager, properties);
}
复制代码
而其实的挂钩对象为 ClientWorker
@SuppressWarnings("PMD.ThreadPoolCreationRule")
public ClientWorker(final HttpAgent agent, final ConfigFilterChainManager configFilterChainManager, final Properties properties) {
this.agent = agent;
this.configFilterChainManager = configFilterChainManager;
// Initialize the timeout parameter
init(properties);
executor = Executors.newScheduledThreadPool(1, new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setName("com.alibaba.nacos.client.Worker." + agent.getName());
t.setDaemon(true);
return t;
}
});
executorService = Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors(), new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setName("com.alibaba.nacos.client.Worker.longPolling." + agent.getName());
t.setDaemon(true);
return t;
}
});
executor.scheduleWithFixedDelay(new Runnable() {
@Override
public void run() {
try {
checkConfigInfo();
} catch (Throwable e) {
LOGGER.error("[" + agent.getName() + "] [sub-check] rotate check error", e);
}
}
}, 1L, 10L, TimeUnit.MILLISECONDS);
}
复制代码
因此我最初是怀疑用户是否是创建了大量的 NacosConfigService 对象
用户 jmap 数据
可以看出,当前 JVM 中的 ClientWorker 对象达到了两千多个,而从上面的 nacos 源码分析可知, ClientWorker 对象挂着线程池
首先让用户自行排查是否自行创建了大量的 NacosConfigService 实例,这是部分用户反馈确实由于自己的误操作导致创建了大量的 NacosConfigService 对象
Spring-Cloub-Alibaba 组件检查
但是还有部分用户说,他们仅仅依赖 spring-cloud-alibaba-nacos 组件,没有自己操作 NacosConfigService 对象,仍然存在大量线程被创建的问题,最终由一个用户的自检查的反馈确定了 spring-cloud-alibaba-nacos 的 BUG
@ConfigurationProperties(NacosConfigProperties.PREFIX)
public class NacosConfigProperties {
...
private ConfigService configService;
...
@Deprecated
public ConfigService configServiceInstance() {
if (null != configService) {
return configService;
}
Properties properties = new Properties();
...
try {
configService = NacosFactory.createConfigService(properties);
return configService;
}
catch (Exception e) {
log.error("create config service error!properties={},e=,", this, e);
return null;
}
}
}
复制代码
这个配置类中,缓存着一个 ConfigService 对象实例,本意是自己维护一个对象的单例,但是实际,每当 spring-cloud 的 context 刷新后,这个 NacosConfigProperties 的 bean 是会被重新创建的,因此,一旦有配置更新——> Context 刷新——> NacosConfigProperties 被重新创建——> ConfigService 缓存失效——> ConfigService 重新创建
因此,由于这个因果关系的存在,导致这个 ConfigService 的缓存在 Context 刷新后就无法作用了
public class NacosConfigManager implements ApplicationContextAware {
private ConfigService configService;
public ConfigService getConfigService() {
return configService;
return ServiceHolder.getInstance().getService();
}
@Override
public void setApplicationContext(ApplicationContext applicationContext)
throws BeansException {
NacosConfigProperties properties = applicationContext
.getBean(NacosConfigProperties.class);
configService = properties.configServiceInstance();
ServiceHolder holder = ServiceHolder.getInstance();
if (!holder.alreadyInit) {
ServiceHolder.getInstance().setService(properties.configServiceInstance());
}
}
static class ServiceHolder {
private ConfigService service = null;
private boolean alreadyInit = false;
private static final ServiceHolder holder = new ServiceHolder();
ServiceHolder() {
}
static ServiceHolder getInstance() {
return holder;
}
void setService(ConfigService service) {
alreadyInit = true;
this.service = service;
}
ConfigService getService() {
return service;
}
}
}
复制代码