转载

缓存策略优化

缓存介绍

  • 在高并发多用户的系统中常常会使用缓存来提升读写性能
  • 常见的如memcached, redis, 内存缓存等

现象

  • 某产品上线后不久,服务报警,看日志发现有sql的timeout报错,具体表现为:
    • 页面许多逻辑超时、出错
    • db所在机器load较高,dba经查为大量相同的sql在反复执行

定位

  • 取应用服务的jstack
  • 参考我之前的blog,stack dump文件用stackAnalysis工具分析,发现有大量的线程在做同一个事情:
40 threads at (state = RUNNABLE,
  locks_locked = [0x0000000725b33848, 0x0000000725b338f0, 0x0000000737ff37d0, 0x0000000737f88f08, 0x0000000737f817c8, 0x00000007fc8ba580, 0x0000000725d8e638, 0x0000000725d8e6e0, 0x0000000738274490, 0x0000000725b5f720, 0x0000000725b5f7c8, 0x00000007384c03f8, 0x00000007231683a8, 0x0000000723168450, 0x0000000731980608, 0x0000000725d27ab8, 0x0000000725d2fcd8, 0x00000007384b16c8, 0x0000000723221798, 0x00000007232299c0, 0x000000072efb1228, 0x00000007005b70c0, 0x00000007005aff10, 0x0000000738321660, 0x00000007318cb948, 0x00000007318c4780, 0x0000000737c7de70, 0x0000000725a02d30, 0x0000000725a02dd8, 0x00000007fc8f8b60, 0x00000007232918f8, 0x000000072329db00, 0x000000073186ee08, 0x0000000725b7b928, 0x0000000725b7bb98, 0x0000000738066408, 0x00000007230a6ef8, 0x00000007230a0160, 0x0000000738191a18, 0x0000000737f619e8, 0x0000000737f5a6d8, 0x00000007fc8b9518, 0x0000000725ba54d0, 0x0000000725ba5578, 0x0000000738239a40, 0x0000000725e885c0, 0x0000000725e810e8, 0x00000007b24ac378, 0x00000007230c47e8, 0x00000007230c4890, 0x0000000731907c58, 0x00000007005345a0, 0x000000070052d098, 0x0000000731a6d400, 0x00000007231879f8, 0x0000000723187aa0, 0x000000073846aa20, 0x00000007231e7128, 0x00000007231e71d0, 0x0000000731958f38, 0x00000007231b2500, 0x00000007231b25a8, 0x00000007fc8f8dc0, 0x0000000725e1af28, 0x0000000725e1afd0, 0x0000000738323388, 0x00000007319ad368, 0x00000007319a6588, 0x00000007384894f0, 0x00000007318b8af8, 0x00000007318b1ba8, 0x00000007380c9908, 0x0000000725c5e478, 0x0000000725c5e520, 0x0000000738256338, 0x00000007230c7cd0, 0x00000007230b9440, 0x000000072e8c7810, 0x0000000725dcd8d0, 0x0000000725dc66d8, 0x0000000732c2df18, 0x00000007232425a0, 0x0000000723242648, 0x0000000732c31da0, 0x0000000731a4fd78, 0x0000000731a4fe20, 0x0000000738139a10, 0x0000000725cda198, 0x0000000725cda240, 0x0000000738066638, 0x0000000702b936b8, 0x0000000702b929a0, 0x00000007384893f0, 0x00000007230f9150, 0x00000007230f91f8, 0x0000000738036fc8, 0x000000073198d218, 0x000000073198d2c0, 0x00000007384710c8, 0x00000007231b0bf0, 0x00000007231b0c98, 0x00000007fc8bdea8, 0x00000007318a5808, 0x000000073189e0c8, 0x0000000731870018, 0x0000000723279d10, 0x0000000723279db8, 0x0000000738471170, 0x000000072e8fabd8, 0x000000072e8f8af8, 0x0000000732c51a38, 0x00000007319c69a8, 0x00000007319b9238, 0x0000000737fd5758, 0x0000000725b0c488, 0x0000000725b0c530, 0x00000007381f44a0, 0x0000000731a095b8, 0x0000000731a09660, 0x0000000735cbb2b0]) :
  "http-bio-*-exec-*" daemon prio=* tid=******** nid=******** runnable [********]
     java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:129)
          at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
          at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
          at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
          - locked <********> (a com.mysql.jdbc.util.ReadAheadInputStream)
          at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
          at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
          at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
          at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
          at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
          at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
          at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
          - locked <********> (a com.mysql.jdbc.JDBC4Connection)
          at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
          - locked <********> (a com.mysql.jdbc.JDBC4Connection)
          at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
          - locked <********> (a com.mysql.jdbc.JDBC4Connection)
          at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.execute(NewProxyPreparedStatement.java:67)
          at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
          at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
          at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:57)
          at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:259)
          at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:132)
          at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:105)
          at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:81)
          at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:104)
          at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98)
          at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:358)
          at com.sun.proxy.$Proxy18.selectList(Unknown Source)
          at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:198)
          at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:114)
          at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:58)
          at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43)
          at com.sun.proxy.$Proxy46.selectAllValidActivityPush(Unknown Source)
          at com.xxxx.xxxx.module.inbox.InboxAgent.selectActivityPush(InboxAgent.java:612)
          at com.xxxx.xxxx.service.SystemMessageService.getActivityPushMessage(SystemMessageService.java:975)
          at com.xxxx.xxxx.service.login.logic.impl.LogicLoginServiceImpl.updateLoginUser(LogicLoginServiceImpl.java:438)
          at com.xxxx.xxxx.service.login.logic.impl.LogicLoginServiceImpl.updateLoginUser(LogicLoginServiceImpl.java:374)
          at com.xxxx.xxxx.web.controller.login.LoginController.login(LoginController.java:119)
  • 可以看到有40个线程在等待db的数据返回,结合堆栈,基本可以定位到有问题的代码逻辑了

分析

  • 仔细分析对应代码逻辑,可发现有如下的缓存策略:
    Object getObject() {
          o = getFromCache()
          if(o == null){
              o = getFromDb()
              if(o != null) {
                  setToCache(o)
              }
          }
          return o;
      }
  • 从上面看貌似没有问题,但仔细分析会发现当getFromDb()返回null即数据库中并不存在相关数据时,每一个线程都会去执行getFromDb()这个方法,每个请求都会穿透到db上
  • 当用户请求较大时,对数据库的压力会非常大【上面的stack仅为多台应用web中的一台】

解决思路

  • 当数据库中无数据时,可以在缓存中放一个无效的对象表明“数据为空,不需要到db中查询了”,如下:
Object getObject() {
        o = getFromCache(key)
        if(o == null){
            o = getFromDb()
            if(o != null) {
                setToCache(key, o)
            }
            else {
                setToCache(key, invalidObject)
            }
        }
        return o == invalidObject ? null : o;
    }
  • 更进一步,上面的getFromDb()逻辑仍有可能会被多个线程同时操作,可以视业务场景而加上分布式锁的逻辑:
    Object getObject() {
          o = getFromCache(key)
          if(o == null){
              try {
                  if(cache.lock(key)) {
                      o = getFromDb()
                      if(o != null) {
                          setToCache(key, o)
                      }
                      else {
                          setToCache(key, invalidObject)
                      }
                  }
              } finally {
                  cache.unlock(key);
              }
          }
          return o == invalidObject ? null : o;
      }

思考及建议

  • 多线程思维:每一行代码都要考虑其会被多个线程高并发的执行
  • 抠门思维:每一行代码,尤其每一个网络操作(cache或db),都要考虑是否可以节省下来,或者将多个操作合并为一个操作
  • 批量思维:多个动作是否可以一次完成。举个例子:去菜市场买菜大家都会一次把五种菜全买回,而不是买一次菜去菜市场一次。coding为什么不也这样呢?
  • 每个逻辑都要谨慎思考,任何疏忽都可能会把线上搞死,服务宕机,造成严重后果
原文  http://www.importnew.com/22709.html
正文到此结束
Loading...