一步一步写APM(八)--出现spanId重复现象,以及优化

很长时间没有更新这个系列,再次更新优化。

上次的agent在tomcat测试中发现了几个问题:

问题一:同应用多线程

多次请求同一个网址spanId出现了重复,导致server端绘制的流程出现了问题。

{"spanId":1,"startTime":1545485811618,"operationName":"Tomcat:/","endTime":1545485811633,"parentSpanId":0,"threadName":"http-apr-8080-exec-9"}
...
{"spanId":1,"startTime":1545485825542,"operationName":"Tomcat:/","endTime":1545485825560,"parentSpanId":0,"threadName":"http-apr-8080-exec-5"}
...
{"spanId":1,"startTime":1545485842160,"operationName":"Tomcat:/","endTime":1545485842171,"parentSpanId":0,"threadName":"http-apr-8080-exec-3"}
...
{"spanId":1,"startTime":1545485854594,"operationName":"Tomcat:/","endTime":1545485854613,"parentSpanId":0,"threadName":"http-apr-8080-exec-10"}
...
{"spanId":10,"startTime":1545485876965,"operationName":"Tomcat:/","endTime":1545485876981,"parentSpanId":0,"threadName":"http-apr-8080-exec-3"}
...
{"spanId":1,"startTime":1545485890771,"operationName":"Tomcat:/","endTime":1545485890790,"parentSpanId":0,"threadName":"http-apr-8080-exec-7"}
...
{"spanId":1,"startTime":1545485917760,"operationName":"Tomcat:/","endTime":1545485917768,"parentSpanId":0,"threadName":"http-apr-8080-exec-1"}
...
{"spanId":10,"startTime":1545485928376,"operationName":"Tomcat:/","endTime":1545485928386,"parentSpanId":0,"threadName":"http-apr-8080-exec-5"}

上面是我摘取一部分server端收到的span信息,我在span中加了一个字段threadName,大家可以看出问题:

  • spanId 出现重复
  • spanId如果出现重复则属于不同的thread
  • 同一个thread多次处理请求不会出现spanId重复

原因相信大家也很清楚,tomcat处理请求有一个线程池,每次请求有可能是不同的thread来处理,所以不同线程间会出现spanId相同的情况。

问题二:跨应用

跨应用调用时,比如下面是一个tomcat消费dubbo的常见情况

{"spanId":1,"startTime":1545542234519,"operationName":"Dubbo:Service.TestService.getData(int)","endTime":1545542234519,"parentSpanId":2,"threadName":"DubboServerHandler-192.168.2.67:29014-thread-5"}
{"spanId":2,"startTime":1545542234426,"operationName":"Dubbo:Service.TestService.getData(int)","endTime":1545542234537,"parentSpanId":1,"threadName":"http-apr-8080-exec-9"}
{"spanId":1,"startTime":1545542234404,"operationName":"Tomcat:/dubbo2.html","endTime":1545542234560,"parentSpanId":0,"threadName":"http-apr-8080-exec-9"}

这里的实际情况是先进行了3的tomcat调用,然后调用了2的getData,最后进行了1的getData(顺序相反是因为先完成的span先发送方的缘故---栈特性),这里就有2的parentSpanId 不清楚是哪个了,收到的图形也会出问题。

这时候就需要引入TraceID的概念,参见:http://www.younian.me/archives/RPC%E6%9C%8D%E5%8A%A1%E8%BF%BD%E8%B8%AA%E7%9A%84%E5%8E%9F%E7%90%86%E4%B8%8E%E5%AE%9E%E8%B7%B5

1. TraceID:用来标识每一条业务请求链的唯一ID,TraceID需要在整个调用链路上传递
2. Span:请求链中的每一个环节为一个Span,每一个Span有一个SpanId来标识,前后Span间形成父子关系

简单的TraceID生成策略:时间+进程ID+线程ID+自增序列

至于问题二,直接可以使下一个应用的spanId在上一个应用parentSpan的spanId上加1。

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×