- 根据官方的这段说明,目前已经不推荐使用 jaeger-client,而是使用 OpenTelemetry 作为 client
Jaeger project recommends OpenTelemetry SDKs for instrumentation, instead of Jaeger's native SDKs that are now deprecated.
The OpenTracing and OpenCensus projects have merged into a new CNCF project called OpenTelemetry.
The Jaeger and OpenTelemetry projects have different goals. OpenTelemetry aims to provide APIs and SDKs in multiple languages to allow applications to export various telemetry data out of the process, to any number of metrics and tracing backends.
The Jaeger project is primarily the tracing backend that receives tracing telemetry data and provides processing, aggregation, data mining, and visualizations of that data. The Jaeger client libraries do overlap with OpenTelemetry in functionality.
OpenTelemetry natively supports Jaeger as a tracing backend and makes Jaeger native clients unnecessary. For more information please refer to a blog post Jaeger and OpenTelemetry.
- 但是从文档来看目前支持的系统还不完善,Unix 系统中只支持 Ubuntu,还是先用 jaeger-client
"github.com/opentracing/opentracing-go"
"github.com/uber/jaeger-client-go"
jaegercfg "github.com/uber/jaeger-client-go/config"
- 注意在创建容器的时候就不能指定 latest 了
- 新建追踪连,定义 span
package mainimport ("github.com/uber/jaeger-client-go"jaegercfg "github.com/uber/jaeger-client-go/config""time"
)func main() {cfg := jaegercfg.Configuration{Sampler: &jaegercfg.SamplerConfig{Type: jaeger.SamplerTypeConst,Param: 1,},Reporter: &jaegercfg.ReporterConfig{LogSpans: true,LocalAgentHostPort: "192.168.109.128:6831", },ServiceName: "vshop",}tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jaeger.StdLogger))if err != nil {panic(err)}defer closer.Close()span := tracer.StartSpan("go-grpc-order-web")time.Sleep(time.Second)defer span.Finish()
}
- 执行成功,可以在 UI 上看到;
4433a5c
是 tracerID
- 创建嵌套 span,其实就是调用链上各函数之间的 span,有顺序,但也可以说成是嵌套关系
func main() {cfg := jaegercfg.Configuration{Sampler: &jaegercfg.SamplerConfig{Type: jaeger.SamplerTypeConst,Param: 1,},Reporter: &jaegercfg.ReporterConfig{LogSpans: true,LocalAgentHostPort: "192.168.109.128:6831", },ServiceName: "shop",}tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jaeger.StdLogger))if err != nil {panic(err)}defer closer.Close()span_a := tracer.StartSpan("funcA")time.Sleep(time.Second)defer span_a.Finish()span_b := tracer.StartSpan("funcB")time.Sleep(2 * time.Second)defer span_b.Finish()
}
- 上面有两个问题,两个 span 不是嵌套关系呀,traceID 不是一个,funcA 用时 3s,也就是包含了 B 的时间
- 解决方案如下,创建父级 span
parent := tracer.StartSpan("father")
span_a := tracer.StartSpan("funcA", opentracing.ChildOf(parent.Context()))
time.Sleep(time.Second)
span_a.Finish()
span_b := tracer.StartSpan("funcB", opentracing.ChildOf(parent.Context()))
time.Sleep(2 * time.Second)
span_b.Finish()
parent.Finish()
- 父 span 是一条完整的时间线,如果各同级 span 之间有没被包裹的地方,会间隔开
- 追踪 grpc 调用
- 使用这个工具,只需要里面的 otgrpc 部分,利用拦截器实现,可以指定 tracer 和 parentSpan,一般放在拨号中
- 看上面的例子,我们需要用 span 夹住被调用的函数
- Client 端代码如下
package mainimport ("context""fmt""github.com/opentracing/opentracing-go""github.com/uber/jaeger-client-go"jaegercfg "github.com/uber/jaeger-client-go/config""google.golang.org/grpc""shop_srvs/order_srv/tests/jaeger/otgrpc""shop_srvs/order_srv/tests/jaeger/proto"
)func main() {cfg := jaegercfg.Configuration{Sampler: &jaegercfg.SamplerConfig{Type: jaeger.SamplerTypeConst,Param: 1,},Reporter: &jaegercfg.ReporterConfig{LogSpans: true,LocalAgentHostPort: "192.168.109.128:6831", },ServiceName: "roy-shop",}tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jaeger.StdLogger))if err != nil {panic(err)}defer closer.Close()opentracing.SetGlobalTracer(tracer)defer closer.Close()conn, err := grpc.Dial("127.0.0.1:50051", grpc.WithInsecure(), grpc.WithUnaryInterceptor(otgrpc.OpenTracingClientInterceptor(opentracing.GlobalTracer())))if err != nil {panic(err)}defer conn.Close()c := proto.NewGreeterClient(conn)r, err := c.SayHello(context.Background(), &proto.HelloRequest{Name: "Roy"})if err != nil {panic(err)}fmt.Println(r.Message)
}
- 服务端也要加,现在只能看到 client 端调用的总时间,这包含了网络传输时间
- 服务端集成 otgrpc
- 订单服务(新建订单),main.go,初始化 jaeger,重点是 NewServer 的时候传递
grpc.UnaryInterceptor
tracer, closer, err := cfg.NewTracer(jaegercfg.Logger(jaeger.StdLogger))
if err != nil {panic(err)
}
opentracing.SetGlobalTracer(tracer)
server := grpc.NewServer(grpc.UnaryInterceptor(otgrpc.OpenTracingServerInterceptor(tracer)))
- 看源码,底层调用了
OpenTracingServerInterceptor
,其中又调用了 spanContext, err := extractSpanContext(ctx, tracer)
func extractSpanContext(ctx context.Context, tracer opentracing.Tracer) (opentracing.SpanContext, error) {md, ok := metadata.FromIncomingContext(ctx)if !ok {md = metadata.New(nil)}return tracer.Extract(opentracing.HTTPHeaders, metadataReaderWriter{md})
}
- 这里的
Extract
和 client 端的 Inject
对应,提取出父span怎么获取的呢?
serverSpan := tracer.StartSpan(info.FullMethod,ext.RPCServerOption(spanContext),gRPCComponentTag,
)
defer serverSpan.Finish()ctx = opentracing.ContextWithSpan(ctx, serverSpan)
ContextWithSpan
这个方法给 activeSpanKey
设置了 value,就是提取出的 parentSpan
- 回到新建订单接口,
OrderListener
中新增 Ctx context.Context
,本地事务中
parentSpan := opentracing.SpanFromContext(o.Ctx)
SpanFromContext
这个方法从 ctx 获取上面那个 key 的 value,也就拿到了 parentSpan
func SpanFromContext(ctx context.Context) Span {val := ctx.Value(activeSpanKey)if sp, ok := val.(Span); ok {return sp}return nil
}
- OK,接下来就是 server 端本地操作了,在想追踪的地方新建 child span 夹住,例如:
shopCartSpan := opentracing.GlobalTracer().StartSpan("select_shopcart", opentracing.ChildOf(parentSpan.Context()))
shopCartSpan.Finish()
- 但是运行发现有很多 health check 的 span,从
info.FullMethod
获取它的 span name,在源码中加个判断
- OK,启动服务访问新建订单接口,可以看到如下结果
- 这里 client 的 /Order/CreateOrder 和 server 的 /Order/CreateOrder 重复了,可以不用 server 创建的 serverSpan
- 空缺部分可能是网络传输或者未追踪部分的执行时间
- 以上就是 server 端使用 otgrpc 的一个例子,其他都可以模仿实现
- 和 proto 文件类似,C 和 S 都要用,才能实现web -> srv 的完整追踪
- 接下来进入可用性相关的阶段:熔断降级