Home >Backend Development >Golang >Unify the scope of different services using OpenTelemetry

Unify the scope of different services using OpenTelemetry

WBOY
WBOYforward
2024-02-13 23:00:11509browse

使用 OpenTelemetry 统一不同服务的范围

php editor Xiaoxin today introduces to you a powerful tool-OpenTelemetry, which can help developers achieve unified scope management in different services. In modern distributed systems, applications are often composed of multiple microservices, each with its own logs, metrics, and tracing information. OpenTelemetry provides a simple and powerful way to integrate and manage this information, allowing developers to better understand and debug the performance and behavior of the entire system. Whether in a local development environment or in a production environment, OpenTelemetry helps developers better understand and optimize their applications.

Question content

I just started using opentelemetry and created two (micro)services for it: standard and geomap.

The end user sends a request to the standard service, which in turn sends a request to geomap to obtain the information, which in turn returns the results to the end user. I use grpc for all communication.

I've done this detection on my function:

For standards:

type standardservice struct {
    pb.unimplementedstandardserviceserver
}

func (s *standardservice) getstandard(ctx context.context, in *pb.getstandardrequest) (*pb.getstandardresponse, error) {

    conn, _:= createclient(ctx, geomapsvcaddr)
    defer conn1.close()

    newctx, span1 := otel.tracer(name).start(ctx, "getstandard")
    defer span1.end()

    countryinfo, err := pb.newgeomapserviceclient(conn).getcountry(newctx,
        &pb.getcountryrequest{
            name: in.name,
        })

    //...

    return &pb.getstandardresponse{
        standard: standard,
    }, nil

}

func createclient(ctx context.context, svcaddr string) (*grpc.clientconn, error) {
    return grpc.dialcontext(ctx, svcaddr,
        grpc.withtransportcredentials(insecure.newcredentials()),
        grpc.withunaryinterceptor(otelgrpc.unaryclientinterceptor()),
    )
}

For Geographic Map:

type geomapservice struct {
    pb.unimplementedgeomapserviceserver
}

func (s *geomapservice) getcountry(ctx context.context, in *pb.getcountryrequest) (*pb.getcountryresponse, error) {

    _, span := otel.tracer(name).start(ctx, "getcountry")
    defer span.end()

    span.setattributes(attribute.string("country", in.name))

    span.addevent("retrieving country info")

    //...
    
    span.addevent("country info retrieved")

    return &pb.getcountryresponse{
        country: &country,
    }, nil

}

Both services are configured to send their spans to the jaeger backend and share almost the same main functionality (minor differences are noted in the comments):

const (
    name        = "mapedia"
    service     = "geomap" //or standard
    environment = "production"
    id          = 1
)

func tracerProvider(url string) (*tracesdk.TracerProvider, error) {
    // Create the Jaeger exporter
    exp, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)))
    if err != nil {
        return nil, err
    }
    tp := tracesdk.NewTracerProvider(
        // Always be sure to batch in production.
        tracesdk.WithBatcher(exp),
        // Record information about this application in a Resource.
        tracesdk.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName(service),
            attribute.String("environment", environment),
            attribute.Int64("ID", id),
        )),
    )
    return tp, nil
}

func main() {

    tp, err := tracerProvider("http://localhost:14268/api/traces")
    if err != nil {
        log.Fatal(err)
    }

    defer func() {
        if err := tp.Shutdown(context.Background()); err != nil {
            log.Fatal(err)
        }
    }()
    otel.SetTracerProvider(tp)

    listener, err := net.Listen("tcp", ":"+port)
    if err != nil {
        panic(err)
    }

    s := grpc.NewServer(
        grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()),
    )
    reflection.Register(s)
    pb.RegisterGeoMapServiceServer(s, &geomapService{}) // or pb.RegisterStandardServiceServer(s, &standardService{})
    if err := s.Serve(listener); err != nil {
        log.Fatalf("Failed to serve: %v", err)
    }
}

When I look at the trace generated by the end user's request to the standard service, I can see that it is, as expected, calling its geomap service:

However, I don't see any properties or events that have been added to the subrange (I added one property and 2 events when instrumenting geomapgetcountry function/em>) .

However, I noticed that these properties are available in another separate trace (available under the "geomap" service in jaeger) whose span ids are completely unrelated to the subspans in the standard service:

Now what I expect is to have a trace and see all properties/events related to the geomap in subscopes within the standard scope. How do I get the expected result from here?

Workaround

The span context (containing the tracking id and span id as described in "service instrumentation & term") should be propagated from the parent span to the child span , so that they are part of the same trace.

Using opentelemetry, this is usually done automatically by instrumenting the code using plugins provided for various libraries, including grpc.
However, propagation doesn't seem to be working properly in your case.

In your code, you would start a new scope in the getstandard function and then use that context (newctx) when making the getcountry request . This is correct because the new context should contain the span context of the parent span (getstandard).
But the problem may be related to your createclient function:

func createclient(ctx context.context, svcaddr string) (*grpc.clientconn, error) {
    return grpc.dialcontext(ctx, svcaddr,
        grpc.withtransportcredentials(insecure.newcredentials()),
        grpc.withunaryinterceptor(otelgrpc.unaryclientinterceptor()),
    )
}

You are correctly using otelgrpc.unaryclientinterceptor here, which should ensure that the context is propagated correctly, but it is not clear when this function is called. If it is called before calling the getstandard function, the context used to create the client will not contain the span context from getstandard.

For testing, try to ensure that the client is created after calling the getstandard function, and that the same context is used throughout the request.

You can do this by passing newctx directly to the getcountry function, as shown in a modified version of the getstandard function:

func (s *standardservice) getstandard(ctx context.context, in *pb.getstandardrequest) (*pb.getstandardresponse, error) {
    newctx, span1 := otel.tracer(name).start(ctx, "getstandard")
    defer span1.end()

    conn, _:= createclient(newctx, geomapsvcaddr)
    defer conn.close()

    countryinfo, err := pb.newgeomapserviceclient(conn).getcountry(newctx,
        &pb.getcountryrequest{
            name: in.name,
        })

    //...

    return &pb.getstandardresponse{
        standard: standard,
    }, nil
}

The context used to create the client and make the getcountry request will now include the span context from getstandard and they should appear as part of the same trace in jaeger.

(As always, check for errors returned by functions such as createclient and getcountry, which are not shown here for brevity).

also:

  • Also check your propagator: make sure you use the same Context propagator a> In both services, the best is w3c tracecontextpropagator, which is opentelemetry in the default.

    You can set the propagator explicitly as follows:

    otel.settextmappropagator(propagation.tracecontext{})
    

    Add the above lines to both services at the beginning of the main function.

  • Make sure metadata is passed: The grpc interceptor should automatically inject/extract tracing context from the request's metadata, but double-check to make sure it's working properly.

    After starting the span in the getcountry function, you can log the tracking id and span id:

    ctx, span := otel.tracer(name).start(ctx, "getcountry")
    sc := trace.spancontextfromcontext(ctx)
    log.printf("trace id: %s, span id: %s", sc.traceid(), sc.spanid())
    defer span.end()
    

    并在 getstandard 函数中执行相同的操作:

    newCtx, span1 := otel.Tracer(name).Start(ctx, "GetStandard")
    sc := trace.SpanContextFromContext(newCtx)
    log.Printf("Trace ID: %s, Span ID: %s", sc.TraceID(), sc.SpanID())
    defer span1.End()
    

    如果上下文正确传播,两个服务中的跟踪 id 应该匹配。

The above is the detailed content of Unify the scope of different services using OpenTelemetry. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete