Melaksanakan Sistem Pemprosesan Pesanan: Pemantauan dan Pemberitahuan Bahagian-Golang-php.cn

Rumah

pembangunan bahagian belakang

Golang

Melaksanakan Sistem Pemprosesan Pesanan: Pemantauan dan Pemberitahuan Bahagian

王林

Sep 05, 2024 pm 10:41 PM

Implementing an Order Processing System: Part Monitoring and Alerting

1. Pengenalan dan Matlamat

Selamat datang ke ansuran keempat siri kami untuk melaksanakan sistem pemprosesan pesanan yang canggih! Dalam catatan kami sebelum ini, kami meletakkan asas untuk projek kami, meneroka aliran kerja Temporal termaju dan menyelidiki operasi pangkalan data lanjutan. Hari ini, kami menumpukan pada aspek yang sama penting dalam mana-mana sistem sedia pengeluaran: pemantauan dan amaran.

Rekap Catatan Sebelumnya

Dalam Bahagian 1, kami menyediakan struktur projek kami dan melaksanakan API CRUD asas.
Dalam Bahagian 2, kami mengembangkan penggunaan Temporal kami, melaksanakan aliran kerja yang kompleks dan meneroka konsep lanjutan.
Dalam Bahagian 3, kami menumpukan pada operasi pangkalan data lanjutan, termasuk pengoptimuman, sharding dan memastikan konsistensi dalam sistem teragih.

Kepentingan Pemantauan dan Makluman dalam Seni Bina Microservices

Dalam seni bina perkhidmatan mikro, terutamanya yang mengendalikan proses kompleks seperti pengurusan pesanan, pemantauan dan amaran yang berkesan adalah penting. Mereka membenarkan kami:

Fahami gelagat dan prestasi sistem kami dalam masa nyata
Kenal pasti dan diagnosis isu dengan cepat sebelum ia memberi kesan kepada pengguna
Buat keputusan dipacu data untuk penskalaan dan pengoptimuman
Pastikan kebolehpercayaan dan ketersediaan perkhidmatan kami

Gambaran Keseluruhan Prometheus dan Ekosistemnya

Prometheus ialah kit alat pemantauan dan amaran sistem sumber terbuka. Ia telah menjadi standard dalam dunia asal awan kerana ciri yang berkuasa dan ekosistem yang luas. Komponen utama termasuk:

Pelayan Prometheus : Mengikis dan menyimpan data siri masa
Perpustakaan Pelanggan : Benarkan instrumentasi mudah kod aplikasi
Alertmanager : Mengendalikan makluman daripada pelayan Prometheus
Pushgateway : Membenarkan kerja sementara dan kelompok untuk mendedahkan metrik
Pengeksport : Benarkan sistem pihak ketiga mendedahkan metrik kepada Prometheus

Kami juga akan menggunakan Grafana, platform sumber terbuka yang popular untuk pemantauan dan pemerhatian, untuk membuat papan pemuka dan menggambarkan data Prometheus kami.

Matlamat untuk Bahagian Siri ini

Menjelang akhir siaran ini, anda akan dapat:

Sediakan Prometheus untuk memantau sistem pemprosesan pesanan kami
Laksanakan metrik tersuai dalam perkhidmatan Go kami
Buat papan pemuka bermaklumat menggunakan Grafana
Sediakan peraturan peringatan untuk memberitahu kami tentang isu yang berpotensi
Pantau prestasi pangkalan data dan aliran kerja Temporal dengan berkesan

Jom selami!

2. Latar Belakang Teori dan Konsep

Sebelum kita mula melaksanakan, mari semak beberapa konsep utama yang akan menjadi penting untuk persediaan pemantauan dan amaran kita.

Kebolehlihatan dalam Sistem Teragih

Kebolehcerap merujuk kepada keupayaan untuk memahami keadaan dalaman sistem dengan memeriksa outputnya. Dalam sistem teragih seperti sistem pemprosesan pesanan kami, kebolehmerhatian biasanya merangkumi tiga tiang utama:

Metrik : Perwakilan berangka data yang diukur mengikut selang masa
Log : Rekod terperinci peristiwa diskret dalam sistem
Jejak : Perwakilan rantai sebab kejadian merentas komponen

Dalam siaran ini, kami akan memberi tumpuan terutamanya pada metrik, walaupun kami akan menyentuh tentang cara ini boleh disepadukan dengan log dan surih.

Seni Bina Prometheus

Prometheus mengikuti seni bina berasaskan tarik:

Pengumpulan Data : Prometheus mengikis metrik daripada kerja berinstrumen melalui HTTP
Storan Data : Metrik disimpan dalam pangkalan data siri masa pada storan setempat
Menyoal : PromQL membenarkan pertanyaan fleksibel bagi data ini
Memberitahu : Prometheus boleh mencetuskan makluman berdasarkan hasil pertanyaan
Visualisasi : Walaupun Prometheus mempunyai UI asas, ia sering digandingkan dengan Grafana untuk visualisasi yang lebih kaya

Jenis Metrik dalam Prometheus

Prometheus menawarkan empat jenis metrik teras:

Compteur : Une métrique cumulée qui ne fait qu'augmenter (par exemple, le nombre de demandes traitées)
Jauge : Une métrique qui peut augmenter et diminuer (par exemple, l'utilisation actuelle de la mémoire)
Histogramme : échantillonne les observations et les compte dans des compartiments configurables (par exemple, durées des demandes)
Résumé : Similaire à l'histogramme, mais calcule des quantiles configurables sur une fenêtre temporelle glissante

Introduction à PromQL

PromQL (Prometheus Query Language) est un langage fonctionnel puissant pour interroger les données Prometheus. Il vous permet de sélectionner et d'agréger des données de séries chronologiques en temps réel. Les principales fonctionnalités incluent :

Sélecteurs de vecteurs instantanés
Sélecteurs de vecteurs de plage
Modificateur de décalage
Opérateurs d'agrégation
Opérateurs binaires

Nous verrons des exemples de requêtes PromQL au fur et à mesure que nous créerons nos tableaux de bord et nos alertes.

Aperçu de Grafana

Grafana est une application Web d'analyse et de visualisation interactive multiplateforme open source. Il fournit des tableaux, des graphiques et des alertes pour le Web lorsqu'il est connecté à des sources de données prises en charge, dont Prometheus. Les principales fonctionnalités incluent :

Création de tableaux de bord flexibles
Large gamme d'options de visualisation
Capacités d'alerte
Authentification et autorisation des utilisateurs
Système de plugins pour l'extensibilité

Maintenant que nous avons abordé ces concepts, commençons à mettre en œuvre notre système de surveillance et d'alerte.

3. Configuration de Prometheus pour notre système de traitement des commandes

Commençons par configurer Prometheus pour surveiller notre système de traitement des commandes.

Installation et configuration de Prometheus

Tout d’abord, ajoutons Prometheus à notre fichier docker-compose.yml :

services:
  # ... other services ...

  prometheus:
    image: prom/prometheus:v2.30.3
    volumes:
      - ./prometheus:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    ports:
      - 9090:9090

volumes:
  # ... other volumes ...
  prometheus_data: {}

Ensuite, créez un fichier prometheus.yml dans le répertoire ./prometheus :

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'order_processing_api'
    static_configs:
      - targets: ['order_processing_api:8080']

  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres_exporter:9187']

Cette configuration indique à Prometheus d'extraire les métriques de lui-même, de notre API de traitement des commandes et d'un exportateur Postgres (que nous configurerons plus tard).

Implémentation des exportateurs Prometheus pour nos services Go

Pour exposer les métriques de nos services Go, nous utiliserons la bibliothèque client Prometheus. Tout d’abord, ajoutez-le à votre go.mod :

go get github.com/prometheus/client_golang

Maintenant, modifions notre fichier Go principal pour exposer les métriques :

package main

import (
    "net/http"

    "github.com/gin-gonic/gin"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    httpRequestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )

    httpRequestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "http_request_duration_seconds",
            Help: "Duration of HTTP requests in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )
)

func init() {
    prometheus.MustRegister(httpRequestsTotal)
    prometheus.MustRegister(httpRequestDuration)
}

func main() {
    r := gin.Default()

    // Middleware to record metrics
    r.Use(func(c *gin.Context) {
        timer := prometheus.NewTimer(httpRequestDuration.WithLabelValues(c.Request.Method, c.FullPath()))
        c.Next()
        timer.ObserveDuration()
        httpRequestsTotal.WithLabelValues(c.Request.Method, c.FullPath(), string(c.Writer.Status())).Inc()
    })

    // Expose metrics endpoint
    r.GET("/metrics", gin.WrapH(promhttp.Handler()))

    // ... rest of your routes ...

    r.Run(":8080")
}

Ce code configure deux métriques :

http_requests_total : Un compteur qui suit le nombre total de requêtes HTTP
http_request_duration_seconds : Un histogramme qui suit la durée des requêtes HTTP

Configuration de la découverte de services pour les environnements dynamiques

Pour les environnements plus dynamiques, Prometheus prend en charge divers mécanismes de découverte de services. Par exemple, si vous utilisez Kubernetes, vous pouvez utiliser la configuration SD Kubernetes :

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

Cette configuration découvrira et récupérera automatiquement les métriques des pods avec les annotations appropriées.

Configuration de la conservation et du stockage des données Prometheus

Prometheus stocke les données dans une base de données de séries chronologiques sur le système de fichiers local. Vous pouvez configurer le temps de rétention et la taille de stockage dans la configuration Prometheus :

global:
  scrape_interval: 15s
  evaluation_interval: 15s

storage:
  tsdb:
    retention.time: 15d
    retention.size: 50GB

# ... rest of the configuration ...

Cette configuration définit une période de conservation de 15 jours et une taille de stockage maximale de 50 Go.

Dans la section suivante, nous aborderons la définition et la mise en œuvre de métriques personnalisées pour notre système de traitement des commandes.

4. Définition et mise en œuvre de métriques personnalisées

Maintenant que Prometheus est configuré et que les métriques HTTP de base sont implémentées, définissons et implémentons des métriques personnalisées spécifiques à notre système de traitement des commandes.

Conception d'un schéma de métriques pour notre système de traitement des commandes

Lors de la conception de métriques, il est important de réfléchir aux informations que nous souhaitons tirer de notre système. Pour notre système de traitement des commandes, nous souhaiterons peut-être suivre :

Taux de création de commandes
Délai de traitement des commandes
Distribution du statut des commandes
Taux de réussite/échec du traitement des paiements
Opérations de mise à jour des stocks
Délai de préparation de l'expédition

Mettons en œuvre ces métriques :

package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    OrdersCreated = promauto.NewCounter(prometheus.CounterOpts{
        Name: "orders_created_total",
        Help: "The total number of created orders",
    })

    OrderProcessingTime = promauto.NewHistogram(prometheus.HistogramOpts{
        Name: "order_processing_seconds",
        Help: "Time taken to process an order",
        Buckets: prometheus.LinearBuckets(0, 30, 10), // 0-300 seconds, 30-second buckets
    })

    OrderStatusGauge = promauto.NewGaugeVec(prometheus.GaugeOpts{
        Name: "orders_by_status",
        Help: "Number of orders by status",
    }, []string{"status"})

    PaymentProcessed = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "payments_processed_total",
        Help: "The total number of processed payments",
    }, []string{"status"})

    InventoryUpdates = promauto.NewCounter(prometheus.CounterOpts{
        Name: "inventory_updates_total",
        Help: "The total number of inventory updates",
    })

    ShippingArrangementTime = promauto.NewHistogram(prometheus.HistogramOpts{
        Name: "shipping_arrangement_seconds",
        Help: "Time taken to arrange shipping",
        Buckets: prometheus.LinearBuckets(0, 60, 5), // 0-300 seconds, 60-second buckets
    })
)

Implémentation de métriques spécifiques aux applications dans nos services Go

Maintenant que nous avons défini nos métriques, implémentons-les dans notre service :

package main

import (
    "time"

    "github.com/yourusername/order-processing-system/metrics"
)

func createOrder(order Order) error {
    startTime := time.Now()

    // Order creation logic...

    metrics.OrdersCreated.Inc()
    metrics.OrderProcessingTime.Observe(time.Since(startTime).Seconds())
    metrics.OrderStatusGauge.WithLabelValues("pending").Inc()

    return nil
}

func processPayment(payment Payment) error {
    // Payment processing logic...

    if paymentSuccessful {
        metrics.PaymentProcessed.WithLabelValues("success").Inc()
    } else {
        metrics.PaymentProcessed.WithLabelValues("failure").Inc()
    }

    return nil
}

func updateInventory(item Item) error {
    // Inventory update logic...

    metrics.InventoryUpdates.Inc()

    return nil
}

func arrangeShipping(order Order) error {
    startTime := time.Now()

    // Shipping arrangement logic...

    metrics.ShippingArrangementTime.Observe(time.Since(startTime).Seconds())

    return nil
}

Meilleures pratiques pour les métriques de dénomination et d’étiquetage

Lorsque vous nommez et étiquetez les métriques, tenez compte de ces bonnes pratiques :

Use a consistent naming scheme (e.g., __)
Use clear, descriptive names
Include units in the metric name (e.g., _seconds, _bytes)
Use labels to differentiate instances of a metric, but be cautious of high cardinality
Keep the number of labels manageable

Instrumenting Key Components: API Endpoints, Database Operations, Temporal Workflows

For API endpoints, we’ve already implemented basic instrumentation. For database operations, we can add metrics like this:

func (s *Store) GetOrder(ctx context.Context, id int64) (Order, error) {
    startTime := time.Now()
    defer func() {
        metrics.DBOperationDuration.WithLabelValues("GetOrder").Observe(time.Since(startTime).Seconds())
    }()

    // Existing GetOrder logic...
}

For Temporal workflows, we can add metrics in our activity implementations:

func ProcessOrderActivity(ctx context.Context, order Order) error {
    startTime := time.Now()
    defer func() {
        metrics.WorkflowActivityDuration.WithLabelValues("ProcessOrder").Observe(time.Since(startTime).Seconds())
    }()

    // Existing ProcessOrder logic...
}

5. Creating Dashboards with Grafana

Now that we have our metrics set up, let’s visualize them using Grafana.

Installing and Configuring Grafana

First, let’s add Grafana to our docker-compose.yml:

services:
  # ... other services ...

  grafana:
    image: grafana/grafana:8.2.2
    ports:
      - 3000:3000
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  # ... other volumes ...
  grafana_data: {}

Connecting Grafana to Our Prometheus Data Source

Access Grafana at http://localhost:3000 (default credentials are admin/admin)
Go to Configuration > Data Sources
Click “Add data source” and select Prometheus
Set the URL to http://prometheus:9090 (this is the Docker service name)
Click “Save & Test”

Designing Effective Dashboards for Our Order Processing System

Let’s create a dashboard for our order processing system:

Click “Create” > “Dashboard”
Add a new panel

For our first panel, let’s create a graph of order creation rate:

In the query editor, enter: rate(orders_created_total[5m])
Set the panel title to “Order Creation Rate”
Under Settings, set the unit to “orders/second”

Let’s add another panel for order processing time:

Add a new panel
Query: histogram_quantile(0.95, rate(order_processing_seconds_bucket[5m]))
Title: “95th Percentile Order Processing Time”
Unit: “seconds”

For order status distribution:

Add a new panel
Query: orders_by_status
Visualization: Pie Chart
Title: “Order Status Distribution”

Continue adding panels for other metrics we’ve defined.

Implementing Variable Templating for Flexible Dashboards

Grafana allows us to create variables that can be used across the dashboard. Let’s create a variable for time range:

Go to Dashboard Settings > Variables
Click “Add variable”
Name: time_range
Type: Interval
Values: 5m,15m,30m,1h,6h,12h,24h,7d

Now we can use this in our queries like this: rate(orders_created_total[$time_range])

Best Practices for Dashboard Design and Organization

Group related panels together
Use consistent color schemes
Include a description for each panel
Use appropriate visualizations for each metric type
Consider creating separate dashboards for different aspects of the system (e.g., Orders, Inventory, Shipping)

In the next section, we’ll set up alerting rules to notify us of potential issues in our system.

6. Implementing Alerting Rules

Now that we have our metrics and dashboards set up, let’s implement alerting to proactively notify us of potential issues in our system.

Designing an Alerting Strategy for Our System

When designing alerts, consider the following principles:

Alert on symptoms, not causes
Ensure alerts are actionable
Avoid alert fatigue by only alerting on critical issues
Use different severity levels for different types of issues

For our order processing system, we might want to alert on:

High error rate in order processing
Slow order processing time
Unusual spike or drop in order creation rate
Low inventory levels
High rate of payment failures

Implementing Prometheus Alerting Rules

Let’s create an alerts.yml file in our Prometheus configuration directory:

groups:
- name: order_processing_alerts
  rules:
  - alert: HighOrderProcessingErrorRate
    expr: rate(order_processing_errors_total[5m]) / rate(orders_created_total[5m]) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: High order processing error rate
      description: "Error rate is over the last 5 minutes"

  - alert: SlowOrderProcessing
    expr: histogram_quantile(0.95, rate(order_processing_seconds_bucket[5m])) > 300
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: Slow order processing
      description: "95th percentile of order processing time is over the last 5 minutes"

  - alert: UnusualOrderRate
    expr: abs(rate(orders_created_total[1h]) - rate(orders_created_total[1h] offset 1d)) > (rate(orders_created_total[1h] offset 1d) * 0.3)
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: Unusual order creation rate
      description: "Order creation rate has changed by more than 30% compared to the same time yesterday"

  - alert: LowInventory
    expr: inventory_level  0.1
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: High payment failure rate
      description: "Payment failure rate is over the last 15 minutes"

Update your prometheus.yml to include this alerts file:

rule_files:
  - "alerts.yml"

Setting Up Alertmanager for Alert Routing and Grouping

Now, let’s set up Alertmanager to handle our alerts. Add Alertmanager to your docker-compose.yml:

services:
  # ... other services ...

  alertmanager:
    image: prom/alertmanager:v0.23.0
    ports:
      - 9093:9093
    volumes:
      - ./alertmanager:/etc/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'

Create an alertmanager.yml in the ./alertmanager directory:

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email-notifications'

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'team@example.com'
    from: 'alertmanager@example.com'
    smarthost: 'smtp.example.com:587'
    auth_username: 'alertmanager@example.com'
    auth_identity: 'alertmanager@example.com'
    auth_password: 'password'

Update your prometheus.yml to point to Alertmanager:

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

Configuring Notification Channels

In the Alertmanager configuration above, we’ve set up email notifications. You can also configure other channels like Slack, PagerDuty, or custom webhooks.

Implementing Alert Severity Levels and Escalation Policies

In our alerts, we’ve used severity labels. We can use these in Alertmanager to implement different routing or notification strategies based on severity:

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email-notifications'
  routes:
  - match:
      severity: critical
    receiver: 'pagerduty-critical'
  - match:
      severity: warning
    receiver: 'slack-warnings'

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'team@example.com'
- name: 'pagerduty-critical'
  pagerduty_configs:
  - service_key: '<your-pagerduty-service-key>'
- name: 'slack-warnings'
  slack_configs:
  - api_url: '<your-slack-webhook-url>'
    channel: '#alerts'

</your-slack-webhook-url></your-pagerduty-service-key>

7. Monitoring Database Performance

Monitoring database performance is crucial for maintaining a responsive and reliable system. Let’s set up monitoring for our PostgreSQL database.

Implementing the Postgres Exporter for Prometheus

First, add the Postgres exporter to your docker-compose.yml:

services:
  # ... other services ...

  postgres_exporter:
    image: wrouesnel/postgres_exporter:latest
    environment:
      DATA_SOURCE_NAME: "postgresql://user:password@postgres:5432/dbname?sslmode=disable"
    ports:
      - 9187:9187

Make sure to replace user, password, and dbname with your actual PostgreSQL credentials.

Key Metrics to Monitor for Postgres Performance

Some important PostgreSQL metrics to monitor include:

Number of active connections
Database size
Query execution time
Cache hit ratio
Replication lag (if using replication)
Transaction rate
Tuple operations (inserts, updates, deletes)

Creating a Database Performance Dashboard in Grafana

Let’s create a new dashboard for database performance:

Create a new dashboard in Grafana
Add a panel for active connections:
- Query: pg_stat_activity_count{datname="your_database_name"}
- Title: “Active Connections”
Add a panel for database size:
- Query: pg_database_size_bytes{datname="your_database_name"}
- Title: “Database Size”
- Unit: bytes(IEC)
Add a panel for query execution time:
- Query: rate(pg_stat_database_xact_commit{datname="your_database_name"}[5m]) + rate(pg_stat_database_xact_rollback{datname="your_database_name"}[5m])
- Title: “Transactions per Second”
Add a panel for cache hit ratio:
- Query: pg_stat_database_blks_hit{datname="your_database_name"} / (pg_stat_database_blks_hit{datname="your_database_name"} + pg_stat_database_blks_read{datname="your_database_name"})
- Title: “Cache Hit Ratio”

Setting Up Alerts for Database Issues

Let’s add some database-specific alerts to our alerts.yml:

  - alert: HighDatabaseConnections
    expr: pg_stat_activity_count > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: High number of database connections
      description: "There are active database connections"

  - alert: LowCacheHitRatio
    expr: pg_stat_database_blks_hit / (pg_stat_database_blks_hit + pg_stat_database_blks_read) 



<h2>
  
  
  8. Monitoring Temporal Workflows
</h2>

<p>Monitoring Temporal workflows is essential for ensuring the reliability and performance of our order processing system.</p>

<h3>
  
  
  Implementing Temporal Metrics in Our Go Services
</h3>

<p>Temporal provides a metrics client that we can use to expose metrics to Prometheus. Let’s update our Temporal worker to include metrics:<br>
</p>

<pre class="brush:php;toolbar:false">import (
    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/worker"
    "go.temporal.io/sdk/contrib/prometheus"
)

func main() {
    // ... other setup ...

    // Create Prometheus metrics handler
    metricsHandler := prometheus.NewPrometheusMetricsHandler()

    // Create Temporal client with metrics
    c, err := client.NewClient(client.Options{
        MetricsHandler: metricsHandler,
    })
    if err != nil {
        log.Fatalln("Unable to create Temporal client", err)
    }
    defer c.Close()

    // Create worker with metrics
    w := worker.New(c, "order-processing-task-queue", worker.Options{
        MetricsHandler: metricsHandler,
    })

    // ... register workflows and activities ...

    // Run the worker
    err = w.Run(worker.InterruptCh())
    if err != nil {
        log.Fatalln("Unable to start worker", err)
    }
}

Key Metrics to Monitor for Temporal Workflows

Important Temporal metrics to monitor include:

Workflow start rate
Workflow completion rate
Workflow execution time
Activity success/failure rate
Activity execution time
Task queue latency

Creating a Temporal Workflow Dashboard in Grafana

Let’s create a dashboard for Temporal workflows:

Create a new dashboard in Grafana
Add a panel for workflow start rate:
- Query: rate(temporal_workflow_start_total[5m])
- Title: “Workflow Start Rate”
Add a panel for workflow completion rate:
- Query: rate(temporal_workflow_completed_total[5m])
- Title: “Workflow Completion Rate”
Add a panel for workflow execution time:
- Query: histogram_quantile(0.95, rate(temporal_workflow_execution_time_bucket[5m]))
- Title: “95th Percentile Workflow Execution Time”
- Unit: seconds
Add a panel for activity success rate:
- Query: rate(temporal_activity_success_total[5m]) / (rate(temporal_activity_success_total[5m]) + rate(temporal_activity_fail_total[5m]))
- Title: “Activity Success Rate”

Setting Up Alerts for Workflow Issues

Let’s add some Temporal-specific alerts to our alerts.yml:

  - alert: HighWorkflowFailureRate
    expr: rate(temporal_workflow_failed_total[15m]) / rate(temporal_workflow_completed_total[15m]) > 0.05
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: High workflow failure rate
      description: "Workflow failure rate is over the last 15 minutes"

  - alert: LongRunningWorkflow
    expr: histogram_quantile(0.95, rate(temporal_workflow_execution_time_bucket[1h])) > 3600
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: Long-running workflows detected
      description: "95th percentile of workflow execution time is over 1 hour"

These alerts will help you detect issues with your Temporal workflows, such as high failure rates or unexpectedly long-running workflows.

In the next sections, we’ll cover some advanced Prometheus techniques and discuss testing and validation of our monitoring setup.

9. Advanced Prometheus Techniques

As our monitoring system grows more complex, we can leverage some advanced Prometheus techniques to improve its efficiency and capabilities.

Using Recording Rules for Complex Queries and Aggregations

Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. This can significantly speed up the evaluation of dashboards and alerts.

Let’s add some recording rules to our Prometheus configuration. Create a rules.yml file:

groups:
- name: example_recording_rules
  interval: 5m
  rules:
  - record: job:order_processing_rate:5m
    expr: rate(orders_created_total[5m])

  - record: job:order_processing_error_rate:5m
    expr: rate(order_processing_errors_total[5m]) / rate(orders_created_total[5m])

  - record: job:payment_success_rate:5m
    expr: rate(payments_processed_total{status="success"}[5m]) / rate(payments_processed_total[5m])

Add this file to your Prometheus configuration:

rule_files:
  - "alerts.yml"
  - "rules.yml"

Now you can use these precomputed metrics in your dashboards and alerts, which can be especially helpful for complex queries that you use frequently.

Implementing Push Gateway for Batch Jobs and Short-Lived Processes

The Pushgateway allows you to push metrics from jobs that can’t be scraped, such as batch jobs or serverless functions. Let’s add a Pushgateway to our docker-compose.yml:

services:
  # ... other services ...

  pushgateway:
    image: prom/pushgateway
    ports:
      - 9091:9091

Now, you can push metrics to the Pushgateway from your batch jobs or short-lived processes. Here’s an example using the Go client:

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/push"
)

func runBatchJob() {
    // Define a counter for the batch job
    batchJobCounter := prometheus.NewCounter(prometheus.CounterOpts{
        Name: "batch_job_processed_total",
        Help: "Total number of items processed by the batch job",
    })

    // Run your batch job and update the counter
    // ...

    // Push the metric to the Pushgateway
    pusher := push.New("http://pushgateway:9091", "batch_job")
    pusher.Collector(batchJobCounter)
    if err := pusher.Push(); err != nil {
        log.Printf("Could not push to Pushgateway: %v", err)
    }
}

Don’t forget to add the Pushgateway as a target in your Prometheus configuration:

scrape_configs:
  # ... other configs ...

  - job_name: 'pushgateway'
    static_configs:
      - targets: ['pushgateway:9091']

Federated Prometheus Setups for Large-Scale Systems

For large-scale systems, you might need to set up Prometheus federation, where one Prometheus server scrapes data from other Prometheus servers. This allows you to aggregate metrics from multiple Prometheus instances.

Here’s an example configuration for a federated Prometheus setup:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="order_processing_api"}'
        - '{job="postgres_exporter"}'
    static_configs:
      - targets:
        - 'prometheus-1:9090'
        - 'prometheus-2:9090'

This configuration allows a higher-level Prometheus server to scrape specific metrics from other Prometheus servers.

Using Exemplars for Tracing Integration

Exemplars allow you to link metrics to trace data, providing a way to drill down from a high-level metric to a specific trace. This is particularly useful when integrating Prometheus with distributed tracing systems like Jaeger or Zipkin.

To use exemplars, you need to enable them in your Prometheus configuration:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  exemplar_storage:
    enable: true

Then, when instrumenting your code, you can add exemplars to your metrics:

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    orderProcessingDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "order_processing_duration_seconds",
            Help: "Duration of order processing in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"status"},
    )
)

func processOrder(order Order) {
    start := time.Now()
    // Process the order...
    duration := time.Since(start)

    orderProcessingDuration.WithLabelValues(order.Status).Observe(duration.Seconds(),
        prometheus.Labels{
            "traceID": getCurrentTraceID(),
        },
    )
}

This allows you to link from a spike in order processing duration directly to the trace of a slow order, greatly aiding in debugging and performance analysis.

10. Testing and Validation

Ensuring the reliability of your monitoring system is crucial. Let’s explore some strategies for testing and validating our Prometheus setup.

Unit Testing Metric Instrumentation

When unit testing your Go code, you can use the prometheus/testutil package to verify that your metrics are being updated correctly:

import (
    "testing"

    "github.com/prometheus/client_golang/prometheus/testutil"
)

func TestOrderProcessing(t *testing.T) {
    // Process an order
    processOrder(Order{ID: 1, Status: "completed"})

    // Check if the metric was updated
    expected := `
        # HELP order_processing_duration_seconds Duration of order processing in seconds
        # TYPE order_processing_duration_seconds histogram
        order_processing_duration_seconds_bucket{status="completed",le="0.005"} 1
        order_processing_duration_seconds_bucket{status="completed",le="0.01"} 1
        # ... other buckets ...
        order_processing_duration_seconds_sum{status="completed"} 0.001
        order_processing_duration_seconds_count{status="completed"} 1
    `
    if err := testutil.CollectAndCompare(orderProcessingDuration, strings.NewReader(expected)); err != nil {
        t.Errorf("unexpected collecting result:\n%s", err)
    }
}

Integration Testing for Prometheus Scraping

To test that Prometheus is correctly scraping your metrics, you can set up an integration test that starts your application, waits for Prometheus to scrape it, and then queries Prometheus to verify the metrics:

func TestPrometheusIntegration(t *testing.T) {
    // Start your application
    go startApp()

    // Wait for Prometheus to scrape (adjust the sleep time as needed)
    time.Sleep(30 * time.Second)

    // Query Prometheus
    client, err := api.NewClient(api.Config{
        Address: "http://localhost:9090",
    })
    if err != nil {
        t.Fatalf("Error creating client: %v", err)
    }

    v1api := v1.NewAPI(client)
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    result, warnings, err := v1api.Query(ctx, "order_processing_duration_seconds_count", time.Now())
    if err != nil {
        t.Fatalf("Error querying Prometheus: %v", err)
    }
    if len(warnings) > 0 {
        t.Logf("Warnings: %v", warnings)
    }

    // Check the result
    if result.(model.Vector).Len() == 0 {
        t.Errorf("Expected non-empty result")
    }
}

Load Testing and Observing Metrics Under Stress

It’s important to verify that your monitoring system performs well under load. You can use tools like hey or vegeta to generate load on your system while observing your metrics:

hey -n 10000 -c 100 http://localhost:8080/orders

While the load test is running, observe your Grafana dashboards and check that your metrics are updating as expected and that Prometheus is able to keep up with the increased load.

Validating Alerting Rules and Notification Channels

To test your alerting rules, you can temporarily adjust the thresholds to trigger alerts, or use Prometheus’s API to manually fire alerts:

curl -H "Content-Type: application/json" -d '{
  "alerts": [
    {
      "labels": {
        "alertname": "HighOrderProcessingErrorRate",
        "severity": "critical"
      },
      "annotations": {
        "summary": "High order processing error rate"
      }
    }
  ]
}' http://localhost:9093/api/v1/alerts

This will send a test alert to your Alertmanager, allowing you to verify that your notification channels are working correctly.

11. Challenges and Considerations

As you implement and scale your monitoring system, keep these challenges and considerations in mind:

Managing Cardinality in High-Dimensional Data

High cardinality can lead to performance issues in Prometheus. Be cautious when adding labels to metrics, especially labels with many possible values (like user IDs or IP addresses). Instead, consider using histogram metrics or reducing the cardinality by grouping similar values.

Scaling Prometheus for Large-Scale Systems

For large-scale systems, consider:

Using the Pushgateway for batch jobs
Implementing federation for large-scale setups
Using remote storage solutions for long-term storage of metrics

Ensuring Monitoring System Reliability and Availability

Your monitoring system is critical infrastructure. Consider:

Implémentation de la haute disponibilité pour Prometheus et Alertmanager
Surveillance de votre système de surveillance (méta-surveillance)
Sauvegarder régulièrement vos données Prometheus

Considérations de sécurité pour les métriques et les alertes

Assurez-vous que :

L'accès à Prometheus et Grafana est correctement sécurisé
Les informations sensibles ne sont pas exposées dans les métriques ou les alertes
TLS est utilisé pour toutes les communications dans votre pile de surveillance

Gérer les problèmes transitoires et les alertes de battement

Pour réduire le bruit d'alerte :

Utilisez des plages horaires appropriées dans vos règles d'alerte
Implémenter le regroupement d'alertes dans Alertmanager
Envisagez d'utiliser l'inhibition des alertes pour les alertes associées

12. Prochaines étapes et aperçu de la partie 5

Dans cet article, nous avons couvert la surveillance et les alertes complètes de notre système de traitement des commandes à l'aide de Prometheus et Grafana. Nous avons mis en place des métriques personnalisées, créé des tableaux de bord informatifs, mis en œuvre des alertes et exploré des techniques et considérations avancées.

Dans la prochaine partie de notre série, nous nous concentrerons sur le traçage et la journalisation distribués. Nous couvrirons :

Implémentation du traçage distribué avec OpenTelemetry
Mise en place d'une journalisation centralisée avec la pile ELK
Corrélation des journaux, des traces et des métriques pour un débogage efficace
Mise en œuvre de l'agrégation et de l'analyse des journaux
Bonnes pratiques pour se connecter dans une architecture de microservices

Restez à l'écoute alors que nous continuons à améliorer notre système de traitement des commandes, en nous concentrant ensuite sur l'obtention d'informations plus approfondies sur le comportement et les performances de notre système distribué !

Besoin d'aide ?

Êtes-vous confronté à des problèmes difficiles ou avez-vous besoin d'un point de vue externe sur une nouvelle idée ou un nouveau projet ? Je peux aider ! Que vous cherchiez à établir une preuve de concept technologique avant de réaliser un investissement plus important ou que vous ayez besoin de conseils sur des problèmes difficiles, je suis là pour vous aider.

Services offerts :

Résolution de problèmes : S'attaquer à des problèmes complexes avec des solutions innovantes.
Consultation : Apporter des conseils d'experts et des points de vue neufs sur vos projets.
Preuve de concept : Développer des modèles préliminaires pour tester et valider vos idées.

Si vous souhaitez travailler avec moi, veuillez nous contacter par e-mail à hungaikevin@gmail.com.

Transformons vos défis en opportunités !

Atas ialah kandungan terperinci Melaksanakan Sistem Pemprosesan Pesanan: Pemantauan dan Pemberitahuan Bahagian. Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!

Kenyataan

Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn

Artikel Berkaitan

Cara menggunakan pakej 'Strings' untuk memanipulasi rentetan dalam langkah demi langkahMay 13, 2025 am 12:12 AM

Pakej Strings Go menyediakan pelbagai fungsi manipulasi rentetan. 1) Gunakan strings.Contains untuk memeriksa substrings. 2) Gunakan string.split untuk memecah rentetan ke dalam kepingan substring. 3) menggabungkan rentetan melalui string.join. 4) Gunakan rentetan.trimspace atau string.Trim untuk mengeluarkan kosong atau aksara yang ditentukan pada awal dan akhir rentetan. 5) Gantikan semua substring yang ditentukan dengan string.replaceall. 6) Gunakan string.hasprefix atau strings.hassuffix untuk memeriksa awalan atau akhiran rentetan.

Pakej Pergi Strings: Bagaimana Meningkatkan Kod Saya?May 13, 2025 am 12:10 AM

Menggunakan pakej GO Language boleh meningkatkan kualiti kod. 1) Gunakan string.join () untuk menyambungkan array rentetan dengan elegan untuk mengelakkan overhead prestasi. 2) menggabungkan rentetan.split () dan strings.contains () untuk memproses teks dan perhatikan masalah kepekaan kes. 3) Elakkan penyalahgunaan strings.replace () dan pertimbangkan untuk menggunakan ungkapan biasa untuk sebilangan besar penggantian. 4) Gunakan string.builder untuk meningkatkan prestasi rentetan splicing yang kerap.

Apakah fungsi yang paling berguna dalam pakej Go Bytes?May 13, 2025 am 12:09 AM

Pakej Bytes Go menyediakan pelbagai fungsi praktikal untuk mengendalikan pengirik byte. 1.Bytes.Contains digunakan untuk memeriksa sama ada slice byte mengandungi urutan tertentu. 2.Bytes.split digunakan untuk memecah irisan byte ke dalam pecahan. 3.Bytes.join digunakan untuk menggabungkan pelbagai hirisan byte ke dalam satu. 4.Bytes.Trimspace digunakan untuk mengeluarkan kekosongan depan dan belakang irisan byte. 5.Bytes.Equal digunakan untuk membandingkan sama ada dua keping byte adalah sama. 6.Bytes.Index digunakan untuk mencari indeks permulaan sub-lisan dalam largerlices.

Menguasai pengendalian data binari dengan pakej 'pengekodan/binari' Go: panduan komprehensifMay 13, 2025 am 12:07 AM

Theencoding/binarypackageingoisessentialbecauseitprovideSastandardardwaywaytoreadandwriteBinaryData, memastikanCross-platformcompatibilityandhandlingdifferentendianness.itoffersFunctionsLikeread, tulis, readuupisyary

Pergi 'bytes' Pakej Rujukan cepatMay 13, 2025 am 12:03 AM

TheBytespackageingoiscialforhandlingbyteslicesandbuffers, menawarkanToolsforefficientMemoryManagementandDataManipulation.1) itprovidesfunctionalitiesLikeCreatingBuffers, ComparingsLices, dan menggantikan/menggantikan

Menguasai GO Strings: menyelam mendalam ke dalam pakej 'rentetan'May 12, 2025 am 12:05 AM

Anda harus mengambil berat tentang pakej "Strings" di GO kerana ia menyediakan alat untuk mengendalikan data teks, splicing dari rentetan asas hingga pemadanan ekspresi biasa yang maju. 1) Pakej "Strings" menyediakan operasi rentetan yang cekap, seperti fungsi gabungan yang digunakan untuk meresap rentetan untuk mengelakkan masalah prestasi. 2) Ia mengandungi fungsi lanjutan, seperti fungsi ContainSany, untuk memeriksa sama ada rentetan mengandungi set aksara tertentu. 3) Fungsi ganti digunakan untuk menggantikan substrings dalam rentetan, dan perhatian harus dibayar kepada perintah penggantian dan kepekaan kes. 4) Fungsi perpecahan boleh memecah rentetan mengikut pemisah dan sering digunakan untuk pemprosesan ekspresi biasa. 5) Prestasi perlu dipertimbangkan semasa menggunakan, seperti

Pakej 'Pengekodan/PerduaMay 12, 2025 am 12:03 AM

"Pengekodan/binari" PacketingoisessentialforhandlingbinaryData, menawarkanToolSforreadingandWritingBinaryDatafiently.1) itsupportsbothlittle-endianandbig-endianbyteorders, crucialforcross-sistempatibility.2) thePackAgeAlAgeAllowSworksworks

Pergi Byte Slice Manipulasi Tutorial: Menguasai Pakej 'Bytes'May 12, 2025 am 12:02 AM

Menguasai pakej bait di GO boleh membantu meningkatkan kecekapan dan keanggunan kod anda. 1) Pakej bait adalah penting untuk parsing data binari, memproses protokol rangkaian, dan pengurusan memori. 2) Gunakan bytes.buffer untuk secara beransur -ansur membina irisan byte. 3) Pakej bait menyediakan fungsi mencari, menggantikan dan segmen kepingan byte. 4) Jenis bait. 5) Pakej Bytes berfungsi dengan kerjasama pemungut sampah Go, meningkatkan kecekapan pemprosesan data besar.

See all articles

Alat AI Hot

Undresser.AI Undress

Apl berkuasa AI untuk mencipta foto bogel yang realistik

AI Clothes Remover

Alat AI dalam talian untuk mengeluarkan pakaian daripada foto.

Undress AI Tool

Gambar buka pakaian secara percuma

Clothoff.io

Penyingkiran pakaian AI

Video Face Swap

Tukar muka dalam mana-mana video dengan mudah menggunakan alat tukar muka AI percuma kami!

Tunjukkan Lagi

Artikel Panas

<🎜>: Tumbuh Taman - Panduan Mutasi Lengkap

3 minggu yang laluByDDD

<🎜>: Bubble Gum Simulator Infinity - Cara Mendapatkan dan Menggunakan Kekunci Diraja

3 minggu yang laluBy尊渡假赌尊渡假赌尊渡假赌

Bagaimana untuk memperbaiki KB5055612 gagal dipasang di Windows 10?

3 minggu yang laluByDDD

Nordhold: Sistem Fusion, dijelaskan

3 minggu yang laluBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers of the Witch Tree - Cara Membuka Kunci Cangkuk Bergelut

3 minggu yang laluBy尊渡假赌尊渡假赌尊渡假赌

Tunjukkan Lagi

Alat panas

SecLists

SecLists ialah rakan penguji keselamatan muktamad. Ia ialah koleksi pelbagai jenis senarai yang kerap digunakan semasa penilaian keselamatan, semuanya di satu tempat. SecLists membantu menjadikan ujian keselamatan lebih cekap dan produktif dengan menyediakan semua senarai yang mungkin diperlukan oleh penguji keselamatan dengan mudah. Jenis senarai termasuk nama pengguna, kata laluan, URL, muatan kabur, corak data sensitif, cangkerang web dan banyak lagi. Penguji hanya boleh menarik repositori ini ke mesin ujian baharu dan dia akan mempunyai akses kepada setiap jenis senarai yang dia perlukan.

DVWA

Damn Vulnerable Web App (DVWA) ialah aplikasi web PHP/MySQL yang sangat terdedah. Matlamat utamanya adalah untuk menjadi bantuan bagi profesional keselamatan untuk menguji kemahiran dan alatan mereka dalam persekitaran undang-undang, untuk membantu pembangun web lebih memahami proses mengamankan aplikasi web, dan untuk membantu guru/pelajar mengajar/belajar dalam persekitaran bilik darjah Aplikasi web keselamatan. Matlamat DVWA adalah untuk mempraktikkan beberapa kelemahan web yang paling biasa melalui antara muka yang mudah dan mudah, dengan pelbagai tahap kesukaran. Sila ambil perhatian bahawa perisian ini

Topik panas

1666

1425

1328

1273

1253

Melaksanakan Sistem Pemprosesan Pesanan: Pemantauan dan Pemberitahuan Bahagian

1. Pengenalan dan Matlamat

Rekap Catatan Sebelumnya

Kepentingan Pemantauan dan Makluman dalam Seni Bina Microservices

Gambaran Keseluruhan Prometheus dan Ekosistemnya

Matlamat untuk Bahagian Siri ini

2. Latar Belakang Teori dan Konsep

Kebolehlihatan dalam Sistem Teragih

Seni Bina Prometheus

Jenis Metrik dalam Prometheus

Introduction à PromQL

Aperçu de Grafana

3. Configuration de Prometheus pour notre système de traitement des commandes

Installation et configuration de Prometheus

Implémentation des exportateurs Prometheus pour nos services Go

Configuration de la découverte de services pour les environnements dynamiques

Configuration de la conservation et du stockage des données Prometheus

4. Définition et mise en œuvre de métriques personnalisées

Conception d'un schéma de métriques pour notre système de traitement des commandes

Implémentation de métriques spécifiques aux applications dans nos services Go

Meilleures pratiques pour les métriques de dénomination et d’étiquetage

Instrumenting Key Components: API Endpoints, Database Operations, Temporal Workflows

5. Creating Dashboards with Grafana

Installing and Configuring Grafana

Connecting Grafana to Our Prometheus Data Source

Designing Effective Dashboards for Our Order Processing System

Implementing Variable Templating for Flexible Dashboards

Best Practices for Dashboard Design and Organization

6. Implementing Alerting Rules

Designing an Alerting Strategy for Our System

Implementing Prometheus Alerting Rules

Setting Up Alertmanager for Alert Routing and Grouping

Configuring Notification Channels

Implementing Alert Severity Levels and Escalation Policies

7. Monitoring Database Performance

Implementing the Postgres Exporter for Prometheus

Key Metrics to Monitor for Postgres Performance

Creating a Database Performance Dashboard in Grafana

Setting Up Alerts for Database Issues

Key Metrics to Monitor for Temporal Workflows

Creating a Temporal Workflow Dashboard in Grafana

Setting Up Alerts for Workflow Issues

9. Advanced Prometheus Techniques

Using Recording Rules for Complex Queries and Aggregations

Implementing Push Gateway for Batch Jobs and Short-Lived Processes

Federated Prometheus Setups for Large-Scale Systems

Using Exemplars for Tracing Integration

10. Testing and Validation

Unit Testing Metric Instrumentation

Integration Testing for Prometheus Scraping

Load Testing and Observing Metrics Under Stress

Validating Alerting Rules and Notification Channels

11. Challenges and Considerations

Managing Cardinality in High-Dimensional Data

Scaling Prometheus for Large-Scale Systems

Ensuring Monitoring System Reliability and Availability

Considérations de sécurité pour les métriques et les alertes

Gérer les problèmes transitoires et les alertes de battement

12. Prochaines étapes et aperçu de la partie 5

Besoin d'aide ?

Services offerts :

Alat AI Hot

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

Artikel Panas

Alat panas

SecLists

DVWA

Dreamweaver CS6

Muat turun versi mac editor Atom

Dreamweaver Mac版

Topik panas