Home >Backend Development >Golang >Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

DDD
DDDOriginal
2024-12-31 14:23:11486browse

⚠️ How to go about this series?

1. Run Every Example: Don't just read the code. Type it out, run it, and observe the behavior.
2. Experiment and Break Things: Remove sleeps and see what happens, change channel buffer sizes, modify goroutine counts.
Breaking things teaches you how they work
3. Reason About Behavior: Before running modified code, try predicting the outcome. When you see unexpected behavior, pause and think why. Challenge the explanations.
4. Build Mental Models: Each visualization represents a concept. Try drawing your own diagrams for modified code.

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

In our previous post, we explored the Generator concurrency pattern, the building blocks of Go's other concurrency patterns. You can give it a read here:

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Generator Concurrency Pattern in Go: A Visual Guide

Souvik Kar Mahapatra ・ Dec 25

#go #tutorial #programming #learning

Now, let's look at how these primitives combine to form powerful patterns that solve real-world problems.

In this post we'll cover Pipeline Pattern and will try to visualize them. So let's gear up as we'll be hands on through out the process.

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Pipeline Pattern

A pipeline is like an assembly line in a factory, where each stage performs a specific task on the data and passes the result to the next stage.

We build pipelines by connecting goroutines with channels, where each goroutine represents a stage that receives data, processes it, and sends it to the next stage.

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Let's implement a simple pipeline that:

  1. Generates numbers
  2. Squares them
  3. Prints the results
// Stage 1: Generate numbers
func generate(nums ...int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for _, n := range nums {
            out <- n
        }
    }()
    return out
}

// Stage 2: Square numbers
func square(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for n := range in {
            out <- n * n
        }
    }()
    return out
}

// Stage 3: Print numbers
func print(in <-chan int) {
    for n := range in {
        fmt.Printf("%d ", n)
    }
    fmt.Println()
}

func main() {
    // Connect the pipeline
    numbers := generate(2, 3, 4)    // Stage 1
    squares := square(numbers)       // Stage 2
    print(squares)                   // Stage 3
}

✏️ Quick byte

<-chan int denotes a receive-only channel.
A channel of type <-chan int can only be used to receive values, not to send them. This is useful to enforce stricter communication patterns and prevent accidental writes to the channel by the receiver.

chan int This denotes a bidirectional channel.
A channel of type chan int can be used to both send and receive values.

Let's go ahead and visualize the above example:

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Here you can see each the building blocks of the pipeline are goroutines following generator pattern. Implies that as soon as the data is ready at any step the next step in the pipeline can start processing it unlike sequential processing.

Error Handling in Pipelines

Core principles should be:

  1. Each stage knows exactly what to do with both good and bad values
  2. Errors can't get lost in the pipeline
  3. Bad values don't cause panics
  4. The error message carries context about what went wrong
  5. The pipeline can be extended with more stages, and they'll all handle errors consistently

let's update our code with some proper error handling.

type Result struct {
    Value int
    Err   error
}

func generateWithError(nums ...int) <-chan Result {
    out := make(chan Result)
    go func() {
        defer close(out)
        for _, n := range nums {
            if n < 0 {
                out <- Result{Err: fmt.Errorf("negative number: %d", n)}
                return
            }
            out <- Result{Value: n}
        }
    }()
    return out
}

func squareWithError(in <-chan Result) <-chan Result {
    out := make(chan Result)
    go func() {
        defer close(out)
        for r := range in {
            if r.Err != nil {
                out <- r  // Forward the error
                continue
            }
            out <- Result{Value: r.Value * r.Value}
        }
    }()
    return out
}

func main() {
    // Using pipeline with error handling
    for result := range squareWithError(generateWithError(2, -3, 4)) {
        if result.Err != nil {
            fmt.Printf("Error: %v\n", result.Err)
            continue
        }
        fmt.Printf("Result: %d\n", result.Value)
    }
}

Why Use Pipeline Pattern?

Let's take an example to understand better, we have a data processing workflow that follows the pipeline pattern as shown below.

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

  1. Each stage in a pipeline operates independently, communicating only through channels. This enables several benefit:

? Each stage can be developed, tested, and modified independently
? Changes to one stage's internals don't affect other stages
? Easy to add new stages or modify existing ones
? Clear separation of concerns

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

  1. Pipeline patterns naturally enable parallel/concurrent processing. Each stage can process different data simultaneously as soon as the data is available.

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

And the best part? We can run multiple instance of each stage (workers) for more concurrent requirements like so:

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

?? Hey but isn't that the Fan-In and Fan-Out Concurrency Pattern?

Bingo! Good catch right there. It is indeed a Fan-Out, Fan-In pattern, which is a specific type of pipeline pattern. We are going to cover it in details in out next post so fret not ;)

Real world use case

processing images in a pipeline

// Stage 1: Generate numbers
func generate(nums ...int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for _, n := range nums {
            out <- n
        }
    }()
    return out
}

// Stage 2: Square numbers
func square(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for n := range in {
            out <- n * n
        }
    }()
    return out
}

// Stage 3: Print numbers
func print(in <-chan int) {
    for n := range in {
        fmt.Printf("%d ", n)
    }
    fmt.Println()
}

func main() {
    // Connect the pipeline
    numbers := generate(2, 3, 4)    // Stage 1
    squares := square(numbers)       // Stage 2
    print(squares)                   // Stage 3
}

or something as complicated as log processing pipeline

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Pipeline scaling patterns

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Horizontal Scaling (Fan-Out, Fan-In)

This pattern is ideal for CPU-bound operations where work can be processed independently. The pipeline distributes work across multiple workers and then recombines the results. This is particularly effective when:

  1. Processing is CPU-intensive (data transformations, calculations)
  2. Tasks can be processed independently
  3. You have multiple CPU cores available

Buffered Pipeline

This pattern helps manage speed mismatches between pipeline stages. The buffer acts as a shock absorber, allowing fast stages to work ahead without being blocked by slower stages. This is useful when:

  1. Different stages have varying processing speeds
  2. You want to maintain steady throughput
  3. Memory usage for buffering is acceptable
  4. You need to handle burst processing

Batched Processing

This pattern optimizes I/O-bound operations by grouping multiple items into a single batch. Instead of processing items one at a time, it collects them into groups and processes them together. This is effective when:

  1. You're working with external systems (databases, APIs)
  2. Network round-trips are expensive
  3. The operation has significant fixed overhead per request
  4. You need to optimize throughput over latency

Each of these patterns can be combined as needed. For example, you might use batched processing with horizontal scaling, where multiple workers each process batches of items. The key is understanding your bottlenecks and choosing the appropriate pattern to address them.


That wraps up our deep dive into the Generator pattern! Coming up next, we'll explore the Pipeline concurrency pattern, where we'll see how to chain our generators together to build powerful data processing flows.

If you found this post helpful, have any questions, or want to share your own experiences with generators - I'd love to hear from you in the comments below. Your insights and questions help make these explanations even better for everyone.

If you missed out visual guide to Golang's goroutine and channels check it out here:

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

Understanding and visualizing Goroutines and Channels in Golang

Souvik Kar Mahapatra ・ Dec 20

#go #programming #learning #tutorial

Stay tuned for more Go concurrency patterns! ?

Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide

The above is the detailed content of Pipeline Concurrency Pattern in Go: A Comprehensive Visual Guide. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn