Golang has proven to be very suitable for concurrent programming. Goroutine is more readable, elegant and efficient than asynchronous programming. This article proposes a Pipeline execution model suitable for implementation by Golang, which is suitable for batch processing of large amounts of data (ETL).
imagined such application scenarios: (Recommended Learning: Go )
## Load user reviews from Database A (Cassandra) (huge quantity, for example, for example 1 billion); associate user information from database B (MySQL) based on the user ID of each comment; call the NLP service (natural language processing) to process each comment; write the processing results to database C (ElasticSearch). Due to various problems encountered in the application, these requirements are summarized:Requirement 1: Data should be processed in batches, for example, 100 items per batch are specified. When a problem occurs (such as any database failure), it will be interrupted, and checkpoint will be used to resume from the interruption the next time the program starts.
Requirement 2: Set a reasonable number of concurrencies for each process, so that the database and NLP services have a reasonable load (without affecting other businesses, occupy as many resources as possible to improve ETL performance). For example, steps (1)-(4) set the concurrency numbers to 1, 4, 8, and 2 respectively.
Reusable Pipeline module
In order to complete the ETL work more efficiently, I abstracted Pipeline into modules. I'll paste the code first and then analyze the meaning. The module can be used directly, and the main interfaces used are: NewPipeline, Async, and Wait. Using this Pipeline component, our ETL program will be simple, efficient, and reliable, freeing programmers from cumbersome concurrent process control:package main import "log" func main() { //恢复上次执行的checkpoint,如果是第一次执行就获取一个初始值。 checkpoint := loadCheckpoint() //工序(1)在pipeline外执行,最后一个工序是保存checkpoint pipeline := NewPipeline(4, 8, 2, 1) for { //(1) //加载100条数据,并修改变量checkpoint //data是数组,每个元素是一条评论,之后的联表、NLP都直接修改data里的每条记录。 data, err := extractReviewsFromA(&checkpoint, 100) if err != nil { log.Print(err) break } //这里有个Golang著名的坑。 //“checkpoint”是循环体外的变量,它在内存中只有一个实例并在循环中不断被修改,所以不能在异步中使用它。 //这里创建一个副本curCheckpoint,储存本次循环的checkpoint。 curCheckpoint := checkpoint ok := pipeline.Async(func() error { //(2) return joinUserFromB(data) }, func() error { //(3) return nlp(data) }, func() error { //(4) return loadDataToC(data) }, func() error { //(5)保存checkpoint log.Print("done:", curCheckpoint) return saveCheckpoint(curCheckpoint) }) if !ok { break } if len(data) < 100 { break } //处理完毕 } err := pipeline.Wait() if err != nil { log.Print(err) } }
The above is the detailed content of How golang handles big data. For more information, please follow other related articles on the PHP Chinese website!

go语言有缩进。在go语言中,缩进直接使用gofmt工具格式化即可(gofmt使用tab进行缩进);gofmt工具会以标准样式的缩进和垂直对齐方式对源代码进行格式化,甚至必要情况下注释也会重新格式化。

go语言叫go的原因:想表达这门语言的运行速度、开发速度、学习速度(develop)都像gopher一样快。gopher是一种生活在加拿大的小动物,go的吉祥物就是这个小动物,它的中文名叫做囊地鼠,它们最大的特点就是挖洞速度特别快,当然可能不止是挖洞啦。

本篇文章带大家了解一下golang 的几种常用的基本数据类型,如整型,浮点型,字符,字符串,布尔型等,并介绍了一些常用的类型转换操作。

是,TiDB采用go语言编写。TiDB是一个分布式NewSQL数据库;它支持水平弹性扩展、ACID事务、标准SQL、MySQL语法和MySQL协议,具有数据强一致的高可用特性。TiDB架构中的PD储存了集群的元信息,如key在哪个TiKV节点;PD还负责集群的负载均衡以及数据分片等。PD通过内嵌etcd来支持数据分布和容错;PD采用go语言编写。

在写 Go 的过程中经常对比这两种语言的特性,踩了不少坑,也发现了不少有意思的地方,下面本篇就来聊聊 Go 自带的 HttpClient 的超时机制,希望对大家有所帮助。

go语言需要编译。Go语言是编译型的静态语言,是一门需要编译才能运行的编程语言,也就说Go语言程序在运行之前需要通过编译器生成二进制机器码(二进制的可执行文件),随后二进制文件才能在目标机器上运行。

删除map元素的两种方法:1、使用delete()函数从map中删除指定键值对,语法“delete(map, 键名)”;2、重新创建一个新的map对象,可以清空map中的所有元素,语法“var mapname map[keytype]valuetype”。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Notepad++7.3.1
Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Mac version
God-level code editing software (SublimeText3)
