The author of Go rqlite tells you: How important algorithms are when developing database software!-Golang-php.cn

Home

Backend Development

Golang

The author of Go rqlite tells you: How important algorithms are when developing database software!

藏色散人

Jan 22, 2022 am 09:19 AM

gogolang

This article is introduced by the golang tutorial column about "Go rqlite author tells you: How important is the algorithm when developing database software!" 》, I hope it will be helpful to friends in need!

Writing database programs is a fascinating job. I've been heavily involved in open source database development for the past two years, and database programming is probably the most inspiring project you can do as a software developer.

What is truly shocking, however, is how much my attitude toward databases has changed over the past 6 years. From being uninterested at the beginning, I now begin to think that database systems are the pinnacle of software engineering.

Don't Know What's Better

For most of my career, my only experience with databases was reading about them. Usually in a boring context - open any undergraduate textbook on databases and you'll see what I mean. Usually you will see the following table as a typical use case for relational databases:

ID	FIRST	LAST	TITLE	DEPARTMENT
1	Robert	Kelly	Director	Marketing
2	Tom	Burke	Representative	Sales
3	John	#Smith	Vice President	Sales

Can you read more boring stuff? If these are all about databases, I want nothing to do with them. What's the point? Software is much cooler than this, right? So I completely avoided anything to do with databases for a long time

You never forget your first CRUD application

2009, after years of writing Embedded software, Linux device drivers, and networking software, I found myself leading a team that needed to build a web-based system. You see, the AWS cloud has arrived, and cloud-based licensing technology MAC addresses are no longer valid. My team has to build a licensing portal for our new EC2-based software appliance. Since we had a lot of experience with Python, we chose Django, running on MySQL. Something new happened. I actually started working on the database.

As our CRUD applications continued to run in the plains of our country, I began to realize how important the database was - and how central it was to our systems. If we lose the database, our software development is in vain. If the database corrupts data, our customers' devices may become unlicensed and their networks will cease to function. If the database does not function properly, thousands of people will be affected simultaneously. But none of these things happened. Databasealways works. It never disappoints us. I'm impressed.
Later I discovered foreign key constraints, unique constraints, referential integrity, indexes, (remember, at this time I don’t know anything about these things) - the database can help me in various ways to build a more robust system. I finally realized that modern databases are amazing-Databases are the most boring things in the worlduntil you actually have to build a system with them.

You will also never forget your first search system

By 2012, I was leading a team that built a large key-value database based on a large indexing and search system , with elasticsearch at its core. It's eye-opening to see what a system like Elasticsearch can do - a technology built on world-class indexing - even with terabytes of log data underneath.
By now I've seen even databases and search systems fail, but I'm fascinated by database technology. By 2014, I joined a small dedicated team developing the core of [open source time series database] (github.com/influxdata/influxdb).

The algorithms I learned

are really Very important

Only in database development can Big O analysis really come alive. Databases are one of the few applications where programmers still need to loop, sort, and filter millions of objects. This is one of the few places where a lot of the boring material learned in CS classes is important.

This is not the case with many other software developments. Writing boot ROM firmware? No, algorithms have never been important to me. Tuner device driver? No, it doesn't matter. Network device management software? CRUD application? Hardly all of these disciplines require different skills and knowledge. Most of the time, I just discussed runtime complexity in interviews.
But with the development of databases, all this has changed. It's a wonderful thing to actually see a system return the correct results, but only for a fraction of the time due to algorithm changes, and to see it happen in your code, in the system you built. matter.

Performance Matters, Too

There’s an old story in software that goes like this: A programmer writes some code that runs ten times faster than the previous version. He showed it, but someone pointed out that the data it produced was slightly different than the correct data. "But it's ten times faster," the programmer pointed out. "Well, if it doesn't need to be correct, I can make a version that takes up no space at all and runs infinitely fast," replied another.
This morality tale has always had a great impact on me. Being right is always more important than anything else. This is real. But it also leads me to believe that projects are valuable simply because they produce the right results.

For databases, this is not the case.
Performance is more than just a feature. This is a requirement. Those who are willing to pay for databases often do so because they have large amounts of data. If the database doesn't perform well in this situation—if it doesn't return results quickly and efficiently—then it might not work at all.

Do you think writing a system is complicated?

I think the thing that shocks me most about developing databases is how complex query engines have become. I have a lot of experience building systems that write and store data to disk. Making these systems work well can be a significant challenge.
But this complexity is usually much less than that of the query engine. A flexible query system - effectively building a system to answer questions when you don't know what the questions will be - requires serious design thinking. The query planner must be valid. Query systems must support many orthogonal requirements—filtering by certain dimensions, grouping by other dimensions, joining data from different tables—and sometimes supporting data from external sources. Finally, the query system must be efficient and perform well. This leads to a tension between abstraction and optimization in design and implementation, which requires real skill to manage well.

In the real world, it must be operated

Any important database must support basic operations such as backup, recovery, fragmentation management, and monitoring.
If I, as a serious operator, can't back up your database, I can't use it, simple as that. It doesn't matter how quickly the database accepts writes. During a query, it doesn't matter how small its memory footprint is. If I can't protect the data in the database from failures beyond your control, the creator of the database, I will never be able to run it comfortably.
Of course, there are many ways to back up the database without the cooperation of the database. But built-in methods are usually best. This is also my recommendation for rqlite v2.0. If I want anyone to use rqlite seriously, I have to solve the real world problem where the system can fail completely and lag behind data for a long time.

Therefore, when designing and implementing a database, build operational support from the beginning. Make it a fundamental part of your design. Your users will thank you for it.

The answer is usually "it depends"

When you first start working with a database, especially as an operator, you often ask the question: Can the system What rate index? How quickly does it respond to queries? How much disk space do I need? How big can a piece of debris be and still work? How can I speed it up? All asked without reservation. I used to make it myself.
Maybe you can talk to the database programmers and ask them these questions. And the answer you'll often—perhaps ever—get is: It's up to you. You have to benchmark, you have to measure. This can be irritating to hear and may seem like you are avoiding responsibility.

but it is not the truth.

Now, when I hear questions like this, I smile. too naive.
Indexing rate may depend on the size of the data, not just the number of documents or data points. This may depend on the batch processing, the cardinality of the data, whether the database is clustered, which columns and fields in the data are indexed, whether it is new data or an update to existing data, the machine the database is running on, RAM, IO performance, and the replication used.
The variables that control performance never end.
For queries, it may depend on the time range of the time series data. It depends on the number of records hit, the number of fields queried, whether a range scan is involved, whether the data is indexed, the type of index used, the number of shards that may be accessed, and whether the data is local. and machine characteristics. Is it in stock? Is it undergoing maintenance? Is the network busy?

So the answer is always,

It depends. Database designers are honest. They can know everything about the system they built and still not know the answers to your questions.

Programming Bucket List

If there is one piece of advice for developers who want to improve their programming skills, it would be to join a database development team. My programming skills have improved tremendously because of database development - it's been a wonderful coding experience.

Original address: https://www.philipotoole.com/what-i-learned-from-programming-a-database/
Translation address: https://learnku .com/go/t/64605

The above is the detailed content of The author of Go rqlite tells you: How important algorithms are when developing database software!. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:learnku. If there is any infringement, please contact admin@php.cn delete

Understanding Goroutines: A Deep Dive into Go's ConcurrencyMay 01, 2025 am 12:18 AM

GoroutinesarefunctionsormethodsthatrunconcurrentlyinGo,enablingefficientandlightweightconcurrency.1)TheyaremanagedbyGo'sruntimeusingmultiplexing,allowingthousandstorunonfewerOSthreads.2)Goroutinesimproveperformancethrougheasytaskparallelizationandeff

Understanding the init Function in Go: Purpose and UsageMay 01, 2025 am 12:16 AM

ThepurposeoftheinitfunctioninGoistoinitializevariables,setupconfigurations,orperformnecessarysetupbeforethemainfunctionexecutes.Useinitby:1)Placingitinyourcodetorunautomaticallybeforemain,2)Keepingitshortandfocusedonsimpletasks,3)Consideringusingexpl

Understanding Go Interfaces: A Comprehensive GuideMay 01, 2025 am 12:13 AM

Gointerfacesaremethodsignaturesetsthattypesmustimplement,enablingpolymorphismwithoutinheritanceforcleaner,modularcode.Theyareimplicitlysatisfied,usefulforflexibleAPIsanddecoupling,butrequirecarefulusetoavoidruntimeerrorsandmaintaintypesafety.

Recovering from Panics in Go: When and How to Use recover()May 01, 2025 am 12:04 AM

Use the recover() function in Go to recover from panic. The specific methods are: 1) Use recover() to capture panic in the defer function to avoid program crashes; 2) Record detailed error information for debugging; 3) Decide whether to resume program execution based on the specific situation; 4) Use with caution to avoid affecting performance.

How do you use the "strings" package to manipulate strings in Go?Apr 30, 2025 pm 02:34 PM

The article discusses using Go's "strings" package for string manipulation, detailing common functions and best practices to enhance efficiency and handle Unicode effectively.

How do you use the "crypto" package to perform cryptographic operations in Go?Apr 30, 2025 pm 02:33 PM

The article details using Go's "crypto" package for cryptographic operations, discussing key generation, management, and best practices for secure implementation.Character count: 159

How do you use the "time" package to handle dates and times in Go?Apr 30, 2025 pm 02:32 PM

The article details the use of Go's "time" package for handling dates, times, and time zones, including getting current time, creating specific times, parsing strings, and measuring elapsed time.

How do you use the "reflect" package to inspect the type and value of a variable in Go?Apr 30, 2025 pm 02:29 PM

Article discusses using Go's "reflect" package for variable inspection and modification, highlighting methods and performance considerations.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

4 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Where to find the Site Office Key in Atomfall

4 weeks agoByDDD

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Hot Topics

Where is the login entrance for gmail email?

7871

1649

1407

1301

1245