Home > Article > Backend Development > Seven Paragraphs of Programming Ability
Foreword
Programmers’ programming skills will gradually improve with the accumulation of experience. I think programming ability can be divided into some levels.
The following is a discussion of the programming capability hierarchy model through two dimensions.
One dimension is the level of programming skills, and the other dimension is the level of domain knowledge.
Programming skill level
Programming skill level refers to the programmer's ability to design and write programs. This is the foundation of programmers.
Section 0—Non-Programmer:
When a beginner programmer encounters a problem, he or she is completely confused and doesn’t know how to program to solve the problem. In other words, you are still a layman and cannot be called a "programmer" yet. The computer was still a mysterious black box in front of him.
Level 1—Basic Programmer:
After learning programming for a period of time, you can write a program to complete the task when you receive a task.
The written code can work under normal circumstances, but in actual operation, various bugs will appear when encountering some special conditions. In other words, if you have the ability to develop demo software, but if the developed software is actually delivered to customers for use, you may be scolded by customers.
The programmer has written the program, but why it sometimes works properly and sometimes not, the programmer himself doesn’t know.
If a bug is encountered during operation, or the requirements change, and the code needs to be modified or added, the program will soon become confusingly structured, the code will expand, and bugs will appear. Soon, even the original developers themselves were unwilling to take over maintaining the program.
Paragraph 2 - Data structure:
After a period of programming practice, programmers will realize the meaning of the ancient saying "data structure + algorithm = program". They use algorithms to solve problems. Furthermore, they will realize that algorithms are essentially dependent on data structures. Once a good data structure is designed, good algorithms will emerge.
If you design the wrong data structure, it is impossible to develop a good algorithm.
I remember a foreign sage once said: "Show me your data structure!"
3 paragraphs - object-oriented:
After that, programmers will appreciate the power of object-oriented programming. Most modern programming languages support object-oriented programming. But it does not mean that if you use an object-oriented programming language to program, if you use classes or even inherit classes, you are writing object-oriented code.
I have seen a lot of process-oriented code written in Java, Python, and Ruby.
Only when you master interfaces, master polymorphism, master the relationship between classes, objects and objects, can you truly master object-oriented programming technology.
Even if you are using a traditional programming language that does not support object-oriented programming, as long as you have "objects" in your mind, you can still develop object-oriented programs.
For example, when I program in C language, I consciously use object-oriented techniques to write and design programs. Use struct to simulate classes, and put functions of the same type of concepts together to simulate classes. If you doubt whether object-oriented code can be written in C language, you can take a look at the Linux kernel. It is written in C language, but you can also see the strong "object" emanating from the lines of its source code. the taste of.
It is not easy to truly master object-oriented programming technology.
In my technical career, there are two hurdles that give me the most headaches.
One hurdle is that during the transition from Dos to Windows development, I couldn’t understand the concept of framework for a long time. In the Dos era, everything is a call to a function library, and your program actively calls functions. In the Windows era, it was replaced by a framework. Even your main program is actually called by the framework. The UI thread will get the message from the operating system and send it to your program for processing. The Spring framework that Java programmers are familiar with is also such a reverse call framework.
Nowadays, because the term "framework" seems very high-end, many "class libraries"/"function libraries" call themselves "frameworks". This is all a misuse of names in my opinion.
"Class library"/"Function library" is the code I wrote to call them.
"Framework" means that I register the callback function to the framework, and the framework calls the function I wrote.
Another hurdle is object-oriented. For a long time, I didn't know how to design the relationship between classes, and I couldn't design a class hierarchy well.
I remember reading a book by a foreign expert. He talked about a very simple and practical object-oriented design technique: "State the problem. Then find out the nouns and use them to build classes. Put the Verbs to find out and use to construct the methods of the class". Although this technique is quite effective, it is too grassroots, has no theoretical basis, and is not rigorous. If the problem is poorly stated, the resulting class system will be problematic.
There should be many ways to master object-oriented thinking. I got inspiration from relational databases to understand and master object-oriented design thinking.
In my opinion, a relational database table is actually a class, and each row of records is an instance of a class, that is, an object. The relationship between tables is the relationship between classes. O-Rmapping technology (such as Hibernate) is used for mapping from object-oriented code to database tables, which also shows that classes and tables are indeed logically equivalent.
Since database design and class design are equivalent, then to design an object-oriented system, you only need to use the design skills of relational databases.
The design of relational database table structure is very simple:
1. Identify the relationship between tables, that is, the relationship between classes. Is it one-to-one, one-to-many, many-to-one, or many-to-many. This is the relationship between classes.
2. Identify the fields of the table. Of course, an object has countless attributes (for example, a person: height, weight, gender, age, name, ID number, driver's license number, bank card number, passport number, Hong Kong and Macao pass number, employment number, medical history, marriage history, etc.) , what we need to write a program to record is only the attributes we care about. These properties of concern are the fields of the table, that is, the attributes of the class. "Three thousand weak water, I will take a ladle to drink"!
4 Paragraphs - Design Patterns:
I once saw this sentence on the Internet: "If you don't have 100,000 lines of code, don't talk to me about design patterns." I deeply agree.
I remember when I first read Gof’s book on design patterns, I found that although I didn’t know about design patterns before, I actually consciously used some design patterns in the actual programming process. Design patterns are objective laws of programming. They were not invented by anyone, but were first discovered by some early senior programmers.
Without design patterns, you can also write programs that meet your needs. However, once the subsequent needs change, your program will not be flexible enough and will be unsustainable. After the real program is delivered to the customer, there will definitely be further demand feedback. The development of subsequent versions will definitely increase demand. This is a reality that programmers cannot avoid.
When writing UI programs, whether it is Web, Desktop, Mobile, or Game, you must use the MVC design pattern. Otherwise, your program will be unsustainable in the face of subsequent changes in UI requirements.
The most important idea in design patterns is decoupling, decoupling through interfaces. In this way, if the requirements change in the future, you only need to provide a new implementation class.
The main design patterns are actually object-oriented. Therefore, design pattern can be considered as an advanced stage of object-oriented. Only by mastering design patterns can we be considered to have truly and thoroughly mastered object-oriented design techniques.
When I learn a new language (including non-object-oriented languages, such as functional programming languages), I always look at how various design patterns are implemented in this language after understanding its syntax. This is also a tip for learning programming languages.
5 paragraphs--Language expert:
After a period of programming practice, programmers have become quite proficient in a common programming language. Some people have also become "language lawyers", good at explaining language usage and various pitfalls to other programmers.
Programmers at this stage are often loyal believers in the language they use. They often argue with users of other languages in communities and forums about which language is the best programming language. They think the language they use is the best programming language in the world, bar none. They believe that the programming language they use is suitable for all scenarios. They only see hammers and therefore treat all tasks as nails.
6 Paragraph - Multilingual Expert:
Programmers at this stage have learned and mastered several programming languages due to work relationships or purely because of their interest in technology. I have experienced the different design ideas of different programming languages and have a better understanding of the strengths and weaknesses of each language.
They now believe that programming language is not the most important, programming language is just basic skills.
They will now choose different programming languages to solve problems based on different task requirements or different resources, and will no longer complain about not using a certain favorite programming language for development.
There are many schools and ideas of programming languages, and some programming languages support multiple programming paradigms at the same time.
Static type programming paradigm
In a programming language that adopts the static type programming paradigm, its variables need to be explicitly typed. Representative languages: C, C++, Pascal, Objective-C, Java, C#, VB.NET, Swift, Golang.
The advantages of doing this are:
1. The compiler can find type errors at compile time.
2. If the compiler knows the type information when compiling, it can improve performance.
This paradigm believes that programmers must know the type of variables. If you don’t know the type of variables, then don’t mess around! When compiling, the program will report an error.
Swift and Go are both statically typed programming languages, but they do not need to specify the type explicitly, but can automatically determine its type by the compiler through inference.
Dynamic type programming paradigm
Programming languages that adopt static type programming paradigm do not need to explicitly specify types for their variables. Any variable can point to any type of object. Representative languages: Python, Ruby, JavaScript.
The philosophy of dynamic typing can be summarized by the concept of duck typing. The duck test proposed by James Whitcomb Riley can be expressed like this: "When you see a bird walking like a duck, swimming like a duck, and quacking like a duck, then the bird can be called a duck."
This paradigm believes that , the programmer must know the type of the variable and the methods and properties it supports. If you don't know the type of the variable, then don't mess around! The program will crash when running! Who to blame if the program crashes? Blame yourself, you are not a qualified programmer!
The benefits of dynamic typing are:
There is no need to explicitly define interfaces and abstract types. As long as a type supports the required methods and properties, then it's OK. The procedure will be quite flexible and simple. The interfaces/base classes that C++, Java, and C# regard as their lifeblood are regarded as nothing in dynamic languages!
Disadvantages are:
1. If the type is wrong, the compiler cannot find the error, but the program crashes during runtime.
2. Because the compiler does not know the type of the variable, it cannot optimize performance.
Object-oriented programming paradigm
Object-oriented programming paradigm began to rise in the late 1970s. It supports classes and instances of classes as modules that encapsulate code. Representative languages: Smalltalk, C++, Objective-C, Java, C#, VB.NET, Swift, Go, Python, Ruby, ActionScritp, OCaml.
Early programming languages were all process-oriented. It's sequence, conditions, and loops that form functions. As the size of the code increases, people find it necessary to modularize the code. The code corresponding to a concept is placed in a file, which facilitates concurrent development and code management.
People have also discovered the rule of "program = data structure + algorithm". Therefore, the data structures and functions corresponding to a concept should be placed in one file. This is the concept of classes.
The object-oriented programming paradigm has indeed greatly improved production efficiency, so it has been widely used. Therefore, there are many languages that support the object-oriented programming paradigm at the language level.
Although the C language does not support the object-oriented programming paradigm at the language level, modern C language development will apply the object-oriented modular idea, putting the same type of data structures and functions in a file, using similar naming Way.
After all, C language does not support object-oriented support at the language level, so many programmers want to add object-oriented support to C language. Representatives are C++ and Objective-C.
C++ is a new language, but most of the language elements are compatible with C.
Objective-C is fully compatible with C. Objective-C adds a thin layer of syntax sugar to C to support interfaces (that is, classes in other languages) and protocols (that is, interfaces in other languages). Even the initial implementation of Objective-C was a C language precompiler. Frankly speaking, Objective-C, except that the added syntax does not conform to the C flow, is actually quite exquisite in its object-oriented system design. In his early years, Jobs was very discerning and took Objective-C into his pocket. Because it was closed within the Apple/NextStep system, few people knew about it. With the popularity of iOS systems, Objective-C has become famous all over the world in recent years.
Functional programming paradigm
Functional programming paradigm is a programming language invented by some mathematicians. They believe that programs are mathematical functions. Representative languages: Lisp, Erlang, JavaScript, OCaml, Prog.
There are many experts who have strongly advocated functional programming language and think it is extremely revolutionary. But I think they overestimated the power of the functional programming paradigm. I don't think the functional programming paradigm is any better than the object-oriented programming paradigm.
The core of functional programming languages is functions, and they do not have the concept of Class. But its functions are not functions of traditional process-oriented languages. Its functions support the concept of "closure".
In my opinion, functions in functional programming languages, also known as "closures", are actually "classes" to put it bluntly. The development of programming languages to this day requires modularization and the combination of "data structure" and "algorithm". Regardless of the language, there is no way out if the programming method does not combine them.
Object-oriented programming language uses classes to combine "data structure" and "algorithm". The core of a class is the "data structure", that is, its "properties", not the "algorithm", its "function". In a class, it is the functions that are attached to the properties.
The functional programming language uses closures to combine "data structure" and "algorithm". It is a function that can capture external fields. It is the "property" that is attached to the "function".
"Class" is essentially equivalent to "closure". Many object-oriented programming languages now have support for closures. Observing their codes, we can find that they actually use "classes" to implement "closures".
Who is easier to use, "class" or "closure"? Obviously "class".
And "closure" is more concise, so "closure" is often used to replace anonymous classes in object-oriented programming languages. For a class with only one function, it is too troublesome to write it as a class. It is better to write it as a closure, which is more concise.
Let’s complain about the OCaml language. Its predecessor, the Caml language itself, is a very good functional language. It adds a complete set of object-oriented mechanisms and supports both object-oriented and functional programming paradigms. It is easy to use it like C++. cracked.
There are also many object-oriented language control people who are annoyed by JavaScript and always want to add object-oriented support to JavaScript. ActionScript is one such attempt. I have used it, and it is really not much different from Java.
Complain about ExtJS again. When choosing a web front-end development framework, I compared ExtJS and JQuery.
ExtJS was obviously developed by Java experts. They used JavaScript to simulate the design ideas of Swing and built a UI library.
JQuery developers have obviously understood the functional programming paradigm of JavaScript, and created a UI library based on the characteristics of JavaScript’s dynamic functional programming language, which immediately beat ExtJS.
From the story of ExtJS and JQuery, we can see how important multi-language programming capabilities are. The author of ExtJS is proficient in and loves Java, so he uses the scalpel JavaScript as a hammer to use Java, knocking on it at random, which is thankless.
Functional programming language, as well as some tips such as tail recursion. Tail recursion does not require a stack to prevent stack overflow during recursive calls.
Template programming paradigm
Template programming is to use types as parameters, and a set of functions can support any variety of types. Representative language: C++.
The need for template programming was invented when C++ was developing container libraries. Because containers need to store objects of any type, there is a need for generics.
C++ template programming is to create corresponding types of code based on the usage in the source code during compilation. In addition to C++, Java and C# also have similar mechanisms called "generics", but their implementation methods are very different from C++ templates. Their compilers do not generate new code, but instead use casts.
In a programming language without templates/generics, how to store objects in a container? Just access objects of public base class types (Java, C#), or void* pointers (C), and cast the type to the actual type yourself when taking it out. Dynamically typed languages don't care about types, and it doesn't matter. Just throw any object into the container, take it out and use it directly.
Some C++ masters have developed "template metaprogramming" based on templates. Because template programming is done by the C++ compiler, template metaprogramming is to let the compiler perform calculations, and the results will be calculated after compilation. I don’t know what the use of this thing is besides research and showing off skills?
Summary
I think there are several criteria for whether a language is worth learning:
1. Whether you want to use it or not, you have to learn it if you want to use it. There is no doubt about it. After all, we all have to eat.
2. Whether its language features give you a refreshing feeling. If so, it's worth the price of admission. For example, the Go language has abolished exceptions and instead returns multiple values. I think so. I've actually been actively using exceptions for many years. Because, I think since C does not support exceptions and can live well, why does it need exceptions? An error occurred and an error code was returned. For irreversible errors, just Abort the program directly! Moreover, exceptions actually violate the principles of procedural programming. A function should have only one entry and one exit. There are more exits for throwing exceptions.
3. Are you good at a certain field? If all you have is a hammer, you're going to treat every task like a hammer. But if you have a variety of tools in your toolbox, it will be much easier to face different tasks.
7th paragraph - Architecture design
You also need to master the ability of architecture design in order to design excellent software. There are some skills in architectural design:
1. Layering
A software is usually divided into:
Presentation layer - UI part
Interface layer - communication interface part of background service
Service layer - actual service part
Storage layer—persistent storage part, stored in files or databases.可以 Layout software can decoup up each module, support parallel development, easy to modify, and easy to improve performance.
2. SOA
Modules are connected to each other through network communication and are loosely coupled. Each module can be deployed independently, and deployment instances can be added to improve performance. Each module can be developed using different languages and platforms, and previously developed services can be reused. SOA, commonly used protocols include WebService, REST, JSON-RPC, etc.
3. Performance bottleneck
1) Change synchronization into asynchronous.
Implemented with memory queue (Redis), workflow engine (JBpm), etc. Memory queues are prone to data loss, but are fast. The workflow engine will save the request to the database.
By converting synchronous requests into asynchronous requests, basically 99.99% of performance problems can be solved.
2) Use single-machine parallel hardware processing.
For example, use GPU, FPGA and other hardware to process and improve performance.
3) Use cluster computers to process.
For example, a Hadoop cluster uses multiple computers to process data in parallel.
In your own software stack, you can also deploy multiple copies of a module for parallel processing.
4) Use cache to satisfy requests. After the cache is heated with commonly used content, a large number of user requests will only read data from the memory, and the performance will be greatly improved.
Cache is God’s algorithm, remember as if its performance is only slightly lower than the optimal performance, as if you are God and can foresee the future. Now that the X86 CPU has encountered a frequency limit, the main way for the CPU to improve performance is to add a high-speed cache.
4. Small-scale construction of large systems
Don’t panic when encountering a large system, divide it into multiple modules, use multiple small programs, and solve it through SOA collaboration. This adheres to the design philosophy of Unix. A large number of single-purpose small programs have been developed on Unix. It advocates users to use pipelines to let multiple small programs collaborate to solve user needs. Of course, pipeline communication has too many restrictions and is not flexible enough. Therefore, now we can let multiple programs collaborate through URI and SOA. Applications on Android and iOS now collaborate through URIs. This can be regarded as a modern development of Unix design ideas, right? !
5. Sharding
There is a trend now, which is to go to IOE. I-IBM mainframe, O-Oracle database, E-EMC storage. Previously, large-scale systems were often architected using IOE, deploying an Oracle database on the mainframe, and the Oracle database used EMC to store data. IOE is the most powerful computer, database and storage today. But there will come a time when they cannot resist the massive system.
Oracle database is Shareeverything, it can run on a computer cluster (server nodes cannot exceed 16). Computer clusters all share the same storage.
Going to the IOE movement marks the bankruptcy of the ShareEverything model. ShareNothing must be used so that the system can be expanded infinitely.
Use MySQL database to handle data of any size. The premise is that you know Sharding. Divide a large system into several small systems and split it into several cheap servers and storage. More Modern, it is divided into a large number of virtual machines.
For example, the 12306 website of the Ministry of Railways. We know that train tickets belong to a certain train. Then we divide each train as a unit, and the 12306 website can be divided into thousands of modules. A virtual machine can host several modules. When certain trains become performance bottlenecks, they can be moved to independent virtual machines. Even if some of the listed services end up being unavailable, the system will not be completely unavailable.
The 12306 website has only one global part, which is user login. This can be left to a third party. For example, users can log in with WeChat, Weibo, QQ and other accounts.
You can also implement user login service yourself. Or use slicing to provide services with multiple Redis servers. The Redis server stores the sessionId and userId, roles, permissions and other information of each logged in user. The sessionId is randomly generated, and some of its bits can be selected to identify which Redis server it is on. After the user logs in, the sessionId is sent to the customer. The sessionId is sent back to the server every time the user makes a request. The server sends the sessionId to the Redis server to query the user information and process the user request. If the sessionId is not found on the redis server, let the user log in. Even if all registered users log in at the same time, it does not require much memory. Moreover, when there is too much session memory, you can delete the session of the earliest logged-in user and force him to log in again. The number of active users at the same time will not be too many.
Levels of domain knowledge
All the previous levels focus on the skills of programming itself. To put it bluntly, they are basic skills and do not generate much value by themselves. But too many programmers waste too much time on those foundational levels.
Some programmers particularly like to delve into programming languages. Whenever a new programming language comes out or an old language is hyped, they will invest their energy in research. I am one of them, wasting a lot of energy on programming languages and tricks.
I think the C++ language is a particularly big pitfall. It was originally developed as object-oriented C. Later, template programming was discovered, and template programming and further template meta-programming were strongly advocated. Recently, new standards such as C++11 and C++14 have been launched, adding many new things, such as functional programming, type inference, etc. C++ is too complex and has too many pitfalls that consume a lot of programmers' energy. When I use C++, I only use the object-oriented part and the template part, and do not use other too sophisticated features.
Computer science is a very broad subject. There are many areas of knowledge that require and deserve our in-depth study so that we can write valuable programs. Software must be integrated with the industry to be valuable when implemented. You cannot write valuable programs by just studying programming skills without understanding domain knowledge.
There are many fields of computer science, here are some examples:
Storage ---- block devices, file systems, cluster file systems, distributed file systems, fiber optic SCSI, iSCSI, RAID, etc.
Network----Ethernet, optical fiber network, cellular network, WIFI, VLAN, etc.
Computer architecture, mainly the CPU instruction set. x86, ARM, etc.
USB protocol. Need to know the URB package.
PCI protocol, PCI-E protocol. Modern computer peripherals are all based on PCI protocol and PCI-E protocol. Graphics cards are now all connected to the computer via the PCI-E protocol. Relatively speaking, it reduces a lot of knowledge that needs to be learned. To engage in virtualization, you need to have an in-depth understanding of the PCI protocol.
Image processing - image compression, real-time video encoding, etc.
3D games
Relational databases
NoSQL databases
Operating systems
Distributed operating systems
Compilation principles
Machine learning - big data needs to be used now!
Understanding the knowledge in these fields also includes understanding the existing commercial hardware, commercial software and open source software in this field. Many times, the work you want to complete already has ready-made tools. You just use off-the-shelf tools to get the job done, no development required. Sometimes, you just need to combine existing tools and write some scripts to complete the task.
For example, I want to implement a two-way synchronization task at a time. I found an excellent open source software Unison, wrote the configuration file and successfully completed the task. No need to write any code.
Another time, to achieve high availability, I used Python to call several open source software and it was easily achieved.
Write an installation program and customize the operating system. Once you know the domain knowledge of the operating system, you can easily do it by writing a few lines of scripts.
People who do not have domain knowledge may have to carry out a lot of unnecessary development, and may even find out after a long time that this is a dead end.
In addition, solid domain knowledge can greatly improve programming debugging and error checking capabilities. Knowing how compilers and programming language runtimes work allows you to quickly modify your code based on compilation errors and warnings.
Knowing the underlying operating mechanism of the operating system can quickly find the root cause of runtime errors. For example, once I wrote a windows upgrade service program. It is a windows service and needs to execute a dos script. This script will replace the windows service itself. I found that sometimes the execution of the script was invalid. After checking it all night, I found that after the Windows service was installed, there would be a permission problem when executing the script on the first startup. The logs were correct, but the actual execution of the script had no effect. But once the windows service program starts once, it is ok. This must be a problem with the underlying security mechanism of the Windows operating system. Because I don’t know much about the Windows kernel, it took me a long time to discover this problem and I was not clear about the root cause of the problem.
Paragraph 0 - Domain Knowledge Rookie
I don’t have much knowledge of the domain knowledge. Find some introductory articles about software and hardware in this field through search engines, and follow the instructions in the article to configure and use the software. Barely able to use existing software and hardware.
Level 1 - Domain Knowledge Expert
Understand the commonly used hardware in the field, and have an in-depth mastery of the configuration and usage skills of commonly used software in the field. Be able to skillfully build solutions using existing software and hardware, and be able to solve various problems encountered in actual work.
Paragraph 2 - Domain Knowledge Expert
When you not only master the software and tools in the field, know how to use them, but also know their principles, "know what it is and why it is so", you are a knowledge expert in the field .
Only if you know the principles of network protocols can you know where the problem may be when there is a problem with the network. Is it a mac conflict, ip conflict, or a network loop?
Only when you know the principles of storage can you know why this storage method is not suitable for virtualization, which storage method is suitable for virtualization, and the other method is suitable for data backup.
Only if you know the PCI protocol can you know how to virtualize a hardware device.
Only if you know the network card hardware protocol can you simulate a virtual network card that the virtual machine can use normally.
Only if you know the video encoding format and principles can you know which video format takes up the least bandwidth and which video format takes up the least CPU.
Only if you understand the IntelVT/Amd V instruction set can you know how virtualization is implemented.
You understand that workflow is actually a state machine. Only when you encounter complex workflows can you know how to design a workflow engine that meets the requirements.
Paragraph 3 - Scientist
You are an expert in domain knowledge, but all your knowledge comes from books and other people.
If you are satisfied with being an expert in domain knowledge, you can only pick up the wisdom of others and never try to surpass it. Others may not be willing to tell you their research results. By the time someone else tells you, it may have discovered an updated theory, and a new generation of products may be about to be released.
Scientists are people who explore the unknown, have the courage to innovate, and are people who promote the progress of human society.
Legend has it that a Cisco executive once said half-jokingly: "If Cisco stops research and development of new technologies, Huawei will not be able to find its direction." This is a mockery of Huawei as it is only at the level of domain knowledge experts and can only copycat but cannot surpass it. I don’t know the actual situation of Huawei, but I hope that Huawei has reached the front-runner position now.
Irving Jacobs discovered the principle of CDMA code division multiple access and found that it had great potential in communications, and formed Qualcomm. Qualcomm lives mainly on patent licensing fees and employs a large number of scientists to conduct research in the field of communications. Some people say Qualcomm is a patent troll. These people do not understand the value of knowledge. In their eyes, the reasonable price of Windows should be 5 yuan, the price of a CD. The price of an iPhone should be more than 1,000 yuan for a bare metal device. Qualcomm is a patent troll, so why don't you also troll CDMA and LTE? Come out and show me!
The X86 chip is not designed with virtualization in mind. Therefore, so-called "virtualization vulnerabilities" will appear. That is to say, when some CPU privileged instructions are executed, exceptions will not be thrown in the virtual machine environment, so it cannot be switched to the Host. In this way, virtual machines cannot run on the X86 chip.
VmWare company was founded in 1998 by several scientists in the United States. They discovered that they could use binary translation technology to run virtual machines on x86 computers.
Xen virtualization software was also invented by several scientists. They found that as long as the kernels of the virtual machine operating system and the Host operating system were modified, and the Host's functions were directly called when a "virtualization vulnerability" instruction needed to be executed, virtualization could be achieved and the running performance of the virtual machine could be greatly improved.
Later, Intel added the IntelVT instruction set to its own chips, and Amd added the AmdV instruction set to its own chips, making up for the "virtualization loophole". So there is KVM virtual machine software, which directly uses CPU hardware instructions to implement virtualization.
When KVM executes CPU instructions, it runs directly on the physical CPU, so it is extremely efficient. However, when a virtual machine runs a virtual peripheral, it must be simulated by software, so the virtual machine's IO access speed is very slow.
IBM scientist Rusty Russell drew on Xen’s R&D experience and created VirtIO technology. It is to write a set of PCI virtual devices and drivers in the virtual machine. This set of virtual PCI devices has a virtual device memory. This virtual device memory Host is accessible, and the virtual machine can also access it through the VirtIO driver. That is, a piece of memory is shared between the virtual machine and the host, which solves the IO performance problem of the virtual machine.
Tell me another search engine story:
A long time ago, I wanted to add a search function to a program. I just started using sql query implementation and found that it was too slow. Later I found the open source Lucene project. It uses reverse indexing technology, which greatly improves search speed by creating reverse indexes in files.
The two founders of Google discovered the secret of links in HTML. They discovered that the weight of each HTML page can be set through the link relationship of the HTML page. That is the PageRank algorithm. As a result, Google's automated search engine defeated Yahoo's manually classified search engine.
OK, using reverse index technology and PageRank, as well as a simple html crawler robot, we can create a search engine. However, the Internet is huge and a large number of new web pages are generated every day. It is difficult to build a reverse index for the entire Internet.
A few years later, Google published three more papers: Googlefs, Mapreduce, and Bigtable. So the developers of the Lucene project developed the Hadoop project based on Google's Mapreduce paper. MapReduce uses a large number of computers to store data and perform calculations, and finally summarizes the results. Using Hadoop+reverse index+PageRank, you can create a search engine. Yahoo, Baidu and other companies have developed their own search engines based on Hadoop.
However, the search engine effects of other companies still cannot compare with Google. We programmers know this best. Like me, I always go outside the wall just to Google.
Some articles by Dr. Wu Jun were published on Google Blackboard, which introduced a lot of machine learning knowledge. As you can see from the article, Google actually uses machine learning to analyze the collected pages. Google obviously won't make this formula public. Even if one day Google really makes this formula public, then it is conceivable that Google will definitely develop more sharp secrets, and the search engine effect of copycat products will still not be as good as Google.
Copycat is the only way to innovation. Before becoming a leader in the field, one must go through a stage of learning and imitation. But to become the boss of the industry, to become a Champion, one must be brave enough to overtake others in corners, bravely embark on the road of innovation, and become a real scientist and a real expert!
Summary
Programming ability can be divided into two dimensions: one is the level of programming skills, and the other is the level of domain knowledge.
Some programmers may spend all their energy on improving their programming skills and have little knowledge of the field. This is actually extremely harmful in daily work. Some needs may already have ready-made, open source and free solutions, or they can be quickly solved by just combining a few existing software, but they have to spend a lot of time developing them themselves. In addition, lack of domain knowledge makes it difficult to quickly locate the source of the problem and solve bugs when unexpected situations occur in the program.