#System Development
1. High cohesion /Low couplingHigh cohesion means that a software module is composed of highly related code and is only responsible for one task, which is often referred to as the single responsibility principle. The cohesion of a module reflects the closeness of the internal connections within the module. The closer the connection between modules, the stronger the coupling, and the worse the independence of the module. The level of coupling between modules depends on the complexity of the interface between modules, the calling method and the information transferred. For a complete system, modules should exist as independently as possible. Generally, the higher the degree of cohesion of each module in the program structure, the lower the degree of coupling between modules.
2. Over-design
Over-design means too much future-oriented design or overcomplicating relatively simple things, excessive pursuit of modularity, scalability, design patterns, etc. The system adds unnecessary complexity.
3. Premature optimization
Premature optimization does not refer to the early stage of the development process, but when the direction of future changes in requirements has not yet been understood. Not only may your optimizations result in you not being able to implement new requirements well, but your guess at the expected optimization may be wrong, resulting in you actually getting nothing but complicating the code.
The correct way is to first implement your needs with quality, write enough test cases, and then profile to find the performance bottlenecks. Only then do optimization.
4. Refactoring
Refactoring is to improve the quality and performance of the software by adjusting the program code, making the design pattern and architecture of the program more reasonable, and improving the software scalability and maintainability.
5. Broken windows effect
Also known as Broken windows theory, Broken windows theory is a theory in criminology. This theory believes that if undesirable phenomena in the environment are allowed to exist, people will be tempted to imitate them or even worsen them. Take a building with a few broken windows. If those windows are not repaired, vandals may destroy more windows. Eventually they may even break into a building and, if found uninhabited, perhaps settle there or set fire to it.
As applied to software engineering, hidden dangers in system code or architecture design must not be allowed to emerge, otherwise as time goes by, the hidden dangers will become more and more serious. On the contrary, a high-quality system will make people write high-quality code involuntarily.
6. The principle of mutual distrust
means that in the entire link upstream and downstream of program operation, no point can be guaranteed to be absolutely reliable, and any point may fail or fail at any time. Unpredictable behavior, including machine networks, services themselves, dependency environments, inputs and requests, etc., so be on guard at all times.
7. Persistence
Persistence is a mechanism for converting program data between temporary and persistent states. In layman's terms, temporary data (such as data in memory, which cannot be saved permanently) is persisted into persistent data (such as persistence to a database or local disk, which can be saved for a long time).
8. Critical section
Critical section is used to represent a common resource or shared data, which can be used by multiple threads, but only one thread can use it at a time. Once the critical section resource is occupied, other threads must wait if they want to use this resource.
9. Blocking/non-blocking
Blocking and non-blocking usually describe the interaction between multiple threads. For example, if a thread occupies a critical section resource, then all other threads that need this resource must wait in this critical section. Waiting will cause the thread to hang. This situation is blocking. At this time, if the thread occupying the resource has been unwilling to release the resource, then all other threads blocked in this critical section will not be able to work. Non-blocking allows multiple threads to enter the critical section at the same time.
10. Synchronization/Asynchronous
Usually synchronization and asynchronous refer to function/method calling.
Synchronization means that when issuing a function call, the call will not return until the result is obtained. The asynchronous call will return instantly, but the instant return of the asynchronous call does not mean that your task is completed. It will start a thread in the background to continue the task, and notify the caller through callback or other methods after the task is completed.
11. Concurrency/Parallel
Parallel means that multiple instructions are executed simultaneously on multiple processors at the same time. So whether from a micro or macro perspective, the two are executed together.
Concurrency means that only one instruction can be executed at the same time, but multiple process instructions are executed in rapid rotation, which has the effect of multiple processes executing at the same time from a macro perspective, but does not have the same effect from a micro perspective. It is not executed at the same time, but the time is divided into several segments so that multiple processes can execute quickly and alternately.
Architecture Design
1. High Concurrency
Due to the advent of distributed systems, High Concurrency usually refers to ensuring through design The system is capable of processing many requests in parallel at the same time. In layman's terms, high concurrency means that at the same point in time, many users access the same API interface or URL address at the same time. It often occurs in business scenarios with a large number of active users and a high concentration of users.
2. High Availability
High availability HA (High Availability) is one of the factors that must be considered in the design of distributed system architecture. It usually refers to a system that has been specially designed to Designed to reduce downtime while maintaining a high level of availability of its services.
3. Read and write separation
In order to ensure the stability of database products, many databases have dual-machine hot backup functions. That is to say, the first database server is a production server that provides external addition, deletion, and modification services; the second database server mainly performs read operations.
4. Cold standby/hot standby
Cold standby: Two servers, one running and one not running as backup. In this way, once the running server goes down, the backup server will be running. The cold standby solution is relatively easy to implement, but the disadvantage of cold standby is that the standby machine will not automatically take over when the host fails, and requires active switching of services.
Hot standby: It is commonly known as active/standby mode. Server data, including database data, is written to two or more servers at the same time. When the active server fails, the standby machine is activated through software diagnosis (usually through heartbeat diagnosis) to ensure that the application is fully restored to normal use within a short period of time. When one server goes down, it automatically switches to another backup server.
5. Multi-active in remote locations
Multi-active in remote locations generally refers to the establishment of independent data centers in different cities. "Live" refers to cold backup, which is to back up all data. , usually do not support business needs, and will only be switched to the backup computer room when the main computer room fails. Duohuo means that these computer rooms also need to handle traffic and provide business support in daily business.
6. Load Balance
Load balancing is a load balancing service that distributes traffic to multiple servers. It can automatically allocate the external service capabilities of the application among multiple instances, improving the availability of the application system by eliminating single points of failure, allowing you to achieve a higher level of application fault tolerance, thereby seamlessly providing the load required to distribute application traffic. Balanced capacity to provide you with efficient, stable and secure services.
7. Dynamic and static separation
Dynamic and static separation refers to the architectural design method of separating static pages from dynamic pages or static content interfaces and dynamic content interfaces for different system access in the web server architecture, and then Improve the overall service access performance and maintainability.
8. Cluster
The concurrent carrying capacity of a single server is always limited. When the processing capacity of a single server reaches the performance bottleneck, multiple servers are combined to provide services. This The combination is called a cluster, and each server in the cluster is called a "node" of the cluster. Each node can provide the same service, thus doubling the concurrent processing capabilities of the entire system.
9. Distributed
A distributed system is to split a complete system into many independent subsystems according to business functions. Each subsystem is called a "service". The distributed system will request Sort and distribute to different subsystems, allowing different services to handle different requests. In a distributed system, subsystems run independently and are connected through network communication to achieve data interoperability and composite services.
10. CAP theory
CAP theory refers to the fact that in a distributed system, Consistency (consistency), Availability (availability), and Partition Tolerance (partition fault tolerance) cannot be simultaneously established.
- Consistency: It requires that at the same point in time, all data backups in the distributed system are the same or in the same state.
- Availability: After some nodes in the system cluster go down, the system can still respond to user requests correctly.
- Partition Fault Tolerance: The system can tolerate failures in network communication between nodes.
To put it simply, in a distributed system, up to two of the above attributes can be supported. But obviously since it is distributed, we are bound to partition it. Since it is partitioned, we cannot 100% avoid partition errors. Therefore, we can only make a choice between consistency and availability.
In distributed systems, what we often pursue is availability, which is more important than consistency. So how to achieve high availability, here is another theory, which is the BASE theory, which does the CAP theory further expansion.
11. BASE theory
BASE theory points out:
- Eventually consistent sex)
BASE theory is the result of a trade-off between consistency and availability in CAP. The core idea of the theory is: we cannot achieve strong consistency, but every application can According to its own business characteristics, adopt appropriate methods to make the system achieve final consistency.
12. Horizontal expansion/vertical expansion
Horizontal expansion Scale Out distributes the load by adding more servers or program instances, thereby increasing storage capacity and computing power. In addition, search the official account Programming Technology Circle backend and reply "mall" to get a surprise gift package.
Vertical expansion Scale Up improves the processing capability of a single machine.
There are two ways to expand vertically:
- (1) Enhance stand-alone hardware performance, for example: increase the number of CPU cores such as 32 cores, upgrade to a better network card such as 10G, upgrade to a better hard drive such as SSD, expand the hard drive capacity such as 2T, Expand system memory such as 128G;
- (2) Improve stand-alone software or architecture performance, for example: use Cache to reduce IO times, use asynchronous to increase single service throughput, and use lock-free data Structure to reduce response time;
13. Parallel expansion
is similar to horizontal expansion. The nodes in the cluster server are all parallel peer nodes. When expansion is required, more nodes can be added to improve the cluster's service capabilities. Generally speaking, the critical paths in the server (such as login, payment, core business logic, etc.) in the server need to support dynamic parallel expansion at runtime.
14. Elastic expansion
refers to the dynamic online expansion of the deployed cluster. The elastic expansion system can automatically add more nodes (including storage nodes, computing nodes, and network nodes) according to the actual business environment and certain strategies to increase system capacity, improve system performance, or enhance system reliability, or accomplish these three goals at the same time. .
15. State synchronization/frame synchronization
State synchronization: State synchronization means that the server is responsible for calculating all game logic and broadcasting the results of these calculations, and the client is only responsible for sending players operations, and performance of the game results received.
Features: State synchronization is highly secure, logic updates are convenient, disconnection and reconnection are fast, but development efficiency is low. Network traffic increases with the complexity of the game, and the server needs to bear greater pressure. .
Frame synchronization: The server only forwards messages without any logical processing. Each client has the same number of frames per second and processes the same input data in each frame.
Features: Frame synchronization needs to ensure that the system has the same output under the same input. Frame synchronization has high development efficiency, low traffic consumption and stability, and puts very little pressure on the server. However, the network requirements are high, the disconnection and reconnection time is long, and the client computing pressure is high.
Network communication
1. Connection pool
Establish a connection buffer pool in advance and provide a set of connection usage, allocation, and management strategies to make the connection pool The connections in can be reused efficiently and safely, avoiding the overhead of frequent connection establishment and closing.
2. Disconnection and reconnection
Due to network fluctuations, users are intermittently disconnected from the server. After the network is restored, the server tries to connect the user to the state when it was last disconnected. and data.
3. Session persistence
Session persistence refers to a mechanism on the load balancer that can identify the correlation between the interaction process between the client and the server. While performing load balancing, It also ensures that a series of related access requests will be distributed to one machine. To put it in human terms: multiple requests initiated during a session will all fall on the same machine.
4. Long connection/short connection
Usually refers to the long connection and short connection of TCP. A long connection means that after a TCP connection is established, the connection is maintained. Generally, heartbeats are sent to each other to confirm the existence of the correspondence. Multiple business data transmissions are performed in the middle, and the connection is generally not actively disconnected. Short connections generally refer to establishing a connection, executing a transaction (such as an http request), and then closing the connection.
5. Flow control/congestion control
Flow control prevents the sender from sending too fast and exhausting the receiver's resources, so that the receiver has no time to process.
Congestion control prevents the sender from sending messages too fast, causing the network to have no time to process and causing congestion, which in turn causes the performance of this part and even the entire network to decline. In severe cases, it may even cause network communication services to come to a standstill.
6. The thundering herd effect
The thundering herd effect is also called the thundering group effect, but what is it called? In short, the thundering herd phenomenon is when multiple processes (multiple threads) block and wait for the same process at the same time. During an event (sleep state), if the event you are waiting for occurs, it will wake up all the waiting processes (or threads), but in the end only one process (thread) may obtain "control" of this time. When processing this event, other processes (threads) fail to obtain "control" and can only re-enter the sleep state. This phenomenon and waste of performance are called panic groups.
7. NAT
NAT (Network Address Translation) is to replace the address information in the IP packet header. NAT is usually deployed at the network egress of an organization to provide public network reachability and upper-layer protocol connectivity by replacing the internal network IP address with the egress IP address.
Fault exception
1. Downtime
Downtime generally refers to the computer host crashing due to unexpected failure. Secondly, some servers, such as database deadlocks, can also be called downtime. Some services on some servers hang up, so to speak.
2. coredump
When a program error occurs and is interrupted abnormally, the OS will store the current status of the program's work into a coredunmp file. Usually coredump files contain memory, register status, stack pointer, memory management information, etc. when the program is running.
3. Cache Penetration/Breakdown/Avalanche
Cache Penetration: Cache penetration refers to querying a data that must not exist. Since the cache is not hit, it needs to be retrieved from When querying the database, if the data cannot be found, it will not be written to the cache. This will cause the non-existent data to be queried in the database every time it is requested, which will put pressure on the database.
Cache breakdown: Cache breakdown refers to when a hotspot key expires at a certain point in time, and there are a large number of concurrent requests for this key at this point in time, resulting in a large number of The request hits the db.
Cache avalanche: Cache avalanche refers to the expiration time of a large batch of data in the cache, and the huge amount of query data causes excessive pressure on the database or even downtime.
The difference from cache breakdown is that cache breakdown is the failure of hotspot keys, and cache avalanche is the failure of a large number of keys at the same time.
4. 500/501/502/503/504/505
500 Internal Server Error: Internal service error. Generally, the server encounters an unexpected situation and cannot complete the request. Possible reasons:
1. Program error, such as ASP or PHP syntax error;
2. System resource limit cannot be achieved due to high concurrency. Caused by opening too many files.
501 Not implemented: The server does not understand or support the requested HTTP request.
502 Bad Gateway: The WEB server is faulty. It may be due to insufficient program processes. The requested php-fpm has been executed, but has not been completed for some reason, eventually causing the php-fpm process to terminate. Possible reasons:
1. Nginx server, the number of php-cgi processes is not enough;
#2. PHP execution time is too long;
3. The php-cgi process died;
503 Service Unavailable: The server is currently unavailable. The system maintenance server is temporarily unable to process the client's request. This is only a temporary state. You can contact the server provider.
504 Gateway Timeout: The server 504 error indicates a timeout, which means that the request sent by the client does not reach the gateway, and the request does not reach php-fpm that can be executed. This is generally related to the configuration of nginx.conf.
505 HTTP Version Not Supported: The server does not support the HTTP protocol version used in the request. (HTTP version is not supported)
Except for the 500 error, which may be a programming language error, the rest of the errors can probably be understood as problems with the server or server configuration.
5. Memory overflow/memory leak
Memory overflow: Memory overflow (Out Of Memory) means that when the program applies for memory, there is not enough memory for the applicant to use. In other words, if you are given a storage space to store int type data, but you store long type data, then the result is that the memory is not enough, and an OOM error will be reported, which is the so-called memory overflow.
Memory Leak: Memory Leak refers to the dynamically allocated heap memory in the program that is not released or cannot be released for some reason, resulting in a waste of system memory and slowing down the running speed of the program. Even serious consequences such as system crash.
6. Handle leakage
Handle leakage means that the process does not release the open file handle after calling the system file. The general phenomenon after a handle leak is that the machine slows down, the CPU surges, and the CPU usage of the cgi or server where the handle leak occurs increases.
7. Deadlock
Deadlock refers to a blocking phenomenon caused by two or more threads competing for resources or communicating with each other during execution. If Without external force, they are all blocked and unable to proceed. At this time, the system is said to be in a deadlock state or the system has a deadlock.
8. Soft interrupt/hard interrupt
Hard interrupt: The interrupt we usually refer to refers to a hard interrupt (hardirq).
Mainly used to notify the operating system of changes in system peripheral status.
Soft interrupt: 1. Usually the kernel is interrupted by the hard interrupt service routine; 2. In order to meet the requirements of the real-time system, the interrupt processing should be as fast as possible.
Linux In order to realize this feature, when an interrupt occurs, the hard interrupt handles the work that can be completed in a short time, and the long-term work of processing the event is completed after the interrupt, that is, the soft interrupt handles the work that can be completed in a short time. Interrupt (softirq) to complete.
9. Glitch
At a short moment, server performance indicators (such as traffic, disk IO, CPU usage, etc.) are much greater than the time period before and after that moment. The appearance of glitches means that the server resources are unevenly and insufficiently utilized, which can easily lead to other more serious problems.
10. Replay attack
The attacker sends a packet that has been received by the destination host to deceive the system. It is mainly used for the identity authentication process and destroys the correctness of the authentication. It is a type of attack that continuously maliciously or fraudulently repeats a valid data transmission. A replay attack can be carried out by the initiator or by an adversary who intercepts and retransmits the data. Attackers use network monitoring or other methods to steal authentication credentials and then resend them to the authentication server.
11. Network Island
Network Island refers to a situation in a cluster environment where some machines lose network connectivity with the entire cluster, split into a small cluster, and data inconsistency occurs.
12. Data skew
For cluster systems, the cache is generally distributed, that is, different nodes are responsible for a certain range of cached data. We do not disperse the cached data enough, which results in a large amount of cached data being concentrated on one or several service nodes, which is called data skew. Generally speaking, data skew is caused by poor load balancing implementation.
13. Split-brain
Split-brain refers to the system split caused by the network unreachability between some nodes in the cluster system. Different split small clusters will provide services according to their respective states. The original The cluster will have inconsistent responses at the same time, causing nodes to compete for resources, system chaos, and data corruption.
Monitoring alarms
1. Service monitoring
The main purpose of service monitoring is to accurately and quickly detect and reduce service problems when there are problems or problems are about to occur. Small scope of influence. There are generally many methods for service monitoring, which can be divided into levels:
- System layer (CPU, network status, IO, machine load, etc.)
- Application layer (process status, error log, throughput, etc.)
- Business layer (service/interface error code, response time)
- User layer (user behavior, public opinion monitoring, front-end tracking)
2. Full link monitoring
Service dial testing: Service dial testing is a monitoring method to detect the availability of services (applications). Through dial testing nodes The target service is periodically detected, mainly measured by availability and response time. There are usually multiple test nodes in different places.
Node detection: Node detection is a monitoring method used to discover and track network availability and smoothness between nodes in different computer rooms (data centers), mainly through response time and packet loss rate , hop count is measured, and the detection method is generally ping, mtr or other private protocols.
Alarm filtering: Filter certain predictable alarms and exclude data from alarm statistics, such as http response 500 errors caused by a small number of crawler visits, business system custom exception information, etc. .
Alarm deduplication: When an alarm is notified to the person in charge, the same alarm will not be received again until the alarm is restored.
Alarm Suppression: In order to reduce the interference caused by system jitter, suppression also needs to be implemented. For example, a momentary high load on the server may be normal, and only a sustained high load for a period of time is required. Receive attention.
Alarm recovery: Development/operation and maintenance personnel not only need to receive alarm notifications, but also need to receive notifications that the fault has been eliminated and the alarm has returned to normal.
Alarm merging: Merge multiple identical alarms generated at the same time. For example, if multiple sub-service load alarms appear in a microservice cluster at the same time, they need to be merged into one alarm. .
Alarm convergence: Sometimes when an alarm occurs, it is often accompanied by other alarms. At this time, you can only generate an alarm for the root cause, and other alarms will be converged into sub-alarms and notifications will be sent together. For example, when a cloud server has a CPU load alarm, it is often accompanied by availability alarms for all systems it carries.
Fault self-healing: Real-time detection of alarms, pre-diagnosis and analysis, automatic fault recovery, and opening up surrounding systems to achieve a closed loop of the entire process.
Service Governance
1. Microservices
Microservice architecture is an architectural pattern that advocates dividing a single application into a set of small Services, services coordinate and cooperate with each other to provide users with ultimate value. Each service runs in its own independent process, and services communicate with each other using a lightweight communication mechanism (usually Restful API based on HTTP). Each service is built around a specific business and can be Independently deploy to production environment, production-like environment, etc.
2. Service discovery
Service discovery refers to using a registration center to record information about all services in a distributed system so that other services can quickly find these registered services. Service discovery is the core module that supports large-scale SOA and microservice architecture, and it should be as highly available as possible.
3. Traffic Peak Shaving
If you watch the request monitoring curve of the lottery or flash sale system, you will find that this type of system will have a peak during the time when the activity is open, and when the activity is not open, there will be a peak. , the system's request volume and machine load are generally relatively stable. In order to save machine resources, we cannot always provide maximum resource capacity to support short-term peak requests. Therefore, some technical means need to be used to weaken the instantaneous request peak and keep the system throughput controllable under peak requests. Peak clipping can also be used to eliminate glitches and make server resource utilization more balanced and full. Common peak-shaving strategies include queues, frequency limiting, hierarchical filtering, multi-level caching, etc.
4. Version compatibility
In the process of upgrading the version, you need to consider whether the new data structure can understand and parse the old data after the upgraded version, and whether the newly modified protocol can understand the old data. agreement and make appropriate and expected decisions. This requires version compatibility during the service design process.
5. Overload protection
Overload means that the current load has exceeded the maximum processing capacity of the system. The occurrence of overload will cause some services to be unavailable. If not handled properly, it is very likely to cause service failure. Completely unavailable, even avalanche. Overload protection is a measure taken against this abnormal situation to prevent the service from being completely unavailable.
6. Service fuse
The function of service fuse is similar to our household fuse. When a service becomes unavailable or the response times out, in order to prevent the entire system from avalanching, the service is temporarily stopped. transfer.
7. Service downgrade
Service downgrade is when the server pressure increases sharply, some services and pages are strategically downgraded based on the current business conditions and traffic, so as to release server resources and Ensure the normal operation of core tasks. Degradation often specifies different levels, and performs different processing when facing different exception levels. In addition, search the public account Java Architect Technical Backend and reply "Spring" to get a surprise gift package.
Depending on the service method: you can refuse service, you can delay service, and sometimes you can provide random service.
Depending on the service scope: a certain function can be cut off, or certain modules can be cut off.
In short, service downgrade requires different downgrade strategies according to different business needs. The main purpose is that although the service is damaged, it is better than nothing.
8. Meltdown VS Downgrade
Same points: the goals are the same, starting from availability and reliability, in order to prevent the system from crashing; the user experience is similar, and in the end, users experience that certain functions are temporary Unavailable;
Differences: The triggering reasons are different. Service interruption is generally caused by the failure of a certain service (downstream service), while service degradation is generally considered from the overall load;
9. Service limit Flow
Current limiting can be considered a type of service degradation. Current limiting is to limit the input and output traffic of the system to achieve the purpose of protecting the system. Generally speaking, the throughput of the system can be measured. In order to ensure the stable operation of the system, once the threshold that needs to be restricted is reached, the traffic needs to be restricted and some measures must be taken to achieve the purpose of limiting the traffic. For example: delayed processing, rejection of processing, or partial rejection of processing, etc.
10. Fault masking
Remove the faulty machine from the cluster to ensure that new requests are not distributed to the faulty machine.
Testing method
1. Black box/white box testing
Black box testing does not consider the internal structure and logical structure of the program. It is mainly used to test whether the function of the system meets the requirement specifications. . Generally there will be an input value, an input value, and an expected value for comparison.
White box testing is mainly used in the unit testing phase, mainly for code-level testing. For the internal logical structure of the program, the testing methods include: statement coverage, judgment coverage, condition coverage, path coverage, and condition combination coverage
2. Unit/Integration/System/Acceptance Testing
Software testing is generally divided into 4 stages: unit testing, integration testing, system testing, and acceptance testing.
Unit testing: Unit testing is to check and verify the smallest verifiable unit in the software, such as a module, a process, a method, etc. Unit testing has the smallest granularity and is generally tested by the development team using a white-box approach. It mainly tests whether the unit conforms to the "design".
Integration testing: Integration testing is also called assembly testing. It usually tests all program modules in an orderly and incremental manner based on unit testing. Integration testing is between unit testing and system testing, playing a "bridge role". Generally, the development team uses a white box plus black box approach to test, which not only verifies the "design" but also verifies the "requirements".
System testing: During system testing, the software that has been integrated tested will be used as part of the computer system and combined with other parts of the system to conduct a series of strict and effective tests in the actual operating environment. Test to discover potential problems with the software and ensure the normal operation of the system. System testing has the largest granularity and is generally tested by an independent testing team using a black box approach. It mainly tests whether the system meets the "requirement specifications".
Acceptance Test: Acceptance test, also called delivery test, is a formal test conducted on user needs and business processes to determine whether the system meets the acceptance criteria. It is conducted by users, customers or other authorized The institution decides whether to accept the system. Acceptance testing is similar to system testing. The main difference is that the testers are different. Acceptance testing is performed by users.
3. Regression testing
After the defects are found and corrected, or new functions are added to the software, retest. Used to check whether the discovered defects have been corrected and the modifications did not cause new problems.
4. Smoke testing
This term comes from the hardware industry. After making changes or repairs to a piece of hardware or hardware components, apply power directly to the device. If there is no smoke, the component passes the test. In software, the term "smoke testing" describes the process of validating code changes before embedding them into the product's source tree.
Smoke testing is a rapid basic function verification strategy for software version packages during the software development process. It is a means of confirming and verifying the basic functions of the software, and is not an in-depth test of the software version package.
For example: For a smoke test of a login system, we only need to test the correct user name and password to verify the core function point of login. As for the input box, special characters, etc., we can test it in the smoke test Proceed afterwards.
5. Performance testing
Performance testing is to test various performance indicators of the system by simulating a variety of normal, peak and abnormal load conditions through automated testing tools. Both load testing and stress testing are performance tests, and the two can be combined.
Through load testing, the performance of the system under various workloads is determined. The goal is to test the changes in various performance indicators of the system when the load gradually increases.
Stress testing is a test that determines the bottlenecks or unacceptable performance points of a system to obtain the maximum service level that the system can provide.
6. Benchmark test
Benchmark test (Benchmark) is also a performance test method, used to measure the maximum actual operating performance of the machine's hardware and the performance improvement effect of software optimization. It can also Used to identify CPU or memory efficiency problems in a certain piece of code. Many developers will use benchmark tests to test different concurrency modes, or use benchmark tests to assist in configuring the number of worker pools to ensure that the throughput of the system can be maximized.
7. A/B test
A/B test is to compare two or more randomly assigned samples with similar numbers. If the experimental results of the experimental group and the comparison group are compared , is statistically significant on the target indicator, it can show that the function of the experimental group can lead to the results you want, thereby helping you verify your hypothesis or make product decisions.
8. Code coverage testing
Code coverage (Code coverage) is a measurement in software testing, which describes the proportion and extent of the source code in the program that is tested. The resulting proportion is called code coverage. When doing unit testing, code coverage is often used as an indicator to measure the quality of testing. Code coverage is even used to assess the completion of test tasks. For example, code coverage must reach 80% or 90%. Ever since, testers have gone to great lengths to design case coverage code.
Release deployment
1. DEV/PRO/FAT/UAT
DEV (Development environment): development environment, used for debugging by developers , the version changes greatly.
FAT (Feature Acceptance Test environment): Functional acceptance test environment, used by software testers for testing.
UAT (User Acceptance Test environment): User acceptance test environment, used for functional verification in the production environment, and can be used as a pre-release environment.
PRO (Production environment): Production environment, formal online environment.
2. Grayscale release
Grayscale release refers to the process of upgrading the version, through partition control, whitelist control, etc., to upgrade product features for some users first, while the remaining users maintain remains unchanged. When users who upgrade product features have no feedback after a period of time, the scope will be gradually expanded, and the new version features will eventually be opened to all users. Grayscale release can ensure the stability of the overall system. It can be discovered during the initial grayscale. Modify the question to ensure its impact.
3. Rollback
refers to restoring the program or data to the last correct state (or the previous stable version) when a program or data processing error occurs. the behavior of.