Home  >  Article  >  System Tutorial  >  Optimize disaster recovery deployment and remove operation and maintenance responsibilities

Optimize disaster recovery deployment and remove operation and maintenance responsibilities

WBOY
WBOYforward
2024-01-03 22:36:201251browse
Introduction Nowadays, local load balancing technology has solved the high availability problem of server clusters, but power outages, construction cutting of optical cables, natural disasters, etc. can still cause the entire data center to be unable to work. In addition, China’s network is composed of multiple operators, and it is an indisputable fact that the quality of interconnection between operators is poor. Therefore, large Internet companies are no longer satisfied with providing website services in a single or active-active data center. More and more Internet companies are beginning to consider deploying multiple data center clusters in different regions and different operators to achieve user access nearby. Load balancing and fault tolerance.

Nowadays, local load balancing technology has solved the high availability problem of server clusters. However, power outages, construction cutting of optical cables, natural disasters, etc. can still cause the entire data center to be unable to work. In addition, China’s network is composed of multiple operators, and it is an indisputable fact that the quality of interconnection between operators is poor. Therefore, large Internet companies are no longer satisfied with providing website services in a single or active-active data center. More and more Internet companies are beginning to consider deploying multiple data center clusters in different regions and different operators to achieve user access nearby. Load balancing and fault tolerance.

When it comes to multi-data center deployment, it is inevitable to face the following three problems.

1. How to distribute the traffic of multiple data centers?

2. How to detect network faults in time through monitoring?

3. How to provide disaster recovery for multiple data center services?

If these three problems cannot be effectively solved, it will lead to poor user access quality, service black holes, and customer complaints. The operation and maintenance personnel behind the website will be challenged frequently by sales, PMs, and leaders! Become the target of taking the blame. What is gratifying is that Alibaba Cloud's product cloud resolution DNS has now helped small and medium-sized enterprises solve traffic load balancing in multiple data centers, achieve nearby user access, timely detection of faults, and real-time disaster recovery switching.

Breaking Data center traffic load balancing

When deploying services in multiple data centers, you must face many factors such as different access bandwidths of different data centers, different load capacities of server clusters, and operating costs. Therefore, it is necessary to design a matching traffic allocation ratio based on different factors. So how can we accurately allocate access traffic? Cloud Resolution DNS provides you with some reference solutions.

Cloud Analysis DNS is a specially designed intelligent DNS system that can quickly identify the location information of an IP address (including country, province, city, operator, etc.), and can respond differently to DNS queries from different sources. IP addresses to meet the needs of enterprises for nearby access, reducing cross-network traffic, and grayscale publishing. At the same time, for data center clusters with different service capabilities in the same location, the overall traffic distribution plan can be set through WRR (Weighted Resource Record).

For example: the www official website of example.com company has 6 data centers, including two North China Telecom, two East China Unicom, and the other two are hosted in Alibaba Cloud BGP data center, Optimize disaster recovery deployment and remove operation and maintenance responsibilities

1. The bandwidth ratio of East China Unicom's two data centers is 3:7. When setting up intra-line load balancing through cloud analysis, set the weights of the service IP addresses of the two data centers to 3 and 7 respectively to achieve East China Unicom access. The traffic is allocated according to the proportion of 30% and 70%;
2. The bandwidth ratio of North China Telecom's two data centers is 1:1. When setting up line load balancing through cloud analysis, set the weights of the service IP addresses of the two data centers to 1 respectively, so that each accounts for 50% of North China Telecom's access traffic. Configuration ratio;
3. Alibaba Cloud BGP The ratio of the number of ECSs in the two Regions is 8:2. When setting up the in-line load balancing through cloud analysis, set the weights of the public network elastic IP addresses of the two Regions to 8 and 2 respectively, so that the access traffic is as follows The ratio of 80% and 20% allocation;
4. Network monitoring monitors the service IP of each data center in real time;
5. Network monitoring periodically feeds back the monitoring results to the cloud resolution DNS;
6. The user initiates a www.example.com DNS query request to North China Telecom dns;
7. If the North China Telecom DNS does not cache the domain name after receiving the user's query, it will initiate a domain name query to the cloud resolution DNS;
8. When Cloud Resolution DNS receives the DNS query from North China Telecom, it polls and responds to IP addresses 3.3.3.3 and 4.4.4.4. At this time, half of the results obtained by North China Telecom's DNS are 3.3.3.3, and the other half's results obtained by North China Telecom's DNS are 4.4.4.4. In the same way, when Cloud Analysis DNS receives the DNS query from East China Unicom, it first returns 5.5.5.5 three times in a row, and then returns 6.6.6.6 seven times in a row, and then repeats the execution. At this time, 30% of East China Unicom's DNS results are 3.3 .3.3, the remaining 70% results in 4.4.4.4.
9. After receiving the response from the cloud resolution DNS, North China Telecom DNS will cache the domain name resolution results and return them to the final query user.
10. Finally, 50% of North China Telecom users access the website services on 3.3.3.3, and the other 50% of North China Telecom users access the website services on 4.4.4.4

Network monitoring detects faults in time

1. Cloud resolution DNS helps small and medium-sized enterprises achieve nearby access and traffic distribution through intelligent resolution and WRR. It also effectively combines Alibaba Cloud distributed monitoring and uses network-wide dial test probes to monitor the website's resolution records in real time. .
Optimize disaster recovery deployment and remove operation and maintenance responsibilities

2. The network monitoring of Cloud Analysis DNS currently supports HTTP/HTTPS and custom URLs. On the basis of providing 5 real Alibaba dial-up test nodes, 15 high-quality dial-up test points of the three major operators have been selected. At the same time, the configuration of up to 50 monitoring tasks is completely ahead of competitors, ensuring that downtime faults can be discovered in time and increasing monitoring coverage.
Optimize disaster recovery deployment and remove operation and maintenance responsibilities
3. The monitoring frequency is as low as 1 minute, which is equivalent to a health check for your website every 3 seconds. The fault can be detected within 3 minutes after the downtime at the earliest, and the failover can be completed through the global load balancing function.
4. In order to prevent false alarms from occurring, we set the downtime judgment threshold to 50%, that is, when 50% of the nodes monitor abnormally, they are judged to be downtime.
5. Of course, the effectiveness of DNS is also affected by the operator's cache TTL. It is recommended to set the host record TTL to 60 seconds.
6. If you are a mobile developer, it is recommended to use it together with Alibaba Cloud HTTPDNS service to make failover more sensitive.

Switching between lines to achieve fault isolation

Fault Isolation
During the operation of website services, failures will inevitably occur. So how to isolate faults? Cloud resolution DNS has the following practices, which can be used by small and medium-sized enterprises.
Optimize disaster recovery deployment and remove operation and maintenance responsibilities

1. A data center cluster 4.4.4.4 of North China Telecom suffered a large-scale failure due to abnormal reasons. The website service was interrupted and user access failed;
2. Website monitoring found a 4.4.4.4 cluster failure within 2 minutes, and notified the cloud resolution DNS system to suspend the IP address resolution of North China Telecom: 4.4.4.4;
3. After Cloud Analysis DNS suspends the faulty IP resolution, it will only query North China Telecom DNS and return the IP address: 3.3.3.3. At the same time, Cloud Analysis DNS resolution log will record the failure time, IP address, and suspension operation information, and notify via SMS and email. Your operation and maintenance engineer.
4. Finally, all user access traffic will be transferred to the North China Telecom data center: 3.3.3.3.

Recovery
When the website is restored to service, how to easily migrate the traffic?
Optimize disaster recovery deployment and remove operation and maintenance responsibilities
1. After all access traffic of North China Telecom users is migrated to 3.3.3.3, 4.4.4.4 is equivalent to offline status. You can organize relevant technical students to repair the faulty cluster.
2. After the repair is completed and the test passes, the monitoring system can automatically detect that the website service of North China Telecom Data Center 4.4.4.4 has returned to normal, and notify the cloud resolution DNS to restore the IP address resolution of North China Telecom 4.4.4.4,
3. When Cloud Resolution DNS receives the DNS query from North China Telecom, it polls and responds to IP addresses 3.3.3.3 and 4.4.4.4. After a period of time, half of the North China Telecom DNS results were 3.3.3.3, and the other half of the North China Telecom DNS results were 4.4.4.4.
4. The end user's access traffic will smoothly transition to 50% of the original configuration, ensuring that the access traffic is restored smoothly and without any user awareness.

Off-site disaster recovery

For large Internet companies, one thing that must be considered is how to ensure normal user access when a catastrophic situation occurs
Optimize disaster recovery deployment and remove operation and maintenance responsibilities
1. Due to some irresistible reasons, the two access IP addresses of North China Telecom's data center: 3.3.3.3 and 4.4.4.4 all failed and could not be restored in time;
2. Website monitoring detects faults in a timely manner and notifies Cloud DNS to suspend IP resolution for all North China Telecom lines;
3. After the cloud resolution DNS is suspended, the inter-line load balancing policy will be enabled, and the DNS query of the North China Telecom user will return the Alibaba Cloud BGP Region address: 1.1.1.1, 2.2.2.2;
4. Finally, the access traffic of all North China Telecom users is scheduled to the default line Alibaba Cloud BGP Region: 1.1.1.1, 2.2.2.2, ensuring that in extreme circumstances, normal services can still be provided to North China Telecom users

Summarize

Cloud resolution DNS is a high-availability, highly scalable authoritative DNS service and DNS management service. It provides a variety of global load balancing strategies to help small and medium-sized enterprises quickly and accurately route user requests to your data center. It also has high-availability disaster recovery switching capabilities, so that even in the case of some data center failures, small and medium-sized enterprises can still be guaranteed. The website services are accessible.

In the future, Cloud Resolution DNS will be integrated with more Alibaba Cloud products, such as SLB, ECS, CDN, Cloud Shield, etc. Forming a three-dimensional high-availability website solution, from access portal to back-end services, helping small and medium-sized enterprises achieve full-link load balancing.

The above is the detailed content of Optimize disaster recovery deployment and remove operation and maintenance responsibilities. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:linuxprobe.com. If there is any infringement, please contact admin@php.cn delete