Home >headlines >Review of Pinduoduo's technical accident, what should programmers learn?

Review of Pinduoduo's technical accident, what should programmers learn?

青灯夜游
青灯夜游forward
2019-01-22 13:28:034950browse

Every accident forces the technical team to grow. No one can guarantee that no bugs will be written and no mistakes will be made. What we have to do is to find the root of the problem after the accident occurs and fill in the holes in time to stop the loss.

Review of Pinduoduo's technical accident, what should programmers learn?

In the early morning of January 20, 2019, some netizens claimed that there was a major bug in Pinduoduo, and users could receive and spend the 100-yuan no-threshold coupon at will. Everyone rushed to spread the word and got up in the middle of the night to collect coupons. Some users even received thousands of coupons. Smart users spent the coupons as quickly as possible, such as recharging China Mobile.

Recommended related articles: A programmer caused a major bug in Pinduoduo and was fined tens of millions

Pinduoduo responded early in the morning, "A black and gray production gang passed an expired The coupon loophole stole tens of millions of platform coupons and made illegal profits. In response to this behavior, the platform has repaired the loophole as soon as possible and is tracing the origin of the orders involved. At the same time, we have reported the case to the public security agency and We will actively cooperate with relevant departments to crack down on the black and gray industry gangs involved."

Subsequently, a Pinduoduo spokesperson said that the actual final asset loss may be less than 10 million yuan.

After this incident happened, there was an explosion in the technology circle. It may be due to rumors that "a bug may cause a loss of 20 billion to the company."

As a programmer, what I am more concerned about is, where did this bug come from? According to market rumors, we can roughly get some clues as follows. The authenticity of these clues has yet to be verified. They may not be the true circumstances of this incident, but this does not prevent us from using these clues to explore the enlightenment this accident has brought to us.

● Many people have made hundreds of thousands of dollars;

● This coupon is a test coupon;

● The system automatically launches the test coupon in the early morning;

● ● The operation and maintenance found that the system exploded and exceeded the threshold;

● The person involved manually offline the test coupon;

● The test coupon that was manually offline was online again at 8 a.m.

These clues seem to reasonably explain the deployment and operation of a major bug. From these clues, in addition to the design problems of the coupon itself, we can also see the chaos of operation and maintenance. How can test coupons be put online? The system is out of control, why are there no follow-up risk prevention measures? How can a test coupon that was manually offline be brought online again? Why are online and offline coupon operations so sloppy? If a software system is operated and maintained at this level, problems will occur sooner or later. If there is no problem yet, it can only be said that you are very lucky.

Coupon design issues

The first question that attracts us is, "Many people have made hundreds of thousands of dollars." This means that one person can receive thousands of coupons. Many people do this, which shows that the technical threshold for receiving no-threshold coupons is extremely low.

General coupons, similar to discount coupons, have usage thresholds, such as buying 100 yuan and getting a discount of 20 yuan. No-threshold coupons, as the name suggests, are coupons that have no usage threshold. A 100 yuan coupon can buy 100 yuan of goods, which is almost equivalent to cash. Since no-threshold coupons are similar to cash, they are significantly different from ordinary coupons.

Ignoring the wooly parties, Pinduoduo claims to have 300 million users. If each user can legally and reasonably receive the 100 yuan no-threshold coupon, this will require 30 billion yuan. There is almost no threshold for account registration. If WeChat’s 1 billion users legally and reasonably register, receive threshold-free coupons, and recharge their mobile phones, 100 billion yuan will be needed.

As of yesterday, Pinduoduo’s market capitalization was close to US$23 billion, which is equivalent to RMB 160 billion. The 100-yuan no-threshold voucher can be collected at will. This looks like a gesture of breaking up the company and giving bonuses to everyone. This is certainly not what is expected from a no-threshold coupon.

The no-threshold coupons that can be obtained casually are not qualified commercial designs no matter how you look at them. At the very least, set an upper limit on the quantity. Once everyone grabs it, it will be gone. If you give away a few million, it will be given away. Giving away tens of billions does not conform to normal business logic.

Generally speaking, there are many additional conditions for receiving even ordinary coupons. For example, an account can only receive it once, or only new accounts can receive it. Account authentication also requires many security measures, such as binding mobile phone numbers, binding devices, and using secure connections. Account authentication and management are the most basic functions of a service website. If a person can receive thousands of coupons, then the basic skill of account management cannot be considered as passing. If the account management level is not up to standard, the risk of this website will be worse than we imagined.

Such poor account management and such poor business design are so unexpected and inconsistent with normal logic. Therefore, I agree that this is just a test coupon and should not appear in a normal operating system. And if it is really a problem with test coupons, it will expose the chaos of software development and operation and maintenance.

Chaotic R&D and Operation and Maintenance

A test coupon came online automatically; after being offline manually, it automatically came online again. This product launch process is incredible. The release of a function must go through design, implementation, review, testing, and approval before it can be implemented into a formal system. As long as one of these links plays a role, the test coupons will not be online, let alone automatically.

After going online, there must be continuous risk monitoring. If the system exceeds the threshold and explodes, operation and maintenance can immediately obtain information about the explosion, such as accounts being unusually active in the middle of the night or coupon business being popular, and can also cut off this risk in a timely manner.

It can be seen that the reasonable restraint mechanism for operation and maintenance personnel is an issue that commercial companies must treat with caution. How can a random test coupon be used directly into the official system? Software operation and maintenance is a link that requires special attention to risk management and control, especially when the quality of the software has not kept up. Software operation and maintenance accidents can sometimes even subvert an industry.

In 2011, a digital certificate issuing agency issued multiple digital certificates to Google. Google has never applied for a digital certificate from this organization. In other words, the holder of the digital certificate is not Google. This holder can impersonate Google's website and steal user login information, including usernames and passwords. This means that either there is a problem with the company's technical level (hacker attack), or there is a problem with the company's operation and maintenance capabilities (random issuance of certificates). This security issue was exposed in August 2011, and almost all software giants immediately announced that they would block the digital certificate of this organization. In September, this institution with great market influence declared bankruptcy.

However, this is not the end, the doom of the digital certificate issuance industry has just begun. People seem to suddenly realize that information security cannot rely on digital certificate authorities. The security of a company with a market capitalization of US$400 billion cannot rely on the operational capabilities of a company with a market capitalization of US$4 billion. As a result, various new technologies emerged in the following years. If these new technologies are widely used, they will completely wipe out the entire digital certificate issuance industry. Such a day is not far away from us. Nowadays, the life of digital certificate issuing agencies is relatively bleak. Some are sold and some are scattered.

Hitchhiking, which frequently causes safety accidents, has a similar nature in the long run. In contrast, if all that is lost in an accident is money, the impact may be bearable. Hopefully all that will be lost is money, but I am not optimistic about this expectation.

Looking back to the source, operation and maintenance is so chaotic, which is generally inseparable from the quality of the software. For a serious software system, how to produce it, how to deploy it, how to operate it, how to authorize it, how to deal with crises, all these issues must be designed, implemented and planned. The party involved designed a threshold-free test coupon that can be run in the operating system and can be received unlimitedly. This exposed the poor quality of the software behind it and the loose and disorderly software development process. We often say that excellent software comes from excellent processes. A chaotic R&D process makes it difficult to produce high-quality products.

what should we do?

Ignorance is fearless. The reason why security issues are special is that if it does not happen, we may never know the existence of the problem, and of course we will not know how serious the problem is. Every security crisis should not be wasted. If we are the parties involved, what methods can we use to avoid similar accidents?

The most urgent thing is to pay off the account security debt as soon as possible. Otherwise, after this spread, a lot of eyes will be attracted to this piece of fat. The next wave of hacker attacks may already be on the way.

Next, consider the following things as soon as possible.

The first thing to do is to standardize the research and development process. The products produced under the first-class software development process will not be any worse no matter how bad they are. In this research and development process, programmers cannot act alone. Requirements need to be discussed, designs need to be previewed, functions need to be reviewed, codes need to be reviewed, software needs to be tested, and the system needs to be put into trial operation. Everyone will make mistakes. A few more pairs of eyes watching every step will greatly reduce the chance of making mistakes. Programmers can also grow quickly through the research and development process, further reducing the chance of errors. A good system can make people successful; a bad system can destroy people.

The second thing to do is to pay attention to the security of the code. Code security does not always refer to hacker intrusion. The use of this no-threshold coupon is a serious safety incident. Reflected at the code level, it may be that account management is unsafe, business logic is not verified, operation and maintenance authorization management is lax, and abnormal risks are not warned in time.

The third thing to do is to deduce the business design in advance. Even if there are no hackers, RMB 100 billion in no-threshold coupons is not a cost that any commercial organization is willing to pay. This is a kindergarten level mistake. If the business logic does not hold true, it means defective software requirements. Software built on buggy requirements will also be buggy.

The fourth thing to do is to improve the mechanism for product launch. Set checkpoints and pipelines for product launch. If any checkpoint fails, the pipeline will be interrupted and the product will not be launched online. These checkpoints include product trial operation, function approval, supporting monitoring measures, etc.

The fifth thing to do is to improve the risk prevention capabilities of operation and maintenance. A company with 300 million users, a market value of US$23 billion, and an e-commerce company is a target for hackers. In particular, user privacy information and cash flow are related to the life and death of the company. For a company of this size, having excellent risk prevention capabilities is the most basic requirement, not an icing on the cake. Active risk detection, timely warning, and rapid response are all measures that must keep up.

If the business design of the threshold-free coupon has been deduced, the software functions have been discussed, the implementation code has been reviewed, and the launch has been rehearsed, and the accident can be warned in time, as long as any of the links plays a role, this accident will not happen. It would be so tragic!

I understand the competitive pressure and motivation of a company to move forward at full speed regardless of everything and to trade quality for speed. However, when we pursue the current speed, we must also consider the future three feet away. Otherwise, the debt accumulated behind the butt will really become a big tail that cannot be shaken off.

Statement:
This article is reproduced at:微信公众号 InfoQ. If there is any infringement, please contact admin@php.cn delete

Related articles

See more