Home  >  Article  >  Take a look at the different seventh national census through a technical perspective! !

Take a look at the different seventh national census through a technical perspective! !

青灯夜游
青灯夜游forward
2021-05-13 11:52:3713941browse

Take a look at the different seventh national census through a technical perspective! !

After reading the seventh census bulletin, you will find that the entire work process is similar to the data analysis process in enterprises. This article will first look at the commonalities between the two, and then look at what information in the "Seven People's Census" should focus on as an Internet practitioner.

We refer to the "Seventh National Census Bulletin" for explanation

Innovate the census content and census methods. The electronic data collection method is fully adopted, and the enumerators use electronic equipment to collect and report the data directly in real time; Transform and report, corresponding to the enterprise is

buried point reporting
.

How to understand buried points? "Point" is actually a certain location on the app or website. Buried points are at a certain location. After the user takes a certain action, the user's current information is recorded. An action can be browsing, clicking, swiping, etc. For example: In an e-commerce app, when a user clicks the order button, the time, amount, product ID, mobile network status, mobile operating system and other information of the user's order are recorded. This is a hidden point. A burial point can record any information, but there are three essential pieces of information: time, place, and person. Time is the time when the behavior occurs and is used to analyze user timing; location is the specific location on the current page where the behavior occurred; character is the user identification, which is generally generated using device information from mobile phones and PCs. Other information is selectively collected based on the needs of data analysis.

After the information is collected, it is usually

reported

to the enterprise's server through real-time technology for subsequent analysis. Based on these buried points, we can analyze what content the user browsed at what time, what content he clicked on last, how long he watched the clicked content, what he finally purchased, how much he spent, etc., and further analysis of the user's What content is preferred and what is the user’s spending power, so as to further make personalized recommendations.

Make full use of Internet cloud technology, cloud services and cloud applications to complete data processing work

Due to the large amount of buried data and the need for long-term storage. Therefore, after an enterprise's buried points are reported, they are generally stored in distributed storage media, and subsequent data analysis work is mostly processed using a distributed computing framework. Distributed storage and computing services currently mostly take the form of cloud services. A company I worked for originally bought its own servers to build distributed services. Because the operation and maintenance costs were too high and unbearable, it eventually moved to Alibaba Cloud, which saved a large part of the operation and maintenance costs.

Distributed storage and computing frameworks can be open source, such as Hadoop, Hive, Spark, etc., or self-developed by enterprises, such as Alibaba Cloud's MaxCompute.

Carry out secure management of census data collection, transmission, and storage in accordance with the national network security level three protection standards to ensure the security of citizens’ personal information

What this part says

Personal Information Protection
, in the enterprise, user confidential information, such as ID number, will be desensitized, that is, the ID number will be encoded into a unique identifier, so that It neither affects use nor leaks private information.

In addition to desensitization, it is also necessary to classify the confidentiality of data and establish a corresponding authority review mechanism. What level of confidential data is used must be applied for corresponding permissions and recorded so that information leakage can be traced.

Census agencies at all levels strictly implement quality control requirements and carefully carry out quality inspection to ensure the quality of work at all stages of the census

This part talks about

data quality monitoring
. In an enterprise, monitoring the quality of buried points is also a key component. If the reported buried points are all wrong and cannot be used, it is obviously meaningless.

The quality monitoring of buried points by enterprises is generally done in two aspects. First, verifying a single buried point, checking whether the format of each field of the reported buried point is correct, monitoring the null value rate of core fields, etc. . Second, monitor the traffic and determine whether there are any abnormalities in the magnitude of reported buried points through year-on-year comparison.

The seventh national census comprehensively investigated the number, structure, distribution and other aspects of my country's population, and grasped the trend characteristics of population changes. It provides a basis for improving my country's population development strategy and policy system and formulating It provides accurate statistical information support for economic and social development planning and promoting high-quality economic development.

This part is the

data analysis
we are familiar with. In the enterprise, it is to analyze user behavior, obtain valuable conclusions, and provide decision support for the iteration of the app or website. .

Data analysis is generally divided into two parts. One part is numerical analysis, which can be simple numerical statistics, or you can use Python machine learning for fitting, classification, etc. When the amount of data is large, distributed computing frameworks Hadoop and Spark will be used. The other part is text analysis, which uses more machine learning and deep learning methods to mine things that cannot be seen through numerical analysis.

Also, add something. The age, gender, education and other information we see in the census are generally called user portraits in companies. This information cannot be collected through buried points, but it is very important data for enterprises. It often needs to be combined with user behavior and predicted using machine learning and deep learning algorithms.

This is the end of the first part. We take the census as an example to introduce the process of enterprise data analysis and the technologies involved. Let’s briefly talk about what aspects we should pay attention to as Internet practitioners.

The quality of the population continues to improve, and new advantages in talent dividends will gradually emerge. At the same time, the employment pressure of college students is increasing, and the pace of industrial transformation and upgrading needs to be accelerated.

The white-collar population has been oversupplied for a long time, and the 996 involution will continue to be intense. Therefore, the talent cost of high-tech enterprises has been reduced, and the "talent dividend advantage has gradually emerged."

For blacksmithing, you still need to be hard-working, and you need to continuously improve your real skills and learning.

The accelerated population agglomeration not only reflects the trend changes in urbanization and economic agglomeration, but also puts forward new requirements for improving the quality of urbanization and promoting coordinated regional development.

The inflow of population into big cities is accelerating, while the loss of rural population is accelerating.

China’s urbanization process has not yet been completed. For students who have not yet graduated, choosing first-tier and new first-tier cities is a wise choice. For migrant workers who are already in big cities, buying a house in a central location is a wise choice.

The proportion of the elderly population is rising rapidly, and aging has become my country’s basic national condition for some time to come. At the same time, the increase in the elderly population will also bring wisdom, inheritance, performance and expansion of demand.

Be prepared to delay retirement. It seems that you have to consider not only the mid-life crisis, but also the old-age crisis.

No company will be idle and analyze a bunch of useless data all day long. The same goes for the census. Finding information that is useful to you and finding out how to go in the future is what everyone should do most.

Related recommendations:

php past life, present life and future prospects

For beginners, how to quickly learn php from scratch? (To you who are confused)

Statement:
This article is reproduced at:weixin. If there is any infringement, please contact admin@php.cn delete