What does big data desensitization mean
Big data data desensitization, also known as data bleaching, data deprivatization or data deformation, It refers to the transformation of certain sensitive information through desensitization rules to achieve reliable protection of sensitive private data, so that the desensitized real data set can be used safely in development, testing, other non-production environments and outsourcing environments.
Privacy data desensitization technology
Usually in big data platforms, data is stored in a structured format, and each table It is composed of many rows, and each row of data is composed of many columns. According to the data attributes of the column, data columns can usually be divided into the following types:
Columns that can accurately locate a person are called identifiable columns, such as ID number, address, name, etc.
A single column cannot locate an individual, but multiple columns of information can be used to potentially identify a person. These columns are called semi-identifying columns, such as postal code, birthday and gender. A research paper in the United States stated that 87% of Americans can be identified using only zip code, birthday and gender information[3].
Columns containing sensitive user information, such as transaction amounts, illnesses, and income.
Other columns that do not contain user sensitive information.
The so-called avoidance of privacy data leakage refers to preventing people who use the data (data analysts, BI engineers, etc.) from identifying a certain row of data as a certain person's information. Data desensitization technology desensitizes data, such as removing identifying columns, converting semi-identifying columns, etc., so that data users can ensure that the #2 (after conversion) semi-identifying columns, #3 sensitive information columns, and #4 On the basis of data analysis in other columns, it is guaranteed to a certain extent that it cannot reversely identify users based on the data, achieving a balance between ensuring data security and maximizing the value of the data.
Privacy data leakage types
Privacy data leakage can be divided into many types. According to different types, different privacy data leakage risk models can usually be used to measure and prevent The risk of privacy data leakage, and the desensitization of data corresponding to different data desensitization algorithms. Generally speaking, types of privacy data leaks include:
Personal identity leakage. When a data user confirms through any means that a piece of data in a data table belongs to a certain person, it is called a personal identity leak. Personal identity leakage is the most serious, because once personal identity leakage occurs, data users can obtain sensitive information about specific individuals.
Attribute leakage, when data users learn new attribute information about a person based on the data table they access, it is called attribute leakage. Personal identity leakage will certainly lead to attribute leakage, but attribute leakage can also occur independently.
Member relationship leaked. When a data user can confirm that a person's data exists in a data table, it is called membership disclosure. The risk of membership relationship leakage is relatively small. Personal identity leakage and attribute leakage definitely mean membership relationship leakage, but membership relationship leakage may also occur independently.
Privacy data leakage risk model
Opening data to data analysts also introduces the risk of privacy data leakage. Maximizing the potential of data analysis and mining while limiting the risk of privacy data leakage within a certain range is the ultimate goal of data desensitization technology. Currently, in the field of privacy data desensitization, there are several different models that can be used to measure the possible privacy data leakage risks of data from different angles.
Recommended tutorial: "PHP Tutorial"
The above is the detailed content of What does big data desensitization mean?. For more information, please follow other related articles on the PHP Chinese website!