


Because I don’t know what the mathematical knowledge I have learned is useful. For R&D personnel in IT companies, they always feel that they need to learn some mathematics before entering big data-related positions. But in the vast world of mathematics, where is the end of data technology?
When it comes to data technology, the first thing that many people think of is mathematics, probably because of the solid position of numbers in the mathematical system, and this is natural. This article conducts some discussion on the mathematical foundation of data technology. (Recommended learning: Python video tutorial)
We know that there are three major branches of mathematics, namely algebra, geometry and analysis. Each branch extends into many small branches with the development of research. In this mathematical system, the mathematical foundations closely related to big data technology mainly include the following categories. (For the application of these mathematical methods in big data technology, please refer to the book "Internet Big Data Processing Technology and Application", 2017, Tsinghua University Press)
(1) Probability Theory and Mathematical Statistics
This part is very closely related to the development of big data technology, basic concepts such as conditional probability and independence, random variables and their distribution, multi-dimensional random variables and their distribution, variance analysis and regression analysis, random processes (Especially Markov), parameter estimation, Bayes theory, etc. are very important in big data modeling and mining. Big data has naturally high-dimensional characteristics. Design and analysis of data models in high-dimensional space requires a certain foundation in multi-dimensional random variables and their distribution. Bayes' theorem is one of the foundations of classifier construction. In addition to these basic knowledge, conditional random field CRF, latent Markov model, n-gram, etc. can be used to analyze vocabulary and text in big data analysis, and can be used to build predictive classification models.
Of course, information theory based on probability theory also plays a certain role in big data analysis. Methods such as information gain and mutual information used for feature analysis are all concepts in information theory.
(2) Linear algebra
This part of mathematical knowledge is also closely related to the development of data technology. Matrix, transpose, rank block matrix, vector, Orthogonal matrices, vector spaces, eigenvalues and eigenvectors are also commonly used technical methods in big data modeling and analysis.
In Internet big data, the analysis objects of many application scenarios can be abstracted into matrix representations, such as a large number of Web pages and their relationships, Weibo users and their relationships, the relationship between texts and vocabulary in text sets, etc. etc. can be represented by matrices. For example, when a Web page and its relationship are represented by a matrix, the matrix element represents the relationship between page a and another page b. This relationship can be a pointing relationship, 1 means there is a hyperlink between a and b, 0 means a, There are no hyperlinks between b. The famous PageRank algorithm is based on this matrix to quantify the importance of pages and prove its convergence.
Various operations based on matrices, such as matrix decomposition, are ways to extract features of analysis objects. Because the matrix represents a certain transformation or mapping, the matrix obtained after decomposition represents the analysis Some new characteristics of the object in the new space. Therefore, singular value decomposition SVD, PCA, NMF, MF, etc. are widely used in big data analysis.
(3) Optimization method
Model learning and training is a way for many analytical mining models to solve parameters. The basic question is: give Define a function f:A→R and find an element a0∈A such that for all a in A, f(a0)≤f(a) (minimize); or f(a0)≥f(a) (maximize change). The optimization method depends on the form of the function. From the current point of view, the optimization method is usually based on differential and derivative methods, such as gradient descent, hill climbing method, least squares method, conjugate distribution method, etc.
(4) Discrete Mathematics
The importance of discrete mathematics is self-evident. It is the foundation of all branches of computer science. Nature is also an important foundation for data technology. It won’t be expanded upon here.
Finally, it needs to be mentioned that many people think that they are not good at mathematics and cannot do well in the development and application of data technology, but this is not the case. Think clearly about what role you play in big data development and applications. Refer to the following entry points for big data technology research and application. The above mathematical knowledge is mainly reflected in the data mining and model layer. These mathematical knowledge and methods need to be mastered.
Of course, at other levels, the use of these mathematical methods is also very meaningful for improving algorithms. For example, at the data acquisition layer, a probability model can be used to estimate the value of crawler collection pages, so as to make better judgment. In the big data computing and storage layer, matrix block computing is used to achieve parallel computing.
For more Python-related technical articles, please visit the Python Tutorial column to learn!
The above is the detailed content of What mathematics do you need to learn for Python data analysis?. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Atom editor mac version download
The most popular open source editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.
