Home  >  Article  >  Backend Development  >  What programming language to use to learn big data

What programming language to use to learn big data

little bottle
little bottleOriginal
2019-05-14 13:22:2215496browse

You can choose the python programming language to learn big data. Python has a library that specializes in processing big data. By combining it with the xlrd library, it will be very convenient for us to do some statistical work on big data processing, such as performance testing.

What programming language to use to learn big data

Big data is a term that has been relatively popular in recent years, making many people unable to help but get into the study of big data. But do you know what language is used for big data?

1. Python language

For more than ten years, Python has been very popular in academia, especially in fields such as natural language processing (NLP). Therefore, if you have a project that requires NLP processing, you will be faced with a dizzying number of choices, including classic NTLK, topic modeling using GenSim, or the ultra-fast and accurate spaCy. Similarly, when it comes to neural networks, Python is equally at home, including Theano and Tensorflow; followed by scikit-learn for machine learning, and NumPy and Pandas for data analysis.

There's also Juypter/iPython - a web-based notebook server framework that lets you mix code, graphics, and almost any object in a sharable log format. This has always been one of Python's killer features, but these days, the concept has proven so useful that it appears in almost all languages ​​that pursue the read-read-output-loop (REPL) concept, including Scala and R.

Python is often supported in big data processing frameworks, but at the same time, it is often not a "first-class citizen". For example, new features in Spark almost always appear first in the Scala/Java bindings, and it may be necessary to write several minor versions of those updates in PySpark (this is especially true for development tools in Spark Streaming/MLLib ).

Contrary to R, Python is a traditional object-oriented language, so most developers will be quite comfortable using it, while first contact with R or Scala can be intimidating. A small problem is that you need to leave the correct white space in your code. This divides people into two camps, those who think "this is very helpful for ensuring readability" and those who think that we should not need to force the interpreter to make the program read just because a character in a line of code is not in the right place. Get up and running.

2. R language

In the past few years, R language has become the darling of data science - data science is now not only popular among nerdy statisticians. It's well-known to Wall Street traders, biologists, and Silicon Valley developers. Companies in a variety of industries, such as Google, Facebook, Bank of America, and the New York Times, all use R, and R continues to spread and proliferate for commercial use.

The R language has a simple yet obvious appeal. Using R, with just a few lines of code, you can sift through complex data sets, process data with advanced modeling functions, and create flat graphs to represent numbers. It has been compared to a hyperactive version of Excel.

The greatest asset of the R language is the vibrant ecosystem that has developed around it: the R language community is always adding new packages and features to its already rich feature set. It is estimated that more than 2 million people use R, and a recent poll showed that R is by far the most popular language for scientific data, used by 61% of respondents (followed by Python at 39%).

3, JAVA

Java, and Java-based frameworks, have been found to have become the skeleton of the largest high-tech companies in Silicon Valley. "If you look at Twitter, LinkedIn and Facebook, Java is the underlying language for all of their data engineering infrastructure," Driscoll said.

Java does not provide the same quality of visualization as R and Python, and it is not the best choice for statistical modeling. However, if you're moving past prototyping and need to build large systems, Java is often your best choice.

The above is the detailed content of What programming language to use to learn big data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn