Home > Article > Backend Development > Analyze commonly used machine learning libraries in Python
Python is widely used in scientific computing: computer vision, artificial intelligence, mathematics, astronomy, etc. It’s no surprise that it also applies to machine learning.
This article lists and describes the most useful machine learning tools and libraries in Python. In this list, we do not require these libraries to be written in Python, as long as they have a Pythoninterface.
Our intention is not to list all machine learning libraries in Python ( returned when searching for "machine learning" Python packages index (PyPI) 139 results), but instead list the ones we know are useful and well maintained.
In addition, although some modules can be used for a variety of machine learning tasks, we only list libraries whose main focus is machine learning. For example, although Scipy1 includes some clustering algorithms, its main focus is not machine learning but a comprehensive scientific computing toolset. Therefore we exclude Scipy (although we use it too!).
Another thing to mention is that we will also evaluate these libraries based on their integration with other scientific computing libraries, because machine learning (supervised or unsupervised) is also part of the data processing system. If the library you use does not match the rest of the data processing system, you will spend a lot of time creating an intermediate layer between the different libraries. It's important to have a great library in your toolset, but it's equally important that the library integrates well with other libraries.
If you are good at other languages but also want to use Python packages, we also briefly describe how to integrate with Python to use the libraries listed in this article.
Scikit Learn7 is our machine learning tool of choice at CB Insights. We use it for classification, feature selection, feature extraction and aggregation.
What we love most is that it has a consistent, easy-to-use API and provides **many** evaluation, diagnostic and cross-validation methods available out of the box (isn't it Sound familiar? Python also provides a "battery ready" method). The icing on the cake is that it uses Scipy data structures under the hood, which fits well with the rest of Python that uses Scipy, Numpy, Pandas and Matplotlib for scientific computing. So, if you want to visualize the performance of your classifier (for example, using a precision-recall chart, or a Receiver Operating Characteristics (ROC) curve), Matplotlib can help Make quick visualizations.
Considering the time spent cleaning and structuring data, using this library can be very convenient because it can be tightly integrated with other scientific computing packages.
In addition, it also contains limited natural language processing feature extraction capabilities, as well as bag of words, tfidf (Term Frequency Inverse
DocumentFrequency algorithm), preprocessing (disabled words/stop-words, custom preprocessing, parser). In addition, if you want to quickly perform different benchmark tests on small
datasets(toy dataset), its own dataset module provides common and useful datasets. You can also create your own small data sets based on these data sets, so that you can test whether the model meets your expectations for your own purposes before applying the model to the real world. For parameter optimization and parameter adjustment, it also provides grid search and random search. None of these features would be possible without strong community support or poor maintenance. We look forward to its first stable release.
Stats
modelIf you are an R or S user, it also provides R syntax for certain statistical models. Its model also accepts Numpy
arraysand Pandas data frames, making intermediate data structures a thing of the past!PyMC
Shogun1 is a machine learning toolbox focusing on Support Vector Machines (SVM), written in C++. It is under active development and maintenance, provides a Python interface, and is also the best documented interface. However, compared to Scikit-learn, we found its API to be more difficult to use. Furthermore, there are not many diagnostic and evaluation algorithms available out of the box. However, speed is a big advantage.
Gensim is defined as "topic modeling for humans". As described on its home page, its focus is Latent Dirichlet Allocation (LDA) and its variants. Unlike other packages, it supports natural language processing and can more easily combine NLP and other machine learning algorithms.
If your field is in NLP and want to do aggregation and basic classification, you can take a look. Currently, they introduce Google's text representation word2vec based on Recurrent Neural Network (Recurrent Neural Network). This library is written exclusively in Python.
Omodularity and configurability to Theano. You can pass different Configuration file to create the neural network, which makes it easier to try different parameters. It can be said that if the parameters and attributes of the neural network are separated into the configuration file, its modularity will be more powerful.
DecafDecaf is a deep learning library recently released by UC Berkeley. It was tested in the Imagenet classification challenge and found that its neural network implementation is very advanced (state of art). NolearnIf you want to use the excellent Scikit-learn library API in deep learning, Nolearn that encapsulates Decaf will make it easier for you to use it. It's a wrapper around Decaf, compatible (mostly) with Scikit-learn, making Decaf even more incredible. OverFeatOverFeat is the recent winner of Cats vs. Dogs (kaggle challenge) 4 and is written in C++ and also includes a Python wrapper (along with Matlab and Lua). It uses the GPU via the Torch library, so it's fast. Also won the ImageNet classification detection and localization challenge. If your field is computer vision, you might want to take a look. HebelHebel is another neural network library with GPU support available out of the box. You can determine the properties of the neural network through YAML files (similar to Pylearn2), providing a friendly way to separate divine networks and code, and you can quickly run the model. Since it has only been developed for a short period of time, the documentation is lacking in terms of depth and breadth. As for the neural network model, it is also limited because it only supports one neural network model (feed-forward). However, it is written in pure Python and will be a very friendly library because it contains many practicalfunctions, such as schedulers and monitors, which we have not found in other libraries. Function.
NeurolabNeuroLab is another API-friendly (similar to Matlabapi) neural network library. Unlike other libraries, it contains different variants of Recurrent Neural Network (RNN) implementations. If you want to use RNN, this library is one of the best choices among similar APIs.You don’t know Python but are good at other languages? Don’t despair! One of the strengths of Python (among others) is that it is a perfect glue language, you can use your usual Programming language, access these libraries through Python. The following packages for various programming languages can be used to combine other languages with Python:
R -> RPython
Matlab -> matpython
Java - > Jython
Lua -> Lunatic Python
Julia -> PyCall.jl
These libraries have not been available for more than a year Any updates released, we list them because you may find them useful, but these libraries are unlikely to receive bug fixes, especially future enhancements.
MDP2MlPy
FFnet
PyBrain
The above is the detailed content of Analyze commonly used machine learning libraries in Python. For more information, please follow other related articles on the PHP Chinese website!