Home  >  Article  >  Backend Development  >  pandas skills sorting and summarizing methods in DataFrame

pandas skills sorting and summarizing methods in DataFrame

coldplay.xixi
coldplay.xixiforward
2020-09-17 16:53:224448browse

pandas skills sorting and summarizing methods in DataFrame

Related learning recommendations: python tutorial

##Today is

pandas data processing In the sixth article of the topic, let’s talk about the sorting and summary operations of DataFrame.

In the previous article, we mainly introduced the

apply method in DataFrame, how to perform broadcast operations on each row or column in a DataFrame, so that we can perform broadcast operations in a very short time Process the entire data within a certain amount of time. Today we will talk about how to sort a DataFrame according to our needs and how to use some summary operations.

Sort

Sort is a very basic need for us. In pandas, This requirement is further subdivided into sorting

based on index and sorting based on value. Let’s first take a look at the sorting method in Series.

There are two sorting methods in Series. One is sort_index. As the name implies, these values ​​are sorted according to the index in Series. The other is sort_values, which is sorted according to the values ​​in the Series. Both methods will return a new Series:

Index sort

The same is true for DataFrame, which also has two functions: sorting by value and sorting by index. But since DataFrame is a two-dimensional data, there will be some differences in usage. The simplest difference is that Series has only one column. We clearly know the sorting object, but DataFrame does not. The indexes in it are divided into two types, namely row index and column index. So when we sort

we need to specify the axis we want to sort on, which is axis.

By default, we sort based on the row index. If we want to specify sorting based on the column index, we need to pass in the parameter axis=1.

We can also pass in the ascending parameter to specify whether the sorting order we want is forward order or reverse order.

Value sorting

The value sorting of DataFrame is different, we Rows cannot be sorted, can only be sorted on columns . We pass in the column we want to sort by via the by parameter, which can be one column or multiple columns.

##Ranking##Sometimes we want to get the

element Ranking

, we would like to know where the current element ranks among the whole. This function is also provided in pandas, which is the rank method.

We can find that the string of numbers we input casually contains two 7s. 7 is the largest number in the Series, but why is their ranking 6.5?

In fact, it is very simple, because 7 appears twice, in the 6th and 7th positions respectively. Here, the ranking of all its occurrences is averaged, so it is 6.5. If we don't want it to be averaged, but

give a ranking

based on the order of appearance, we can use the method parameter to specify the effect we want.

#The legal parameters of method are not limited to first. There are also some other slightly less popular uses, which we will list together.

If it is a DataFrame, the default is to calculate the overall ranking of the elements in each row in row units. We can also specify the calculation in column units through the axis parameter:

Summary operation

Finally, let’s introduce the summary operation in DataFrame. The summary operation is also Aggregation operation, such as our most common sum method, for one Batch data is aggregated and summed. There are similar methods in DataFrame, let’s look at them one by one.

The first is sum. We can use sum to sum the DataFrame. If no parameters are passed, the default is to sum each row.

In addition to sum, another commonly used one is mean, which can be averaged over a row or a column.

Since there are often NA elements in the DataFrame, we can use the skipna parameter to exclude missing values ​​and then calculate the average.

Another method that I personally find very useful is descirbe, which can return the overall information in the DataFrame. For example, the mean, sample size, standard deviation, minimum value, maximum value, etc. of each column. It is a commonly used statistical method that can be used to understand the distribution of data in a DataFrame.

In addition to the methods introduced, there are many similar summary operation methods in DataFrame, such as idxmax, idxmin, var, std, etc. If you are interested, you can check the relevant documents. , but according to my experience, it is generally not used.

If you want to learn more about programming, please pay attention to the php training column!

The above is the detailed content of pandas skills sorting and summarizing methods in DataFrame. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.im. If there is any infringement, please contact admin@php.cn delete