Home > Article > Backend Development > Calculate a histogram of a set of data using NumPy in Python
Histogram is a graphical representation of the distribution of a data set. It represents data in the form of a series of bar charts, where each bar represents a range of data values and the height of the bar represents the frequency of data values defined within that range.
These are mainly used to represent the distribution of numerical data, such as the distribution of grades in a class, population distribution or employee income distribution, etc.
In histogram, x-axis represents the range of data values, divided into intervals and the y-axis represents the frequency of the range of data values within each bin. Histograms can be normalized by dividing the frequency of each bin by the total data values, which results to the relative frequency histogram where y-axis represents the data values of each bin.
In python, for creating the histograms we have numpy, matplotlib and seaborn libraries. In Numpy, we have the function named histogram() to work with the histogram data.
Following is the syntax for creating the histograms for the given range of data.
numpy.histogram(arr, bins, range, normed, weights, density)The Chinese translation of
Where,
is:Where,
arr is the input array
bins is the number of bars in the histogram used to represent the data
range Defines the range of values in the histogram
normed Preference density parameter
weights is an optional parameter, used for the weight of each data value
Density is a parameter that normalizes histogram data to probability density.
The output of the histogram function will be a tuple containing the histogram counts and bin edges.
In the example below, we create a histogram using Numpy’s histogram() function. Here, we are taking an array as input parameter and defining bins as 10 so that the histogram will be created with 10 bins and the rest of the parameters can be kept as none.
import numpy as np arr = np.array([10,20,25,40,35,23]) hist = np.histogram(arr,bins = 10) print("The histogram created:",hist)
The histogram created: (array([1, 0, 0, 1, 1, 1, 0, 0, 1, 1], dtype=int64), array([10., 13., 16., 19., 22., 25., 28., 31., 34., 37., 40.]))
Let us look at an example to understand the histogram() function of numpy library.
import numpy as np arr = np.array([[20,20,25],[40,35,23],[34,22,1]]) hist = np.histogram(arr,bins = 20) print("The histogram created:",hist)
The histogram created: (array([1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1], dtype=int64), array([ 1. , 2.95, 4.9 , 6.85, 8.8 , 10.75, 12.7 , 14.65, 16.6 , 18.55, 20.5 , 22.45, 24.4 , 26.35, 28.3 , 30.25, 32.2 , 34.15, 36.1 , 38.05, 40. ]))</p><p>
In this example, we create a histogram by specifying the bins and the data range to use. The following code can be used as a reference.
import numpy as np arr = np.array([[20,20,25],[40,35,23],[34,22,1]]) hist = np.histogram(arr,bins = 20, range = (1,10)) print("The histogram created:", hist)
The histogram created: (array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0], dtype=int64), array([ 1. , 1.45, 1.9 , 2.35, 2.8 , 3.25, 3.7 ,4.15, 4.6 , 5.05, 5.5 , 5.95, 6.4 , 6.85, 7.3 , 7.75, 8.2 , 8.65, 9.1 , 9.55, 10. ]))
The above is the detailed content of Calculate a histogram of a set of data using NumPy in Python. For more information, please follow other related articles on the PHP Chinese website!