Home >Backend Development >Python Tutorial >How to calculate the P value of two sets of data based on python

How to calculate the P value of two sets of data based on python

coldplay.xixi
coldplay.xixiforward
2020-07-16 17:40:466133browse

How to calculate the P value of two sets of data based on python

We need to use p_value when doing A/B test evaluation. This article records how to use python to calculate the significance of two sets of data.

1. Code

# TTest.py
# -*- coding: utf-8 -*-
'''
# Created on 2020-05-20 20:36
# TTest.py
# @author: huiwenhua
'''

## Import the packages
import numpy as np
from scipy import stats

def get_p_value(arrA, arrB):

  a = np.array(arrA)
  b = np.array(arrB)

  t, p = stats.ttest_ind(a,b)

  return p

if __name__ == "__main__":
  get_p_value([1, 2, 3, 5, ], [6, 7, 8, 9, 10])

2. T-test: two-sample T-test

The two-sample t test is to compare whether there is a significant difference between the two population means represented by the two samples. In addition to requiring that the samples come from a normal distribution, it also requires that the population variances of the two samples are equal, which is "homogeneity of variances."

Test the null hypothesis: there is no difference in sample means (μ=μ0)

Python command stats.ttest_ind(data1,data2)

When it is uncertain whether the variances of the two populations are equal , you should first use the levene test to test whether the two populations have homogeneity of variances stats.levene(data1,data2). If the p-value of the returned result is much greater than 0.05, then we think that the two populations have homogeneity of variances. If the two populations do not have homogeneous variances, you need to add the parameter equal_val and set it to False, as follows.

stats.ttest_ind(data1,data2,equal_var=False) // The default in TTest is homogeneity of variance

3. Interpretation of results

When the p value is less than a certain significance level α (such as 0.05), it is considered that there is a significant difference in the sample means. The specific analysis depends on whether the bilateral hypothesis or the unilateral hypothesis is chosen. (It is divided into less than and greater than) Note that stats.ttest_ind performs a two-sided test.

When the t value is greater than 0, there is ((1-p) * 100)% confidence that the first set of data is better than the second set of data. For example, p=0.05, then we are 95% confident that the first set of data is better than the second set of data.

Related learning recommendations: python video tutorial

The above is the detailed content of How to calculate the P value of two sets of data based on python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jb51.net. If there is any infringement, please contact admin@php.cn delete