Home > Article > Technology peripherals > Tencent releases open source data component Fast-Causal-Inference to support distributed vectorized statistical analysis and causal inference
Tencent announced on its public account "Tencent Open Source" that its open source distributed data science component project Fast-Causal-Inference has been publicly released on GitHub
▲ Picture source "Tencent Open Source" public account
It is reported that is developed by Tencent WeChat, uses SQL interaction, and is a statistical analysis and causal inference calculation library based on distributed vectorization, according to It is said to “solve the performance bottleneck of existing statistical model libraries (R/Python) under big data, provide causal inference capabilities that can execute tens of billions of data in seconds, and at the same time reduce the threshold for using statistical models through SQL language, making them easy to use in production environments. , has been applied in multiple internal WeChat businesses such as WeChat video account and WeChat search."
Official introduction:
Provides Causal inference for massive data execution in seconds Capability
By utilizing the vectorized OLAP execution engine ClickHouse/StarRocks, the speed of user experience can be further improved to reach the ultimate level
Minimalist SQL usage
SQLGateway WebServer lowers the threshold for using statistical models through SQL language, and provides a minimalist SQL usage method on the upper layer, transparently doing engine-related SQL expansion and optimization.
Provides causal inference capabilities of basic operators, high-order operators, and upper-layer application encapsulation
Supports ttest, OLS, Lasso, Tree-based model, matching, bootstrap, DML, etc.
#This site also learned that the official stated that the first version already supports the following features:
Basic Causal Inference Tool
- Ttest based on deltamethod, supports CUPED
- OLS, billion rows of data, sub-second level
Advanced causal inference tool
- OLS-based IV, WLS, and other GLS, DID, synthesis control, CUPED, mediation are incubating
- uplift: tens of millions of data minute level operations
- bootstrap / permutation and other data Simulation framework to solve the problem of variance estimation without displayed solutions
In order to keep the original meaning unchanged, the content needs to be rewritten into Chinese. There is no need to appear the original sentence
The above is the detailed content of Tencent releases open source data component Fast-Causal-Inference to support distributed vectorized statistical analysis and causal inference. For more information, please follow other related articles on the PHP Chinese website!