Home  >  Article  >  Technology peripherals  >  Tencent releases open source data component Fast-Causal-Inference to support distributed vectorized statistical analysis and causal inference

Tencent releases open source data component Fast-Causal-Inference to support distributed vectorized statistical analysis and causal inference

WBOY
WBOYforward
2023-09-18 16:57:021164browse

Tencent announced on its public account "Tencent Open Source" that its open source distributed data science component project Fast-Causal-Inference has been publicly released on GitHub

腾讯发布开源数据组件 Fast-Causal-Inference,支持分布式向量化统计分析和因果推断▲ Picture source "Tencent Open Source" public account

It is reported that is developed by Tencent WeChat, uses SQL interaction, and is a statistical analysis and causal inference calculation library based on distributed vectorization, according to It is said to “solve the performance bottleneck of existing statistical model libraries (R/Python) under big data, provide causal inference capabilities that can execute tens of billions of data in seconds, and at the same time reduce the threshold for using statistical models through SQL language, making them easy to use in production environments. , has been applied in multiple internal WeChat businesses such as WeChat video account and WeChat search."

Official introduction:

Provides Causal inference for massive data execution in seconds Capability

By utilizing the vectorized OLAP execution engine ClickHouse/StarRocks, the speed of user experience can be further improved to reach the ultimate level

腾讯发布开源数据组件 Fast-Causal-Inference,支持分布式向量化统计分析和因果推断


Minimalist SQL usage

SQLGateway WebServer lowers the threshold for using statistical models through SQL language, and provides a minimalist SQL usage method on the upper layer, transparently doing engine-related SQL expansion and optimization.

腾讯发布开源数据组件 Fast-Causal-Inference,支持分布式向量化统计分析和因果推断

Provides causal inference capabilities of basic operators, high-order operators, and upper-layer application encapsulation

Supports ttest, OLS, Lasso, Tree-based model, matching, bootstrap, DML, etc.

腾讯发布开源数据组件 Fast-Causal-Inference,支持分布式向量化统计分析和因果推断

#This site also learned that the official stated that the first version already supports the following features:

Basic Causal Inference Tool

  • Ttest based on deltamethod, supports CUPED
  • OLS, billion rows of data, sub-second level

Advanced causal inference tool

  • OLS-based IV, WLS, and other GLS, DID, synthesis control, CUPED, mediation are incubating
  • uplift: tens of millions of data minute level operations
  • bootstrap / permutation and other data Simulation framework to solve the problem of variance estimation without displayed solutions

In order to keep the original meaning unchanged, the content needs to be rewritten into Chinese. There is no need to appear the original sentence

The above is the detailed content of Tencent releases open source data component Fast-Causal-Inference to support distributed vectorized statistical analysis and causal inference. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete