1. Problems faced by experiments in new user scenarios
1. UG panorama
This is a panoramic view of UG.
UG Acquire customers and divert traffic to the APP through channels such as Paid Ads, ASO, SEO and other channels. Next, we will do some operations and guidance for novices to activate users and bring them into the maturity stage. Subsequent users may slowly become inactive, enter a decline period, or even enter a churn period. During this period, we will do some early warnings for churn, recall to promote activation, and later some recalls for lost users.
can be summarized as the formula in the above figure, that is, DAU is equal to DNU times LT. All work in the UG scenario can be dismantled based on this formula.
2. Principle of AB experiment
The purpose of AB experiment is to completely randomize the distribution of traffic , using different strategies for the experimental group and different control groups. Finally, scientific decisions are made by combining statistical methods and experimental hypotheses, which constitutes the framework of the entire experiment. There are currently two types of experimental distribution on the market: experimental platform distribution and client local distribution
There are prerequisites for experimental platform distribution. It is necessary for the device to obtain a stable ID after completing initialization. Based on this ID, the experimental platform is requested to complete the offloading-related logic, and the offloading ID is returned to the endpoint, and then the endpoint makes corresponding strategies based on the received ID. Its advantage is that it has an experimental platform that can ensure the uniformity and stability of the shunt. Its disadvantage is that the equipment must be initialized before experimental shunting can be carried out.
Another offloading method is client local offloading. This method is relatively niche and is mainly suitable for some UG scenes, advertising screen opening scenes and performance initialization scenes. In this way, all offloading logic is completed when the client is initialized. Its advantages are obvious, that is, there is no delay and the distribution can be carried out immediately after powering on. Logically speaking, its distribution uniformity can also be guaranteed. However, in actual business scenarios, there are often problems with its distribution uniformity. The reasons will be introduced next
3. Problems faced by the new user scenario AB experiment
The first problem actually faced by the UG scenario is to divert traffic as early as possible.
Here is an example, such as the traffic acceptance page here. The product manager feels that the UI can be optimized to improve the core indicators. In such a scenario, we hope that the experiment will be triaged as soon as possible.
During the offloading process of page 1, the device will be initialized and obtain the ID. 18.62% of users cannot generate IDs. If the traditional experimental platform diversion method is used, 18.62% of users will not be grouped, resulting in an inherent selection bias problem
#In addition, the traffic of new users is very valuable. 18.62% of new users cannot be used in the experiment, which will also cause a great loss in the duration of the experiment and traffic utilization efficiency.
In the future, to solve the problem of offloading experiments as early as possible, we will use the client to offload experiments locally. The advantage is that the offloading is completed when the device is initialized. The principle is that first, when initializing on the terminal, it can generate random numbers by itself, hash the random numbers and then group them in the same way, thereby generating an experimental group and a control group. In principle, it should be possible to ensure that the traffic distribution is even. However, through the set of data in the above figure, we can find that more than 21% of users repeatedly enter different groups.
There is a scenario where users of some very popular products, such as Honor of Kings or Douyin, are easily addicted. New users will uninstall and reinstall multiple times during the experimental cycle. According to the local diversion logic just mentioned, the generation and diversion of random numbers will allow users to enter different groups, so that the diversion ID and statistical ID cannot match one-to-one. This caused the problem of uneven distribution.
In new user scenarios, we also face the problem of experimental evaluation standards.
We have reorganized the time chart of this scenario of new user traffic. On application startup, we chose to offload. Assume that we can achieve uniform distribution timing and produce corresponding strategic effects at the same time. Next, the timing of generating the indicator statistical ID is later than the timing of the strategy effect, and only then can the data be observed. The timing of data observation lags far behind the timing of strategy effects, which will lead to survivor bias
2. New experimental system and its scientific verification
In order to solve the above problems, we proposed a new experimental system and scientifically verified it
1. New user scenario experiment diversion ID selection
As mentioned before, the requirements for diversion selection for new users will be relatively high, so how to choose the diversion ID for new user experiments? The following are a few principles:
- Compliance, whether it is overseas business or domestic business, safety compliance is first and foremost the lifeline, and safety compliance must be met. Otherwise, the impact will be particularly great once it is removed from the shelves.
- #Timeliness, for new user scenarios, it must be timely, and the offload can be obtained immediately after booting.
- Uniqueness, within a single installation cycle, the shunt ID is stable and can form a one-to-one correspondence with the indicator ID. As can be seen from the data in the figure below, the one-to-one matching ratio between the diversion ID and the indicator calculation caliber ID has reached 99.79%, and the one-to-one ratio between the indicator calculation ID and the diversion ID has also reached 99.59%. Basically, it can be verified that the diversion ID and indicator ID selected according to the standard can achieve a one-to-one match.
2. Scientific verification of diversion capability
After selecting the diversion ID, the diversion capability is often There are two ways, the first is through the experimental platform, and the second is through the end.
After you have the diversion ID, provide the diversion ID to the experimental platform to complete the diversion capability in the experimental platform. As a distribution platform, the most basic thing is to verify its randomness. The first is uniformity. In the same layer of experiments, the traffic is evenly divided into many buckets, and the number of groups in each bucket should be even. It can be simplified here. If there is only one experiment on one layer and it is divided into two groups, a and b, the number of users in the control group and the experimental group should be approximately equal, thereby verifying the uniformity of the diversion capability. Secondly, for multi-layer experiments, the multi-layer experiments should be orthogonal to each other and unaffected. Similarly, it is also necessary to verify the orthogonality between experiments at different layers. Uniformity and orthogonality can be verified through statistical category tests.
After introducing the ID of the diversion selection and the diversion capability, finally we need to verify whether the newly proposed diversion results meet the requirements of the AB experiment from the indicator result level.
3. Scientific verification of diversion results
By using the internal platform, we have conducted multiple AA simulations
Comparison Whether the control group and the experimental group meet the requirements of the experiment on the corresponding indicators. Next, let’s take a look at this set of data.
We sampled some index groups of the t test. It can be understood that for so many experiments, the type one error rate should be at a very small probability. Assume The type one error rate is scheduled to be around 0. 055%, and its confidence interval should actually be around 1000 times, which should be between 0. 0365- 0. 0635. You can see that some of the indicators sampled in the first column are within this execution range, so from the perspective of type one error rate, the existing experimental system is OK.
At the same time, considering that the test is a test of the t statistic, the corresponding t statistic should approximately obey the normal distribution under the distribution of large traffic. You can also test the normal distribution of the t-test statistic. The normal distribution test is used here, and you can see that the test result is also much greater than 0.05, that is, the null hypothesis is established, that is, the t statistic approximately obeys the normal distribution.
For each test, the pvalue of the t statistic test result is approximately uniformly distributed in so many experiments. At the same time, the pvalue can also be uniformly Similar results can also be seen in the distribution test, pvalue_uniform_test, which is also much greater than 0.05. Therefore, the null hypothesis that pvalue approximately obeys a uniform distribution is also OK.
The above has verified the newly proposed experimental diversion system from the one-to-one correspondence between the diversion ID and the indicator calculation caliber, the diversion capability and the diversion result indicator results. scientific nature.
3. Application case analysis
The following will be combined with actual application cases in UG scenarios to explain in detail how to conduct experimental evaluations to solve the previous problems. The third question mentioned
1. New user scenario experimental evaluation
Here is a typical UG traffic acceptance scenario. A lot of optimizations will be done during NUJ new user guidance or new user tasks to improve traffic utilization. The evaluation standard at this time is often retention rate, which is the current common understanding in the industry.
Assuming the process from new user download to installation to first startup, PM feels that such a process is useful for users, especially those who have never experienced it. The threshold for users of this part of the product is too high. Should users be familiar with the product first and experience the hip-hop moment of the product before being guided to log in?
Further, the product manager put forward another hypothesis, that is, for users who have never experienced the product, when a new user logs in or a new user NUJ Reduce resistance in the scene. For users who have already experienced the product and users who have switched devices, the online process is still used
The method of diversion based on the indicator ID first obtains the indicator ID, and then triage. This splitting method is usually uniform, and there is not much difference from the experimental results and retention rate. Judging from such results, it is difficult to make a comprehensive decision. This kind of experiment actually wastes a part of the traffic and has the problem of selection bias. Therefore, we will conduct a local shunt experiment. The following figure shows the results of the local shunt experiment
The number of new devices entering the group will be significant. The difference is believable. At the same time, there is an improvement in retention rate, but it is actually negative in other core indicators, and this negative direction is difficult to understand because it is actually strongly related to retention. Therefore, based on such data, it is difficult to explain or attribute it, and it is also difficult to make comprehensive decisions.
You can observe the situation of users who have been repeatedly added to groups, and you will find that more than 20% of users have been repeatedly assigned to different groups. This destroys the randomness of the AB experiment and makes it difficult to make scientific comparison decisions
Finally, take a look at the results of experiments with the proposed new shunt.
You can divert the traffic when you turn it on. The diverting capacity is guaranteed by the internal platform, which can ensure the uniformity and stability of the diverting to a great extent. . Judging from the experimental data, it is almost close. When doing the square root test, we can also see that it fully meets the needs. At the same time, we can see that the number of effective new devices has increased significantly, by 1%, and the retention rate has also improved. At the same time, if you look at the control group or the experimental group alone, you can see the traffic conversion rate based on the diversion ID to the new device finally generated. The experimental group is 1% higher than the control group. The reason for this result is that the experimental group actually enlarged the user's entry point in NUJ and NUT, making it easier for more users to come in, experience the product, and then stay.
Divide the experimental data into login and non-login parts. It can be found that for users in the experimental group, more users choose non-login. Login mode to experience the product, and the retention rate has also improved. This result is also in line with expectations.
You can see the indicators by daily, and those who entered the group The number of users actually has been written for a long time. Judging by daily, it is increasing steadily, and the retention index has also improved. Compared with the control group, the experimental group has improved in the number of effective devices and retention.
For the scenario of new user traffic acceptance, the evaluation indicators are more evaluated from the retention or short-term LT dimension. Here, the optimization is actually only performed on the one-dimensional space at the LT level
. However, in the new experimental system, the one-dimensional optimization is turned into a two-dimensional optimization. DNU God Shang LT has been improved as a whole, so that the strategy space has changed from one dimension to two dimensions. At the same time, in some scenarios, the loss of part of LT can be accepted.
4. Summary
Finally, let’s summarize the experimental capability building and experimental evaluation standards in new user scenarios.
- UG The existing experimental system in the new user scenario cannot fully solve the problems faced by the evaluation of new user traffic acceptance strategies, and a new experimental system is needed.
- There are several criteria for selecting the offload ID. The first is security compliance, then it can be obtained at the first startup, and the third is within a single installation cycle. is stable and has an injective relationship with the indicator ID.
- Experimental evaluation for new user scenarios is a multi-dimensional optimization. The revenue comes from the effective number of new devices and device retention, unlike the previous evaluation of devices. of retention.
- #Accepting “new” users often brings huge business benefits. The "new" here refers not only to new users, but also to users who have uninstalled and reinstalled.
The above is the detailed content of How to build an AB experiment system in user growth scenarios?. For more information, please follow other related articles on the PHP Chinese website!

特斯拉是一个典型的AI公司,过去一年训练了75000个神经网络,意味着每8分钟就要出一个新的模型,共有281个模型用到了特斯拉的车上。接下来我们分几个方面来解读特斯拉FSD的算法和模型进展。01 感知 Occupancy Network特斯拉今年在感知方面的一个重点技术是Occupancy Network (占据网络)。研究机器人技术的同学肯定对occupancy grid不会陌生,occupancy表示空间中每个3D体素(voxel)是否被占据,可以是0/1二元表示,也可以是[0, 1]之间的

译者 | 朱先忠审校 | 孙淑娟在我之前的博客中,我们已经了解了如何使用因果树来评估政策的异质处理效应。如果你还没有阅读过,我建议你在阅读本文前先读一遍,因为我们在本文中认为你已经了解了此文中的部分与本文相关的内容。为什么是异质处理效应(HTE:heterogenous treatment effects)呢?首先,对异质处理效应的估计允许我们根据它们的预期结果(疾病、公司收入、客户满意度等)选择提供处理(药物、广告、产品等)的用户(患者、用户、客户等)。换句话说,估计HTE有助于我

译者 | 朱先忠审校 | 孙淑娟引言模型超参数(或模型设置)的优化可能是训练机器学习算法中最重要的一步,因为它可以找到最小化模型损失函数的最佳参数。这一步对于构建不易过拟合的泛化模型也是必不可少的。优化模型超参数的最著名技术是穷举网格搜索和随机网格搜索。在第一种方法中,搜索空间被定义为跨越每个模型超参数的域的网格。通过在网格的每个点上训练模型来获得最优超参数。尽管网格搜索非常容易实现,但它在计算上变得昂贵,尤其是当要优化的变量数量很大时。另一方面,随机网格搜索是一种更快的优化方法,可以提供更好的

导读:因果推断是数据科学的一个重要分支,在互联网和工业界的产品迭代、算法和激励策略的评估中都扮演者重要的角色,结合数据、实验或者统计计量模型来计算新的改变带来的收益,是决策制定的基础。然而,因果推断并不是一件简单的事情。首先,在日常生活中,人们常常把相关和因果混为一谈。相关往往代表着两个变量具有同时增长或者降低的趋势,但是因果意味着我们想要知道对一个变量施加改变的时候会发生什么样的结果,或者说我们期望得到反事实的结果,如果过去做了不一样的动作,未来是否会发生改变?然而难点在于,反事实的数据往往是

SimCLR(Simple Framework for Contrastive Learning of Representations)是一种学习图像表示的自监督技术。 与传统的监督学习方法不同,SimCLR 不依赖标记数据来学习有用的表示。 它利用对比学习框架来学习一组有用的特征,这些特征可以从未标记的图像中捕获高级语义信息。SimCLR 已被证明在各种图像分类基准上优于最先进的无监督学习方法。 并且它学习到的表示可以很容易地转移到下游任务,例如对象检测、语义分割和小样本学习,只需在较小的标记

一、盒马供应链介绍1、盒马商业模式盒马是一个技术创新的公司,更是一个消费驱动的公司,回归消费者价值:买的到、买的好、买的方便、买的放心、买的开心。盒马包含盒马鲜生、X 会员店、盒马超云、盒马邻里等多种业务模式,其中最核心的商业模式是线上线下一体化,最快 30 分钟到家的 O2O(即盒马鲜生)模式。2、盒马经营品类介绍盒马精选全球品质商品,追求极致新鲜;结合品类特点和消费者购物体验预期,为不同品类选择最为高效的经营模式。盒马生鲜的销售占比达 60%~70%,是最核心的品类,该品类的特点是用户预期时

1.线性回归线性回归(Linear Regression)可能是最流行的机器学习算法。线性回归就是要找一条直线,并且让这条直线尽可能地拟合散点图中的数据点。它试图通过将直线方程与该数据拟合来表示自变量(x 值)和数值结果(y 值)。然后就可以用这条线来预测未来的值!这种算法最常用的技术是最小二乘法(Least of squares)。这个方法计算出最佳拟合线,以使得与直线上每个数据点的垂直距离最小。总距离是所有数据点的垂直距离(绿线)的平方和。其思想是通过最小化这个平方误差或距离来拟合模型。例如

10 月 5 日,AlphaTensor 横空出世,DeepMind 宣布其解决了数学领域 50 年来一个悬而未决的数学算法问题,即矩阵乘法。AlphaTensor 成为首个用于为矩阵乘法等数学问题发现新颖、高效且可证明正确的算法的 AI 系统。论文《Discovering faster matrix multiplication algorithms with reinforcement learning》也登上了 Nature 封面。然而,AlphaTensor 的记录仅保持了一周,便被人类


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Linux new version
SublimeText3 Linux latest version
