I have a table like this:
<表类=“s-表”> <标题>Track all user emails, IPs, dates/times and events (registrations and purchases).
Right now, I'm trying to get daily statistics on a) signups and b) conversions (purchases that occur within 7 days of signup, the original signup date assigned to that email/IP, not the purchase date). p>
I could easily figure out a) registrations... but trying to figure out how to query conversions within 7 days and then assign each registration conversion to the registration date (instead of the conversion date, which is easy), it turns out this It's quite a challenge.
This is my query so far:
选择日期(时间戳)作为日期, SUM(CASE WHEN event = '注册' THEN 1 ELSE 0 END) AS 注册, SUM(CASE WHEN event = '购买' THEN 1 ELSE 0 END) AS 转化 来自点击跟踪 哪里日期(时间戳)<='2021-07-31' 和日期(时间戳)>='2021-07-01' 按日期分组 按日期排序
This gives me the following results:
<表类=“s-表”> <标题>What I ideally need is something like this (3 purchase events associated with 3 registration events on the 15th, hence why 3 conversions are assigned to the 15th and none to the 16th):
<表类=“s-表”> <标题>Does it make sense?
Keep in mind that the size of this click_tracking table is a million or two records, and I've tried JOINS on it multiple times to make it crash, so not just any query will work...
Any idea how to solve this problem efficiently and change my query to accomplish this task?
P粉8846670222023-09-12 17:09:57
You need window functions to perform this kind of query:
与组合 AS ( 选择日期(时间戳)作为日期0, 电子邮件, FIRST_VALUE(事件) OVER(按电子邮件分区 ORDER BY 当前行和 0 个后续行之间的时间戳行) AS event1, NTH_VALUE(事件,2) OVER(按电子邮件分区 ORDER BY 当前行和后续 1 行之间的时间戳行) AS event2, FIRST_VALUE(日期(时间戳)) OVER(按电子邮件分区 ORDER BY 1 PRECEDING AND 1 FOLLOWING 之间的时间戳行) AS date1, NTH_VALUE(DATE(时间戳),2) OVER(按电子邮件分区 ORDER BY 1 PRECEDING AND 1 FOLLOWING 之间的时间戳行) AS date2 来自点击跟踪 WHERE 时间戳位于“2021-07-01 00:00:00”和“2021-07-30 23:59:59”之间) 选择日期 0 作为日期, SUM(CASE WHEN event1='注册' THEN 1 ELSE 0 END) AS 注册, SUM(CASE WHEN event1='注册' AND event2='购买' AND DATEDIFF(date2,date1) < 8 THEN 1 ELSE 0 END) AS 转化 从组合 按 1 分组
Assuming that for each email, the first record is always Registration
and the second record (if any) is always Purchase
, you will get the type of the email and date records the first 2 records at a time. You can then easily count sign-ups and purchases separately, while applying additional filters so that there are no more than 7 days between 2 events.
If you have a key on timestamp
then the query should be fast enough even with 1 million rows.