I have a list of drivers, orders and dates for the period January 1, 2022 to January 15, 2022 (15 days) in a table named all_data
as shown below:
driver_id | order_id | Order Date |
---|---|---|
1 | one | 2022-01-01 |
1 | b | 2022-01-02 |
2 | c | 2022-01-01 |
2 | d | 2022-01-03 |
Within these 15 days, how do I find the number of continuously active drivers who have completed at least one order per day as of that date? The output should be a table like this:
Order Date | active_drivers |
---|---|
2022-01-01 | 30 |
2022-01-02 | 27 |
2022-01-03 | 25 |
For example, on January 1, 2022, 30 different drivers completed at least one order that day. On January 2, 2022, we have to find the number of unique drivers who fulfilled at least one order on January 1, 2022, and January 2, 2022. On January 3, 2022, we must count drivers who completed at least one order on January 1, 2022, January 2, 2022, and January 3, 2022.
What I tried
I found a similar solution in MySQL (below), but it is not allowed in bigquery because of the error "Unsupported subquery with table in join predicate".
MySQL SELECT order_date, (SELECT COUNT(distinct s1.driver_id) as num_hackers FROM all_data s2 join all_data s1 on s2. order_date = s1. order_date and (SELECT COUNT(distinct s3. order_date) FROM all_data s3 WHERE s3.driver_id = s2.driver_id AND s3. order_date < s1. order_date) = datediff(s1. order_date, date('2022-01-01'), day) )) from all_data
I've also read this Google BigQuery: Rolling Count Distinct question, but that's for a fixed 45 days, whereas the number of days here is a date-based variable. How to write a query in BigQuerySQL to find the rolling number of drivers that are continuously active per day?
P粉7575563552024-03-31 18:46:13
First find all combinations of date and driver, then get the count of all drivers for each date. Try this:
select order_date, count(*) from( select order_date, driver_id, count(*) from all_data ad group by order_date, driver_id) group by order_date
P粉4398045142024-03-31 00:40:04
Consider the following
select order_date, count(distinct if(flag, driver_id, null)) active_drivers from ( select order_date, driver_id, row_number() over(partition by driver_id order by order_date) - date_diff(order_date, min(order_date) over(), day) = 1 as flag from (select distinct order_date, driver_id from all_data) ) group by order_date