search

Home  >  Q&A  >  body text

The number of daily active drivers has continued to increase since its inception

I have a list of drivers, orders and dates for the period January 1, 2022 to January 15, 2022 (15 days) in a table named all_data as shown below:

driver_id order_id Order Date
1 one 2022-01-01
1 b 2022-01-02
2 c 2022-01-01
2 d 2022-01-03

Within these 15 days, how do I find the number of continuously active drivers who have completed at least one order per day as of that date? The output should be a table like this:

Order Date active_drivers
2022-01-01 30
2022-01-02 27
2022-01-03 25

For example, on January 1, 2022, 30 different drivers completed at least one order that day. On January 2, 2022, we have to find the number of unique drivers who fulfilled at least one order on January 1, 2022, and January 2, 2022. On January 3, 2022, we must count drivers who completed at least one order on January 1, 2022, January 2, 2022, and January 3, 2022.

What I tried

I found a similar solution in MySQL (below), but it is not allowed in bigquery because of the error "Unsupported subquery with table in join predicate".

MySQL
SELECT order_date, 
 (SELECT COUNT(distinct s1.driver_id) as num_hackers 
   FROM all_data s2 
   join all_data s1 
   on s2. order_date = s1. order_date and     
     (SELECT COUNT(distinct s3. order_date) 
      FROM all_data s3 
      WHERE s3.driver_id = s2.driver_id 
        AND s3. order_date < s1. order_date) 
     = datediff(s1. order_date, date('2022-01-01'), day)
))
from all_data

I've also read this Google BigQuery: Rolling Count Distinct question, but that's for a fixed 45 days, whereas the number of days here is a date-based variable. How to write a query in BigQuerySQL to find the rolling number of drivers that are continuously active per day?

P粉434996845P粉434996845265 days ago431

reply all(2)I'll reply

  • P粉757556355

    P粉7575563552024-03-31 18:46:13

    First find all combinations of date and driver, then get the count of all drivers for each date. Try this:

    select order_date, count(*)
    from(
        select order_date, driver_id, count(*)
        from all_data ad
        group by order_date, driver_id)
    group by order_date

    reply
    0
  • P粉439804514

    P粉4398045142024-03-31 00:40:04

    Consider the following

    select order_date, count(distinct if(flag, driver_id, null)) active_drivers
    from (
      select order_date, driver_id, 
        row_number() over(partition by driver_id order by order_date) - 
        date_diff(order_date, min(order_date) over(), day) = 1 as flag
      from (select distinct order_date, driver_id from all_data)
    )
    group by order_date

    reply
    0
  • Cancelreply