search

Home  >  Q&A  >  body text

Efficiently find the last timestamp within a given time range

The problem is with a hypothetical online store; I will simplify the data for my specific problem.

Suppose there are two tables: one holds the registration date of each customer

CREATE TABLE customers (
    customerID int,
    register DATE,
);

, the second one saves all the customer’s online visits

CREATE TABLE visits (
    customerID int,
    visit DATE,
);

For each user, I want to find the last day the user visited the online store within 7 days of registration, in the most efficient way. If the user has not visited the store since registration, I should return NULL as the last visit date within 7 days of registration.

Of course I can LEFT JOIN customers access by Customer ID and filter the access date to between the registration day and Add seven, and finally get the maximum date of each customer. However, the initial join results in a huge table and I want to avoid this computationally expensive operation.

Thank you very much for your thoughts!

P粉232409069P粉232409069244 days ago627

reply all(1)I'll reply

  • P粉514458863

    P粉5144588632024-03-23 10:28:00

    This is probably the most efficient:

    SELECT  c.customerID, c.register,
            ( SELECT MAX(v.visit)
                FROM visits AS v
                WHERE v.customerID = c.customerID
                  AND v.visit < c.register + INTERVAL 7 DAY
            ) AS last_date
        FROM customers AS c;

    index:

    customers:  PRIMARY KEY(customerId)  -- is this what  you have?
    visits:     INDEX(customerID, visit)  -- or perhaps PRIMARY KEY

    Your concerns about "computationally expensive operations":

    • The cost of retrieving rows is higher than the cost of evaluating expressions.
    • My recommendation for accessing the index is to "overwrite" it, so it only looks at the index.

    reply
    0
  • Cancelreply