search

Home  >  Q&A  >  body text

Efficient way to fill missing values ​​in unordered map from SQL table (SQL/C++)

question There is currently a system that reads unique identifiers (id) and their attached data sets from a SQL table into an unordered map. These datasets used to start with id 1, but adding and removing datasets took about 10 milliseconds. Note that not all datasets are always loaded into RAM.

When the program starts, it reads SELECT MAX(id) from the database and proceeds to add 1 to the counter variable which will be used as the id# for any added dataset ##. The ids of the deleted dataset are no longer used anywhere. This will inevitably lead to gaps in the id sequence, which will one day overflow.

I'm looking for a performant way to fill the lowest gap in

id values. Also consistent with only partially loaded SQL tables and program memory. The data in memory may take several minutes or immediately to be saved to the SQL table.

idea One solution I came up with was to do an expensive query on each created dataset at runtime to find the smallest gap in the SQL table, checking if this id exists as a value in the unordered map , and then again use the variable from the counter as a backup to avoid endless queries for the free id. This works perfectly for quantities of 1 id. Then it remains free in SQL, but is fetched in the unordered map until memory is saved again.

I also brainstormed querying the list of free IDs into a vector and using them as

ids for new datasets until the vector is empty and then (or often) doing new queries for more IDs . But I can't think of a query to find X number of gaps in a table that may or may not have an id column starting with 1.

I came across How to find "gaps" when running a counter using SQL? , but I have two problems with the top answer: Apparently, it only finds one gap, whereas I need a lot of gaps, and I can't understand its use of

mo and mi .

Suppose I have a table called

userdata that contains the id and dataset columns, both 32-bit signed INTs. How to find a series of gaps in the id column? For example, when the ids in the table are 1,6,7,9, I want the query to return 2,3,4,5,8.

Any pointers to possible solutions would also be appreciated.

P粉476547076P粉476547076262 days ago514

reply all(1)I'll reply

  • P粉252423906

    P粉2524239062024-04-05 00:12:20

    If there is a database change every 10 milliseconds, then there are 100 changes per second. A signed int can hold approximately 2,147,483,648 values, or 21,474,846 seconds, which is approximately 8 months. After this, it is not possible to have a new ID available.

    The first solution is to use the 64bit type instead of int. This gives you about 13,600 years (for signed 64b), which seems enough :)


    Other solution is to have a vector containing all possible IDs. Vector storage bool(ID used/unused). Requesting a new ID is done by moving the vector to the first position marked as unused.
    This vector uses a lot of RAM, although there is a version of std::vector specifically for bool that requires less RAM.


    The third solution is to deal with storing a linked list (possibly doubly linked) of deleted (read: reusable) IDs.

    When a new ID is requested, the list provides its header, or the size of the table if the list is empty.
    When a dataset is deleted, its ID is correctly inserted into the list, so the list is always sorted.
    When an ID is reused, it is removed from the list.
    Deleting the last record in the table may also delete the last nodes in the list because they are useless (case ID > table size). That's why I recommend using a doubly linked list so that the last node can be removed quickly.

    So the list uses "new" and "delete" on its nodes quickly, and also runs up and down (for dual links) frequently to insert new nodes.
    This is a bit slow, but I hope the list isn't too big and then the time required isn't bad.

    Also note that this list gives you the array of gaps you need.

    reply
    0
  • Cancelreply