search

Home  >  Q&A  >  body text

Firestore method to get random documents in collection

It is crucial for my application to be able to randomly select multiple documents from a collection in Firebase.

Since there is no built-in native function in Firebase (that I know of) to implement a query that does this, my first thought was to use a query cursor to pick a random start and end index, assuming I had The number of documents in the numeric collection.

This approach would work, but only in a limited way, as each document will be served sequentially with its neighboring documents each time; however, if I am able to select a document by its index in its parent collection, I can Implementing a random document query, but the problem is that I can't find any documentation describing how to do this, or even if it is possible to do this.

This is what I want to do, consider the following firestore architecture:

root/
  posts/
     docA
     docB
     docC
     docD

Then on my client side (I'm in a Swift environment) I want to write a query that does this:

db.collection("posts")[0, 1, 3] // would return: docA, docB, docD

Can I do something similar? Alternatively, is there any other way to select random documents in a similar way?

please help.

P粉277305212P粉277305212444 days ago869

reply all(2)I'll reply

  • P粉668113768
  • P粉985686557

    P粉9856865572023-10-20 09:50:51

    Using a randomly generated index and a simple query, you can randomly select documents from a collection or collection group in Cloud Firestore.

    This answer is divided into 4 parts, each part has different options:

    1. How to generate random index
    2. How to query random index
    3. Select multiple random documents
    4. Reseed for consistent randomness

    How to generate random index

    The basis of this answer is to create an index field that, when sorted in ascending or descending order, will cause all documents to be sorted randomly. There are a number of different ways to create this, so let's look at 2, starting with the most accessible method.

    Automatically identify version

    If you use the randomly generated automatic IDs provided in our client library, you can use the same system to randomly select documents. In this case, the randomly ordered index is the document ID.

    Later in our query section, the random value you generate is a new automatic ID (iOS, Android, Web) that you The field queried is the __name__ field, and the "low value" mentioned later is an empty string. This is by far the simplest way to generate a random index, and will work regardless of language and platform.

    By default, document names (__name__) are only indexed in ascending order, and you cannot rename existing documents except by deleting and recreating them. If you need either of these, you can still use this method, just store the automatic ID as an actual field named random instead of overloading the document name for this purpose.

    Random integer version

    When you write a document, you first generate a random integer in a bounded range and set it to a field named random. Depending on the number of documents you expect, you can use different bounded ranges to save space or reduce the risk of conflicts (which reduces the effectiveness of this technique).

    You should consider which language you need as there will be different considerations. Although Swift is simple, JavaScript has a notable problem:

    • 32-bit integers: ideal for small (~10K less likely to conflict) data sets
    • 64-bit integers: large data sets (note: JavaScript itself does not support it, still)

    This will create an index with documents sorted randomly. Later in our query section, the random value you generate will be another of these values, and the "low value" mentioned later will be -1.

    How to query random index

    Now that you have a random index, you will need to query it. Below we look at some simple variations that select 1 random document, as well as options for selecting multiple 1 documents.

    For all of these options, you need to generate a new random value in the same form as the index value you created when writing the document, represented by the variable random below. We will use this value to find random points on the index.

    Surround

    Now that you have random values, you can query individual documents:

    let postsRef = db.collection("posts")
    queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                       .order(by: "random")
                       .limit(to: 1)

    Check if the document has been returned. If not, query again, but with the "low value" of the random index. For example, if you do random integers, lowValue is 0:

    let postsRef = db.collection("posts")
    queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
                       .order(by: "random")
                       .limit(to: 1)

    As long as you have one document, you are guaranteed to return at least 1 document.

    Both directions

    The wraparound method is simple to implement and allows you to optimize storage with only ascending indexes enabled. One disadvantage is that values ​​may be unfairly protected. For example, if the first 3 documents in 10K (A, B, C) have random index values ​​A:409496, B:436496, C:818992, then the chance of A and C being selected is less than 1/10K, while B will be selected because of A is effectively shielded from close proximity, and has only about a 1/160K chance.

    Instead of querying one way and wrapping around if a value is not found, you can randomly choose between >= and <=<=, which reduces the probability of unfairly masking a value Halved at the cost of doubling index storage.

    If no result is returned in one direction, switch to the other direction:

    queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
                       .order(by: "random", descending: true)
                       .limit(to: 1)
    
    queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                       .order(by: "random")
                       .limit(to: 1)

    Select multiple random documents

    Typically, you need to select multiple random documents at once. There are two different ways to adapt the above techniques depending on the trade-offs you want.

    Rinse and repeat

    This method is very simple. Just repeat the process, including choosing a new random integer each time.

    This method will give you a random sequence of documents without having to worry about seeing the same pattern repeatedly.

    The trade-off is that it will be slower than the next method since it requires a separate round trip to serve each document.

    Keep it up

    In this method, just increase the limit number of required documents. This is a bit complicated because you may be returning 0..limit documents in the call. You then need to get the missing document in the same way, but with the limitations reduced to just the differences. If you know that the total number of documents is more than you ask for, you can optimize by ignoring the edge case where enough documents are never retrieved on the second call (but not the first).

    The trade-off with this solution is the repeating sequence. Although the documents are sorted randomly, if you end up with overlapping ranges, you'll see the same pattern you saw before. There are ways to alleviate this concern, which we will discuss in the next section on reseeding.

    This method is faster than "rinse and repeat" because you will request all documents in one call in the best case or two calls in the worst case.

    Reseed for consistent randomness

    While this method will give you documents randomly if the document set is static, the probability of returning each document will also be static. This is a problem because some values ​​may have unfairly low or high probabilities depending on the initial random value they were obtained from. In many use cases this is fine, but in some you may want to increase the long-term randomness so that there is a more even chance of any 1 document being returned.

    Note that inserted documents will eventually be intertwined, gradually changing the probability, and the same will be true for deleted documents. If the insertion/deletion rate is too small for a given number of documents, there are some strategies to solve this problem.

    Multiple random

    You don't have to worry about reseeding, you can always create multiple random indexes per document and then randomly select one of them each time. For example, let field random be a map containing subfields 1 to 3:

    {'random': {'1': 32456, '2':3904515723, '3': 766958445}}

    Now you will randomly query random.1, random.2, random.3, creating a larger distribution of randomness. This essentially uses increased storage space to save the increased computation (document writing) of reseeding.

    Reset seed when writing

    Every time the document is updated, the random value of the random field will be regenerated. This will move the documents in a random index.

    Reseeding on read

    If the generated random values ​​are not uniformly distributed (they are random, so this is expected), the same document may be selected at inappropriate times. This problem can be easily solved by updating a randomly selected document with new random values ​​after reading it.

    Since writes are more expensive and can become hotspots, you may choose to update only on a subset of read times (e.g., if random(0,100) === 0) update; ). < /p>

    reply
    0
  • Cancelreply