search

Home  >  Q&A  >  body text

php - When importing excel into laravel and the amount of data is large, what is the best way to deal with duplicate data?

Use laravel excel to import the user's excel
It can be expected that an excel will have thousands to tens of thousands of entries.
It needs to be judged not to import duplicate users. For example, if 10,000 people are imported and there are already 100 people in the database, then these 100 people need to be judged and additional logical processing done. Another 9,900 people were imported. And it records 9900 insertion statuses, such as 8000 successes and 1900 failures. I am thinking of several solutions and I don’t know which one is more reasonable
1. The traditional batch insertion method cannot accurately determine whether each piece of data is repeated or record the return status. But the efficiency should be relatively high.
2. Use a for loop to check whether it exists before inserting, and then record the status after inserting. From the business logic, it is the most suitable for the needs, but the efficiency is very low. It is estimated that it will easily cause 500 errors or timeout.
3. Improve on the basis of 2. Write the entire process into the queue, group the chunks into the queue, and then process it in the background. The disadvantage is that the user experience is relatively poor and cannot be obtained immediately. It also needs to cooperate with the notification system to do this.
4. After checking, there is another way to use transactions to commit in blocks. I don’t know how this works and I haven’t tested it yet. Please also give me your advice.

I don’t know if you have any better ideas. It can balance efficiency and logic. Thanks

世界只因有你世界只因有你2753 days ago1138

reply all(2)I'll reply

  • 过去多啦不再A梦

    过去多啦不再A梦2017-05-16 13:00:18

    Looking at the meaning of the question, I understand it is
    1. Excel upload
    2. Excel analysis, storage
    3. Return data to the user

    The whole process should be quite time-consuming. You can give the user the status of [Processing], process it in the background, and output the result after it is completely processed.

    As for judging whether there are duplicate users and recording the status of success and failure, you can consider saving it to redis first, which will be faster, and finally all the results will be stored in the database for processing

    reply
    0
  • 大家讲道理

    大家讲道理2017-05-16 13:00:18

    Tell me what I think, don’t comment if you don’t like it.

    Assume that the unique identifier in the database and excel is the user ID number:

    • Check out the existing ID number in the database,

    • After Excel is uploaded, it traverses each row and determines whether the ID number in the current piece of data is found in the array in the database

    • If it is not there, write it into the database and add the ID number to the found array,

    • If yes, skip

    In my online environment, I can upload an Excel file with 50,000+ rows at a time, and the waiting time is only about 6 seconds.
    Of course, the number of single rows and columns in each online environment and Excel file is different,
    This is for the poster’s reference only.

    reply
    0
  • Cancelreply