Home  >  Q&A  >  body text

java - mysql single table pipeline 500W data multi-dimensional statistics solution

Now there is a flow table structure in the project as follows
id sdkVersion jarVersion countryCode imei createTime

The previous requirement was to find the total number by grouping sdkVersion, jarVersion, countryCode and the total number after imei sorting. The approximate sql is as follows:

select sdkVersion,jarVersion,countryCode,count(*),count(distinct imei) from xxx 
where createTime = 'xxxx-xx-xx'
group by sdkVersion,jarVersion,countryCode

Find out all the data of the previous day and summarize it into a table. The structure is roughly as follows

id sdkVersion jarVersion countryCode count(*) count(distinct imei) createTime

The current requirement is to query the combination of any latitude, that is,
group by sdkVersion
group by jarVersion
group by countryCode
group by sdkVersion, countryCode
and so on. Combination, if we follow the previous daily summary plan, we will have to create many tables for different latitude combinations. Is there any good solution to solve this problem? Or can it be solved using a specialized statistical framework?

黄舟黄舟2711 days ago859

reply all(4)I'll reply

  • 仅有的幸福

    仅有的幸福2017-05-19 10:09:03

    You can check out the PipelineDB streaming database

    reply
    0
  • 黄舟

    黄舟2017-05-19 10:09:03

    apache kylin, sub-second olap

    reply
    0
  • 曾经蜡笔没有小新

    曾经蜡笔没有小新2017-05-19 10:09:03

    For daily summary, real-time requirements are not high, and 500W records are still within the processing range. View + scheduled plan can meet the requirements, and there is no need to build multiple tables.
    It is best for the questioner to explain what bottlenecks or pain points there are. After all, mysql is a mature product, and there are certain risks in switching to cutting-edge technology.

    reply
    0
  • 黄舟

    黄舟2017-05-19 10:09:03

    Write the stored procedure and run it regularly every day

    reply
    0
  • Cancelreply