微服务架构中,每个服务都有自己的独立数据库。
然而现在有个需求,需要生成一张实时的报表,该报表包含两个服务的数据。
如服务A,服务B。B中仅包含A的主键id作为关联。
而此报表的搜索条件包含A服务实体中的字段也包含B服务实体中的字段。
现有方案
1、如果搜索条件中包含A的条件,则先去服务A中搜索,得到所有结果的主键,在服务B中使用where A.id IN (ids) 再次查询
想法:当A.id数量庞大时,这个查询极其缓慢! 而A.id数量庞大的情况很多
2、使用搜索引擎
想法:感觉杀鸡用牛刀
请教各位大牛有更好的方案吗
迷茫2017-04-18 10:57:55
Laxative
If it is online business data (OLTP), then option 1 is the standard practice of microservices. If such related queries need to be done frequently online, it means that the coupling of the two services (and their two libraries) is very serious, so why bother to separate them in the first place?
If it is an analysis report, it falls under the OLAP category. Solution 2 is indeed a desirable solution. If you feel that using a search engine is overkill, you might as well try doing various report analysis operations on the slave database. For example, the online A database and B database are synchronized to a read-only database in real time, and then in the read-only database JOIN is done in one go.
天蓬老师2017-04-18 10:57:55
One of the design principles of microservices is to separate services that are not related to the business into separate services. There is overlap between your businesses.
PHP中文网2017-04-18 10:57:55
In fact, this kind of problem is very common in microservices. For example, you need to query the order through some information on the product. The order and the product belong to two microservices respectively. In addition to your own two solutions, there are also
Put the data aggregation into the data warehouse, aggregate the data in A and B in real time into another database (not necessarily mysql, it can also be Hbase), and pull the data from the report from the data warehouse
When designing the table, it is appropriate to redundant some fields. As you said, some fields of A can be predictably redundant on B
Method 1 has a very fatal shortcoming. Once paging is involved, this method is definitely not feasible. Which solution to adopt depends on the order of magnitude corresponding to your data. If the corresponding amount of data is not very large, you can Using method 1, if the speed is slow, you can open a few more threads to retrieve the corresponding data in batches (if there are too many IDs, pull them in batches, and batch query are effective solutions that can reduce timeouts and time); if the amount of data is large , it is recommended to use a data warehouse. The main benefit of using a data warehouse is that it will not put pressure on the main database, because the generation of aggregate tables can be obtained through Binlog; because the report still belongs to the category of offline data, if it really needs to be like an order The query is so real-time, very efficient and accompanied by the status of the table, and there are so many search conditions, then the search engine is a good choice
So, you can use method 1 and method 3 according to the actual situation
黄舟2017-04-18 10:57:55
Requirements such as generating reports should not be placed in business database systems. You can make a set of otter aggregation libraries on the backend to synchronize data from multiple services in real time. Then you can play whatever you want in this aggregation library.