I’m really speechless. The second submission was rejected again. The reason is: “This content belongs to technical discussion. It is recommended to briefly talk about your thoughts on this issue in order to better have a technical exchange with others.” "If it was the first time because of the layout, I would admit it. Why is it so difficult to post a Q&A post? Okay, just express it as required. If you really have no idea, do you have to make it up randomly? I believe that the website team has a good intention, but for newcomers, it is likely to be abandoned because of this. For the last time, I won’t use this forum anymore. It’s really difficult to ask questions.
The following is the original text, and some of my own thoughts will be attached at the end.
When I first came here, my typesetting was probably too messy for the first time, which caused me to fail the review. I later learned some markdown syntax specifically for this purpose. Without further ado, let’s get into the text.
A large listed state-owned enterprise has branches in various provinces and cities. In order to uniformly manage the business processes and financial data of branches and subsidiaries across the country, a management platform was developed and distributed Provide secondary development to each province and city based on actual needs to replace the original decentralized financial systems, business management systems, etc. of each city.
The frontend uses angularJs, bootstrap, html, the backend uses springMVC, MyBatis, the database includes Oracle and Mysql, the rpc framework uses dubbo, the registration center uses zookeeper, and the cache uses redis. The overall system architecture is a distributed cluster. . The entire system includes multiple business modules, mainly five modules: "Project Management", "Contract Management", "Procurement Management", "Sales Management" and "Accounting Management".
Based on existing business, make 100 reports. The reporting module will have an independent database and application.
How to build a large table for use in all reports?
It is of course the most basic thing to clarify the business part, but each module calls the service through the interface.
How to extract large amounts of data from distributed databases?
Each module has its own database, some use Oracle, some use MySQL, and the data volume level is more than 10 million.
What should be the more reasonable way to synchronize data? With what technology?
Incremental synchronization is difficult. There is no good way from the business module to ensure that incremental data is not missed; if it is full data, the amount of data each time is really too large. In addition, how can we ensure performance when displaying reports and exporting data?
It’s my first time doing such a big project, so I’m really confused. It may not be expressed clearly. If you have anything to add, please leave a message. I hope that the seniors in the forum can give me some advice, I would be very grateful.
ps:
I still don’t understand many parts of markdown. For example, when ordered and unordered lists are nested, why do the solid black dots become hollow?
Why is the unordered list line wrapping only valid for the current row, and when the current row is valid, the lines that have been wrapped above are invalid?
I don’t understand. Could it be that the syntax of different editors is different?
How can I see the markdown of posts posted by others? If you can see it, you can learn from it.
Analyze the statistical dimensions and common fields of all reports, create a new oracle user in the report module, divide all fields into tables according to the module, and use it as the basic table of the report;
You need to consider the data extraction method. Currently I can think of two methods:
Provide interfaces through each module, and insert the data into the basic large table of the report module; Advantages: The extraction rules are easy to maintain; Disadvantages: Poor performance;
Connect the Oracle database and Mysql database through DBLink, and use stored procedures to directly insert data into the basic large table. Advantages: Performance improvement; Disadvantages: It will become difficult for the rules to be maintained by others.
The data is fully synchronized every time. Advantages: The logic is simple; Disadvantages: The amount of data synchronization is too large and the time is too long.
The above are some personal opinions. I hope you seniors can give me some advice.
过去多啦不再A梦2017-06-20 10:07:40
The demand you mentioned is basically building a data warehouse. The basic idea is:
1. The databases of the data warehouse and the business system are independent. The modeling of the data warehouse generally requires a hierarchical design, not simply building a large table.
It is generally divided into buffer layer, base layer, aggregation layer, report layer, etc. The focus of each layer is different. The base layer is still based on the paradigm model, the aggregation layer generally needs to make data redundant, and the report layer is generally It is a wide table design with many columns.
2. Data synchronization. When the amount of data is large, there must be an incremental mechanism. If not, system modification needs to be applied.
3. There are several ideas for synchronization methods:
a. 用dblink打通数据库,人工写存储过程。
b. 用informatic powercenter 或kettle类似的ETL工具
c. 专用的数据库层同步软件,如oracle的ogg等