What is the difference between a data warehouse and a database?-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

What is the difference between a data warehouse and a database?

青灯夜游

Oct 19, 2020 pm 04:37 PM

databasedatabase

The difference between data warehouse and database: 1. The database stores raw data without any processing; while the data warehouse is designed to meet the needs of data analysis, and the ETL process and data extraction work are performed on the source data. It is divided into extraction, cleaning, conversion and loading; 2. The amount of data in the data warehouse is much larger than that of the database.

What is the difference between a data warehouse and a database?

(Recommended tutorial: mysql video tutorial)

1. Data warehouse

What is a data warehouse?
Data Warehouse, which can be abbreviated as DW or DWH, data warehouse is a strategic collection of all types of data types for the decision-making and planning process at all levels of the enterprise. It was created for analytical reporting and decision support purposes. For enterprises that need business intelligence, they need to guide business process improvement, monitor time, cost, quality and control, etc.;
What can the data warehouse do? (Cite a few chestnuts)
1. The establishment of annual sales targets requires decisions based on past historical reports and cannot be set casually.
2. Optimize business process
  For example: For a certain brand of mobile phone on a certain e-commerce platform, what is the age group of the main buyers in the past five years? In that season, the number of purchases was large, so that you can Based on this feature, set their main needs for the target group and dynamically allocate the generated production volume and warehouse inventory.
Characteristics of data warehouse
1. The data warehouse is subject-oriented.
  1. Unlike traditional databases, data warehouses are subject-oriented. So what is a subject? The home page theme is a higher-order concept and is the object of data synthesis, classification and analysis in the enterprise information system at a higher level. In a logical sense, it is the analysis object involved in a certain macro analysis field in the enterprise. (In human terms: It is the key aspects that users are concerned about when using data warehouses to make decisions. A topic is usually related to multiple operational information systems, and the data organization of operational databases is oriented towards transaction processing tasks, and each task is isolated from each other. );
2. The data warehouse is integrated.
  1. The data in the data warehouse is extracted from the original scattered database data (mysql and other relational databases). There is a big difference between operational databases and DSS (decision support system) analytical databases. First, the source data corresponding to each topic in the data warehouse has many repetitions and differences in all the scattered databases, and the data from different online systems are bundled with different application logic. ; Second, the comprehensive data in the data warehouse cannot be obtained directly from the original database system. Therefore, before the data enters the data warehouse, it must go through unification and synthesis. This step is the most critical and complex step in the construction of the data warehouse. The work to be done is:
    1. To count the sources All contradictions in the data, such as fields with the same name, different names and synonyms, inconsistent units, inconsistent word lengths, etc.
    2. Perform data synthesis and calculation. The data synthesis work in the data warehouse can be generated when the original database extracts data, but many of them are generated within the data warehouse, that is, after entering the data warehouse, they are synthesized and generated.
3. #The data in the data warehouse changes with time.
  1. The data in the data warehouse cannot be updated for applications. That is to say, users of the data warehouse do not perform data update operations when performing analysis and processing. But it does not mean that all data warehouse data will never change during the entire life cycle from the beginning of data integration to the data warehouse until it is finally deleted.
  2. The data in the data warehouse changes with time, which is one of the characteristics of the data warehouse. This feature mainly has the following three manifestations:
    1. The data warehouse continues to add new data content as time changes. The data warehouse system must continuously capture the changing data in the OLTP database and append it to the data warehouse. That is to say, it must continuously generate snapshots of the OLTP database and add them to the data warehouse through unified integration; but for database snapshots that do not change, If new changed data is captured, only a new database snapshot will be generated and added without modifying the original database snapshot.
    2. The database continuously deletes old data content as time changes. The data in the data warehouse also has a storage period. Once this period expires, the expired data will be deleted. It's just that the data age limit in the database is much longer than the data age limit in the operational environment. In an operational environment, only 60 to 90 days of data are generally saved, while in a data warehouse, a longer period of data (for example, 5 to 10 years) needs to be saved to meet the requirements of DSS for trend analysis.
    3. The data warehouse contains a large amount of comprehensive data, and many of these comprehensive data are related to time. For example, the data is often synthesized according to time periods, or sampled at certain time slices, etc. These data need to be continuously re-synthesized as time changes. Therefore, the data characteristics of the data warehouse include time items to indicate the historical period of the data.
4. #The data in the data warehouse cannot be modified.
  1. The data in the data warehouse is mainly used for enterprise decision-making analysis. The data operations involved are mainly data queries, and generally no modification operations are performed. The data in the data warehouse reflects the content of historical data over a long period of time. It is a collection of database snapshots at different points in time, as well as exported data based on statistics, synthesis and reorganization of these snapshots, rather than data processed online. The libraries for online processing in the database are integrated and input into the data warehouse. Once the data stored in the data warehouse has exceeded the data storage period of the data warehouse, the data will be deleted from the current data warehouse. Because the data warehouse only performs data query operations, the system in the data warehouse is much simpler than the system in the database. Many technical difficulties in database management systems, such as integrity protection, concurrency control, etc., can almost be eliminated in data warehouse management. However, since the amount of data queried in the data warehouse is often large, higher requirements are put forward for data query, which requires the use of various complex indexing technologies; at the same time, the data warehouse is oriented to the senior management of commercial enterprises, who will The interface friendliness and data representation of data query put forward higher requirements;

2. The difference between data warehouse and database

Before we want to understand the difference, we need to understand three concepts. What are database software, databases and data warehouses?
1. Database software: It is a kind of software (not a graphical client that links to the database). It is used to implement database logical processes and belongs to the physical layer.
2. Database: It is a logical concept, a warehouse used to store data, implemented through database software. The database is composed of many tables. The tables are two-dimensional and there are many fields in one table. The fields are arranged in a row, and the data is written into the table row by row. Database tables are capable of expressing multidimensional relationships in two dimensions. Such as: oracle, DB2, MySQL, Sybase, MSSQL Server, etc.
3. Data warehouse: It is an upgrade of the database concept. Logically speaking, there is no difference between a database and a data warehouse. They are both places where data is stored through database software. However, in terms of data volume, the data warehouse is much larger than the database. Data warehouse is mainly used for data mining and data analysis to assist leaders in making decisions;
4. In the IT architecture system, a database must exist and there must be a place to store data. For example, current online shopping and other e-commerce. The inventory of items, the price of the items, the user’s account balance, etc. These data are stored in the background database. Or the simplest understanding is our current accounts and passwords such as WeChat, Weibo and QQ. The background database must be a user table with at least two fields, namely username and password, and then our data is stored in the table row by row. When we log in, we fill in the username and password, and the data will be sent back to the backend to match the data in the table. If the match is successful, we can log in. If the match is unsuccessful, an error will be reported. This is the database, and the database is used for work in the production environment. We use databases for all business-related applications.
5. Data warehouse is one of the technologies under BI. Since the database is linked to business applications, it is impossible for one database to contain all the data of a company. The table design of the database is often designed for a certain application. For example, in the login function just now, there are only these two fields in the user table and no other fields. At that time, this table meets what it should be, and there is no problem, but this table does not meet the analysis. For example, I want to know during which time period there are the largest number of users? Which user purchases the most in a year? Indicators like this. Then we need to redesign the table structure of the database. For data analysis and data mining, we introduced the concept of data warehouse. The table structure of the data warehouse is designed according to the analysis requirements, analysis dimensions, and analysis indicators.
6. The difference between database and data warehouse is actually the difference between OLTP and OLAP.
  1. Operational processing is called Online Transaction Processing (OLTP) (On-Line Transaction Processing), which can also be called a transaction-oriented processing system. It is a daily operation of a specific business online in the database, usually querying a small number of records. ,Revise. Users are more concerned about issues such as operation response time, data security, integrity, and the number of concurrent supported users. As the main means of data management, traditional database systems are mainly used for operational processing.
  2. Analytical processing, called online analytical processing OLAP (On-Line Analytical Processing), generally analyzes historical data on certain topics to support management decisions.

##Operational processingAnalytical processingDetailedComprehensive or refinedEntity-relationship (E-R) modelStar model Or snowflake modelStoring instantaneous dataStoring historical data, excluding recent dataUpdatableRead only, append onlyOperation one unit at a timeOperation one collection at a timeHigh performance requirements , short response timeLoose performance requirementsTransaction-orientedAnalysis-orientedOperation data once Small amountSupport decision-making needsSmall amount of dataLarge amount of dataCustomer orders, Inventory levels and bank account inquiriesCustomer revenue analysis, market segmentation

3. Tail

1. If there are any errors, please feel free to comment. Point it out and I will correct it in time. If you don’t understand anything, you can also leave a message to ask questions and communicate with each other.

2. Maybe you think this is nothing, but I will take it seriously and regard it as my notes and experiences, so that I can improve myself.

The above is the detailed content of What is the difference between a data warehouse and a database?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What are stored procedures in MySQL?May 01, 2025 am 12:27 AM

Stored procedures are precompiled SQL statements in MySQL for improving performance and simplifying complex operations. 1. Improve performance: After the first compilation, subsequent calls do not need to be recompiled. 2. Improve security: Restrict data table access through permission control. 3. Simplify complex operations: combine multiple SQL statements to simplify application layer logic.

How does query caching work in MySQL?May 01, 2025 am 12:26 AM

The working principle of MySQL query cache is to store the results of SELECT query, and when the same query is executed again, the cached results are directly returned. 1) Query cache improves database reading performance and finds cached results through hash values. 2) Simple configuration, set query_cache_type and query_cache_size in MySQL configuration file. 3) Use the SQL_NO_CACHE keyword to disable the cache of specific queries. 4) In high-frequency update environments, query cache may cause performance bottlenecks and needs to be optimized for use through monitoring and adjustment of parameters.

What are the advantages of using MySQL over other relational databases?May 01, 2025 am 12:18 AM

The reasons why MySQL is widely used in various projects include: 1. High performance and scalability, supporting multiple storage engines; 2. Easy to use and maintain, simple configuration and rich tools; 3. Rich ecosystem, attracting a large number of community and third-party tool support; 4. Cross-platform support, suitable for multiple operating systems.

How do you handle database upgrades in MySQL?Apr 30, 2025 am 12:28 AM

The steps for upgrading MySQL database include: 1. Backup the database, 2. Stop the current MySQL service, 3. Install the new version of MySQL, 4. Start the new version of MySQL service, 5. Recover the database. Compatibility issues are required during the upgrade process, and advanced tools such as PerconaToolkit can be used for testing and optimization.

What are the different backup strategies you can use for MySQL?Apr 30, 2025 am 12:28 AM

MySQL backup policies include logical backup, physical backup, incremental backup, replication-based backup, and cloud backup. 1. Logical backup uses mysqldump to export database structure and data, which is suitable for small databases and version migrations. 2. Physical backups are fast and comprehensive by copying data files, but require database consistency. 3. Incremental backup uses binary logging to record changes, which is suitable for large databases. 4. Replication-based backup reduces the impact on the production system by backing up from the server. 5. Cloud backups such as AmazonRDS provide automation solutions, but costs and control need to be considered. When selecting a policy, database size, downtime tolerance, recovery time, and recovery point goals should be considered.

What is MySQL clustering?Apr 30, 2025 am 12:28 AM

MySQLclusteringenhancesdatabaserobustnessandscalabilitybydistributingdataacrossmultiplenodes.ItusestheNDBenginefordatareplicationandfaulttolerance,ensuringhighavailability.Setupinvolvesconfiguringmanagement,data,andSQLnodes,withcarefulmonitoringandpe

How do you optimize database schema design for performance in MySQL?Apr 30, 2025 am 12:27 AM

Optimizing database schema design in MySQL can improve performance through the following steps: 1. Index optimization: Create indexes on common query columns, balancing the overhead of query and inserting updates. 2. Table structure optimization: Reduce data redundancy through normalization or anti-normalization and improve access efficiency. 3. Data type selection: Use appropriate data types, such as INT instead of VARCHAR, to reduce storage space. 4. Partitioning and sub-table: For large data volumes, use partitioning and sub-table to disperse data to improve query and maintenance efficiency.

How can you optimize MySQL performance?Apr 30, 2025 am 12:26 AM

TooptimizeMySQLperformance,followthesesteps:1)Implementproperindexingtospeedupqueries,2)UseEXPLAINtoanalyzeandoptimizequeryperformance,3)Adjustserverconfigurationsettingslikeinnodb_buffer_pool_sizeandmax_connections,4)Usepartitioningforlargetablestoi

See all articles