What is a Data Warehouse?
A data warehouse is a system that pulls together data from many different sources within an organization for reporting and analysis. The reports created from complex queries within a data warehouse are used to make business decisions.
In more comprehensive terms, a data warehouse is a consolidated view of either a physical or logical data repository collected from various systems. The primary focus of a data warehouse is to provide a correlation between data from existing systems, i.e., product inventory stored in one system purchase orders for a specific customer, stored in another system. Data warehouses are used for online analytical processing (OLAP), which uses complex queries to analyze rather than process transactions.
What is a Database?
A database contains information organized in columns, rows, and tables that is periodically indexed to make accessing relevant information more accessible.
Many enterprises and organizations create and manage databases using a database management system. Special DBMS software can be used create and store product inventory and customer information, for example.
Organizations most often use databases for online transaction processing (OLTP). Database software needs to provide easy access to information and fast querying so that transactions can be carried out efficiently. Databases are often referred to as operational systems, meaning they are used to process day-to-day transactions in an organization.
While most read, write, and report generation is usually managed by a Database Administrator, some transactional databases provide atomicity, consistency, isolation, and durability (ACID) compliance to ensure that the information contains in the database is consistent and any transactions that take place, are complete.
Data Warehouse vs Database
Data warehouses and databases are both relational data systems, but were built to serve different purposes. A data warehouse is built to store large quantities of historical data and enable fast, complex queries across all the data, typically using Online Analytical Processing (OLAP). A database was built to store current transactions and enable fast access to specific transactions for ongoing business processes, known as Online Transaction Processing (OLTP).
A database is optimized to maximize the speed and efficiency with which data is updated (added, modified, or deleted) and enable faster analysis and data access. Databases use Online Transactional Processing (OLTP) to delete, insert, replace, and update large numbers of short online transactions. Other features include fast query processing, multi-access data integrity, and a number of processed transactions per second. Databases performing OLTP transactions contain and maintain current, and detailed data from a single source. However, due to the number of table joins, performing analytical queries is difficult and requires an experienced database administrator or developer familiar with the application, to write queries that result in any meaningful analysis.
Response times from databases need to be extremely quick for efficient transaction processing. The most important aspect of a database is that the write operation is recorded in the system. A business that sells products online wouldn’t be in business very long if its database didn’t make a record of every purchase.
Data warehouses use Online Analytical Processing (OLAP) that is optimized to handle a low number of complex queries on aggregated large historical data sets. Tables are denormalized and transformed to yield summarized data, multidimensional views, and faster query response times. Additionally, query response times are used to measure an OLAP system’s effectiveness. As a function of business intelligence, OLAP allows managers and analysts to select, extract, view, and analyze corporate data to identify and obtain insights on corporate trends as well as identify potential issues.
Most databases use a normalized data structure. Data normalization means reorganizing data so that it contains no redundant data, and all related data items are stored together, with related data separated into multiple tables. Normalizing data ensures the database takes up minimal disk space while response times are maximized.
The more normalized your data is, the more complex the queries needed to read the data because a single query combines data from many tables. This puts a huge strain on computing resources.
The data in a data warehouse does not need to be organized for quick transactions. Therefore, data warehouses normally use a denormalized data structure. A denormalized data structure uses fewer tables because it groups data and doesn’t exclude data redundancies. Denormalization offers better performance when reading data for analytical purposes.
A database processes day-to-day transactions within an organization. Therefore, databases typically don’t contain historical data—current data is all that matters in a normalized relational database.
Data warehouses are used for analytical purposes and business reporting. Data warehouses typically store historical data by integrating copies of transaction data from disparate sources. Data warehouses can also use real-time data feeds for reports that use the most current, integrated information.
While databases are normally used for transactional purposes, analytical queries can still be performed on the data. The problem is that the complexity of the data’s normalized organization makes analytical queries difficult to carry out. A skilled developer or analyst will be required to create such analytical queries. The depth of analysis is limited to static one-time reports because databases just give a snapshot overview of data at a specific time.
The structure of data warehouses makes analytical queries much simpler to perform. No advanced knowledge of database applications is required. Analytics in data warehouses is dynamic, meaning it takes into account data that changes over time.
An OLTP database supports thousands of concurrent users. Many users must be able to interact with the database simultaneously without it affecting the system’s performance.
Data warehouses support a limited number of concurrent users compared to operational systems. The data warehouse is separated from front-end applications and it relies on complex queries, thus necessitating a limit on how many people can use the system simultaneously.
Database vs. Data Warehouse Applications
With databases, there is a one-to-one relationship with a single application as its source. A credit card processing application is an excellent example of a single data source that can run on an OLTP database. This type of database contains highly detailed data as well as a detailed relational views. Tables are normalized to achieve efficient storage, concurrent transaction processing, as well as return quick query results.
Data warehouses store summarized historical data from many different applications. There is a one to many relationships between a data warehouses and the applications that serve as data sources. Examples of data sources include but are not limited to customer relationship management (CRM), enterprise resource management (ERP), or even social media data.
Database Use Cases
Database use cases are related to the day-to-day transactional requirements in an organization. Some examples of database applications include:
- An airline using an online booking system
- A hospital registering a patient
- A bank adding an ATM withdrawal transaction to an account
- A website creating an order for a product it has sold
Data Warehouse Use Cases
Data warehouse use cases focus on providing high-level reporting and analysis that lead to more informed business decisions. Use cases include:
- Carrying out data mining to gain new insights from the information held in many large databases
- Conducting market research by analyzing large volumes of data in-depth
- An online business analyzing user behavior to make business decisions
Database vs. Data Warehouse SLA’s
Most SLAs for OLTP databases state that they must meet 99.99% uptime. Any system failure may result in lost revenue and lawsuits. Because the database is directly linked to the front end application, real-time data is always available.
SLAs for data warehouses have downtime built in to accommodate periodic uploads of new data. To support scalability, data warehouses are separated from frontend applications.
Panoply is a smart cloud data warehouse that automates the ingestion diverse data sources and uses machine learning to optimize query speed.