Glossary

Glossary


Analytical database - An analytical database stores historical data to be read. Analytical databases are not updated as frequently as transactional databases. Examples of data stored in an analytical database are sales, marketing and finance data.

API connection to DW - Companies can use an API (Application Programming Interface) to connect their own data warehouse, along with reporting and analytics tools, to another data warehouse. This allows data analysts to access a data warehouse to add other data sources and connect to their reporting or BI tools.

Automation - Data warehouse automation comprises a central repository of design patterns, which encapsulate architectural standards as well as best practices for data design, data management, data integration, and data usage. The solution addresses the goal of having a fully automated data warehouse environment and document data warehouses and data marts.

Avro - Avro is a framework developed within Apache’s Hadoop that uses JSON for defining data types and protocols, and serializes data in a compact binary format.

Azure SQL - Microsoft SQL Server has a cloud version, Azure SQL Database, which is a platform as a service (PaaS) for Microsoft Azure.

BI tool - Business Intelligence (BI) tools are technologies used by companies for data analysis. Their functions include analytics, reporting, data mining, process mining, complex event processing, business performance management, online analytical processing, benchmarking, predictive analytics, and prescriptive analytics.

Big data - Big data is the term for a large amount of structured and unstructured data that businesses have to work with on a daily basis.

Cloud-based DW - A cloud-based data warehouse (DW) is a system that collates data from a wide range of sources and is used as centralized data repository for analytical and reporting purposes. Unlike traditional on-premise DWs, cloud-based DWs use the power of the cloud to store data.

Cluster - A cluster is a group of a particular set of objects based on their characteristics.

Concurrency - Concurrency is the ability for multiple processes to access or change shared data at the same time.

Data archive - Data archiving is the process of moving data that is no longer used to a new storage device for long-term retention.

Data extraction - Data extraction is the process of retrieving data out of data sources for further processing or storage (data migration). The retrieved data is usually unstructured.

Data ingestion - Data ingestion is the process of importing data from the place where it originated into a system where it can be analyzed and stored.

Data integration - Data integration is the process of combining data from different sources into meaningful and valuable information.

Data load - Data load is the process of loading data or data sets from a source file, folder, or application into a database.

Data mining - Data mining is the process of discovering patterns, correlations, and anomalies within large data sets.

Data source - A data source is the location where the data is coming from. It can be a file, a database, or even a live data feed.

Data storage - Data storage is the process of capturing and retaining digital information on electromagnetic, optical, or silicon-based storage media.

Data transfer - Data transfer is the process of using computing to move electronic or analog data from one computer node to another.

Data warehousing - Data warehousing is a process for collecting, storing, and delivering decision-support data for some or all of an enterprise. A data warehouse is a database whose data includes a copy of operational data.

DBMS - A database management system (DBMS) is software that is used to create and manage databases.

DB2 - DB2 is a relational database management system introduced by IBM in 1983.

Disparate data system - In the context of data processing systems, a disparate data system is designed to operate as a fundamentally distinct processing system that does not exchange data or interact with other data processing systems. Most common legacy systems are disparate data systems, which are heterogeneous database systems. A disparate data system is often considered a silo because it is isolated from or incompatible with other systems.

Disparate data sets - Disparate data sets exist because different lines of business use different types of sources and cloud services to gather their data. Disparate data sets are a major pain point for data engineers and analysts, who must spend time on coding, modeling, and maintenance to bring disparate data sources together in a format that can be queried simultaneously.

DW control system - An operational or control system processes the daily transactions of a company’s data warehouse. Control systems are made to process data efficiently while still preserving all transactional data.

DW inventory management - Data warehouse inventory management uses technologies, especially automation, to keep tabs on various transactions affecting a data warehouse, including customer requests, performance measures or outcomes, historical data, and real-time data for immediate analytics.

DW staging area - A data warehouse (DW) staging area is an intermediate storage area used for data processing during the extract, transform and load (ETL) process.

DW tools - DW tools are applications that facilitate the process of data warehousing. They have various functions, and fall under these categories: Data Cleansing, Data Transformation and Load, Data Access and Analysis (Query), and Report Writing.

Elastic compute - Elastic compute is a concept in cloud computing in which computing resources can be scaled up and down easily by the cloud service provider.

ELT - Extract, Load, and Transform (ELT) is a data warehousing process that extracts data from all data sources, and immediately starts moving it into a centralized data repository. Cloud-based data warehouse and data lake infrastructure supports large storage and scalable compute. So it’s no longer necessary to keep data small and summarized. Many organizations maintain massive data pools in the cloud at low cost, leveraging ELT tools to process the data.

ETL - Extract, Transform, and Load (ETL) is a data warehousing process that uses batch processing to help business users analyze and report on data relevant to their business focus.

ETL tools - Extract, Transform, and Load (ETL) tools are software that enable organizations to make their data accessible, meaningful, and usable across disparate data systems.

iPaaS - Integration platform as a service (iPaaS) is a set of automated tools for connecting software applications that are deployed in different environments.

JDBC - Java Database Connectivity (JDBC) is an application programming interface (API) for the Java programming language that defines how a client can access a database.

JSON - JavaScript Object Notation (JSON) is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute-value pairs and array data types (or any other serializable value).

Materialization - Materialization is the process of setting up a materialized view, which is a database object that contains the results of a query. This is a form of caching the results of a query and can be described as a kind of pre-computation. Database users typically use materialized views for performance optimization reasons. Today, this optimization can be automated by technologies like Panoply’s smart data warehouse.

Metadata - Metadata summarizes basic information about data to make it easier to work with data.

Mongo - Mongo database is an open-source non-SQL distributed database that stores data in flexible fields similar to documents in JSON.

MPP data warehouse - A massively parallel processing (MPP) database is optimized to be processed in parallel, which means many operations are performed by many processing units at a time.

MQL - Model Query Language (MQL) uses a SQL-like syntax to specify the model object class or other data sources, their attributes, along with a filter expression.

MySQL - MySQL is open-source relational database management system written in C and C++.

MSSQL - MSSQL is a relational database created by Microsoft. It helps store and retrieve data from across a network.

Nested data - In the context of databases, nested data is separate but part of a larger data set. For example, a list can be nested within list, or a table can be nested within a table.

Nested query - A nested query is a query within another SQL query and embedded within the WHERE clause.

NLP - Panoply uses Natural Language Processing (NLP) to analyze data sets and make inference as to data types followed by data enrichment, enabling efficient transformation and logical joins in a relational database.

Node - A node is an instance of a database. Data changes are written to the master node. Updates are propagated to one or more slave nodes, which are used for backup or as read replicas. Applications can then be split into read and write traffic between them.

ODBC - Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS).

OODBMS - An object-oriented database management system (OODBMS) is a database management system that supports the creation and modeling of data as objects.

PostgreSQL - PostgreSQL, often simply Postgres, is an object-relational database management system.

Predictive analytics - Predictive analytics is a method for analyzing current and historical data to make predictions about future or unknown events by using statistical techniques, data mining, predictive modeling, and machine learning.

Query tool - Advanced Query Tool is a Microsoft Windows-based SQL query tool used to administer and execute queries against the most common database management systems.

Reindexing - Reindexing is an operation that parses content to generate index entries and merges of data tables in order to configure data for additional functionality to search unfiltered data.

Relational database - A relational database is a set of tables from which data can be accessed or reassembled in many different ways.

Relational data model - The relational model (RM) for database management is the way to manage data using a structure and language consistent with first-order predicate logic.

Scalability - Scalability enables vast quantities of data to be simultaneously processed, which minimizes the time required for analysts to get their insights. Scalability is specially important for enterprise-grade data warehouses.

Schema - A database schema is a logical description of an entire database.

Semi-structured data - Semi-structured data is data that has not been organized into a structured repository but has some associated information, such as tags or metadata, available to facilitate data processing.

SQL - Structured Query Language (SQL) is a domain-specific language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).

SQL Server - Microsoft SQL Server is a relational database management system that can store and retrieve data as requested by other software applications that run either on the same computer or on another computer across a network (including the Internet).

SQL Workbench - SQL Workbench is a visual database design tool that integrates SQL development, administration, database design, creation and maintenance into a single environment for the SQL database system.

Structured data - Structured data is data that has a high degree of organization. Data resides in fixed fields within records or files according to its data model.

Three-tiered architecture - Traditional data warehouse architecture employs a three-tiered structure composed of bottom, middle, and top tiers. The bottom tier contains the database server used to extract data from many different sources. The middle tier houses an online analytical processing (OLAP) server. The top tier is the client layer.

Time series data - Time series data is a set of data points indexed in time order.

Transactional database - Transactional databases are row-stores, which means that data is stored on disk as rows rather than as columns.

Transformation - Transformation is the process of converting data from one format or structure into another format or structure.

Unstructured data - Unstructured data is information, in many different forms, that is not organized according to a predefined model, and therefore does not fit into a mainstream relational database.

Vaccum - The vacuum command in Redshift reclaims space and resorts rows in either a specified table or all tables in the current database.

VBA - Visual Basic for Applications (VBA) is an implementation of Microsoft’s event-driven programming language Visual Basic 6.

Visual SQL - Visual SQL tools are used to create SQL queries by dragging and dropping elements into the user interface. In Panoply, this can be done without leaving the data warehouse.

Visualization tool - Visualization tools place data in a visual context in order to get actionable analytical insights.

XML - Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.