Data warehouses and their tools are moving from the data center to a cloud-based data warehouse. Many large organizations still operate large data warehouses on-premise—but clearly the future of the data warehouse is in the cloud.
New tools like Amazon Redshift and Google BigQuery provide powerful functionality, improved query performance compared to traditional data warehouses, and limitless scalability—with no setup costs and faster time to market. A large ecosystem of cloud databases and tools can help you get started.
These 12 essential data warehouse tools can help you build enterprise data solutions and derive value from your data—easily and inexpensively in the cloud:
Cloud-native data warehouses — three options for moving your mission-critical data warehouse to the cloud.
Cloud-based ETL tools — lightweight services that help upload data or pull it directly from cloud sources, transform it, and pipe in into a data warehouse, without heavyweight planning or infrastructure.
Cloud-based BI tools — platforms that let you connect to data warehouses, create visualizations and dashboards and share them with collaborators.
Cloud-based data integration tools — services that help you connect to just about any application or data source, define triggers and when a specific event happens, grab the data to feed your data warehouse.
Cloud-native data warehouses
Amazon Redshift, one of the most popular cloud services from Amazon Web Services, is a fully-managed, analytical data warehouse that can handle petabyte-scale data, and enable analysts to query it in seconds. With no upfront costs, Redshift offers limitless scalability on Amazon’s architecture. By adding nodes to the Redshift cluster, or adding more clusters, you can support higher data volumes or high concurrency. Redshift has several alternatives but it remains the incumbent in the cloud data warehouse market.
Pricing: 2 months free trial, pricing starts from $0.25 per hour for one Redshift instance, up to $1,000 per terabyte per year for larger deployments
Google BigQuery is another enterprise-grade cloud-native data warehouse. Like Redshift, it can run blazing-fast queries on datasets of petabyte-scale. Unlike Redshift, it is serverless, without cloud instances to manage. BigQuery also abstracts away clustering, which happens behind the scenes. A newer contender, BigQuery added many features to achieve parity with Redshift—real-time analytics, flexible data ingestion, data governance, encryption, security and more.
Pricing: First 10 GB free every month, beyond that flat pricing of $5 / TB of data queried / month + $0.02 / GB / month for storage
We may be biased, but we believe Panoply’s automated data warehouse is changing the cloud data warehouse game. Panoply provides a cloud-based data warehouse with scalability, high availability and fast querying built in, just like Redshift and BigQuery. It also provides end-to-end data management, automating all data preparation tasks. Panoply’s self-optimizing architecture uses machine learning and natural language processing (NLP) to model and streamline data from source to analysis—from data to insights in literally minutes.
Cloud-based ETL tools
Stitch, a lightweight ETL (Extract, Transform, Load) tool, pulls together multiple data sources , transforms or cleans the data, and lets you configure the data pipeline with its UI. Pipe data from pre-built integrations with data sources—from cloud services like MixPanel, Segment and Intercom, to databases such as MySQL, MongoDB and MariaDB—and send it to any cloud data warehouse.
Pricing: Free up to 5 million rows / month, paid plans start from $100 per month
Blendo is a data warehouse tool that allows you to easily connect data sources to a data warehouse. Blendo loads live and historical data from cloud services you connect—on-demand or with an automated load schedule. It optimizes your data scheme, and provides a UI to see stats and data loading issues. Blendo pulls data from sources like AdWords, Mailchimp, Salesforce and Magento, to destinations including Redshift, PosgreSQL, MS SQL Server and Panoply’s data warehouse.
Pricing: Free 14 day trial, pricing starts from $100 / month for 10 million rows
Fivetran loads multiple data sources into a central data repository, giving you ownership of your data and control over analytics and archiving. The platform offers numerous data connectors for systems like Google BigQuery, MySQL, PostgreSQL, Amazon Redshift, Snowflake, and SQL Server. Fivetran can transform and also normalize data as it loads into your data warehouse.
Pricing: Enterprise pricing on request
Cloud-based BI tools
7. Tableau Online
While many alternatives exist, Tableau is known for advanced analytics and beautiful dashboards. Its Tableau Online edition provides the same capabilities in the cloud. It connects to big data sources, lets you publish interactive dashboards and share discoveries with your organization. Tableau allows data scientists, analysts and business teams to slice and dice data and create insightful visualizations.
Pricing: $42 / user / month
Qlik Sense connects to data sources and lets you discover insights beyond formal SQL queries. You can freely search and explore data, pivoting the analysis to investigate hypotheses. Its associative engine indexes all possible relationships in the data, letting you slice and dice without being restricted to a partial view of data. Qlik also offers robust visualization and collaboration features.
Pricing: Free for up to 5 collaborators. Pricing for cloud edition starts from $15 / user / month.
Chartio allows you to explore data and build SQL queries—using an interactive query builder or SQL mode. Chartio can transform data with a mini-ETL engine—preview the data pipeline and run transformation queries. It helps users turn organizational data into charts and visualizations, and set up auto-refreshing live dashboards.
Pricing: Starting at $249 / month with up to 6 users
Looker, a cloud-based BI platform, queries and analyzes large data sets via SQL. Analysts define metrics using LookerML, a simple data modeling language. Looker connects to a database or data warehouse directly without the need to extract data, and auto-generates a data model from your schema. Uniquely, it works on fresh data direct from the source—not partial or temporal extracts.
Pricing: Custom pricing on request
Cloud-based data integration tools
Zapier has pre-built integrations between hundreds of systems and applications, so you don’t need to build the integrations yourself. It defines ‘triggers’—actions that happen in one application, and ‘actions’—things it can do for you in that application or others. For example, Zapier can detect a new email received in a Gmail account, or a new card on a Trello board, and save that data to a database (it supports MySQL, PostgreSQL and DynamoDB). This creates interesting possibilities for pulling non-traditional data sources into a data warehouse.
Pricing: Free forever for simple integrations, paid plans start from $20 / month for integrations with 3+ steps and access to more applications
IFTTT stands for “IF This, Then That”. Similar to Zapier, it creates workflows between hundreds of pre-integrated applications and cloud services. IFTTT is easier to setup and use than Zapier, but has more limited functionality. Unlike Zapier, IFTTT does not integrate with database platforms and is limited to interactions between applications.
A data warehouse is not a million dollar project anymore
These 12 data warehouse tools help data engineers, IT teams and even data analysts setup powerful data infrastructure in the cloud. Many similar tools are available in the cloud which are inexpensive, easy to use and let you setup a data pipeline in days, or even hours.
Data warehouses used to be huge enterprise projects with million dollar budgets. This is still true in large organizations, though they too want to unlock value of the cloud. But small-to-medium enterprises can now set up a world-class data warehouse, quickly and with a smaller investment than before.