Transformations

Transformations

Panoply’s data sources and many online ingestion systems have changed the ingestion process from the classic ETL to an agile ELT process. You can read more about the differences between these two approaches.

Part of Panoply’s vision is to have the data ready for use in its most raw form. This gives the user flexibility when asking questions without relying on pre-ingestion processes. All the aggregation and calculation processes can and should occur post-ingestion. There are multiple ways to manage such post-ingestion processes. In Panoply, the main way to achieve them is by using views.

Views allow you to save DQL (Data Query Language) queries and give them aliases. These queries will return the result set directly to the SQL client or BI tool. With this solution, you can change business logic and questions on the fly without relying on rerunning procedures for historical data.

Views versus pre-processing

In data analytics, agility and flexibility are very important for ongoing and ad hoc analysis. You want to have the ability to ask new questions all the time without being limited in what data was saved and how it was aggregated.

Transformation views have several benefits over pre-processing all of the data:

  • It’s available immediately. As soon as new data comes in, you can query it. There’s no delay due to processing thousands of lines of ETL code.
  • It’s agile. When you modify your views, the changes take place immediately, for all future and past data, with no need to re-run the process over and over again. You can iterate quickly and get the results you need in seconds.
  • It’s fast. Because ad-hoc queries can benefit from filtering and indexing that’s not available for pre-processing all of the data ahead of time, your views can immediately access the raw data you’re searching for.
  • It’s standard, portable SQL. Unlike many ETL tools, you’re not vendor-locked into a specific technology that would make it difficult to port or use outside of Panoply.

How to use views

We’ve established the fact that ELT is the way to go. Currently, in the Panoply platform, we allow you to easily create views. Yes, you will need to know SQL to create them, but we assume that if you plan to ask complex questions you know how to ask them.

In Panoply’s workbench, you can write, run, and save queries.

To get started with views:

  1. Click on the Workbench tab.
  2. Type your SQL query. We support standard ANSI SQL. Currently, all the functions that are supported by AWS Redshift are supported by Panoply as well.
  3. Click Run to execute your query.
  4. View the results directly in the Panoply workbench. The Panoply workbench display is limited to 100 records. The saved query will not have this limit when querying it with your BI tool.
  5. Use the workbench menu to save or export your results.

The workbench menu includes these options:

  • Close - Collapse the list of schemas, tables, and fields to make more room for the query and results areas.
  • New - Start a new query.
  • Open - Open a previously saved view. You can open views that were saved by external workbenches as well.
  • Save/Save as - Save the query as a view.
  • Export - Export the result set. The export is not limited to the 100 records visible in the Panoply workbench, but is limited to 1,000,000 records.

When you save a query as a view you can name the view and place it in a schema. The schema should exist beforehand.

After you have saved the view you can see it in the Tables screen. You can also query it directly in Panoply and in any external tool that you’ve connected to your Panoply account.

View dependencies

The views that you save can rely on tables that you’ve ingested into Panoply (using Panoply data sources or external ingestion tools) and they can also rely on other views that you’ve created. By saving a view that relies on another view you will basically create a chain of views (chain of dependencies). Usually, in these cases, the user will query the last view in that chain (or tree if it’s a more complex configuration) but it’s definitely not a must and you can query each view separately as well.

When working with the classic ETL processes, you create the complete data flow from extraction to how the data will be saved. This entire process is usually saved in the same ingestion tool and is relatively easy to view, examine, and understand. But when you’re working with views, you can easily lose track of the view’s flow and what is in charge of what and which view depends on which table/view.

To overcome this issue, you can identify the dependencies per view and find the actual dependency tree for your specific setup. To see the dependencies, you can use the following query, but change the view_name_or_table_name and schema_name to match the specific view you’re looking for:

SELECT DISTINCT c_p.oid AS tbloid,
 n_p.nspname AS dependee_schema,
 c_p.relname AS dependee,
 n_c.nspname AS dependent_schema,
 c_c.relname AS dependent,
 c_c.oid AS viewoid
FROM pg_class c_p
 JOIN pg_depend d_p ON c_p.relfilenode = d_p.refobjid
 JOIN pg_depend d_c ON d_p.objid = d_c.objid
 JOIN pg_class c_c ON d_c.refobjid = c_c.relfilenode
 LEFT JOIN pg_namespace n_p ON c_p.relnamespace = n_p.oid
 LEFT JOIN pg_namespace n_c ON c_c.relnamespace = n_c.oid
WHERE d_p.deptype = 'i'
 AND dependee_schema + dependee <> dependent_schema + dependent
 AND dependee = 'view_name_or_table_name'
 AND dependee_schema = 'schema_name';
Getting started is easy! Get all your data in one place in minutes.
Try Free