Building an Analytics Stack: A Guide
The Wealth of Information and the Weight of Maintenance
We live in an age of data. Across all industries and sectors, business are gaining more and more access to a wealth of information that holds the potential to spark game-changing ideas and illuminate new solutions to old problems. The opportunity is truly limitless. But with that promise comes the very real problem of needing a robust and reliable infrastructure that can make that data available quickly and easily. Think of it this way: if data is an essential resource, like water, then a data infrastructure is the series of pipes that brings it to your faucet. And just like a building needs good plumbing, every business that wants to tap into this wealth of information must grapple with the weight of maintaining the systems that make it available. That’s where an analytics stack comes in.
At its most basic level, an analytics stack is the link between raw data and business intelligence. An analytics stack is an integrated system of applications that collect, combine, analyze, and realize the value of data. This infrastructure lives within a broader business system that encompasses operations, human capital, and even organizational culture. Data-driven businesses place as much importance on having a dependable analytics infrastructure as they do on having the data itself—and they continuously refine their infrastructure to support their analytics efforts and advance their competitive edge.
Having an analytics stack has become an imperative for modern businesses. As the most successful companies continue to set new standards for efficiency and growth, their competitors, no matter their size, must embrace analytics if they want to compete. Luckily, the components of an analytics stack are becoming simpler to set up, easier to manage, and cheaper to scale.
This guide explains how the analytics stack has become the engine of a data-driven organization and how building both an analytics stack and a data-driven company culture go hand in hand.
What is a “a stack?”
Not to be confused with the programming data structure called a “stack,” the term “analytics stack” comes from a the concept of a technology stack. For example, a web applications “stack” can be used to describe a collection of technologies like the LAMP stack (Linux, Apache, MySQL, PHP) or the modern MERN stack (MongoDB, Express.js, React, Node).
As software solutions have moved to the cloud, it has become possible to compose several applications into a software or solution stack. Today, entire business functions are refining their processes as they refine their software stacks—which include everything from marketing stacks to customer service stacks to, of course, analytics stacks.
The ability to integrate several applications together provides a benefit known as “composability.” In programming, this refers to “the ability to assemble complex behaviour by aggregating simpler behaviour.” By dealing with the simpler pieces, you get two important benefits: customizability and interchangeability. In other words, you can customize your solution to your own unique needs, taking into account your available time, resources, and budget. And when you need new functionality, you can replace any layer of the stack without replacing the stack entirely. Rather than undergoing a quarter-long sales and engineering process, you only have to connect the new piece into the existing stack.
What is an Analytics Stack?
Apply this idea of the “stack” to the weight of data infrastructure maintenance we described above, and it becomes clear that having a well-functioning analytics stack is crucial for any company that wants to cultivate data and extract insights from it. However, as soon as a company’s executives recognize the importance of their data and begin to ask questions about it, they will likely discover that their organization lacks the infrastructure to access their data. Moreover, their teams may be missing the technical know-how needed to even get to the data, or they may not have the ability to analyze this information and affect change with it. Every layer of the analytics stack represents a problem that needs to be solved and a skill set that is required to solve it.
When to Build an Analytics Stack
When companies prepare to build a data stack, they usually fall into one of two groups: The first are small organizations who don’t have anything and are starting with a clean slate. The second group includes organizations that have a poorly architected or failing system. Both situations justify a approach.
Usually, organizations have done their strategic planning and know how data can help them, but they realize that the systems they’re using—which are usually just the systems they use to run their business—are not powerful enough or don’t provide the level of detail for their analytics practice.
Building a Analytics Stack
Let’s start with the first group: companies starting from zero. If you fall into this category, you might already have some types of reports available. For example, a retailer may have POS information that displays sales trends by brand, product, store, or segment. However, that data is likely delivered in discrete batches, perhaps once every 24 hours or longer. This doesn’t allow you to be nimble throughout the day or modify your sales strategy in real time.
Adding in real-time data can be especially powerful for companies that set daily sales targets. For example, if a retail store manager is able to gain instant insights into what’s selling and what’s not, the manager might redirect a few of the salespeople on the floor to a different area or even move product around the floor. A data stack can fill that information gap, providing near real-time data that can make an immediate impact on the company’s ultimate sales goals.
For other companies looking to build a data stack for the first time, their problems might be latency or maybe bringing multiple data sets together in one place to see the whole 360-degree view of the business. These are just a few of the reasons why companies choose to build a data stack.
Rebuilding a Analytics Stack
On the other hand, companies that need to rebuild their data stack often already understand the value of an analytics environment to direct their corporate strategy, product strategy, or marketing strategy, but data management is not their strong point.
The biggest challenge is that they don’t have the skills in-house. As a result, these companies might try to leverage other internal technical resources to write data scripts. For example, a manager might assign a data stack project to software engineers who are building the product, or I.T. folks who can write code. But because these people are working outside their primary roles, the end result is usually far from ideal. This isn’t through any fault of their own, but rather because they simply don’t have the expertise or experience. Unfortunately, they don’t understand how hard it can be to have data jobs that run efficiently without failure—or how to recover elegantly from failure so that engineers don’t have to be up late at night trying to get reports and analytics available the next morning.
Data is messy. It’s structured in very different ways. Understanding what keys are used to join different data sets together is still difficult. These challenges don’t go away even when you throw modern technology at them. And people don’t recognize how challenging it can be. As a result, it ends up becoming a second job for somebody on nights and weekends.
The good news is that while the fundamental root difficulties of dealing with data haven’t gone away, data stack technology, data warehouses, and BI and analytics vendors have improved by leaps and bounds over the past decade.
When Is It Time to Build an Analytics Stack?
When an organization recognizes that it’s time to invest in a more comprehensive data stack, it’s clear to everyone—to the consumers of the data and especially to the executive team, who are frustrated because they can’t access the data in a timely manner. They need daily reports and instead it takes several days because the data wasn’t properly loaded into their data warehouse, which can cause operational problems.
More and more companies are recognizing that vendors can offer far better solutions than their own teams can create in-house, for the reasons explained above. Next, we’ll explore how things can go wrong if a data stack isn’t built correctly to begin with and why investing in an improved process can make a big difference.
What Are the Fail Points of an Analytics Stack?
There are many different ways that a sub-optimal data stack can fail. Sometimes the schema of the source has changed—a new column has been added or something has changed in the structure of the source data in such a way that the downstream systems are not handling it correctly.
A schema change is easy to understand. If new columns are added somewhere along the way and you haven’t used the right tools, that can break your Extract, Transform, Load (ETL) process. Or, in the best case, that column doesn’t make it to your end user’s hands. Better ETL tools will handle that, and if you architect your stack right, that will happen automatically.
Other times, the issue is that the volume of data has increased. That can stress the system to the point where data jobs will fail or run out of memory because they don’t have enough processing power. As companies scale, their data stacks don’t always scale with them.
Should You Spend Money on a Patch or Fix the Process?
At the end of the day, it’s about working smarter, not harder. While you could spend money on quick fixes that will fill the gap for a while, the better answer is to invest in a comprehensive solution, a reliable architecture, and a completely different set of tools.
Take this common scenario where organizations do not use ETL tools to build their data pipelines and instead, build them in house. An engineer is tasked with writing custom Python scripts to extract and transform a large amount of data from external systems so that the company’s analysts can perform complex analytics on the data. These custom scripts, which often have to be run manually, might take all day to run because they did not receive the attention that that deserved from the engineer who was focused on other tasks.
This custom script often fails because it takes so long. As a result, it holds up the team. Ideally, an analyst would have access to this data daily, but the engineer was so busy with other things that it might run once a week or even less often.
Once organizations realize they need a better solution, this is where the expertise of an outside vendor becomes really valuable. ETL tools can automate the process to run daily so the process runs smoothly. As a result, everyone gets more sleep and the engineer is back to focusing on their primary job.