How to Create RFM Segments for Shopify
When you’re in the business of selling a product or service, any customer is a good customer, right?
Not exactly.
The reality is that while all your customers absolutely deserve tender loving care, they’re not all equally valuable.
Thankfully, RFM analysis enables you to use a simple rubric to discover who your best customers really are and to make them even more of an asset. In this post, we’ll explain what RFM analysis is, show you how to create RFM cells, and explain how that translates into RFM customer segmentation.
This tutorial features SQL queries written in Redshift SQL syntax. If you’re on a Google BigQuery data warehouse, work with one of our data architects to adjust these queries to work in standard SQL syntax.
What is RFM analysis?
An RFM report is a way of segmenting customers using three key metrics: recency (how long ago their last purchase was), frequency (how often they purchase), and monetary value (how much they spent).
RFM analysis is based on the assumption that customers who purchased from you recently are more likely to purchase again. That principle holds true across a variety of industries, as does evidence that frequent buyers and big spenders are more likely to respond to promotions than customers who buy less often or who spend less.
Created in the ‘90s as a way to make direct mail campaigns more effective, RFM leverages the Pareto principle to answer critical questions like:
- Who are my best or most loyal customers?
- Which customers am I in danger of losing?
- Which customers should I focus my marketing efforts on?
- Which customers can I entice to spend more?
This kind of customer analysis can make a huge impact on your business. Whether you’re looking to improve your marketing ROI, build loyalty, reduce churn, or increase CLV, RFM analysis is a step in the right direction.
How to Create an RFM Report for Shopify Data
Because it focuses on just three metrics, RFM reporting is deceptively simple. For example, when doing RFM analysis for Shopify, all you really need is three data points:
- Recency - last order date
- Frequency - count of total orders
- Monetary value - total spend
Together, these metrics reveal who your repeat purchasers are, which customers are the most loyal over time, and which customers spend the most in your shop. An RFM report provides critical information about your customers and while RFM is strongly associated with B2C and retail, it’s an equally valid way for B2B, services, or SaaS businesses to better understand their customers.
To get that information, just sync your Shopify data to Panoply and enter the following query into the workbench:
/* RFM VALUES */
SELECT o.email, -- CUSTOMER
max(o.created_at) AS recent_order, -- RECENCY
count(o.id) AS total_orders, -- FREQUENCY
sum(o.total_line_items_price) AS gross_sales, -- MONETARY (AS GROSS SALES)
sum(o.total_line_items_price)
+ sum(-nvl(o.total_discounts,0))
+ sum(-nvl(r.refunds,0))
+ (sum(nvl(o.total_tax,0)) - sum(nvl(r.tax_returned,0)))
+ sum(nvl(sl.shipping_charge,0)) AS total_sales -- MONETARY (AS TOTAL SALES)
FROM shopify_orders o
LEFT JOIN (
/* REFUNDS TABLE */
SELECT r.shopify_orders_id,
count(r.id) number_of_items_returned,
sum(rli.subtotal) AS refunds,
sum(rli.total_tax) tax_returned
FROM shopify_orders_refunds r
LEFT JOIN shopify_orders_refunds_refund_line_items rli ON r.id = rli.shopify_orders_refunds_id
GROUP BY 1
) r ON o.id = r.shopify_orders_id
LEFT JOIN (
/* SHIPPING LINES TABLE */
SELECT sl.shopify_orders_id,
sum(sl.price) shipping_charge,
sum(sl.discounted_price) shipping_discount
FROM shopify_orders_shipping_lines sl
GROUP BY 1
) sl ON o.id = sl.shopify_orders_id
GROUP BY 1
ORDER BY 1;
The query above provides the values you’ll need to carry out an RFM analysis. The resulting data will look like this:
# | recent_order | total_orders | gross_sales | total_sales | |
---|---|---|---|---|---|
1 | sarah@test.com | 2020-01-22T19:57:59.000Z | 2 | 3030 | 2524.4 |
2 | eli@user.com | 2020-09-01T19:57:59.000Z | 1 | 1250 | 1293.35 |
The table above tells us that while Sarah has greater frequency (2 orders) and monetary value ($3030 in gross sales; $2524.40 net sales), Eli may actually be a better target for an end-of-year promotion because he’s purchased more recently.
Ready to try running an RFM analysis yourself? Get started now with a Free 21-Day Proof of Value of Panoply!
How to go from raw data to actionable RFM analysis
Once you have a table with data on each of your customers’ recency, frequency, and monetary value, it’s time to start segmenting. While you can slice the data as finely as you’d like, using quintiles mirrors the Pareto principle by indicating which of your customers are in the top 20% for all three metrics.
To create the quintiles, we’ll use an ntile
window function that requires just a few lines of additional SQL at the top of your raw RFM Values query. We’ll continue to use Shopify as an example, but the same query can be used for any business just a few adjustments to the table and column names.
/* RFM QUINTILES*/
SELECT email,
ntile(5) OVER ( ORDER BY recent_order ) AS rfm_recency,
ntile(5) OVER ( ORDER BY total_orders ) AS rfm_frequency,
ntile(5) OVER ( ORDER BY total_sales ) AS rfm_monetary
FROM (
/* RFM VALUES */
SELECT o.email, -- CUSTOMER
max(o.created_at) AS recent_order, -- RECENCY
count(o.id) AS total_orders, -- FREQUENCY
sum(o.total_line_items_price) AS gross_sales, -- MONETARY (AS GROSS SALES)
sum(o.total_line_items_price)
+ sum(-nvl(o.total_discounts,0))
+ sum(-nvl(r.refunds,0))
+ (sum(nvl(o.total_tax,0)) - sum(nvl(r.tax_returned,0)))
+ sum(nvl(sl.shipping_charge,0)) AS total_sales -- MONETARY (AS TOTAL SALES)
FROM shopify_orders o
LEFT JOIN (
/* REFUNDS TABLE */
SELECT r.shopify_orders_id,
count(r.id) number_of_items_returned,
sum(rli.subtotal) AS refunds,
sum(rli.total_tax) tax_returned
FROM shopify_orders_refunds r
LEFT JOIN shopify_orders_refunds_refund_line_items rli ON r.id = rli.shopify_orders_refunds_id
GROUP BY 1
) r ON o.id = r.shopify_orders_id
LEFT JOIN (
/* SHIPPING LINES TABLE */
SELECT sl.shopify_orders_id,
sum(sl.price) shipping_charge,
sum(sl.discounted_price) shipping_discount
FROM shopify_orders_shipping_lines sl
GROUP BY 1
) sl ON o.id = sl.shopify_orders_id
GROUP BY 1
)
ORDER BY 2 DESC
This query returns a table with customer emails alongside the quintile that each customer falls into for recency, frequency, and monetary value. Looking at recency, a 5 means the customer’s purchase was within the top 20% most recent, while a 1 means the customer is in the bottom 20%.
rfm_recency | rfm_frequency | rfm_monetary | |
---|---|---|---|
David@msn.com | 5 | 4 | 3 |
Amelia@about.me | 5 | 2 | 5 |
Matthew@google.com | 5 | 2 | 4 |
Preethi@go.com | 5 | 1 | 5 |
Hanna@google.com | 5 | 1 | 4 |
Racquel@earthlink.net | 4 | 5 | 3 |
Jordan@netscape.com | 4 | 5 | 2 |
Juan@example.com | 3 | 5 | 5 |
Nadim@apple.com | 3 | 2 | 5 |
Alys@geocities.jp | 2 | 5 | 5 |
In the table above, we can see that while Hanna is in the top 20% of customers for recency, she’s in the bottom 20% for purchase frequency. Meanwhile, Jordan hasn’t purchased as recently, but he’s bought multiple small orders from you.
Creating RFM cells
Now that you know where your customers fall generally, let’s make the quintile data easier to work with. The most common approach is to concatenate the quintile rank for each metric to create a 3-digit number also known as a “cell.”
When concatenating the data this way, your best customers would have a cell value of 555 because they’re in the top 20% for all three metrics, while a cell value of 111 would be given to customers in the bottom 20% for all three.
/* RFM CELL */
SELECT
email,
rfm_recency,
rfm_frequency,
rfm_monetary,
rfm_recency || rfm_frequency || rfm_monetary AS rfm_cell
FROM
(
SELECT email,
ntile(5) OVER ( ORDER BY recent_order ) AS rfm_recency,
ntile(5) OVER ( ORDER BY total_orders ) AS rfm_frequency,
ntile(5) OVER ( ORDER BY total_sales ) AS rfm_monetary
FROM (
/* RFM VALUES */
SELECT o.email, -- CUSTOMER
max(o.created_at) AS recent_order, -- RECENCY
count(o.id) AS total_orders, -- FREQUENCY
sum(o.total_line_items_price) AS gross_sales, -- MONETARY (AS GROSS SALES)
sum(o.total_line_items_price)
+ sum(-nvl(o.total_discounts,0))
+ sum(-nvl(r.refunds,0))
+ (sum(nvl(o.total_tax,0)) - sum(nvl(r.tax_returned,0)))
+ sum(nvl(sl.shipping_charge,0)) AS total_sales -- MONETARY (AS TOTAL SALES)
FROM shopify_orders o
LEFT JOIN (
/* REFUNDS TABLE */
SELECT r.shopify_orders_id,
count(r.id) number_of_items_returned,
sum(rli.subtotal) AS refunds,
sum(rli.total_tax) tax_returned
FROM shopify_orders_refunds r
LEFT JOIN shopify_orders_refunds_refund_line_items rli ON r.id = rli.shopify_orders_refunds_id
GROUP BY 1
) r ON o.id = r.shopify_orders_id
LEFT JOIN (
/* SHIPPING LINES TABLE */
SELECT sl.shopify_orders_id,
sum(sl.price) shipping_charge,
sum(sl.discounted_price) shipping_discount
FROM shopify_orders_shipping_lines sl
GROUP BY 1
) sl ON o.id = sl.shopify_orders_id
GROUP BY 1
)
)
ORDER BY 5 DESC
rfm_recency | rfm_frequency | rfm_monetary | rfm_cell | |
---|---|---|---|---|
David@msn.com | 5 | 4 | 3 | 543 |
Amelia@about.me | 5 | 2 | 5 | 525 |
Matthew@google.com | 5 | 2 | 4 | 524 |
Preethi@go.com | 5 | 1 | 5 | 515 |
Hanna@google.com | 5 | 1 | 4 | 514 |
Racquel@earthlink.net | 4 | 5 | 3 | 453 |
Jordan@netscape.com | 4 | 5 | 2 | 452 |
Juan@example.com | 3 | 5 | 5 | 355 |
Nadim@apple.com | 3 | 2 | 5 | 325 |
Alys@geocities.jp | 2 | 5 | 5 | 255 |
While this approach is popular, reading the data requires some mental gymnastics. It’s natural to look at the rfm_cell
column and treat it as a 3-digit number. Looking at the data that way would mean customers in “higher” RFM cells are the more valuable ones.
But because this way of segmenting by RFM is just a concatenation of three numbers, that may not be the case. Looking at the example above, Alys might seem like one of your best customers because she’s in the top 20% in two categories: frequency and monetary value. However, she hasn’t bought from you in a while so Racquel could actually be the better target for a marketing campaign.
Creating actionable RFM segmentation
Now that you have clear RFM cells, you need to create meaningful groupings that can help turn the data into actionable insights. With three metrics and five tiers for each, you can create up to 125 (5x5x5) customer segments. If the idea of 125 segments is giving you a headache, consider using quartiles (which would give you 4x4x4 = 64 segments) or even thirds (3x3x3 = 27 segments) to simplify things.
How you draw the lines between segments depends on the nature of your business. If you’re a hardcore Pareto fan who wants to reward your best purchasers, you may want to focus on customers with an RFM cell of 555. However, if you want to open things up a bit to include recent and frequent customers who may have spent a little less, you could create a segment that includes everyone with an RFM cell value of 555 and 554.
The key when creating customer segments is to make the buckets large enough to be actionable—Marketing will lose their minds if they think they need to create 125 separate customer campaigns—but small enough that you can address them as a group. As an example, we’ll outline just five potential segments, but if you’re interested in getting more specific, check out this resource for an even more robust version.
The key when creating customer segments is to make the buckets large enough to be actionable but small enough that you can address them as a group.
Here’s an example of how you might segment RFM cell data:
Cannot lose | Active fans | Promising newbies | Potential churners | |
---|---|---|---|---|
RFM cell values | 355, 255 | 543, 542, 453, 452 | 525, 524, 515, 514 | 335, 334, 325, 324 |
Conditions for inclusion | Low R High F&M |
High R&F Low M |
High R&M Low F (so far) |
Mid R Low F High M |
Description | Big spenders who haven’t purchased lately | Customers who buy often & recently, but at low price points | New customers with large orders | High spending customers who haven’t purchased often or lately |
Once you’ve identified the segments you want to address and the conditions for inclusion—aka, the RFM cell values associated with each—you can create a SQL query that puts customers into the right bucket. To do that, just use CASE
statements to return the segment name based on the RFM cell value:
/* RFM SEGMENTS */
SELECT
email,
rfm_recency,
rfm_frequency,
rfm_monetary,
rfm_recency || rfm_frequency || rfm_monetary AS rfm_cell,
CASE WHEN rfm_cell IN ('355','255') THEN 'Cannot lose'
WHEN rfm_cell IN ('543','542','453','452') THEN 'Active fans'
WHEN rfm_cell IN ('525','524','515','514') THEN 'Promising newbies'
WHEN rfm_cell IN ('335','334','325','324') THEN 'Potential churners'
ELSE 'Other' END AS rfm_segment
FROM
(
SELECT email,
ntile(5) OVER ( ORDER BY recent_order ) AS rfm_recency,
ntile(5) OVER ( ORDER BY total_orders ) AS rfm_frequency,
ntile(5) OVER ( ORDER BY total_sales ) AS rfm_monetary
FROM (
/* RFM VALUES */
SELECT o.email, -- CUSTOMER
max(o.created_at) AS recent_order, -- RECENCY
count(o.id) AS total_orders, -- FREQUENCY
sum(o.total_line_items_price) AS gross_sales, -- MONETARY (AS GROSS SALES)
sum(o.total_line_items_price)
+ sum(-nvl(o.total_discounts,0))
+ sum(-nvl(r.refunds,0))
+ (sum(nvl(o.total_tax,0)) - sum(nvl(r.tax_returned,0)))
+ sum(nvl(sl.shipping_charge,0)) AS total_sales -- MONETARY (AS TOTAL SALES)
FROM shopify_orders o
LEFT JOIN (
/* REFUNDS TABLE */
SELECT r.shopify_orders_id,
count(r.id) number_of_items_returned,
sum(rli.subtotal) AS refunds,
sum(rli.total_tax) tax_returned
FROM shopify_orders_refunds r
LEFT JOIN shopify_orders_refunds_refund_line_items rli ON r.id = rli.shopify_orders_refunds_id
GROUP BY 1
) r ON o.id = r.shopify_orders_id
LEFT JOIN (
/* SHIPPING LINES TABLE */
SELECT sl.shopify_orders_id,
sum(sl.price) shipping_charge,
sum(sl.discounted_price) shipping_discount
FROM shopify_orders_shipping_lines sl
GROUP BY 1
) sl ON o.id = sl.shopify_orders_id
GROUP BY 1
ORDER BY 1
)
)
ORDER BY 6 DESC
rfm_recency | rfm_frequency | rfm_monetary | rfm_cell | rfm_segment | |
---|---|---|---|---|---|
Hanna@google.com | 5 | 1 | 4 | 514 | Promising newbie |
Preethi@go.com | 5 | 1 | 5 | 515 | Promising newbie |
Amelia@about.me | 5 | 2 | 5 | 525 | Promising newbie |
Matthew@google.com | 5 | 2 | 4 | 524 | Promising newbie |
Nadim@apple.com | 3 | 2 | 5 | 325 | Potential churner |
Alys@geocities.jp | 2 | 5 | 5 | 255 | Cannot lose |
Juan@example.com | 3 | 5 | 5 | 355 | Cannot lose |
Racquel@earthlink.net | 4 | 5 | 3 | 453 | Active fan |
Jordan@netscape.com | 4 | 5 | 2 | 452 | Active fan |
David@msn.com | 5 | 4 | 3 | 543 | Active fan |
This query returns a table with each customer and their associated rfm_segment
. In the table above, there’s a lot of good news: You have a mix of active fans and high value customers, and you’re attracting plenty of promising newbies. However, you do have a churn risk (Nadim) that may need attention.
What you do next depends on how Marketing prefers to work with the data: You can upload the tags straight into Shopify or marketing automation platform so they can easily create email lists with personalized offers at scale. For example, those “can’t lose” customers might get an invitation to become brand ambassadors, while those “promising newbies” get follow-up emails with recommended products that complement their previous purchases.
What comes next: Taking RFM to the next level
You may not be launching winback email campaigns, but that doesn’t mean your work here is done. Once you have a solid RFM framework in place, you can make it even more valuable by working with stakeholders to create a dashboard that would make it easy for them to view customers by segment on demand.
You could also consider layering in additional metrics. For example, you could add relationship length to the mix to find LRFM, which adds in the delicious complexity of identifying customers with long term loyalty. Alternatively, you could layer in data around acquisition costs. While those 555s might be unicorns, if they were acquired through expensive ad campaigns, they’ll have to stick around for a while to really be worth it.
Alternatively, you could move away from simplistic cells and toward more sophisticated RFM clustering using a K-means model. With clustering, you’d input values for recency, frequency, and monetary value and the algorithm would identify the optimum number of segments as well as which customers fall in each grouping. It’s a more complex—and, arguably, time-intensive—approach, but it’s also far more sensitive than crudely assigning RFM cells to segments.
And, of course, you should be working to measure the success of the campaigns launched for each segment. That could be as simple as looking at something like engagement, purchase rate, or average order value, but you might also consider measuring RFM migration, which is how customers move from one cell to another over time.
No matter how you slice—er, segment it, using data to understand your customers is always a good idea. By running an RFM analysis, you can easily identify top customers as well as purchasers who could become even more valuable…with the right segmentation, of course.