RFM segmentation turns two years of CPG purchase data into a 40% return on ad spend lift
Introduction
In e-commerce, acquiring a customer is only half the equation — the real margin lives in understanding which customers are worth keeping. Yet most growing CPG brands apply the same messaging to every buyer, from first-time purchasers to their most loyal advocates. Research from HubSpot and Omnisend shows that targeted and personalized emails account for 58% of total e-commerce revenue — and that email consistently delivers 4–9× the return of paid advertising when it's segment-driven rather than broadcast.
For this CPG e-commerce startup, the challenge wasn't a lack of data — it was a lack of signal. With 82,000 customers and two years of purchase history, the ingredients for a smarter marketing program existed. What was missing was a way to turn raw transaction records into actionable customer intelligence: a segmentation engine that could tell the team which customers to invest in, how much, and through which channel.
Key Challenges
The client's marketing program treated all 82,000 customers identically — regardless of purchase frequency, recency, or spend. Without segment-level visibility, budget flowed equally to one-time buyers and high-LTV loyalists alike. Paid social, the team's primary acquisition channel, was generating the lowest return on ad spend in their mix. The opportunity cost of undifferentiated marketing was real but invisible.
One-Size-Fits-All Marketing
All 82,000 customers received identical messaging regardless of purchase history or lifetime value — wasting spend on low-intent audiences and under-investing in high-value ones.
Weak Paid Social Returns
Paid social received a disproportionate share of the budget despite generating the lowest ROAS in the channel mix — with no segment-level data to justify the allocation.
No Customer Value Visibility
The CRM held transaction records but no behavioral layer. The team had no way to distinguish high-LTV loyalists from one-time buyers or window-shoppers who had never converted.
Static, Unactionable CRM
Customer data was a snapshot, not a system. Without continuous segment updates, any segmentation work would decay the moment it was completed.
Solution Components
We designed and deployed an RFM segmentation engine on Databricks that classified 82,000 customers across recency, frequency, and monetary value using K-means clustering. The model was productionized to refresh segment assignments weekly, turning a static CRM into a living behavioral layer. Insights directly informed channel allocation, email targeting, subscription strategy, and an ongoing test-and-learn framework.
RFM Segmentation Engine
K-means clustering across recency, frequency, and monetary value produced an overall score (0–9) mapped to three actionable customer segments and nine granular sub-segments.
Weekly Automated Refresh
The segmentation model deployed on Databricks refreshes customer segment assignments weekly — keeping the CRM continuously current without manual intervention.
Segment-Driven Campaigns
Insights operationalized into targeted email programs, subscription offers by product affinity, and a test-and-learn framework that systematized experimentation across all segments.
Impact
Within three months of implementation, return on ad spend increased by 40%. Budget shifted from low-performing paid social toward high-ROI targeted email, driving measurable gains in Loyalist retention and subscription adoption. The weekly-updated segmentation engine became the operational backbone of a more data-driven marketing program — one built to improve continuously as the customer base grows.
Our Process
Data Extraction & Cohort Setup
Two years of purchase history organized into customer cohorts by first purchase date, creating the longitudinal baseline for RFM analysis.
RFM Clustering
K-means clustering (k=4) applied independently to recency, frequency, and monetary value. Scores merged by customer ID and summed to produce an overall_score (0–9).
Segment Profiling & Insights
Four segments analyzed across cohort migration rates, acquisition channel ROAS, and price sensitivity — surfacing the behavioral patterns driving each group.
Productionization on Databricks
Segmentation model deployed with a weekly refresh cadence, automatically appending RFM scores and segment labels to all 82,000 customers in the CRM.