Big Data Architectures: Lambda vs. Kappa
Big Data Architectures: Lambda vs. Kappa
This document provides a side-by-side comparison of the Lambda and Kappa Big Data architectures, followed by practical use case examples for each, and a detailed explanation of the A/B Test Conversion Rate metric often used in conjunction with these systems.
1. Lambda Architecture (Three Layers)
The Lambda Architecture is designed to handle **large, historical data** while also providing **real-time insights**. It achieves high data accuracy by combining the results of two different paths.
| Layer | Purpose | Key Function |
|---|---|---|
| Batch Layer | Accuracy. Stores the master dataset and pre-computes historical, highly accurate views. | Processes all data history to correct errors and ensure a canonical view. |
| Speed Layer | Latency. Processes new data in real-time to provide immediate, approximate results. | Provides a quick view of the most recent data, compensating for the Batch Layer's delay. |
| Serving Layer | Access. Indexes and merges the data from both the Batch and Speed layers. | Provides low-latency queries to the final user reports and applications. |
Example Use Case: E-commerce Analytics
- Batch Layer: Runs overnight to calculate the Lifetime Customer Value (LTV) for every customer based on years of purchase history. (High accuracy, high latency).
- Speed Layer: Processes live clicks and items added to carts to calculate immediate metrics like "current browsing users." (Low accuracy, low latency).
- Serving Layer: The analyst's dashboard shows the accurate LTV (from Batch) combined with real-time traffic numbers (from Speed).
2. Kappa Architecture (One Unified Stream)
The Kappa Architecture simplifies the design by treating **all data as a stream**, eliminating the complexity of managing two separate processing paths (Batch and Speed).
| Layer | Purpose | Key Function |
|---|---|---|
| Streaming Layer | Simplicity & Latency. The single path that handles both real-time processing and historical processing. | All data is ingested and processed in the stream. To recalculate history, the stream simply re-processes past data from the source log. |
| Serving Layer | Access. Indexes the results from the Streaming Layer for queries. | Provides the real-time, unified results to the application. |
Example Use Case: Financial Trading and Fraud Detection
- Streaming Layer: Processes all financial transactions (trades, deposits) **sequentially and in a single stream**. It immediately checks for anomalies and fraud patterns.
- Serving Layer: Provides **immediate, up-to-the-second account balances** and instantly flags suspicious activity for investigation.
3. A/B Test Conversion Rate Explanation
The **A/B Test Conversion Rate** is the core metric used to evaluate which version of a website or app change (the **Control** or the **Variant**) performs better.
Definition and Calculation
- Conversion: The specific, desired action a user completes (e.g., clicking a button, signing up, making a purchase).
- Conversion Rate (CR): The percentage of visitors who complete the desired action.
Example Comparison
| Metric | Version A (Control) | Version B (Variant) |
|---|---|---|
| Total Visitors | 10,000 | 10,000 |
| Number of Sales (Conversions) | 350 | 400 |
| Conversion Rate | 3.5% | 4.0% |
Determining the Winner
In this example, Version B performs better. The improvement is measured by the Lift:
Comments
Post a Comment