iluwatar
diff --git a/‎map-reduce/README.md
+77-40 b/‎map-reduce/README.md
+77-40
diff --git a/‎map-reduce/etc/mapreduce-flowchart.png
88.4 KB b/‎map-reduce/etc/mapreduce-flowchart.png
88.4 KB
@@ -2,20 +2,26 @@
 title: "MapReduce Pattern in Java"
 shortTitle: MapReduce
 description: "Learn the MapReduce pattern in Java with real-world examples, class diagrams, and tutorials. Understand its intent, applicability, benefits, and known uses to enhance your design pattern knowledge."
-category: Structural
+category: Functional
 language: en
 tag:
-    - Delegation
+    - Concurrency
+    - Data processing
+    - Data transformation
+    - Functional decomposition
+    - Immutable
+    - Multithreading
+    - Scalability
 ---
 
 ## Also known as
 
-* Split-Apply-Combine Strategy
-* Scatter-Gather Pattern
+* Map-Reduce
+* Divide and Conquer for Data Processing
 
 ## Intent of Map Reduce Design Pattern
 
-MapReduce aims to process and generate large datasets with a parallel, distributed algorithm on a cluster. It divides the workload into two main phases: Map and Reduce, allowing for efficient parallel processing of data.
+To efficiently process large-scale datasets by dividing computation into two phases: map and reduce, which can be executed in parallel and distributed across multiple nodes.
 
 ## Detailed Explanation of Map Reduce Pattern with Real-World Examples
 
@@ -29,19 +35,22 @@ In plain words
 
 Wikipedia says
 
-> "MapReduce is a programming model and associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster".
-MapReduce consists of two main steps:
-The "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.
-The "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.
-This approach allows for efficient processing of vast amounts of data across multiple machines, making it a fundamental technique in big data analytics and distributed computing.
+> MapReduce is a programming model and associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. MapReduce consists of two main steps:
+The Map step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. The Reduce step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve. This approach allows for efficient processing of vast amounts of data across multiple machines, making it a fundamental technique in big data analytics and distributed computing.
+
+Flowchart
+
+![MapReduce flowchart](./etc/mapreduce-flowchart.png)
 
 ## Programmatic Example of Map Reduce in Java
 
 ### 1. Map Phase (Splitting & Processing Data)
 
-* The Mapper takes an input string, splits it into words, and counts occurrences.
-* Output: A map {word → count} for each input line.
+* Each input string is split into words, normalized, and counted.
+* Output: A map `{word → count}` for each input string.
+
 #### `Mapper.java`
+
 ```java
 public class Mapper {
     public static Map<String, Integer> map(String input) {
@@ -57,13 +66,17 @@ public class Mapper {
     }
 }
 ```
+
 Example Input: ```"Hello world hello"```
 Output: ```{hello=2, world=1}```
 
-### 2. Shuffle Phase (Grouping Data by Key)
+### 2. Shuffle Phase – Grouping Words Across Inputs
+
+* Takes results from all mappers and groups values by word.
+* Output: A map `{word → list of counts}`.
 
-* The Shuffler collects key-value pairs from multiple mappers and groups values by key.
 #### `Shuffler.java`
+
 ```java
 public class Shuffler {
     public static Map<String, List<Integer>> shuffleAndSort(List<Map<String, Integer>> mapped) {
@@ -78,14 +91,18 @@ public class Shuffler {
     }
 }
 ```
+
 Example Input: 
+
 ```
 [
     {"hello": 2, "world": 1},
     {"hello": 1, "java": 1}
 ]
 ```
+
 Output: 
+
 ```
 {
     "hello": [2, 1],
@@ -94,10 +111,13 @@ Output:
 }
 ```
 
-### 3. Reduce Phase (Aggregating Results)
+### 3. Reduce Phase – Aggregating Counts
+
+* Sums the list of counts for each word.
+* Output: A sorted list of word counts in descending order.
 
-* The Reducer sums up occurrences of each word.
 #### `Reducer.java`
+
 ```java
 public class Reducer {
     public static List<Map.Entry<String, Integer>> reduce(Map<String, List<Integer>> grouped) {
@@ -112,15 +132,19 @@ public class Reducer {
     }
 }
 ```
+
 Example Input:
+
 ```
 {
     "hello": [2, 1],
     "world": [1],
     "java": [1]
 }
 ```
+
 Output:
+
 ```
 [
     {"hello": 3},
@@ -129,10 +153,12 @@ Output:
 ]
 ```
 
-### 4. Running the Full MapReduce Process
+### 4. MapReduce Coordinator – Running the Whole Pipeline
+
+* Coordinates map, shuffle, and reduce phases.
 
-* The MapReduce class coordinates the three steps.
 #### `MapReduce.java`
+
 ```java
 public class MapReduce {
     public static List<Map.Entry<String, Integer>> mapReduce(List<String> inputs) {
@@ -148,10 +174,12 @@ public class MapReduce {
 }
 ```
 
-### 4. Main Execution (Calling MapReduce)
+### 5. Main Execution – Example Usage
+
+* Runs the MapReduce process and prints results.
 
-* The Main class executes the MapReduce pipeline and prints the final word count.
 #### `Main.java`
+
 ```java
   public static void main(String[] args) {
     List<String> inputs = Arrays.asList(
@@ -168,6 +196,7 @@ public class MapReduce {
 ```
 
 Output:
+
 ```
 hello: 4
 world: 2
@@ -183,10 +212,11 @@ fun: 1
 ## When to Use the Map Reduce Pattern in Java
 
 Use MapReduce when:
-* Processing large datasets that don't fit into a single machine's memory
-* Performing computations that can be parallelized
-* Dealing with fault-tolerant and distributed computing scenarios
-* Analyzing log files, web crawl data, or scientific data
+
+* When processing large datasets that can be broken into independent chunks.
+* When data operations can be naturally divided into map (transformation) and reduce (aggregation) phases.
+* When horizontal scalability and parallelization are essential, especially in distributed or big data environments.
+* When leveraging Java-based distributed computing platforms like Hadoop or Spark.
 
 ## Map Reduce Pattern Java Tutorials
 
@@ -197,32 +227,39 @@ Use MapReduce when:
 
 Benefits:
 
-* Scalability: Can process vast amounts of data across multiple machines
-* Fault-tolerance: Handles machine failures gracefully
-* Simplicity: Abstracts complex distributed computing details
+* Enables massive scalability by distributing processing across nodes.
+* Encourages a functional style, promoting immutability and stateless operations.
+* Simplifies complex data workflows by separating transformation (map) from aggregation (reduce).
+* Fault-tolerant due to isolated, recoverable processing tasks.
 
 Trade-offs:
 
-* Overhead: Not efficient for small datasets due to setup and coordination costs
-* Limited flexibility: Not suitable for all types of computations or algorithms
-* Latency: Batch-oriented nature may not be suitable for real-time processing needs
+* Requires a suitable problem structure — not all tasks fit the map/reduce paradigm.
+* Data shuffling between map and reduce phases can be performance-intensive.
+* Higher complexity in debugging and optimizing distributed jobs.
+* Intermediate I/O can become a bottleneck in large-scale operations.
 
 ## Real-World Applications of Map Reduce Pattern in Java
 
-* Google's original implementation for indexing web pages
-* Hadoop MapReduce for big data processing
-* Log analysis in large-scale systems
-* Genomic sequence analysis in bioinformatics
+* Hadoop MapReduce: Java-based framework for distributed data processing using MapReduce.
+* Apache Spark: Utilizes similar map and reduce transformations in its RDD and Dataset APIs.
+* Elasticsearch: Uses MapReduce-style aggregation pipelines for querying distributed data.
+* Google Bigtable: Underlying storage engine influenced by MapReduce principles.
+* MongoDB Aggregation Framework: Conceptually applies MapReduce in its data pipelines.
 
 ## Related Java Design Patterns
 
-* Chaining Pattern
-* Master-Worker Pattern
-* Pipeline Pattern
+* [Master-Worker](https://java-design-patterns.com/patterns/master-worker/): Similar distribution of tasks among workers, with a master coordinating job execution.
+* [Pipeline](https://java-design-patterns.com/patterns/pipeline/): Can be used to chain multiple MapReduce operations into staged transformations.
+* [Iterator](https://java-design-patterns.com/patterns/iterator/): Often used under the hood to process input streams lazily in map and reduce steps.
 
 ## References and Credits
 
-* [What is MapReduce](https://www.ibm.com/think/topics/mapreduce)
-* [Wy MapReduce is not dead](https://www.codemotion.com/magazine/ai-ml/big-data/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud/)
-* [Scalabe Distributed Data Processing Solutions](https://tcpp.cs.gsu.edu/curriculum/?q=system%2Ffiles%2Fch07.pdf)
+* [Big Data: Principles and Paradigms](https://amzn.to/3RJIGPZ)
+* [Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems](https://amzn.to/3E6VhtD)
+* [Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale](https://amzn.to/4ij2y7F)
+* [Java 8 in Action: Lambdas, Streams, and functional-style programming](https://amzn.to/3QCmGXs)
 * [Java Design Patterns: A Hands-On Experience with Real-World Examples](https://amzn.to/3HWNf4U)
+* [Programming Pig: Dataflow Scripting with Hadoop](https://amzn.to/4cAU36K)
+* [What is MapReduce (IBM)](https://www.ibm.com/think/topics/mapreduce)
+* [Why MapReduce is not dead (Codemotion)](https://www.codemotion.com/magazine/ai-ml/big-data/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud/)