Whats the Problem : Function like map() , filter() can use variables defined outside them in the driver program but each task running on the cluster gets a new copy of each variable, and updates from these copies are not propagated back to the driver.
The Solution : Spark provides two type of shared variables.
2. Broadcast variables
Here we are only interesting in Accumulators . if you wants to read about Broadcast variables then you can refer this Blog
Accumulators provides a simple syntax for aggregating values from worker nodes back to the driver program. One of the most common use of Accumulator is count events that may help in debugging process.
Example :- to understand Accumulators better lets take a example of football.
View original post 337 more words