Introduction to Accumulators : Apache Spark


Whats the Problem  : Function like map() , filter() can use variables defined outside them in the driver program but each task running on the cluster gets a new copy of each variable, and updates from these copies are not propagated back to the driver.

The Solution : Spark provides two type of shared variables.

1.    Accumulators
2.    Broadcast variables

Here we are only interesting in Accumulators . if you wants to read about Broadcast variables then you can refer this Blog

Accumulators provides a simple syntax for aggregating values from worker nodes back to the driver program. One of the most common use of Accumulator is count events that may help in debugging process.

Example :-  to understand Accumulators better lets take a example of football.

Div Date HomeTeam AwayTeam FTHG FTAG FTR
D1 14/08/15 Bayern Munich Hamburg 5 H
D1 15/08/15 Augsburg Hertha 1 A

View original post 337 more words


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s