Spark RDD Operations

Spark RDD Operations

Spark RDD Operations

  1. Transformation
  2. Action

Transformation in Spark

Spark Transformation is a function that produces new RDD (dataframes/datasets) from the existing RDDs. It takes RDD as input and produces one or more RDD as output.

Each time it creates a new RDD when we apply any transformation. (input RDD’s cannot be changed since RDD are immutable in nature)

Example of Transformation,

  • Narrow Transformation - map(), mapPartition(), flatMap(), filter(), union()
  • Wider Transformation - groupByKey(), aggregateByKey(), aggregate(), join(), repartition(), etc.,

Action in Spark

Transformations creates RDDs from each other, but when we want to work with the actual dataset, at that point, action is performed.

Example of Action,

collect(), count(), first(), top(), etc.,