Vipul Malhotra
What is pyspark architecture

What is the architecture of pyspark? How are the driver and worker nodes managed in the same?

By Vipul Malhotra in Data Mining on Mar 22 2023
  • Rajkiran Swain
    Apr, 2023 3

    PySpark is the Python API for Apache Spark, which is an open-source distributed computing system used for large-scale data processing. The architecture of PySpark consists of the following components:

    Driver program: The driver program is responsible for creating a SparkContext, which is the entry point for PySpark applications. It communicates with the cluster manager to coordinate the execution of tasks on the worker nodes.

    SparkContext: The SparkContext is the entry point for PySpark applications. It is responsible for managing the cluster resources, creating RDDs (Resilient Distributed Datasets), and coordinating the execution of tasks on the worker nodes.

    RDD: RDDs are the basic unit of data in PySpark. They are immutable distributed collections of objects that can be processed in parallel across the worker nodes. RDDs can be created from data stored in Hadoop Distributed File System (HDFS), local file system, or any other storage system supported by Spark.

    Transformations: Transformations are operations that are applied to RDDs to create a new RDD. Examples of transformations include map, filter, and join.

    Actions: Actions are operations that are applied to RDDs to return a result or to trigger a computation. Examples of actions include count, collect, and reduce.

    Worker nodes: Worker nodes are the nodes in the cluster that execute the tasks. They receive the tasks from the driver program, process them, and send the results back to the driver program.

    Cluster manager: The cluster manager is responsible for managing the cluster resources and coordinating the execution of tasks on the worker nodes. Apache Spark supports several cluster managers, including YARN, Mesos, and standalone cluster manager.

    • 0


Most Popular Job Functions


MOST LIKED QUESTIONS