Introduction Of Big Data


  • Digital data is accumulated in several important fields, such as e-commerce, social network, finance, banking, healthcare, education, environment and purchase at department/grocery stores and so on.
  • It is becoming increasingly popular to uncover such enormous data and you can gain insights to support business choices and create more sophisticated customized higher quality services.
  • Map Reduce Program is used to collect the data according to the request.
  • To process big data, proper scheduling is required to achieve greater performance.
  • Scheduling is a technique of assigning jobs to available resources in a manner so as to minimize the starvation and maximize the resource utilization.
  • Lots of data is being collected and warehoused.

    • Web data, e-commerce
    • Purchases at department/grocery stores
    • Bank/Credit Card transactions.



  • The Big Data is used to store a large amount of data to uncover hidden pattern, correlations, and other insights.
  • Now-a-days, it’s is possible to analyze the data and get answers from it almost immediately - an effort that’s slower and less efficient with more traditional business intelligence solutions.
  • Big Data analytics helps organization harness their data and use it to identify new opportunities.
  • Big data is used to store large amount of data in database and easy to retrieve the data, and information from that database.



  • EMRSA (Energy Map Reduce Scheduling Algorithm) methodology is used for the big data applications. 
  • It is the most popular method in the big data applications.
  • In the current world, the energy waste is great problem in a lot of IT companies.
  • More workload Calculations increase high energy costs.
  • The main purpose is to reduce energy costs from efficient maps reduce concept.
  • In order to optimize the mining results, Evaluate Map Reduce using one step algorithm and three step algorithm, Iterative algorithm by various calculations Efficient mining characteristics, too energy.
  • Incremental processing approach called energy map to reduce the scheduling algorithm.
  • EMRSA is an algorithm to provide more energy and less map based on priority scheduling.
  • It is a task to assign a schedule based on the Jobs need and utilization.



  • Hadoop is an open-source framework, created by Doug Cutting, the creator of open source search technology.
  • Hadoop allows to store and process big data in a distributed environment with asymmetrical clusters of computers using simple programming models.
  • It is designed to expand from single server to thousands of machines, each machine offering local computation and storage.
  • it can run applications that involve thousands of nodes containing terabytes of data.
  • A single node failure doesn’t affect the another system failure.
  • Hive tools is used to Hadoop in big data and it is only work in data center.
  • Google is the very very important in Hadoop technology.
  • Hadoop is only works with OS (operating system), Data, Queries, Cluster.
  • HD means Hadoop.
  • SQL server is used for the hadoop technology.




  • Map Reduce algorithm means it is reducing the storage size of the database. So it will be used fully for increasing energy efficient and to reduce the cost of the energy.
  • Map Reduce algorithm is used by client to access the data from cluster. The map reduce algorithm is used for reducing the more number of key values into a less number of key values.
  • There are totally 6 key values available in the Map task. It is further reduced to 3 key values, known as Reduce Task.
  • And then, this is further reduced to single key value, i.e. called as Reduced Task.




  • This scheduling algorithm is creating the priority queue.
  • Then, the input task is given to the priority queue.
  • This priority queue is assign the queue based on the priority given to the input tasks.
  • Then, it is given to the task allocating process to allocate the input task to process.
  • There are two things in allocating process one is pre and another one is non pre.


Several tools are used in the big data applications, such as - 

  • Hadoop
  • Storm
  • Cloudera
  • Space curve
  • Grid again


There are many techniques available in big data applications.

  • Data Management
  • Data Mining
  • Hadoop
  • In Memory Analysis
  • Text Mining
  • Predictive Analysis


  • Less cost
  • Better sales insight
  • Improve services
  • Very secure
  • More faster
  • Better decision making
  • New strategies are noticed immediately
  • It is very authoritive, actionable and accessible
  • It is timely, relevant, holistic and also more trustworthy.