Web Scraping Using Scrapy

What is Web Scraping?

Web Scraping is extracting a large amount of data from websites. This data then can be saved in your local file in the computer or in the database table.

We can use this data to do the analysis. For example, we can scrape prices of products from e-commerce websites and then analyze them. 

Why Web Scraping? 

Data displayed on the website can only be viewed on the browser. We don't get to save this information. For this, we actually need to copy/paste the entire website which is boring sometimes. So, instead, we can use scrapers to get the information in a fraction of minutes.

Scrapy Framework 

Scrapy is the web scraping framework written in Python. It can be used for various purposes like data mining, monitoring, and test automating. Scrapy is open source and available for python 2.7 and python 3.4 and above version.

Here, we will see how easily we can scrape websites using the Scrapy framework.

Steps 
  1. Requirement
    Install Python 2.7 or Python 3.4 or above. Here is the link for downloading python Python Link.

  2. Install Scrapy
    Open your command prompt or terminal and type,

    pip install scrapy
  3. Scrapy Shell
    Scrapy has a ScrapyShell which can be used for testing or debugging your code and you can also scrape the URLs from here. So, once you have successfully installed Scrapy, just write in your command prompt or Terminal -

    scrapy shell 

    Python

  4. Fetch
    Once ScrapyShell is started successfully, we can start scraping. Fetch is going to request the response and scrape the data. For now, I am going to take my friend's website "ugentertainment.in".

    fetch("http://ugentertainment.in/")

    Python

  5. View
    View will open the response in your default browser.

    view(response)

    Python

    and the scraped website will open in the default browser and you can compare the original website and scraped website. 

Scaped Website

Python

Original Website 

Python

And you are done with scraping your first website using Scrapy.