HTML Parsing Using Beautiful Soup In Python

HTML parsing in Python is very easy to handle and can also fetch data from any website according to requirement.

Beautiful Soup is also is library that have great method of html, xml parsing and it provide many concept for filter data from website.

Today I want show some examples of this library. This library is great when you want to access many data continuous from big website for your project.

Beautiful Soup

Now let’s move on with an example. My examples are all performed on www.c-sharpcorner.com website for accessing data directly without accessing website through Browsers.

Example1:

  1. Firstly install library beautiful soup from https://pypi.python.org/pypi/beautifulsoup4.

  2. Firstly we want to access all articles that are shown in front page of website with their author’s names.

Now write code as below:

Example 1

see code

Output:

cmd

Example 2:

Now I want to get all breaking news on c-sharpcorner. So write code as below:

Code

Output:

run

Example 3:

  1. This example is much better for accessing values from multiple page than from a single page.

  2. In this example I retrieve points of top members from url http://www.c-sharpcorner.com/members/ with their names. Write code as below:

    names.write code

    Output

    Output

We can access data from HTML content using Python Beautiful soup library. Python has many libraries that have been helpful to me many times.

Seleniumhq

Seleniumhq is also a very important library that can send data console to html page. Using this library we can work with website. This library make events on elemens of HTML using their tags, classes and Id. It is called remote control.