30 Days Of Python πŸ‘¨β€πŸ’» - Day 21 - Scripting Basics

This article is a part of a 30 day Python challenge series. You can find the links to all the previous posts of this series here
Today I explored the basics of scripting in Python. Scripting basically means writing programs having a collection of commands that are executed from the command line or an interactive shell to perform useful tasks and automate them. There are a ton of things that can be automated using Python scripts such as processing various types of files such as PDFs, images, excel, CSV etc, send emails, create bots like twitter bot and numerous other things. As a part of this challenge, I decided to learn the basics of scripting so that I get to understand the concepts and then can explore in greater depth in future. My focus today was to find out basic techniques of processing images and PDF files using Python scripts.
 

Image Processing

 
Image Processing in simple words is the method or technique to perform some operations on images using programs either for enhancing images or extracting information out of it. There are a lot of popular libraries to perform image processing in Python such as
  • Pillow
  • OpenCV
  • Python Imaging Library (Deprecated)
  • scikit-image
I tried out Pillow which is a forked version of the Python Imaging Library (PIL) which is no longer maintained and does not support the latest versions of Python. Thus Pillow is recommended to use over PIL.
 
The installation and basic usage can be found out in the Pillow docs.
 
Pillow can be installed using the command line using the command pip install Pillow (Check documentation for specific OS commands).
 
It’s time to create some image processing scripts. The first one I created is a basic image converter that converts all JPEG format images in a folder to PNG format images and stores them in another folder. I downloaded some JPEG images from https://unsplash.com and stored them in a folder images. The script needs to read all the JPEG images, convert them to PNG and then place them in a new folder generated.
 
Here’s the code for the script file which I named image_convertor.py Here is the GitHub repository link for the project.
 
Since the original images are very large in size, I first resized them into a smaller size and then converted them to improve the performance of the script.
 
image_converter.py
  1. import os    
  2. from PIL import Image    
  3.     
  4. # fetch all the files from the source folder    
  5. dirname = 'images'    
  6. output_dirname = 'generated'    
  7. images_list = os.listdir(dirname)    
  8.     
  9. # check if output folder exits otherwise create it    
  10. if not os.path.exists(output_dirname):    
  11.     os.makedirs(output_dirname)    
  12.     
  13. for image in images_list:    
  14.     # split the filename to separate the format and name    
  15.     name, format = os.path.splitext(image)    
  16.     
  17.     original = Image.open(f'{dirname}\{image}')    
  18.     
  19.     # resize image to a standard size and to reduce file size    
  20.     size = 1000,1000    
  21.     # thumbnail maintains aspect ratio    
  22.     original.thumbnail(size)     
  23.         
  24.     
  25.     # save image as png format    
  26.     original.save(f'{output_dirname}\{name}.png')    
The script can be run from the terminal as python image_converter.py. It should automatically convert the images in the generated folder.
 
The second script I created is a grayscale converter that converts all the images to black and white images. Using Pillow lot of filters can be applied on the images, grayscale being one of them.
 
grayscale_converter.py
  1. import os    
  2. from PIL import Image, ImageFilter    
  3.     
  4. # fetch all the files from the source folder    
  5. dirname = 'images'    
  6. output_dirname = 'greyscale'    
  7. images_list = os.listdir(dirname)    
  8.     
  9. # check if output folder exits otherwise create it    
  10. if not os.path.exists(output_dirname):    
  11.     os.makedirs(output_dirname)    
  12.     
  13. for image in images_list:    
  14.     # split the filename to separate the format and name    
  15.     name, format = os.path.splitext(image)    
  16.     
  17.     original = Image.open(f'{dirname}\{image}')    
  18.     
  19.     # resize image to a standard size and to reduce file size    
  20.     size = 10001000    
  21.     # thumbnail maintains aspect ratio    
  22.     original.thumbnail(size)    
  23.     
  24.     # convert the image to greyscale    
  25.     grayscale_image = original.convert('L')  # L mode means greyscale    
  26.     grayscale_image.save(f'{output_dirname}\{image}')   
Finally, I created another image processing script to apply a logo on all the images. This uses the technique of merging images. This can be pretty useful if we have to apply branding to images. I added a logo.png image file to the root directory.
 
brand_stamp.py
  1. import os    
  2. from PIL import Image, ImageFilter    
  3.     
  4. # fetch all the files from the source folder    
  5. dirname = 'images'    
  6. output_dirname = 'branded'    
  7. images_list = os.listdir(dirname)    
  8. logo = Image.open('logo.png')    
  9.     
  10. # check if output folder exits otherwise create it    
  11. if not os.path.exists(output_dirname):    
  12.     os.makedirs(output_dirname)    
  13.     
  14. for image in images_list:    
  15.     # split the filename to separate the format and name    
  16.     name, format = os.path.splitext(image)    
  17.     
  18.     original = Image.open(f'{dirname}\{image}')    
  19.     
  20.     # resize image to a standard size and to reduce file size    
  21.     size = 10001000    
  22.     # thumbnail maintains aspect ratio    
  23.     original.thumbnail(size)    
  24.     
  25.     # create a copy of the image    
  26.     image_copy = original.copy()    
  27.     # obtain the position to place the logo    
  28.     
  29.     position = ((image_copy.width - logo.width),    
  30.                 (image_copy.height - logo.height))    
  31.     # The third parameter makes it transparent    
  32.     image_copy.paste(logo, position, logo)    
  33.     image_copy.save(f'{output_dirname}\{name}.png')    
That was quite cool stuff! And this is just scratching the surface of processing images. This is I suppose a nice starting point to explore further in future while creating projects.
 
Here are some cool resources that I found interesting related to image processing in Python
  • https://auth0.com/blog/image-processing-in-python-with-pillow/
  • https://opensource.com/article/19/3/python-image-manipulation-tools
  • https://stackabuse.com/introduction-to-image-processing-in-python-with-opencv/
  • https://towardsdatascience.com/image-manipulation-tools-for-python-6eb0908ed61f
  • https://github.com/shekkizh/ImageProcessingProjects

Processing PDFs

 
Apart from playing around with images, I also explored manipulating PDF files and the basics of processing PDF files based on some practical use cases. PDFs are one of the most widely used file formats and can store a wide variety of data.
 
The library which I used is PyPDF2 https://pypi.org/project/PyPDF2/ which is a very popular library I found on PyPI. The library can be downloaded using the pip command pip install PyPDF2
 
I added a sample PDF file to the pdfs directory
 
The first script I created is mainly to extract information from a PDF file such as its author, page count, subject, title etc.
 
info_extractor.py
  1. from PyPDF2 import PdfFileReader    
  2.     
  3. def extract_information(pdf_path):    
  4.     with open(pdf_path, 'rb') as f:    
  5.         pdf = PdfFileReader(f)    
  6.         information = pdf.getDocumentInfo()    
  7.         number_of_pages = pdf.getNumPages()    
  8.     
  9.     txt = f"""  
  10.     Information about {pdf_path}:   
  11.   
  12.     Author: {information.author}  
  13.     Creator: {information.creator}  
  14.     Producer: {information.producer}  
  15.     Subject: {information.subject}  
  16.     Title: {information.title}  
  17.     Number of pages: {number_of_pages}  
  18.     """    
  19.     
  20.     print(txt)    
  21.     return information    
  22.     
  23. if __name__ == '__main__':    
  24.     path = 'pdfs/sample1.pdf'    
  25.     extract_information(path)  
The script can be run using python info_extractor.py. It should successfully print all the necessary information about the PDF file.
 
Lastly, I worked on another script to add the branding logo to all the pdfs as a watermark. For that, I created another blank PDF that only has the logo as watermarked it. This can now be merged with the PDF file to process. Creating watermarked PDFs is quite a common requirement and automating this task might be pretty useful.
 
pdf_watermarker.py
  1. from PyPDF2 import PdfFileWriter, PdfFileReader    
  2.     
  3. def create_watermark(input_pdf, output, watermark):    
  4.     watermark_obj = PdfFileReader(watermark)    
  5.     watermark_page = watermark_obj.getPage(0)    
  6.     
  7.     pdf_reader = PdfFileReader(input_pdf)    
  8.     pdf_writer = PdfFileWriter()    
  9.    
  10.     # Watermark all the pages    
  11.     for page in range(pdf_reader.getNumPages()):    
  12.         page = pdf_reader.getPage(page)    
  13.         page.mergePage(watermark_page)    
  14.         pdf_writer.addPage(page)    
  15.     
  16.     with open(output, 'wb') as out:    
  17.         pdf_writer.write(out)    
  18.     
  19. if __name__ == '__main__':    
  20.     create_watermark(    
  21.         input_pdf='pdfs/sample1.pdf',     
  22.         output='pdfs/watermarked_sample.pdf',    
  23.         watermark='pdfs/watermark.pdf')   
On running python pdf_watermarker.py, it should generate the watermarked PDF file.
 
There are a lot of things that can be done with PDFs. However, I simply decided to go through the basics to get my familiar with the process. I am linking some great resources to deep dive into PDF processing.
 
Here are some references for processing PDFs in Python
  • https://realpython.com/pdf-python/
  • https://towardsdatascience.com/pdf-preprocessing-with-python-19829752af9f
  • https://www.geeksforgeeks.org/working-with-pdf-files-in-python/
  • https://automatetheboringstuff.com/chapter13/
  • https://medium.com/@umerfarooq_26378/python-for-pdf-ef0fac2808b0
All the associated code can be found in this Github repo
 
That’s all for today. Will be exploring more on scripting such as building automated bots for Twitter, sending email and other cool stuff tomorrow.
 
Have a nice one!