30 Days of Python 👨‍💻 - Day 18 - File I/O

This article is a part of a 30 day Python challenge series. You can find the links to all the previous posts of this series here
Today I explored how to handle files and communicate with files using Python. All these days, I have been exploring and sharing about various Python concepts along with some best practices to program in Python. However, we haven’t interacted with the outside world outside Python. Our programs often need to communicate with the external world for various reasons such as reading data from excel, CSV or pdf files, converting and compressing images, extracting data from text files, reading data from a database and countless other things. This interaction with the external world is done using I/O or input-output operations.
Files help us store data permanently into systems. When we write any program to manage some data, the data is stored temporarily in the RAM of the machine and gets erased when the computer is turned off. To store data permanently, they need to be stored in some kind of a database or in some file systems so that can be accessed for later use.
Files can be broadly classified based on their content into two types,
  • Binary (Also called as Rich text)
  • Text
If you are interested to know more about these two file types here is a great article to checkout.
Python provides a built-in function open to open any file. Any file first needs to be opened to read data from it or write some data onto it. Reading data from a file is simple in Python.
I used REPL as the playground to experiment with all the code-blocks provided in this article.

Opening Files

I created a test.txt file with some dummy content for testing.
  1. # test.txt I am learning python.   
Now the contents of this file can be read using Python like this.
  1. content = open('test.txt')  
  2. output = content.read()  
  3. print(output) # I am learning python.  
We can also specify the mode while opening the file in the open function. By default, the mode is ‘r’ or read mode. We can also specify if the file needs to be opened in a text or binary mode.
r Opens a file for reading. (default)
w Opens a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
x Opens a file for exclusive creation. If the file already exists, the operation fails.
a Opens a file for appending at the end of the file without truncating it. Creates a new file if it does not exist.
t Opens in text mode. (default)
b Opens in binary mode.
+ Opens a file for updating (reading and writing)
We can also specify the encoding format while opening a file. The default format is utf-8

Closing Files

It is important to close the file after performing operations on it as it will free up the memory space associated with the file.
  1. content = open('test.txt', mode='r')  
  2. output = content.read()  
  3. print(output)  
  4. content.close()  
The above block can be placed inside a try-except finally block. This ensures that if there is any error while performing the operation, the file will be closed.
  1. try:  
  2.     content = open('test.txt', mode='r')  
  3.     output = content.read()  
  4.     print(output)  
  5. except FileNotFoundError as error:  
  6.     print(f'file not found {error}')  
  7. finally:  
  8.     content.close()  
Python provides a better syntax to open a perform operations on a file using the with statement. It automatically closes the file once the operation is performed.
  1. with open('test.txt', mode='r') as content:  
  2.     output = content.read()  
  3.     print(output) # I am learning python.  

Writing to Files

Python provides write method to write data to a file. The file needs to be opened using w mode to write to a file. It is to be noted that using the w mode overrides the content of the file. If the content needs to be appended, then the a mode can be used. If the file does not exist, the file is created before writing data to it.
  1. with open('test.txt', mode='w', encoding='utf-8') as my_file:  
  2.     my_file.write('This is the first line\n'# \n is for creating a newline  
  3.     my_file.write('This is the second line\n')  
  4.     my_file.write('This is the third line')  
  1. with open('test.txt', mode='a', encoding='utf-8') as my_file:  
  2.     my_file.write('This text will be appended')  
Another way to write is by using the writelines method. It can be provided with a list of items.
  1. with open('test.txt', mode='w', encoding='utf-8') as my_file:  
  2.     my_file.writelines(['First line''\n''Second Line'])  

Reading from files

Python provides a lot of methods to read from a file. The file needs to be opened using the ‘r’ mode. The is also an ‘r+’ mode if we have to do read and write operations together. The read method accepts a size parameter which is basically the character count till which it will read. If the size is not provided, then it reads the entire file.
  1. with open('test.txt', mode='r', encoding='utf-8') as my_file:  
  2.     content = my_file.read()  
  3.     print(content)  
There is a tell method which provides where the cursor in the file that is being read is currently.
The seek method is used to bring the cursor to a specific position in the file.
  1. with open('test.txt', mode='r', encoding='utf-8') as my_file:  
  2.     my_file.seek(0# brings cursor to beginning of file  
  3.     print(my_file.tell()) # prints location of cursor  
  4.     content = my_file.read()  
  5.     print(content)  
If there are many lines in the file, a more efficient and performant way is to read the lines using a loop.
  1. with open('test.txt', mode='r', encoding='utf-8') as my_file:  
  2.     for line in my_file:  
  3.         print(line)  
Alternatively, Python provides two other methods, readline and readlines.
readline reads the file until a newline(\n) is reached.
Readlines returns a list of lines.

Python file methods

Here is the complete list of file methods available in Python
close() Closes an opened file. It has no effect if the file is already closed.
detach() Separates the underlying binary buffer from the TextIOBase and returns it.
fileno() Returns an integer number (file descriptor) of the file.
flush() Flushes the write buffer of the file stream.
isatty() Returns True if the file stream is interactive.
read(n) Reads at most n characters from the file. Reads till end of file if it is negative or None.
readable() Returns True if the file stream can be read from.
readline(n=-1) Reads and returns one line from the file. Reads in at most n bytes if specified.
readlines(n=-1) Reads and returns a list of lines from the file. Reads in at most n bytes/characters if specified.
seek(offset,from=SEEK_SET) Changes the file position to offset bytes, in reference to from (start, current, end).
seekable() Returns True if the file stream supports random access.
tell() Returns the current file location.
truncate(size=None) Resizes the file stream to size bytes. If size is not specified, resizes to current location.
writable() Returns True if the file stream can be written to.
write(s) Writes the string s to the file and returns the number of characters written.
writelines(lines) Writes a list of lines to the file.

A cool exercise

Let’s try building a translator program that can read a file with English content and create a new translated version of the file in a different language.
For this exercise, we will use an external Python package from PyPI called Translate. With the help of this package, we can do offline translations!
First, this package needs to be installed. Since I am using REPL, I will add it to the packages section in REPL. It can be installed using pip in the terminal, if using a local project.
Will create a file named quote.txt and fill it with an inspiring quote,
  1. If you can't make it good, at least make it look good. - Bill Gates  
Now let’s generate two translated versions of this quote. One in Spanish with filename quote-es.txt and another in French with filename quote-fr.txt
  1. from translate import Translator  
  3. spanish_translate = Translator(to_lang="es")  
  4. french_translate = Translator(to_lang="fr")  
  6. try:  
  7.     with open('quote.txt', mode='r') as quote_file:  
  8.         # read the file  
  9.         quote = quote_file.read()  
  10.         # do the translations  
  11.         quote_spanish = spanish_translate.translate(quote)  
  12.         quote_french = french_translate.translate(quote)  
  13.         # create the translated files  
  14.         try:  
  15.             with open('quote-es.txt', mode='w') as quote_de:  
  16.                 quote_de.write(quote_spanish)  
  17.             with open('quote-fr.txt', mode='w') as quote_fr:  
  18.                 quote_fr.write(quote_french)  
  19.         except IOError as error:  
  20.             print('An error ocurred')  
  21.             raise (error)  
  22. except FileNotFoundError as error:  
  23.     print('File not found')  
  24.     raise (error)  
This will generate two translated files with the quote translated automatically. That was pretty cool!

Built-in module to handle files

Python provides a built-in module as part of its standard libraries called pathlib. It provides various convenient classes representing file system paths with semantics appropriate for different operating systems. This module was introduced in v3.4. It is beneficial to use this package when dealing with a lot of directories.
Here are some resources related to the pathlib module with great explanations.
  • https://realpython.com/python-pathlib/
  • https://docs.python.org/3/library/pathlib.html
  • https://www.geeksforgeeks.org/pathlib-module-in-python/
Will be using pathlib module explicitly in upcoming days while building projects.
That’s all for today. Tomorrow I plan to explore working with regular expressions in Python and their use cases.
Have a great one!