Learn About Python GCP

In this article, you will learn about Python GCP.

Overview

 
This articles covers how to set up a Python development environment, get the cloud dataflow SDK for python, and an example pipeline using the Google cloud platform.
 
Requirements
  • Google cloud platform account

Steps

 
Step 1 - Activate cloud shell
 
Python GCP
 
Then click Continue.
 
Step 2 - Create a cloud storage bucket.
  • In the GCP console, click on cloud storage.
  • Click the create Bucket.
  • In the create Bucket Dialog, specify the following attributes.
    Name: Publicly visible name
    Storage Class: Multi Regional
    A location where bucket data will be stored.
  • Click create.
Step 3 - Install pip and cloud Dataflow SDK
 
The cloud Dataflow for Python requires Python version 2.7. Check the version through the Cloud shell command.
 
Check the Python version
 
Python –version
 
Check the pip version.
 
pip --version
 
After installation, the pip version is 7.0.0 or newer. To update pip, run the command
 
sudo pip install -U pip
 
If you do not have virtualenv version 13.1.0 or newer, install it by running
 
sudo pip install --upgrade virtualenv
 
A virtual python environment is own python distribution, To create a virtual environment, run the command
 
virtualenv -p python env
 
then, To activate virtual environment in Bash, Run command
 
source env/bin/activate
 
Step 4
 
Install the latest version of Apache beam.
 
Pip install apache-beam[gcp]
 
Step 5
 
Run the wordcount.py example locally by running the following command,
 
python -m apache_beam.examples.wordcount –output OUTPUT_FILE
 
You may see a similar message.
 
Step 6
 
List the files that are on your local cloud environment to get the name of the OUTPUT_FILE.
 
1s
 
Copy the name of the OUTPUT_FILE and cat into it.
 
cat<OUTPUT_FILE>
 
Step 7
 
Run example pipeline remotely,
 
BUCKET=GS://bucket name provided earlier>
 
Step 8
 
Run the wordcount.py example remotely,
  1. Python -m apache_beam.example.wordcount –project $DEVSHELL_PROJECT_ID\  
  2. --runner DataflowRunner \  
  3. --staging_location $BUCKET/staging\  
  4. --temp_location $BUCKET/temp \  
  5. --output $BUCKET/results/output  
OUTPUT
 
JOB_MESSAGE_DETAILED: Workers have started successfully.
 
Check if your job has succeeded
 
Click Navigation Menu->Cloud Dataflow
 
You should see your wordcount job with the status of running.
 
Click Navigation Menu->Storage
 
Click on the name of your bucket. In your bucket, you should see the results and staging directories.
 
Then, click on the results folder and you should see the output files that your job created.
 
Python GCP
 

Conclusion

 
That’s all. We have created a Python environment with Dataflow. I hope you understood how to create a Python environment in the Google Cloud Platform.