How To Clone Git Repository In AWS Notebook Instances

In this article, we’ll learn how to clone repositories from GitHub using git clone which will enable us to run the repo on the notebook instance created in AWS. This will become highly beneficial for Machine Learning Engineers and Data Scientists looking to explore the notebooks from other creators in the production environment without having to install all packages and libraries on their local notebook and having to go through the complications of any dependencies and resource limitations.

Amazon Web Services provides a wide range of features for Machine Learning and Amazon SageMaker is at the forefront. We learned to create notebook instances in our last article, Creating Notebook Instance in Amazon SageMaker. Notebook is the primary tool through which interaction is done with the SageMaker ecosystem. There are numerous other ways for the interaction to the functionalities of Amazon SageMaker with this approach widely used. We discussed in detail Amazon SageMaker in the previous article, Introduction to AWS SageMaker. From this article, we’ll be able to clone the public repositories from GitHub and explore them with ease in the Notebook Instance in Amazon SageMaker.  

Step 1 - Create Notebook Instance in Amazon SageMaker

You can easily follow the instructions from the previous article, Creating Notebook Instance in Amazon SageMaker, and have the notebook instance running. We have our notebook instance, ojash-deployment-notebook running. 

Step 2

Click on Open Jupyter 

Step 3

We’ll be taken to the Jupyter Page Under Files.  

Step 4

Click On New and Choose Terminal listed at the bottom of the list.  

Step 5

You are now directed to the terminal welcomed with a black screen with sh-4.2$.  

Step 6

Change the Directory to SageMaker with the command,

cd SageMaker

Step 7

Clone the git repo with the command, git clone followed by the repo HTTPS link.  

In the case here, I’m cloning the repo from my GitHub which is an app programmed in python using pandas and plotly will help visualize the real-time location of the International Space Station. 

git clone https://github.com/ojashshrestha1/international-space-station-realtime-location.git

Note, that complex Machine Learning applications with numerous libraries to be installed, that uses huge libraries provided in SageMaker service itself can be clone in Amazon SageMaker with this procedure with ease. Thereafter, you could train the models and test them on the platform itself without much worry about the library installation complexities.  

Step 8

The package will be unpacked and confirmation of success will be shown as follows.  

Step 9

As you are done with the cloning, you can close the terminal using the command exit.  

Step 10

The update [CLOSED] assured the terminal has now been closed.  

Step 11

You can now close the tab and check out the Jupyter Notebook Home Page. The New directory, international-space-station-realtime-location can now be viewed and explored.  

You can see, the files of the GitHub repo have been properly cloned.. 

Conclusion

In this article, we learned how we can clone github repository into the Notebook Instance in Amazon Sage Maker. The Amazon Web Services enables Machine Learning functionalities in utmost support from building, training and deploying. This cloning in the Notebook instance will help machine learning engineers and data scientists to explore the enormous number of machine learning projects with ease to expand the work, contribute to open-source projects and moreover help learn and grow in this domain.