Netflix Announces Open Sourcing Of Metaflow

Metaflow is a human-centric framework for data science from Netflix.

Recently, Netflix announced the open-sourcing of Metaflow. Metaflow is Netflix's human-centric framework for data science.
 
For last two years, Netflix has been using Metaflow internally to build and manage hundreds of data-science projects from NLP to operations research. Well, by design, Metaflow is a deceptively simple Python library.
 
The company said that its data warehouse contains hundreds of petabytes of data and Metaflow leverages the elasticity of the cloud by design — both for compute and storage. For the open-source release, Netflix partnered with AWS to provide a seamless integration between Metaflow and various AWS services.
 
Netflix Metaflow 
Source: Netflix
 
Metaflow allows you to structure your workflow as a Directed Acyclic Graph of steps, as depicted above. And, the steps can be arbitrary Python code. In the above hypothetical example, the flow will train two versions of a model in parallel and will choose the one with the higher score.
 
In the above example since data and models are stored as normal Python instance variables, they work even if the code is executed on a distributed compute platform. Metaflow supports this by default because of Metaflow’s built-in content-addressed artifact store.
 
To get a full overview of all features of Metaflow, you can visit documentation at docs.metaflow.org.