Google Announces New TensorFlow Runtime: TFRT

TFRT will replace the existing TensorFlow runtime.

Recently, Google announced TFRT, a new runtime that will replace the existing TensorFlow runtime. TFRT aims to offer a unified, extensible infrastructure layer with best-in-class performance across a wide variety of domain-specific hardware.
According to Google TFRT renders efficient use of multithreaded host CPUs, supports fully asynchronous programming models, and focuses on low-level efficiency. It provides efficient execution of kernels – low-level device-specific primitives – on targeted hardware.
Google said that TFRT plays a critical role in both eager and graph execution.
Source: Google 
Though TensorFlow runtime was initially built for graph execution and training workloads, the new runtime - TFRT - will make eager execution and inference first-class citizens, while putting special emphasis on architecture extensibility and modularity.
Design highlights of TFRT include:
  • For higher performance, TFRT comes with a lock-free graph executor that supports concurrent op execution with low synchronization overhead, and a thin eager op dispatch stack so that eager API calls will be asynchronous and more efficient.
  • In order to make extending the TF stack easier, Google has decoupled device runtimes from the host runtime, the core TFRT component that drives host CPU and I/O work.
  • For consistent behavior, TFRT takes advantage of common abstractions, such as shape functions and kernels, across both eager and graph.
The new run time is also tightly-integrated with MLIR. It makes use of MLIR’s compiler infrastructure to generate an optimized, target-specific representation of the computational graph that the runtime executes. Also, it utilizes MLIR’s extensible type system to support arbitrary C++ types in the runtime, which removes tensor-specific limitations.