Operator Annotations

Introduction

Cellulose has integrated NVIDIA TensorRT into the dashboard and for each operator in a machine learning model, identify which is compatible / convertible to a particular TensorRT version.

This feature is only enabled for users on the Professional / Enterprise plan. You can read more about our pricing here.

How It Works

For example, if TensorRT version X.Y.Z. is selected as the runtime target, we’ll add Annotations to each operator in the ONNX graph that indicate if it can be convertible or not.

There are several ML framework entry points to TensorRT today. We will cover the only path supported today - ONNX. We plan to also support other frameworks such as PyTorch (and Torch-TRT) in the future.

Selecting a Runtime and Version

Navigate to a tracked model and look for this runtime selector at the top right corner of the page.

Runtime selector at the top right of a model visualizer page

Pick a Runtime Type and Desired Precision. We’ll go with TensorRT 8.6.1 and FP16 respectively in this example.

You’ll note that many of the nodes now have a TRT badge with a green checkmark.

This means that “these nodes are compatible and convertible to TensorRT 8.6.1!

Phew! That’s a good sign. Now we know that the model can be converted to a TensorRT FP16 engine. Let’s try something else. What about INT8?

Go ahead and select INT8 at the runtime selector at the top right corner again.

Uh oh, seems like many of these nodes won’t work. That Reshape node is the only convertible one in this subgraph. Note that INT8 precision also require calibration dataset integration. We’ll cover this in detail under the quantization section.

Some nodes here may be marked convertible by TensorRT but there are implicit downcasts so engines can be successfully exported.While this is fine for most workflows, we’d ideally know what has been done to the model before it is shipped as a production asset. Cellulose plans to fill this gap over time by providing even more insights than we already have here.

Understanding TensorRT Convertibility for a given node

Let’s dig a little deeper on that Reshape node. Click on the node to open the drawer. Navigate to the Supported Runtimes tab:

Support TensorRT 8.6.1 precisions for Reshape

We now see that FP16, FP32, INT32, INT8 and BOOL are supported precisions for Reshape in TensorRT 8.6.1. Let’s look at another node like BatchNormalization.

Support TensorRT 8.6.1 precisions for BatchNormalization

In contrast, only FP16 and FP32 are supported for BatchNormalization.

Model Specific Information

ONNX Models

We currently use the ONNX-TensorRT tool to evaluate TensorRT compatibility. You can read more about it here

Have questions / need help?

Please reach out to [email protected], and we’ll get back to you as soon as possible.

Getting Started

Dashboard Features

PyTorch

Runtime Support

Organizations

Pricing

Release Notes

Roadmap

Operator Annotations

Introduction

How It Works

Selecting a Runtime and Version

Understanding TensorRT Convertibility for a given node

Model Specific Information

ONNX Models

Have questions / need help?

Getting Started

Dashboard Features

PyTorch

Runtime Support

Organizations

Pricing

Release Notes

Roadmap

​Introduction

​How It Works

​Selecting a Runtime and Version

​Understanding TensorRT Convertibility for a given node

​Model Specific Information

​ONNX Models

​Have questions / need help?

Introduction

How It Works

Selecting a Runtime and Version

Understanding TensorRT Convertibility for a given node

Model Specific Information

ONNX Models

Have questions / need help?