Cellulose has integrated NVIDIA TensorRT into the dashboard and for each
operator in a machine learning model, identify which is compatible /
convertible to a particular TensorRT version.
This feature is only enabled for users on the Professional / Enterprise plan.
You can read more about our pricing here.
For example, if TensorRT version X.Y.Z. is selected as the runtime target,
we’ll add Annotations to each operator in the ONNX graph that indicate if it
can be convertible or not.
There are several ML framework entry points to TensorRT today. We will cover the
only path supported today - ONNX. We plan to also support other frameworks
such as PyTorch (and Torch-TRT) in the future.
Navigate to a tracked model and look for this runtime selector at the top right
corner of the page.
Runtime selector at the top right of a model visualizer page
Pick a Runtime Type and Desired Precision. We’ll go with
TensorRT 8.6.1 and FP16 respectively in this example.
You’ll note that many of the nodes now have a TRT badge with a green checkmark.
This means that “these nodes are compatible and convertible to TensorRT 8.6.1!
Phew! That’s a good sign. Now we know that the model can be converted to a
TensorRT FP16 engine.Let’s try something else. What about INT8?
Go ahead and select INT8 at the runtime selector at the top right corner again.
Uh oh, seems like many of these nodes won’t work.
That Reshape node is the only convertible one in this subgraph. Note that
INT8 precision also require calibration dataset integration. We’ll cover this
in detail under the quantization section.
Some nodes here may be marked convertible by TensorRT but there are implicit
downcasts so engines can be successfully exported.While this is fine for most workflows, we’d ideally know what has been
done to the model before it is shipped as a production asset. Cellulose plans
to fill this gap over time by providing even more insights than we already have
here.
Understanding TensorRT Convertibility for a given node
Let’s dig a little deeper on that Reshape node. Click on the node to open
the drawer. Navigate to the Supported Runtimes tab:
We now see that FP16, FP32, INT32, INT8 and BOOL are supported precisions
for Reshape in TensorRT 8.6.1.Let’s look at another node like BatchNormalization.
In contrast, only FP16 and FP32 are supported for BatchNormalization.