Cellulose has integrated NVIDIA TensorRT into the dashboard and for each operator in a machine learning model, identify which is compatible / convertible to a particular TensorRT version.
This feature is only enabled for users on the Professional / Enterprise plan. You can read more about our pricing here.
How It Works
For example, if TensorRT version X.Y.Z. is selected as the runtime target, we’ll add Annotations to each operator in the ONNX graph that indicate if it can be convertible or not.
There are several ML framework entry points to TensorRT today. We will cover the only path supported today - ONNX. We plan to also support other frameworks such as PyTorch (and Torch-TRT) in the future.
Selecting a Runtime and Version
Navigate to a tracked model and look for this runtime selector at the top right corner of the page.
Runtime selector at the top right of a model visualizer page
Pick a Runtime Type and Desired Precision. We’ll go with TensorRT 8.6.1 and FP16 respectively in this example.
You’ll note that many of the nodes now have a TRT badge with a green checkmark.
This means that “these nodes are compatible and convertible to TensorRT 8.6.1!
Phew! That’s a good sign. Now we know that the model can be converted to a TensorRT FP16 engine.
Let’s try something else. What about INT8?
Go ahead and select INT8 at the runtime selector at the top right corner again.
Uh oh, seems like many of these nodes won’t work. That Reshape node is the only convertible one in this subgraph. Note that INT8 precision also require calibration dataset integration. We’ll cover this in detail under the quantization section.
Some nodes here may be marked convertible by TensorRT but there are implicit downcasts so engines can be successfully exported.
While this is fine for most workflows, we’d ideally know what has been done to the model before it is shipped as a production asset. Cellulose plans to fill this gap over time by providing even more insights than we already have here.
Understanding TensorRT Convertibility for a given node
Let’s dig a little deeper on that Reshape node. Click on the node to open the drawer. Navigate to the Supported Runtimes tab:
We now see that FP16, FP32, INT32, INT8 and BOOL are supported precisions for Reshape in TensorRT 8.6.1.
Let’s look at another node like BatchNormalization.
In contrast, only FP16 and FP32 are supported for BatchNormalization.
Model Specific Information
Have questions / need help?
Please reach out to firstname.lastname@example.org, and we’ll get back to you as soon as possible.