Overview

On this page

Have questions / need help?

At time of this writing, most modern ML inference workloads, especially ones in production, run on NVIDIA GPUs for better performance. NVIDIA TensorRT is one such tool that takes in a PyTorch or ONNX model, applies some model optimizations and generate TensorRT binaries to be run on compatible GPUs. Some model optimization techniques here include:

Weight and activation precision calibrations (quantization)
Operation (op) / layer fusions
Auto-tuning of kernels so only the best algorithms are selected to run on your specific target device (GPU)
Multi stream execution etc.

Cellulose currently integrates basic TensorRT features within our dashboard. These include:

Operator Annotations

Figure out TensorRT compatibility on a per operation / layer basis at a glance from the model visualizer.

Quantization Workflows (coming soon)

Quantize your model with TensorRT, understand your accuracy vs. system performance improvement tradeoffs with ease.

Operator Fusion (coming soon)

Understand which operator / layer fusions are automatically applied by TensorRT.

Have questions / need help?

Please reach out to support@cellulose.ai, and we’ll get back to you as soon as possible.

Runtime Support Overview Operator Annotations

Getting Started

Dashboard Features

PyTorch

Runtime Support

Organizations

Pricing

Release Notes

Roadmap

Operator Annotations

Quantization Workflows (coming soon)

Operator Fusion (coming soon)

Have questions / need help?

Getting Started

Dashboard Features

PyTorch

Runtime Support

Organizations

Pricing

Release Notes

Roadmap

Operator Annotations

Quantization Workflows (coming soon)

Operator Fusion (coming soon)

​Have questions / need help?

Have questions / need help?