Quantization Workflows (coming soon)

Have questions / need help?

TensorRT supports quantized floating point, where floating-point values are linearly compressed and rounded to 8-bit integers. This significantly increases arithmetic throughput while reducing storage requirements and memory bandwidth. When quantizing a floating-point tensor, TensorRT must know its dynamic range - that is, what range of values is important to represent - values outside this range are clamped when quantizing.

Dynamic range information can be calculated by the builder (this is called calibration) based on representative input data. Or you can perform quantization-aware training in a framework and import the model to TensorRT with the necessary dynamic range information.

from NVIDIA’s documentation

We have plans to support NVIDIA TensorRT quantization workflows, and that means a Cellulose SDK that allows you to define what your calibration dataset will be like if you want to deploy in lower precision formats such as INT8. Read more about NVIDIA TensorRT quantization workflows here

Have questions / need help?

Please reach out to support@cellulose.ai, and we’ll get back to you as soon as possible.

Operator Annotations Operator Fusion (coming soon)

⌘I

Getting Started

Dashboard Features

PyTorch

Runtime Support

Organizations

Pricing

Release Notes

Roadmap

Quantization Workflows (coming soon)

Have questions / need help?

Getting Started

Dashboard Features

PyTorch

Runtime Support

Organizations

Pricing

Release Notes

Roadmap

​Have questions / need help?

Have questions / need help?