Maziar Raissi put together a great paper on some open problems in applying deep learning:

There are times when not only we are looking for the most performant model but also we want the model to be as memory and compute efficient as possible. This is an important stepping stone towards democratizing artificial intelligence in anticipation of the future of Internet of Things where a lot of our devices (e.g., cellphones, cars, security cameras, refrigerators, air conditioners, etc.) will be intelligent. Such devices usually have smaller compute capabilities and memory capacity than our computers in data-centers or on the cloud. To make them intelligent we need to take their constraints into consideration.

There’s also a whole YouTube playlist on some of these techniques.

Based on personal experience, even “low hanging fruit” optimizations can halve your model’s total inference latency, potentially saving your GPU cloud costs too. Unfortunately, a lot of these deep learning techniques aren’t adopted today. They’re probably too cumbersome to add to an already complex AI workflow.

Cellulose wants to encourage machine learning engineers to adopt these but have the process to be as “one click” as possible. We’ll start by adding NVIDIA quatization support, then adding methods to specify calibration datasets. We’ll then look into other more advanced methods like compression / pruning and Neural Architecture Search (NAS) in the future - all configurable and tuneable from the dashboard.

Have questions / need help?

Please reach out to, and we’ll get back to you as soon as possible.