Inference

Memory-aware and context-aware multi-DNN inference on the edge

Masa is a memory-aware multi-DNN scheduling framework for edge devices that ensures low response times without modifying models. It leverages inter/intra-network dependencies and context to cut latency by up to 90% on low-memory devices.

Code Paper

Masa: Responsive multi-dnn inference on the edge

Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer.

Code Paper