Layer size distribution

Memory-aware and context-aware multi-DNN inference on the edge

Masa is a memory-aware multi-DNN scheduling framework for edge devices that ensures low response times without modifying models. It leverages inter/intra-network dependencies and context to cut latency by up to 90% on low-memory devices.

Pervasive and Mobile Computing · July 2022 · Bart Cox,  Robert Birke,  Lydia Y Chen