LOW-LATENCY DEEP NEURAL NETWORK COMPRESSION FOR REAL-TIME IOT APPLICATIONS
Keywords:
DNN Compression, IoT, Low-Latency Computing, Model Pruning, Quantization, Edge AI, RealTime Processing.Abstract
The rapid growth of Internet of Things (IoT) systems has emphasized the need for efficient deep neural network (DNN) processing under stringent latency and resource constraints. Conventional DNNs require significant computational power, making them unsuitable for real-time IoT deployments with limited memory, bandwidth, and processing capability. This paper proposes a lowlatency DNN compression framework that combines structured pruning, quantization-aware training, and lightweight model reparameterization. The proposed method reduces computational complexity while maintaining competitive accuracy, enabling faster inference on edge IoT devices. Experimental evaluations demonstrate up to 62% reduction in model size and 48% improvement in inference speed. The approach provides a scalable and energy-efficient solution for real-time IoT applications.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.






