LOW-LATENCY DEEP NEURAL NETWORK COMPRESSION FOR REAL-TIME IOT APPLICATIONS

Authors

  • Sakshi Author

Keywords:

DNN Compression, Low Latency, IoT, Edge Computing, Quantization, Structured Pruning, RealTime Inference.

Abstract

Real-time IoT applications demand rapid inference from deep neural networks (DNNs), yet conventional models are too computationally heavy for resource-constrained edge devices. This paper presents a low-latency neural network compression framework that integrates structured pruning, quantization-aware training, and lightweight reparameterization to significantly reduce model size and execution time. The approach minimizes latency while preserving accuracy, enabling on-device intelligence without reliance on cloud services. Experimental evaluation on multiple IoT platforms demonstrates up to 55% reduction in inference time, 60% reduction in memory usage, and minimal accuracy drop. The proposed framework ensures efficient deployment of deep learning models in latency-critical IoT scenarios such as anomaly detection, sensing, and autonomous monitoring.

Downloads

Published

2023-03-22

How to Cite

Sakshi. (2023). LOW-LATENCY DEEP NEURAL NETWORK COMPRESSION FOR REAL-TIME IOT APPLICATIONS. International Journal of Economic Social Science and Management LAW, 4(1), 13-16. https://ijeml.com/journal/index.php/ijeml/article/view/36

Similar Articles

11-20 of 73

You may also start an advanced similarity search for this article.