LOW-LATENCY DEEP NEURAL NETWORK COMPRESSION FOR REAL-TIME IOT APPLICATIONS

Sakshi

LOW-LATENCY DEEP NEURAL NETWORK COMPRESSION FOR REAL-TIME IOT APPLICATIONS

Authors

Sakshi Author

Keywords:

DNN Compression, Low Latency, IoT, Edge Computing, Quantization, Structured Pruning, RealTime Inference.

Abstract

Real-time IoT applications demand rapid inference from deep neural networks (DNNs), yet conventional models are too computationally heavy for resource-constrained edge devices. This paper presents a low-latency neural network compression framework that integrates structured pruning, quantization-aware training, and lightweight reparameterization to significantly reduce model size and execution time. The approach minimizes latency while preserving accuracy, enabling on-device intelligence without reliance on cloud services. Experimental evaluation on multiple IoT platforms demonstrates up to 55% reduction in inference time, 60% reduction in memory usage, and minimal accuracy drop. The proposed framework ensures efficient deployment of deep learning models in latency-critical IoT scenarios such as anomaly detection, sensing, and autonomous monitoring.

Downloads

Published

2023-03-22

Issue

Vol. 4 No. 1 (2023): Volume 4, Issue 1, 2023

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

Sakshi. (2023). LOW-LATENCY DEEP NEURAL NETWORK COMPRESSION FOR REAL-TIME IOT APPLICATIONS. International Journal of Economic Social Science and Management LAW, 4(1), 13-16. https://ijeml.com/journal/index.php/ijeml/article/view/36