Srota AI: A Voice-Controlled Intelligent Web Automation Platform Using Large Language Models and Multi-Agent Orchestration

Mr.Shuvendu Samal; Mr.Biswajit Swain; Prof Biswajit Sahoo

doi:10.64751/vyfrez55

Authors

Mr.Shuvendu Samal Author
Mr.Biswajit Swain Author
Prof Biswajit Sahoo Author

DOI:

https://doi.org/10.64751/vyfrez55

Abstract

The rapid proliferation of complex, multi-step webbased applications across government, healthcare, education, and enterprise domains has introduced significant usability barriers for diverse user populations. Traditional web automation tools rely on brittle, static rule-based scripting that is inaccessible to non-technical users and fails to adapt to dynamic web interfaces. This paper presents Srota AI, a voice-controlled intelligent web automation platform that leverages large language models (LLMs), multi-agent orchestration via LangGraph, and real-time voice transcription through the Deepgram API to enable users to complete complex digital workflows using natural language or spoken commands. The system architecture integrates a Planner Agent for goal decomposition, a Navigator Agent for semantic DOM interaction, a Voice Assistant Module for hands-free operation, and a Semantic DOM Processing Module for contextaware UI navigation. The backend, developed in Python with FastAPI, supports asynchronous concurrent session management and integrates with Google Gemini for AI reasoning. A plug-andplay React SDK enables seamless embedding into third-party web applications. Evaluation results demonstrate significant reductions in task completion time and error rates across real-world automation scenarios, including government form submission, healthcare appointment scheduling, and enterprise onboarding. The system establishes a replicable, extensible framework for applying agentic AI and voice recognition to accessible, intelligent web automation.