Fonada Voice Assistant
A complete voice assistant pipeline integrating:
- Custom ASR (Automatic Speech Recognition)
- Custom Turn detection with ReplyOnPause handler
- LLM for conversational responses
- Custom Fonada TTS for high-quality voice synthesis
Prerequisites
- Python 3.8+
- 4 CUDA-capable GPU
- 50 GB+ disk space
- Microphone and speakers
Setup
- Install the required dependencies: Install NeMo from github.
pip install -r requirements.txt
- Run LLM server
`
lmdeploy serve api_server hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --server-port 23333 --quant-policy 4
or
export CUDA_VISIBLE_DEVICES=2 lmdeploy serve api_server sarvamai/sarvam-m \ --server-port 8000 \ --tp 1 \ --backend turbomind \ --quant-policy 4 \ --cache-max-entry-count 0.9
3. Run TTS server from `models/` folder
export CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server tts_hindi --server-port 23334 --quant-policy 4
## Running the Voice Assistant
Run the assistant with:
```bash
export LD_LIBRARY_PATH=/workspace/TensorRT-10.10.0.31/lib:$LD_LIBRARY_PATH
export OPENAI_API_ASR_KEY=
export SARVAM_API_KEY=
export DEEPGRAM_API_KEY=
export OPENAI_API_LLM_KEY=
export GROQ_API_LLM_KEY=
python app.py
Change the path accordingly to your TensorRT path and API key. This will start a web server and open a browser interface where you can interact with the voice assistant.
Usage
- Click the microphone button to start speaking
- The assistant will automatically detect when you've finished speaking
- It will transcribe your speech, generate a response with LLama 3.2, and speak the response using Fonada TTS
- You can interrupt the assistant by speaking while it's responding
Customization
Voice Selection
To change the voice used by Fonada TTS, modify the options
dictionary in the text_to_speech_sync
method:
options = {"voice_id": "Ananya"} # Change to your preferred voice
Available voices: "Rahul", "Vikram", "Arjun", "Dev", "Sanjay", "Jaya", "Meera", "Priya", "Ananya", "Divya"
System Prompt
To change how the LLM responds, customize the system prompt when initializing the VoiceAssistant
:
assistant = VoiceAssistant(
llm_model_path=llm_model_path,
tts_model_path=tts_model_path,
system_prompt="You are a helpful voice assistant. Keep your responses short and friendly."
)
Turn Detection Sensitivity
Adjust the turn detection parameters in the create_voice_assistant_stream()
function to change how the assistant detects when you've finished speaking:
algo_options=AlgoOptions(
audio_chunk_duration=0.5, # Duration of audio chunks
started_talking_threshold=0.2, # Threshold to detect start of speech
speech_threshold=0.1 # General speech detection threshold
)
Integration with FastAPI
To integrate the voice assistant with a FastAPI app:
from fastapi import FastAPI
from voice_assistant.app import create_voice_assistant_stream
app = FastAPI()
stream = create_voice_assistant_stream()
stream.mount(app)
Troubleshooting
Issue: Models fail to load Solution: Verify the correct paths to your model files and ensure they're accessible.
Issue: Speech recognition is inaccurate Solution: Try speaking clearly and ensure your microphone is properly configured.
Issue: High latency in responses Solution: Consider using a more powerful GPU or reducing the model parameters.
License
This project uses the same license as the Fonada TTS system.
Voice Assistant Monitoring
This document describes how to set up monitoring for the Voice Assistant application. There are two options available:
Option 1: Streamlit Dashboard (Lightweight)
A lightweight, real-time monitoring dashboard built with Streamlit.
Installation
Install required packages:
pip install streamlit pandas plotly
Run the monitoring dashboard:
streamlit run monitor.py
The dashboard will be available at http://localhost:8501
and includes:
- Real-time log viewing
- Request timeline visualization
- Log level distribution
- Filtering by request ID and log level
- Auto-refresh functionality
Option 2: Graylog (Enterprise-grade)
A more comprehensive logging and monitoring solution.
Installation
Install Graylog prerequisites (MongoDB and Elasticsearch):
sudo apt-get install mongodb-org elasticsearch
Download and install Graylog:
wget https://packages.graylog2.org/repo/packages/graylog-4.0-repository_latest.deb sudo dpkg -i graylog-4.0-repository_latest.deb sudo apt-get update sudo apt-get install graylog-server
Features
Streamlit Dashboard
- Real-time log viewing
- Interactive visualizations
- Request timeline
- Log level distribution
- Filter by request ID and log level
- Auto-refresh capability
- Lightweight and easy to set up
Graylog
- Enterprise-grade log management
- Advanced search capabilities
- Custom dashboards
- Alerts and notifications
- Log retention policies
- Role-based access control
Usage
Start your voice assistant application:
python app.py
Choose your preferred monitoring solution:
For Streamlit dashboard:
streamlit run monitor.py
For Graylog:
- Access the Graylog web interface at
http://your-server:9000
- Default credentials: admin/admin (change on first login)
Monitoring Metrics
The monitoring solutions track:
- Total number of requests
- Active requests (last 5 minutes)
- Error rates
- Log levels distribution
- Request timelines
- Detailed log messages
Troubleshooting
If you encounter issues:
- Streamlit Dashboard:
- Ensure the log file exists and is readable
- Check if required packages are installed
Verify the correct Python version
Graylog:
- Verify MongoDB and Elasticsearch are running
- Check Graylog service status
- Review system logs for errors