Fonada Voice Assistant

A complete voice assistant pipeline integrating:

Custom ASR (Automatic Speech Recognition)
Custom Turn detection with ReplyOnPause handler
LLM for conversational responses
Custom Fonada TTS for high-quality voice synthesis

Prerequisites

Python 3.8+
4 CUDA-capable GPU
50 GB+ disk space
Microphone and speakers

Setup

Install the required dependencies: Install NeMo from github.

pip install -r requirements.txt

Run LLM server `

lmdeploy serve api_server hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --server-port 23333 --quant-policy 4

or

export CUDA_VISIBLE_DEVICES=2 lmdeploy serve api_server sarvamai/sarvam-m \ --server-port 8000 \ --tp 1 \ --backend turbomind \ --quant-policy 4 \ --cache-max-entry-count 0.9


3. Run TTS server from `models/` folder

export CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server tts_hindi --server-port 23334 --quant-policy 4

## Running the Voice Assistant

Run the assistant with:

```bash
export LD_LIBRARY_PATH=/workspace/TensorRT-10.10.0.31/lib:$LD_LIBRARY_PATH
export OPENAI_API_ASR_KEY=
export SARVAM_API_KEY=
export DEEPGRAM_API_KEY=
export OPENAI_API_LLM_KEY=
export GROQ_API_LLM_KEY=
python app.py

Change the path accordingly to your TensorRT path and API key. This will start a web server and open a browser interface where you can interact with the voice assistant.

Usage

Click the microphone button to start speaking
The assistant will automatically detect when you've finished speaking
It will transcribe your speech, generate a response with LLama 3.2, and speak the response using Fonada TTS
You can interrupt the assistant by speaking while it's responding

Customization

Voice Selection

To change the voice used by Fonada TTS, modify the options dictionary in the text_to_speech_sync method:

options = {"voice_id": "Ananya"}  # Change to your preferred voice

Available voices: "Rahul", "Vikram", "Arjun", "Dev", "Sanjay", "Jaya", "Meera", "Priya", "Ananya", "Divya"

System Prompt

To change how the LLM responds, customize the system prompt when initializing the VoiceAssistant:

assistant = VoiceAssistant(
    llm_model_path=llm_model_path,
    tts_model_path=tts_model_path,
    system_prompt="You are a helpful voice assistant. Keep your responses short and friendly."
)

Turn Detection Sensitivity

Adjust the turn detection parameters in the create_voice_assistant_stream() function to change how the assistant detects when you've finished speaking:

algo_options=AlgoOptions(
    audio_chunk_duration=0.5,  # Duration of audio chunks
    started_talking_threshold=0.2,  # Threshold to detect start of speech
    speech_threshold=0.1  # General speech detection threshold
)

Integration with FastAPI

To integrate the voice assistant with a FastAPI app:

from fastapi import FastAPI
from voice_assistant.app import create_voice_assistant_stream

app = FastAPI()
stream = create_voice_assistant_stream()
stream.mount(app)

Troubleshooting

Issue: Models fail to load Solution: Verify the correct paths to your model files and ensure they're accessible.

Issue: Speech recognition is inaccurate Solution: Try speaking clearly and ensure your microphone is properly configured.

Issue: High latency in responses Solution: Consider using a more powerful GPU or reducing the model parameters.

License

This project uses the same license as the Fonada TTS system.

Voice Assistant Monitoring

This document describes how to set up monitoring for the Voice Assistant application. There are two options available:

Option 1: Streamlit Dashboard (Lightweight)

A lightweight, real-time monitoring dashboard built with Streamlit.

Installation

Install required packages:
```
pip install streamlit pandas plotly
```
Run the monitoring dashboard:
```
streamlit run monitor.py
```

The dashboard will be available at http://localhost:8501 and includes:

Real-time log viewing
Request timeline visualization
Log level distribution
Filtering by request ID and log level
Auto-refresh functionality

Option 2: Graylog (Enterprise-grade)

A more comprehensive logging and monitoring solution.

Installation

Install Graylog prerequisites (MongoDB and Elasticsearch):
```
sudo apt-get install mongodb-org elasticsearch
```

Download and install Graylog:

wget https://packages.graylog2.org/repo/packages/graylog-4.0-repository_latest.deb
sudo dpkg -i graylog-4.0-repository_latest.deb
sudo apt-get update
sudo apt-get install graylog-server

Features

Streamlit Dashboard

Real-time log viewing
Interactive visualizations
Request timeline
Log level distribution
Filter by request ID and log level
Auto-refresh capability
Lightweight and easy to set up

Graylog

Enterprise-grade log management
Advanced search capabilities
Custom dashboards
Alerts and notifications
Log retention policies
Role-based access control

Usage

Start your voice assistant application:
```
python app.py
```
Choose your preferred monitoring solution:

For Streamlit dashboard:

streamlit run monitor.py

For Graylog:

Access the Graylog web interface at http://your-server:9000
Default credentials: admin/admin (change on first login)

Monitoring Metrics

The monitoring solutions track:

Total number of requests
Active requests (last 5 minutes)
Error rates
Log levels distribution
Request timelines
Detailed log messages

Troubleshooting

If you encounter issues:

Streamlit Dashboard:
Ensure the log file exists and is readable
Check if required packages are installed
Verify the correct Python version
Graylog:
Verify MongoDB and Elasticsearch are running
Check Graylog service status
Review system logs for errors

Package detail

voice-assistant-widget

readme

Fonada Voice Assistant

Prerequisites

Setup

Usage

Customization

Voice Selection

System Prompt

Turn Detection Sensitivity

Integration with FastAPI

Troubleshooting

License

Voice Assistant Monitoring

Option 1: Streamlit Dashboard (Lightweight)

Installation

Option 2: Graylog (Enterprise-grade)

Installation

Features

Streamlit Dashboard

Graylog

Usage

Monitoring Metrics

Troubleshooting