Sign Language Recognition

05 Jun 2025 Categorized under My Projects / Deep Learning tagged Lstm 06-06-2025

A real-time word level sign language recognition system built with Flask, MediaPipe, and deep learning.

GitHub
📸 Demo
▶️ Inference
Dataset
Features
🔬 Features in Detail
🚀 How It Works
🔑 Tips for Better Recognition
🧠 Model Architecture
Tech Stack
🔧 System Requirements
🛠️ Installation
📁 Project Structure
Database Schema
API Endpoints
- Recognition
- Dataset Management
⚠️ Troubleshooting
Acknowledgments

A real-time sign language recognition system built with Flask, MediaPipe, and deep learning. The system can recognize word-level sign language gestures in real-time through a web interface, as well as manage a dataset of sign language videos for training.

GitHub

github.com/metehanozdeniz/sign-language-recognition

📸 Demo

demo-gif

Real-time webcam feed with live sign prediction.

▶️ Inference

python real_time_inference.py

You should edit it according to the camera id on your device. cam_id = your_cam_id

Dataset

Kaggle

Features

Real-time Recognition: Live sign language gesture recognition through webcam
Modern Web Interface: Clean, responsive dashboard with real-time predictions
Dataset Management: Tools for recording, importing, and managing sign language videos
MediaPipe: MediaPipe landmark extraction for hands and pose
Video Processing: Automatic landmark extraction and feature generation
Training Pipeline: Complete pipeline for training sign language recognition models
LSTM-based prediction: Using sliding windows

🔬 Features in Detail

Real-time Recognition

Uses MediaPipe for hand and pose landmark detection
LSTM model for sequence-based gesture recognition
Live video streaming with real-time predictions
Top-3 prediction display with confidence scores

Dataset Management

Record videos directly through the web interface
Import existing videos with custom labels
Automatic landmark extraction and feature generation
Mirror augmentation support
Video segmentation for precise gesture isolation

Training Pipeline

Automated feature extraction from video segments
Standardized preprocessing pipeline
LSTM-based deep learning model
Performance visualization and evaluation tools

🚀 How It Works

🔹 Step 1: Capture Input

Uses webcam feed via OpenCV.
MediaPipe detects:
- Pose landmarks: shoulders, elbows, wrists (6 points)
- Hand landmarks: only fingertips (5 per hand)

🔹 Step 2: Extract Features

For each 0.5s window (~7–8 frames):

Calculates mean and std of X/Y/Z positions for:
- Left & right hand fingertips
- Upper body joints

These features are stacked across 5 consecutive windows to form an LSTM input.

🔹 Step 3: Predict Word

Preprocessed input is passed to an LSTM model.
Top-3 predictions are shown on screen in real-time.

🔑 Tips for Better Recognition

Lighting: Ensure good lighting conditions
Background: Use a plain background for better landmark detection
Distance: Stay at an appropriate distance from the camera
Movement: Perform signs clearly and at a moderate speed
Framing: Keep your hands and upper body visible in the frame

🧠 Model Architecture

The script uses:

app/model/sign_language_recognition.keras: LSTM model trained on sign language data
- Input shape: (5, N) where 5 = time steps, N = number of extracted features (e.g., 18–36)
- Architecture: LSTM + Dense layers
- Output: Softmax over sign vocabulary
app/model/scaler.pkl: StandardScaler for feature normalization
app/model/label_encoder.pkl: LabelEncoder for class labels
app/model/feature_order.json: Ensures features are in the correct order

Tech Stack

Backend: Flask, Celery, Redis
Frontend: Bootstrap 5, JavaScript
Computer Vision: OpenCV, MediaPipe
Machine Learning: TensorFlow, scikit-learn
Database: SQLite with SQLAlchemy

🔧 System Requirements

Python 3.10.12
Redis Server
Webcam for real-time recognition
Modern web browser with JavaScript enabled

🛠️ Installation

Clone the repository:

git clone https://github.com/metehanozdeniz/sign-language-recognition.git
cd sign-language-recognition

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

📦 Install dependencies:
```
pip install -r requirements.txt
```
Start Redis server:
```
redis-server
```
Run the application:

#Terminal 1: Start Celery worker
celery -A celery_worker.celery worker --loglevel=info

#Terminal 2: Start Flask application
python run.py

The application will be available at http://localhost:5000

📁 Project Structure

sign-language-recognition/
├── app/
│   ├── model/                   # Trained models and preprocessing files
│   ├── static/                  # Static files (JS, CSS, images)
│   ├── templates/               # HTML templates
│   ├── utils/                   # Utility modules
│   ├── videos/                  # Stored video files
│   ├── __init__.py              # App initialization
│   ├── config.py                # Configuration settings
│   ├── models.py                # Database models
│   ├── routes.py                # Route handlers
│   └── tasks.py                 # Celery tasks
├── venv/                        # Virtual environment
├── celery_worker.py             # Celery worker configuration
├── requirements.txt             # Project dependencies
├── run.py                       # Application entry point
└── train.ipynb                  # Model training script

Database Schema

Video

Stores video metadata and file paths
Tracks video duration and creation time
Links to landmarks and features

FrameLandmark

Stores extracted MediaPipe landmarks
Supports both hand and pose landmarks
Maintains frame-level temporal information

VideoFeature

Stores processed features for model training
Supports windowed feature extraction
Links features to source videos

API Endpoints

Recognition

/video_feed - Live video stream
/current_predictions - Real-time prediction results
/toggle_recognition - Start/stop recognition
/recognition_status - Current recognition state

Dataset Management

/record - Video recording interface
/import - Video import interface
/gallery - Video gallery and management
/process_video - Landmark extraction endpoint
/task_status/<task_id> - Processing status endpoint

⚠️ Troubleshooting

If you encounter issues:

Check that your webcam is working and accessible
Verify all model files are present in app/model/
Ensure MediaPipe can detect your hands and pose
Try adjusting the detection confidence thresholds in the code

Acknowledgments

MediaPipe for providing the pose and hand landmark detection models
TensorFlow and Keras for the deep learning framework
Flask team for the excellent web framework

Sign Language Recognition

GitHub

📸 Demo

▶️ Inference

Dataset

Features

🔬 Features in Detail

Real-time Recognition

Dataset Management

Training Pipeline

🚀 How It Works

🔹 Step 1: Capture Input

🔹 Step 2: Extract Features

🔹 Step 3: Predict Word

🔑 Tips for Better Recognition

🧠 Model Architecture

Tech Stack

🔧 System Requirements

🛠️ Installation

📁 Project Structure

Database Schema

Video

FrameLandmark

VideoFeature

API Endpoints

Recognition

Dataset Management

⚠️ Troubleshooting

Acknowledgments

Metehan Özdeniz

Error

GitHub

📸 Demo

▶️ Inference

Dataset

Features

🔬 Features in Detail

Real-time Recognition

Dataset Management

Training Pipeline

🚀 How It Works

🔹 Step 1: Capture Input

🔹 Step 2: Extract Features

🔹 Step 3: Predict Word

🔑 Tips for Better Recognition

🧠 Model Architecture

Tech Stack

🔧 System Requirements

🛠️ Installation

📁 Project Structure

Database Schema

Video

FrameLandmark

VideoFeature

API Endpoints

Recognition

Dataset Management

⚠️ Troubleshooting

Acknowledgments

Templates (for web app):

Error