Project Page

AI Sign Language Translation for Brazilian Healthcare - Omdena Project

Introduction

This project was done as part of an Omdena local chapter challenge with the São Paulo chapter. These challenges are open source collaborations between volunteer developers around the world, working to create AI applications that solve social issues for local communities.

Brazil has a significant deaf community, with Brazilian Sign Language (Libras) recognized as an official language. However, communication barriers often exist in healthcare settings, potentially compromising the quality of care for deaf patients. This potentially lead to misdiagnoses, treatment errors, or a compromised patient experience.

This project aimed to address this by developing a sign language translation model for Brazilian Sign Language (LIBRAS), and a demo web app to showcase the model.

This page contains an overview of the project work, and links to the other deliverables. You can navigate to the various sections of this “Project Page”, and the more detailed “Report Page” using the side bar on the left.

Other pages

The deliverables for this project consisted of a live demo application, the open source code, and a report of our work and results.

Demo Application: You can see the model in action and test it on your own videos in the Demo App.

Open Source Code: You can find the project code and instructions for setup/usage in our Github Repo

Detailed Report: This page has an overview of the project. For more details, jump to the Report Page

Domain - Sign Language Processing (SLP)

What is SLP?

Sign Language Processing (SLP) is an AI field that combines Natural Language Processing and Computer Vision to process and analyze sign language content.

Unlike most spoken languages sign languages lack a standard written representation, and they use a combination of hand movements, facial expressions, and body gestures to communicate meaning. These characteristics, and other details about the lingusitics of sign languages make SLP a very interesting, but also challenging field.

There are a variety of SLP tasks, like sign language recognition, sign language translation, sign language production, sign language detection, and sign language segmentation. In this project we focused on sign language recognition.

Approaches to SLP

The best approach for modelling sign language depends on the specific task, and has evolved over time as the field advanced, and incorporated the latest advances in Natural Language Processing and Computer Vision.

In this project, our approach was pose estimation for feature extraction, modelled with RNN, LSTM, and Transformer models.

Project Summary

Goal

To address the problem of communication barriers in healthcare settings for the deaf community in Brazil, we would develop a sign language translation model for Brazilian Sign Language (LIBRAS). The deliverable will be a report of our work and results, and a demo web app to showcase the model & make it accessible to the public.

Results

We used a combination of 4 public LIBRAS datasets to create a target dataset of 25 words related to the healthcare domain, with 6 videos each.

We developed a detailed preprocessing pipeline to standardize the conditions between each data source, and a training pipeline that allowed us to experiment with different feature engineering, data augmentation, model architectures and training strategies.

With 6 videos per word, for each word we used 4 videos for training, 1 for validation, and 1 for testing. We had an even distribution of data sources in each split to avoid bias.

Details about the best performing model:

LSTM architecture:
- 2 layers
- 256 hidden units
20 frames per sample:
- Randomly sampled from the preprocessed frame series
189 input features per frame:
- Dropped the 150 landmark position coordinates
- Engineered 181 features capturing distances & angles between landmarks, and movements across consecutive frames
- Additional 8 features representing various metadata
Data augmentation:
- Rotation (±10 degrees, applied to an entire series)
- Gaussian Noise on Landmark coordinates (0.05 std, applied to each landmark on each frame)

Best model performance:

Metric	Value
Accuracy	66.37%
Top-2 Accuracy	77.88%
Top-3 Accuracy	84.07%
Top-4 Accuracy	89.38%
Top-5 Accuracy	93.81%
Loss	2.66401

The training loss and validation loss for this model were 0.02336 and 0.00660 respectively. These are significantly lower than the test loss, indicating overfitting.

We took measures to prevent overfitting: like randomly sampling frames, applying data augmentation, and stratifying the data sources in each split. The data augmentation is the reason the training loss is higher than the validation loss. However the model still overfits in the end.

We expect that the small dataset size is a large reason the model overfits. To remedy this, a larger dataset would of course help, but in lieu of that, we could apply more aggresive data augmentation to help the model generalize better to new features in unseen data.

Contributors

The main work for this project took place over 4 months, from February to June 2025. Below is a list of the people that contributed to the project. Feel free to reach out to them if you have questions about any aspect of the project. Some members have also made additional changes & improvements since the end of the main project period.

Project Leader

Ben Thompson

Tasks: Research Resources, Data Scraping, Data Preprocessing, Model Development, Demo App Development
Omdena Role: Project Leader & Lead ML Engineer

Task Leaders

Ayushya Pare

Tasks: Research Resources, Data Scraping
Omdena Role: Lead ML Engineer

Gustavo Santos

Tasks: Data Scraping, Data Cleaning & Organisation
Omdena Role: Lead ML Engineer

Anastasiia Derzhanskaia

Tasks: Model Development
Omdena Role: Lead ML Engineer

Patrick Fitz

Tasks: Demo App Development
Omdena Role: Lead ML Engineer

Task Contributors

Michael Spurgeon Jr

Tasks: Data Scraping, Model Development
Omdena Role: ML Engineer

Pooja Prasad

Tasks: Data Review & Cleaning
Omdena Role: Junior ML Engineer

Ethel Phiri

Tasks: Data Scraping
Omdena Role: Junior ML Engineer

Damla Helvaci

Tasks: Data Review & Cleaning
Omdena Role: Junior ML Engineer

Wafa Basoodan

Tasks: Model Development
Omdena Role: ML Engineer

Kunal Sood

Tasks: Model Development
Omdena Role: Junior ML Engineer

Guluzar GB

Tasks: Data Review & Cleaning
Omdena Role: Junior ML Engineer

Project Tasks

The project work was divided into separate tasks, with some of them able to be done in parallel. A task leader was assigned to each task, who would lead the work, coordinate with task contributors, and align with the project leader weekly about progress and direction.

1. Research & Planning

To decide on the scope of the project and our plan for development, we conducted research and shared findings with each other. We investigated the SLP domain generally, the datasets available for LIBRAS, and the existing literature about SLP for LIBRAS.

2. Data Collection

We used Selenium and BeautifulSoup to scrape videos and metadata from 4 different public sources to get a large dataset of ~8500 videos corresponding to ~2100 labels. After cleaning and excluding labels with less than 5 videos, we had 170 labels that were candidates for our target dataset. After reviewing the videos, and removing some labels that were not relevant to the healthcare domain, we had our target dataset of 25 words, with 6 videos per word.

GIF showing all sign videos in the dataset for the word 'banana'

3. Data Preprocessing

The videos from each data source had a lot of variety in their format (framerate, dimensions, etc.), and the conditions of the recorded signs (position of the signer, lighting, scale, etc.). We used Google’s open source model MediaPipe Holistic to get pose estimation landmarks.

GIF showing raw pose estimation landmarks for the sign 'casa'

We used developed a detailed preprocessing pipeline with OpenCV and NumPy to standardize the conditions between each data source. The pipeline would trim the series to the start and end of the sign, adjust signers to the same scale, align the signers to the same central position, and apply interpolation to fill in frames where landmark detection failed.

GIF showing pose estimation landmarks before and after preprocessing for the sign 'casa'

4. Model Development

We used PyTorch to code the models (RNN, LSTM & Transformer) and training pipeline. The training pipeline was engineered to be platform agnostic, and resumable since we were collaborating internationally. We used Hydra & Tensorboard for managing and tracking model experiments. We used Google Colab for collaboratively training and testing models on both GPU and CPU

5. Report

The first deliverable for this project was a report of our work and results. We used Seaborn and Matplotlib to create all the plots, visualizations, and gifs in the report. We used Jekyll and Github Pages to host the report as a static website.

6. Demo App

The second deliverable for this project was a demo application. Users can select from sample videos, or upload their own videos, and see the model’s prediction & pose estimation result. We developed the Backend with Python & FastAPI, and deployed it with Hugging Face Spaces. We developed the Frontend with Next.js & React, and deployed it with Vercel.

Project Page

Project Page

AI Sign Language Translation for Brazilian Healthcare - Omdena Project

Introduction

Other pages

Domain - Sign Language Processing (SLP)

What is SLP?

Approaches to SLP

Recommended Reads

Project Summary

Goal

Results

Details about the best performing model:

Best model performance:

Contributors

Project Leader

Task Leaders

Task Contributors

Project Tasks

1. Research & Planning

2. Data Collection

3. Data Preprocessing

4. Model Development

5. Report

6. Demo App

results matching ""

No results matching ""