HLDC-BailNLP: Hindi Legal Bail Prediction

Coming Soon GitHub

PythonPyTorchTransformersFP16NLP

Domain: NLP / Legal AI / Machine Learning
Type: Research / Academic / Personal ML
Dataset: HLDC — 900K+ Hindi legal documents from UP district courts
Components: Data preprocessing, text classification, transformer fine-tuning, FP16 optimization, evaluation pipeline
Year: 2025

83.68%

Accuracy

0.826

F1-Score

0.777

Eval Loss

77.08/s

Throughput

This project fine-tunes a transformer model on the HLDC (Hindi Legal Documents Corpus), a dataset of over 900K Hindi legal documents collected from Uttar Pradesh district courts. The task is binary bail prediction — classifying whether a bail application was granted or rejected — framed as a legal text classification problem over raw Hindi court orders.

The implementation improves on a prior baseline by switching to a stronger transformer architecture and applying FP16 mixed-precision training via PyTorch, which reduced training time and enabled faster iteration cycles. The pipeline covers data preprocessing for Hindi legal text, tokenization, sequence classification head training, learning rate scheduling, and evaluation.

Final evaluation results: 83.68% accuracy, 0.826 F1-score, 0.777 eval loss, and 77.08 samples/sec throughput. The project demonstrates that transformer models can learn meaningful legal reasoning signals from raw Hindi court text without manual feature engineering or domain-specific preprocessing.

Key Highlights

Fine-tuned a transformer-based NLP model on the HLDC Hindi Legal Documents Corpus for bail decision classification.
Improved a previous baseline using FP16 mixed-precision training and a stronger architecture for faster experimentation.
Achieved 83.68% accuracy, 0.826 F1-score, and 77.08 samples/sec evaluation throughput.