AP

HLDC-BailNLP: Hindi Legal Bail Prediction

PythonPyTorchTransformersFP16NLP
Domain
NLP / Legal AI / Machine Learning
Type
Research / Academic / Personal ML
Dataset
HLDC — 900K+ Hindi legal documents from UP district courts
Components
Data preprocessing, text classification, transformer fine-tuning, FP16 optimization, evaluation pipeline
Year
2025

83.68%

Accuracy

0.826

F1-Score

0.777

Eval Loss

77.08/s

Throughput

This project fine-tunes a transformer model on the HLDC (Hindi Legal Documents Corpus), a dataset of over 900K Hindi legal documents collected from Uttar Pradesh district courts. The task is binary bail prediction — classifying whether a bail application was granted or rejected — framed as a legal text classification problem over raw Hindi court orders.

The implementation improves on a prior baseline by switching to a stronger transformer architecture and applying FP16 mixed-precision training via PyTorch, which reduced training time and enabled faster iteration cycles. The pipeline covers data preprocessing for Hindi legal text, tokenization, sequence classification head training, learning rate scheduling, and evaluation.

Final evaluation results: 83.68% accuracy, 0.826 F1-score, 0.777 eval loss, and 77.08 samples/sec throughput. The project demonstrates that transformer models can learn meaningful legal reasoning signals from raw Hindi court text without manual feature engineering or domain-specific preprocessing.

Key Highlights

  • Fine-tuned a transformer-based NLP model on the HLDC Hindi Legal Documents Corpus for bail decision classification.
  • Improved a previous baseline using FP16 mixed-precision training and a stronger architecture for faster experimentation.
  • Achieved 83.68% accuracy, 0.826 F1-score, and 77.08 samples/sec evaluation throughput.