Detecting Unattended Luggage in Trains: A Deep Learning Approach

In today's fast-paced world, the need for secure public transportation systems has never been more critical. As millions of passengers travel daily, ensuring the safety of these systems is paramount. One common challenge is detecting unattended luggage, which can be a security threat in crowded environments like trains. To address this, we developed a real-time unattended luggage detection system for train environments as part of our thesis in collaboration with Televic company

This blog post dives deep into the technologies, tools, and challenges encountered while building this system, with a focus on AI and deep learning techniques used to solve complex real-world problems.

Technologies and Tools Used

Programming Languages: Python
Machine Learning Libraries: PyTorch, YOLO, OC-SORT
Algorithms and Models: YOLOv8, OSNet for Re-Identification, Spatio-Temporal Detection
Techniques: Object Detection, Tracking, Re-Identification, Data Augmentation, Fine-Tuning, Spatio-Temporal Ownership Detection
Data Analysis Tools: NumPy, Pandas, OpenCV for video processing

Project Overview

The core objective of this project was to develop a robust AI-powered system capable of identifying passengers and their luggage, tracking both across multiple frames, and detecting abandoned luggage in real-time. Using a combination of object detection models, tracking algorithms, and spatio-temporal analysis, the system could detect ownership and abandonment scenarios, ensuring timely alerts for potential security risks.

Key System Components

Object Detection: YOLOv8 was chosen for real-time object detection due to its balance between speed and accuracy, particularly in crowded environments. The model was fine-tuned on a custom dataset comprising images from public datasets like COCO and manually labeled video footage from trains.

Object Tracking: OC-SORT, a state-of-the-art tracking algorithm, was implemented to track luggage and passengers across frames. It ensures that even when objects are occluded or move rapidly, the system maintains a consistent identification.

Re-Identification: OSNet was used to re-identify luggage and passengers when they reappeared in different frames. This step was crucial in ensuring that the system correctly matched luggage with the right passenger, even in the case of visual changes like changes in angle or lighting.

Data Preparation

One of the first challenges was assembling a comprehensive dataset that could capture the nuances of train environments. For this, we combined the publicly available COCO dataset with manually annotated footage provided by our industry partner, Televic.

COCO Subset: 12,479 images containing handbags, suitcases, and backpacks.
Manually Labeled: 5,658 images annotated from self-captured and provided videos.

Object Detection and Tracking Pipeline

The primary pipeline combined YOLOv8 for detection and OC-SORT for tracking. This enabled the system to track passengers and their luggage even in complex, crowded environments.

Application of ownership detection and abandonment detection algorithm

YOLOv8 Fine-Tuning Process

The YOLOv8 model was trained for 100 epochs using a mix of COCO and manually labeled data. Several loss functions were employed to optimize performance:

CIoU Loss: Used for bounding box regression, improving the localization of luggage and passengers.
DFL Loss: Optimized object classification, ensuring that the model effectively differentiated between various luggage types.
VFL Loss: Addressed imbalances in the data and improved classification precision.

Training Results:

Precision: Started at 0.41 and increased to 0.61 by epoch 91.
Recall: Improved from 0.36 to 0.47 by epoch 59.
Mean Average Precision (mAP): Grew from 0.31 to 0.48 by epoch 53.

Key Challenges

Occlusions and Overlapping Objects

One of the biggest hurdles was managing occlusions, where passengers or objects blocked the camera's view of luggage. This was particularly common in crowded train environments.

Solution: Fine-tuning YOLOv8 on augmented data that simulated occlusions improved detection accuracy. Additionally, the OC-SORT tracker maintained object identities even when occluded for short periods, allowing the system to "remember" objects as they reappeared.

Typical challenges in railway environment

Luggage Handovers and Appearance Changes

Passengers often exchanged items or changed their appearance (e.g., removing a jacket), which disrupted the consistency of tracking.

Solution: By using OSNet for Re-Identification, we ensured that the system could match passengers and luggage across frames even after a visual change. The system could differentiate between passengers based on their overall appearance, rather than just focusing on clothing.

Handling Complex Abandonment Scenarios

Detecting whether luggage was truly abandoned or simply left temporarily was challenging, particularly when passengers moved briefly away from their belongings.

Solution: A spatio-temporal algorithm tracked the distance between passengers and their luggage over time. If a passenger moved more than the threshold away from their luggage for over certain timeframe, it triggered an alert for abandonment. The distance threshold and time duration were calibrated based on real-world train footage.

Luggage Handovers and Appearance Changes

Passengers often exchanged items or changed their appearance (e.g., removing a jacket), which disrupted the consistency of tracking.

Handling Complex Abandonment Scenarios

Detecting whether luggage was truly abandoned or simply left temporarily was challenging, particularly when passengers moved briefly away from their belongings.

Results and Metrics

The system was tested on a dedicated test set, yielding the following metrics:

Class	Precision	Recall	mAP50	mAP50-95
Suitcase	0.62	0.70	0.70	0.55
Backpack	0.59	0.42	0.50	0.32
Handbag	0.47	0.44	0.46	0.31

Future Work and Improvements

While the system demonstrated strong performance, especially in detecting abandoned luggage, there are several areas for future improvement:

Enhanced Re-Identification: Retraining OSNet on a more diverse dataset could improve the system’s ability to differentiate between passengers and their luggage.
Increased Dataset Variety: Adding more real-world scenarios, such as nighttime footage or different camera angles, would help the system generalize better to new environments.
Gait Recognition: Incorporating Gait Energy Images (GEI) for passenger identification could enhance tracking performance.
Multiple Camera Feeds: Integrating additional camera views would eliminate blind spots, ensuring complete coverage of the train carriage and entrances.

Conclusion

This project demonstrates the power of combining state-of-the-art object detection, tracking, and re-identification techniques to solve real-world challenges in public transportation security. The system successfully identifies luggage ownership and abandonment in real time, even in complex, crowded environments.

With further fine-tuning and additional data, this approach can be scaled and adapted to other public spaces, providing an extra layer of security in areas prone to high traffic and potential threats.

Stay tuned for further updates on enhancing this system, and feel free to explore the codebase for implementation details on GitHub .

GitHub Repository