Home Experience Projects Education Skills Publications Leadership Contact

Warehouse Image Similarity Detection

Developed a computer vision application to identify warehouse items based on image similarity, addressing challenges with lost or unlabeled products. The solution increased the speed of processing of such tickets 10-fold for warehouse staff at NCOC company .

Leveraging advanced AI techniques, the solution enhances inventory management by quickly matching unidentified items with existing records.

Similarity detection in Warehouse

Technologies and Tools

  • Programming Languages: Python
  • Machine Learning Libraries: PyTorch, Hugging Face Transformers
  • Algorithms and Models: ResNet-50, CLIP, DINOv2, GPT4-mini
  • Techniques: Computer Vision, NLP, Image Similarity Detection, Data Augmentation, Synthetic Data Generation, Fine-tuning Pre-trained Models, Triplet Loss
  • Data Analysis Tools: NumPy, Pandas

Background

In warehouse operations, products often lose their labels or identification during handling, leading to significant time spent on relabeling and tracking. A solution was needed to automate the identification process of returned or unlabeled materials to improve efficiency.

Methodology

The project involved creating a computer vision system capable of identifying objects based on image similarity. Key steps included:

  • Data Collection and Augmentation: Started with a small dataset of 558 images. Enhanced the dataset through data augmentation techniques such as rotation, flipping, brightness and contrast adjustments, zooming, and random distortion, achieving an 11-fold increase in data.
  • Synthetic Data Generation: Generated synthetic images to diversify the dataset, addressing challenges with limited and industrial-specific data.
  • Model Selection and Fine-tuning: Employed pre-trained models like ResNet-50, CLIP, and DINOv2. Fine-tuned these models using techniques like triplet loss to optimize the embedding space for similarity detection.
  • Feature Extraction: Modified model architectures to extract rich feature embeddings, enabling more accurate similarity comparisons.
  • Caption generation: Captions for images were generated from large SAP output files with long descriptions using the GPT API. This involved modifying model architectures to extract rich feature embeddings, enabling more accurate similarity comparisons.
  • Similarity Matching: Implemented cosine similarity measures to compare query images with a gallery of known product images, retrieving the top K matches.
  • Project Pipeline: Involved data collection, preprocessing, model training, and inference. The pipeline ensured smooth and efficient execution of the entire image similarity detection process.
  • Main Project Pipeline

    Main Project Pipeline

Challenges and Solutions

Limited Dataset: The initial dataset was small and contained industrial images that were difficult to interpret. Addressed this by extensive data augmentation and synthetic data generation to increase the dataset's size and diversity.

Image Quality Issues: Dealt with poor lighting, obstructed views, and multiple items in one image. Improved data preprocessing and implemented advanced image processing techniques to enhance image quality.

Model Adaptation: Standard pre-trained models were not sufficient for the industrial context. Fine-tuned models specifically for the warehouse environment, incorporating domain-specific knowledge.

Results

The fine-tuned CLIP model with GPT captions achieved the best performance, reaching a Top-5 accuracy of 77.6% and a Top-10 accuracy of 82.4%. Key findings include:

  • Fine-tuning significantly improved model performance over base pre-trained models.
  • Models that integrated textual captions (e.g., CLIP with GPT-generated captions) outperformed those that relied solely on image data.
  • Data quality and quantity were critical factors in achieving high accuracy.
Model Top-5 Accuracy Top-10 Accuracy
ResNet Base 50.0% 54.0%
ResNet Fine-Tuned 47.0% 53.0%
DINOv2 Base 64.8% 66.4%
DINOv2 Fine-Tuned 49.8% 66.4%
CLIP Fine-Tuned (GPT) 77.6% 82.4%

Main Page of MM SAP Code Finder

Main Page of MM SAP Code Finder

The main page of the MM SAP Code Finder application provides a user-friendly interface for searching and retrieving Material Management (MM) SAP codes. With this application, users can easily find the relevant codes for various materials, simplifying the process of managing and organizing inventory.

An Azure App Service was created specifically for hosting the FastAPI-based application. Azure Pipelines handled the build steps, ensuring that dependencies like torch, FastAPI, and clip were properly installed and configured.Necessary environment variables and configurations for the application (such as paths to model weights, API keys, and database connections) were set up in the Azure portal’s configuration settings.

Conclusion

This project showcased my expertise in computer vision and machine learning, applying advanced techniques to solve a real-world problem in warehouse management. The developed application effectively identifies unlabeled products, reducing manual workload and improving operational efficiency. Future improvements could include integrating segmentation models, enhancing image processing, and incorporating better natural language processing to further boost accuracy.

Year

August 2023