Data Engineering: The MRL Eye Dataset
In AI, the quality of the model is determined by the quality of the data. For this project, we used the MRL Eye Dataset, a benchmark collection for eye state classification.
📦 The Dataset Concept
Unlike object detection (where you draw boxes around eyes), this project uses Image Classification. This means the model doesn’t need to know where the eye is in the image; it only needs to know what state the eye is in.
Data Flow Diagram
graph TD Raw[MRL Raw Images] --> Split[Dataset Splitting] Split --> Train[Train Set: 70%] Split --> Val[Validation Set: 20%] Split --> Test[Test Set: 10%] Train --> L1[Folder: open_eyes] Train --> L2[Folder: closed_eyes] L1 --> Model[YOLOv11 Model] L2 --> Model Model --> Weights[Trained .pt Weights]
📁 Folder-Based Labeling (The “Zero-Annotation” Method)
We used a technique called Folder-Based Labeling. Instead of creating a separate .txt or .xml file for every image, we simply put the images into folders named after their class.
The Structure
dataset/
├── train/
│ ├── open_eyes/ # [Image 1, Image 2, ...] $\rightarrow$ Label: 0
│ └── closed_eyes/ # [Image 1, Image 2, ...] $\rightarrow$ Label: 1
├── valid/
│ ├── open_eyes/
│ └── closed_eyes/
└── test/
├── open_eyes/
└── closed_eyes/How the AI reads this:
When the training script points to the dataset/ folder, it automatically scans the subdirectories. It says: “Everything in the ‘open_eyes’ folder is Class 0, and everything in the ‘closed_eyes’ folder is Class 1.”
🛠️ Data Quality Challenges
To ensure the model is robust, the dataset must account for:
- Lighting Variations: Images taken in bright light vs. dim light.
- Demographics: Different eye shapes, colors, and ethnicities.
- Occlusions: Glasses or eyelashes partially covering the eye.
By using the MRL dataset, the model is exposed to thousands of these variations, which prevents it from becoming “biased” toward one specific type of eye.
Related Components
Last Updated: 2026-05-03