Dataset processing and analysisΒΆ

Importing and verifying the dataset:ΒΆ

ImageHash is a perceptual hashing library that allows us to detect if there are any very similar images in the dataset regardless of their physical size and minor color related differences.

Out[6]:
Class Duplicate Count Total Images Proportion
0 full 0 23086 0.0
1 Total 0 23086 0.0
Running on 16 workers
Total images: 23086
Processing images: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 23086/23086 [05:18<00:00, 72.47it/s]
Total processing time: 320.58 seconds

Image color summaryΒΆ

There seem to be no grayscale images and all images have 3 color channels.

  Cell In[107], line 3
    row_col=None, suptitle=None, xlim=(6.5, None), binwidth=0.1, xlim=(0, None)
                                                                      ^
SyntaxError: invalid syntax. Perhaps you forgot a comma?
Out[17]:
image_path variance unique_colors entropy brisque_score laplacian_variance fft_blur_score luminance luminance_bin skin_tone age gender age_group age_bin_raw Images
0 ../dataset/full/10_0_0_20170110220033115.jpg.c... 1477.253495 6267 6.857390 33.980056 256.865812 2.312982 203.8070 3 12.5248 10 0 0-18 0-10 0
1 ../dataset/full/10_0_0_20170110224406532.jpg.c... 2452.172032 8298 7.718125 33.397515 244.865678 2.826604 141.7135 2 23.6788 10 0 0-18 0-10 0
2 ../dataset/full/10_0_0_20170110220255346.jpg.c... 2980.936287 8942 7.736862 44.824772 123.788397 2.063477 158.8874 3 25.4196 10 0 0-18 0-10 0
3 ../dataset/full/10_0_0_20170110220251986.jpg.c... 3365.068846 6339 7.209920 24.517992 657.658092 3.654595 130.6373 2 20.3080 10 0 0-18 0-10 0
4 ../dataset/full/10_0_0_20170110220403810.jpg.c... 4118.893420 8065 7.896404 52.822707 74.278110 2.606479 122.2249 2 21.0340 10 0 0-18 0-10 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
23081 ../dataset/full/9_1_2_20170104020210475.jpg.ch... 1676.861665 5791 7.260590 37.093527 176.641079 2.029646 138.5114 2 23.7608 9 1 0-18 0-10 0
23082 ../dataset/full/9_1_2_20161219204347420.jpg.ch... 1255.620365 7693 7.232986 42.996096 49.222689 1.349526 83.3686 1 23.2144 9 1 0-18 0-10 0
23083 ../dataset/full/9_1_4_20170103200814791.jpg.ch... 3325.250201 8696 7.875873 11.624793 914.503642 3.635523 145.6209 2 7.2624 9 1 0-18 0-10 0
23084 ../dataset/full/9_1_3_20161219225144784.jpg.ch... 1996.379638 6084 7.345491 55.754715 46.323105 1.265786 86.5876 1 42.3900 9 1 0-18 0-10 0
23085 ../dataset/full/9_1_4_20170103213057382.jpg.ch... 2170.575589 8720 7.753345 31.614764 439.525016 2.402710 157.0024 3 0.4552 9 1 0-18 0-10 0

23086 rows Γ— 15 columns

Age and Gender DistributionΒΆ

/tmp/ipykernel_2896/1480114290.py:2: FutureWarning: 

`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.

  sns.kdeplot(data=image_entropy_summary, x='age', shade=True)
No description has been provided for this image

The distribution of ages in the dataset doesn't seem to be inline with general demographic trends in most countries:

  • Newborns and young children and young working age people between 20-40 are disproportionally overrepresented.
  • There are relatively few samples of teenagers and those above 50-60
No description has been provided for this image
No description has been provided for this image

The uneven distribution will likely impact the model's performance and generalization capabilities across different age groups so that's something we'll need to pay attention to and find ways to handle it if that's the case.

Gender Balance and DistributionΒΆ

While the balance between and male and female samples is relatively acceptable (53:47) we can see that their distribution across different age groups is quite different:

Out[20]:
<seaborn.axisgrid.FacetGrid at 0x7f64d0b79d80>
No description has been provided for this image

(1 = Female)

Out[22]:
Male Female All
Count 12069.00 11017.00 23086.00
Prop. 0.52 0.48 1.00
Mean 35.65 30.62 33.25
Median 34.00 26.00 29.00
Mode 26.00 26.00 26.00
Std Dev 19.72 19.69 19.86
IQR 25.00 16.00 22.00
5th Percentile 1.00 2.00 2.00
25th Percentile 25.00 21.00 23.00
75th Percentile 50.00 37.00 45.00
95th Percentile 70.00 72.00 71.00
Minimum 1.00 1.00 1.00
Maximum 110.00 116.00 116.00
Skewness 0.28 1.03 0.62
Kurtosis -0.19 1.32 0.32

On average males in the photographs seem to be significantly older, at least in the middle of the range (25-50th percentiles). Above a certain age (~70) the proportion of females increase significantly. This again, raises certain issues and is something we'll need to pay close attention to when evaluating our model.

Out[32]:
Male Female Total
age_bin_raw
0-10 1509 1638 3147
10-20 672 952 1624
20-30 3223 4339 7562
30-40 2408 1828 4236
40-50 1417 640 2057
50-60 1500 650 2150
60-70 754 378 1132
70-80 406 247 653
80-90 168 274 442
90-inf 12 71 83
No description has been provided for this image
Out[102]:
Text(0.5, 1.02, 'Distribution of Age Groups by Gender (Fem = 1)')
No description has been provided for this image

Image AnalysisΒΆ

We'll perform an in-depth analysis of some key characteristics, like:

  • Luminance distribution
  • Color variance and distribution
  • Image entropy
  • Image quality (using BRISQUE, FFT, Laplacian variance)

We want to make sure that we have a comprehensive understanding of our dataset since that will impact our preprocessing (selection of transformation and augmentation techniques) and other decisions.

Additionally, we'll use a combination of these metrics to improve the robustness of your evaluation pipeline:

  • Luminance and color information is used to assess the model's performance over different skin tone ranges.
  • Image quality analysis will allow use to eliminate or at least identify invalid (i.e. extremely blurry or cropped images) and measure their impact on overall performance.

Color Variance and EntropyΒΆ

  • Average variance of color channels in the all images:

    • Variance = 0: All pixels in the image have the same color.
    • High Variance: Indicates images with diverse color pixels.
  • Number of unique colors in each image

  • Entropy (shannon_entropy).

    • Scale: 0 to log2(N), where N is the number of possible pixel values (0 to 8 for 256 grayscale values).
      • Min Entropy = 0: Perfectly uniform image (single color).
      • High Entropy: Indicates images with a wide variety of colors and patterns.
Out[45]:
Male Female All
Count 12069.00 11017.00 23086.00
Prop. 0.52 0.48 1.00
Mean 7.52 7.59 7.55
Median 7.57 7.64 7.61
Mode 4.28 5.60 4.28
Std Dev 0.27 0.25 0.26
IQR 0.33 0.30 0.32
5th Percentile 7.02 7.11 7.06
25th Percentile 7.38 7.47 7.42
75th Percentile 7.71 7.77 7.74
95th Percentile 7.86 7.89 7.87
Minimum 4.28 5.60 4.28
Maximum 7.97 7.97 7.97
Skewness -1.43 -1.37 -1.40
Kurtosis 5.10 3.11 4.31
Out[44]:
Male Female All
Count 12069.00 11017.00 23086.00
Prop. 0.52 0.48 1.00
Mean 2548.27 3013.95 2770.50
Median 2338.08 2781.21 2540.88
Mode 201.43 293.90 201.43
Std Dev 1195.63 1397.19 1316.42
IQR 1525.39 1851.76 1701.91
5th Percentile 1006.69 1139.91 1063.60
25th Percentile 1672.27 1973.63 1803.40
75th Percentile 3197.67 3825.39 3505.31
95th Percentile 4815.17 5662.61 5273.66
Minimum 201.43 293.90 201.43
Maximum 9816.23 10944.40 10944.40
Skewness 1.05 0.85 0.98
Kurtosis 1.55 0.82 1.20

While male and female images have comparable overall color complexity or information content (entropy) higher variance in female images indicates that the colors in these images are more spread out from the mean color.

e.g. females image might contain a wide range of colors (high variance) in a balanced, evenly distributed manner (similar entropy to male images). For instance, a colorful floral dress with many different hues, but well-distributed throughout the image.

This would raise a few questions that could influence or preprocessing pipeline and the model itself:

  • Difference in color variance between male and female images could become a strong predictive feature for gender classification. However, the model, might become overly reliant on color variance, potentially misclassifying males with high color variance or females with low color variance.
  • While this affect won't be noticeable when testing on a sample of the same dataset (or is likely to improve the models performance) it might mean that the model might perform worse in real world conditions or different datasets because a part of its decision-making is based not core facial attributes but clothing, cosmetics and other external factors (assuming our hypothesis is correct).

We'll try to handle this by including various augmentation techniques that add color jitter to individual samples and even remove all color information from images (however we'd need to use a different dataset to fully verify this)

No description has been provided for this image
/tmp/ipykernel_2896/290011491.py:23: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  for (row_val, row_data), ax in zip(data.groupby(row_col), g.axes.flat):
No description has been provided for this image

We can see similar differences when comparing different age groups as well.

Skin Color EstimationΒΆ

Additionally, we'll try to determine the skin color of the subjects so that we could later measure whether that has an impact on the performance of our model.

We've attempted to use various heuristics (or their combination) for this, however we've found that just using luminance directly provides the most predictable and reasonable useful results:

thresh: 60.112435, filtered_df: 231
No description has been provided for this image
thresh: 194.07173500000016, filtered_df: 231
No description has been provided for this image

Measuring Image QualityΒΆ

The quality and validity of the data we're using also has a significant effect (even if it's not-necessarily easy to estimate when using the same dataset for evaluation).

While the UTK dataset is relatively high quality it still contains some invalid images (and some probably mislabeled ones, but we'll get to that later)

BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator):ΒΆ

A no-reference image quality assessment method. Uses scene statistics of locally normalized luminance coefficients to quantify possible losses of "naturalness" in the image due to distortions. Operates in the spatial domain.

Bassically it allows us to detect very blurry images:

thresh: 66.22156698932034, filtered_df: 35
No description has been provided for this image

While these images seem mostly valid (i.e. contain human faces) we can see that BRISQUE wuold allow to filter out the images which have a very poor quality and would be too hard to classify. Also depending on production use cases it would be possible to just indicate to the user which images to classify or not.

Examples of High BRISQUE ImagesΒΆ

thresh: -3.152889781340932, filtered_df: 35
No description has been provided for this image

Laplacian VarianceΒΆ

A measure of image sharpness/blurriness. Uses the Laplacian operator to compute the second derivative of the image. Measures the variance of the Laplacian-filtered image.

thresh: 17.4395013321, filtered_df: 35
No description has been provided for this image

Laplacian Variance seems to correlate very highly with BRISQUE , bassically allow us to filter out very similar images.

FFT-based Blur DetectionΒΆ

thresh: 0.8144524239654852, filtered_df: 35
No description has been provided for this image

FFT seems to be somewhat too agressive for our purposes, it assigns very low scores even with images with reasonably discernible faces.

Feature CorrelationΒΆ

No description has been provided for this image

All the three new metrics are strongly correlated to each other just proving that they more or less measure the same thing (blurriness and amount of detail)

Color Chanel Distribution by ClassΒΆ

These plots show the normalized intensity (0 - 255) distributions of color channel by class. The Y show the normalized frequency (density) relative to all color channels (based on highest individual value for any channel).

The charts are made by generating a histogram for each image, normalizing it (normalization process maintains the shape of the histogram, meaning the relative distribution of pixel intensities is preserved. All histograms in the class are then averaged.

No description has been provided for this image