ImageHash
is a perceptual hashing library that allows us to detect if there are any very similar images in the dataset regardless of their physical size and minor color related differences.
Class | Duplicate Count | Total Images | Proportion | |
---|---|---|---|---|
0 | full | 0 | 23086 | 0.0 |
1 | Total | 0 | 23086 | 0.0 |
Running on 16 workers Total images: 23086
Processing images: 100%|ββββββββββ| 23086/23086 [05:18<00:00, 72.47it/s]
Total processing time: 320.58 seconds
Image color summaryΒΆ
There seem to be no grayscale images and all images have 3 color channels.
Cell In[107], line 3 row_col=None, suptitle=None, xlim=(6.5, None), binwidth=0.1, xlim=(0, None) ^ SyntaxError: invalid syntax. Perhaps you forgot a comma?
image_path | variance | unique_colors | entropy | brisque_score | laplacian_variance | fft_blur_score | luminance | luminance_bin | skin_tone | age | gender | age_group | age_bin_raw | Images | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ../dataset/full/10_0_0_20170110220033115.jpg.c... | 1477.253495 | 6267 | 6.857390 | 33.980056 | 256.865812 | 2.312982 | 203.8070 | 3 | 12.5248 | 10 | 0 | 0-18 | 0-10 | 0 |
1 | ../dataset/full/10_0_0_20170110224406532.jpg.c... | 2452.172032 | 8298 | 7.718125 | 33.397515 | 244.865678 | 2.826604 | 141.7135 | 2 | 23.6788 | 10 | 0 | 0-18 | 0-10 | 0 |
2 | ../dataset/full/10_0_0_20170110220255346.jpg.c... | 2980.936287 | 8942 | 7.736862 | 44.824772 | 123.788397 | 2.063477 | 158.8874 | 3 | 25.4196 | 10 | 0 | 0-18 | 0-10 | 0 |
3 | ../dataset/full/10_0_0_20170110220251986.jpg.c... | 3365.068846 | 6339 | 7.209920 | 24.517992 | 657.658092 | 3.654595 | 130.6373 | 2 | 20.3080 | 10 | 0 | 0-18 | 0-10 | 0 |
4 | ../dataset/full/10_0_0_20170110220403810.jpg.c... | 4118.893420 | 8065 | 7.896404 | 52.822707 | 74.278110 | 2.606479 | 122.2249 | 2 | 21.0340 | 10 | 0 | 0-18 | 0-10 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
23081 | ../dataset/full/9_1_2_20170104020210475.jpg.ch... | 1676.861665 | 5791 | 7.260590 | 37.093527 | 176.641079 | 2.029646 | 138.5114 | 2 | 23.7608 | 9 | 1 | 0-18 | 0-10 | 0 |
23082 | ../dataset/full/9_1_2_20161219204347420.jpg.ch... | 1255.620365 | 7693 | 7.232986 | 42.996096 | 49.222689 | 1.349526 | 83.3686 | 1 | 23.2144 | 9 | 1 | 0-18 | 0-10 | 0 |
23083 | ../dataset/full/9_1_4_20170103200814791.jpg.ch... | 3325.250201 | 8696 | 7.875873 | 11.624793 | 914.503642 | 3.635523 | 145.6209 | 2 | 7.2624 | 9 | 1 | 0-18 | 0-10 | 0 |
23084 | ../dataset/full/9_1_3_20161219225144784.jpg.ch... | 1996.379638 | 6084 | 7.345491 | 55.754715 | 46.323105 | 1.265786 | 86.5876 | 1 | 42.3900 | 9 | 1 | 0-18 | 0-10 | 0 |
23085 | ../dataset/full/9_1_4_20170103213057382.jpg.ch... | 2170.575589 | 8720 | 7.753345 | 31.614764 | 439.525016 | 2.402710 | 157.0024 | 3 | 0.4552 | 9 | 1 | 0-18 | 0-10 | 0 |
23086 rows Γ 15 columns
Age and Gender DistributionΒΆ
/tmp/ipykernel_2896/1480114290.py:2: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(data=image_entropy_summary, x='age', shade=True)
The distribution of ages in the dataset doesn't seem to be inline with general demographic trends in most countries:
- Newborns and young children and young working age people between 20-40 are disproportionally overrepresented.
- There are relatively few samples of teenagers and those above 50-60
The uneven distribution will likely impact the model's performance and generalization capabilities across different age groups so that's something we'll need to pay attention to and find ways to handle it if that's the case.
Gender Balance and DistributionΒΆ
While the balance between and male and female samples is relatively acceptable (53:47) we can see that their distribution across different age groups is quite different:
<seaborn.axisgrid.FacetGrid at 0x7f64d0b79d80>
(1 = Female)
Male | Female | All | |
---|---|---|---|
Count | 12069.00 | 11017.00 | 23086.00 |
Prop. | 0.52 | 0.48 | 1.00 |
Mean | 35.65 | 30.62 | 33.25 |
Median | 34.00 | 26.00 | 29.00 |
Mode | 26.00 | 26.00 | 26.00 |
Std Dev | 19.72 | 19.69 | 19.86 |
IQR | 25.00 | 16.00 | 22.00 |
5th Percentile | 1.00 | 2.00 | 2.00 |
25th Percentile | 25.00 | 21.00 | 23.00 |
75th Percentile | 50.00 | 37.00 | 45.00 |
95th Percentile | 70.00 | 72.00 | 71.00 |
Minimum | 1.00 | 1.00 | 1.00 |
Maximum | 110.00 | 116.00 | 116.00 |
Skewness | 0.28 | 1.03 | 0.62 |
Kurtosis | -0.19 | 1.32 | 0.32 |
On average males in the photographs seem to be significantly older, at least in the middle of the range (25-50th percentiles). Above a certain age (~70) the proportion of females increase significantly. This again, raises certain issues and is something we'll need to pay close attention to when evaluating our model.
Male | Female | Total | |
---|---|---|---|
age_bin_raw | |||
0-10 | 1509 | 1638 | 3147 |
10-20 | 672 | 952 | 1624 |
20-30 | 3223 | 4339 | 7562 |
30-40 | 2408 | 1828 | 4236 |
40-50 | 1417 | 640 | 2057 |
50-60 | 1500 | 650 | 2150 |
60-70 | 754 | 378 | 1132 |
70-80 | 406 | 247 | 653 |
80-90 | 168 | 274 | 442 |
90-inf | 12 | 71 | 83 |
Text(0.5, 1.02, 'Distribution of Age Groups by Gender (Fem = 1)')
Image AnalysisΒΆ
We'll perform an in-depth analysis of some key characteristics, like:
- Luminance distribution
- Color variance and distribution
- Image entropy
- Image quality (using
BRISQUE
,FFT
, Laplacian variance)
We want to make sure that we have a comprehensive understanding of our dataset since that will impact our preprocessing (selection of transformation and augmentation techniques) and other decisions.
Additionally, we'll use a combination of these metrics to improve the robustness of your evaluation pipeline:
- Luminance and color information is used to assess the model's performance over different skin tone ranges.
- Image quality analysis will allow use to eliminate or at least identify invalid (i.e. extremely blurry or cropped images) and measure their impact on overall performance.
Color Variance and EntropyΒΆ
Average variance of color channels in the all images:
- Variance = 0: All pixels in the image have the same color.
- High Variance: Indicates images with diverse color pixels.
Number of unique colors in each image
Entropy (shannon_entropy).
- Scale: 0 to log2(N), where N is the number of possible pixel values (0 to 8 for 256 grayscale values).
- Min Entropy = 0: Perfectly uniform image (single color).
- High Entropy: Indicates images with a wide variety of colors and patterns.
- Scale: 0 to log2(N), where N is the number of possible pixel values (0 to 8 for 256 grayscale values).
Male | Female | All | |
---|---|---|---|
Count | 12069.00 | 11017.00 | 23086.00 |
Prop. | 0.52 | 0.48 | 1.00 |
Mean | 7.52 | 7.59 | 7.55 |
Median | 7.57 | 7.64 | 7.61 |
Mode | 4.28 | 5.60 | 4.28 |
Std Dev | 0.27 | 0.25 | 0.26 |
IQR | 0.33 | 0.30 | 0.32 |
5th Percentile | 7.02 | 7.11 | 7.06 |
25th Percentile | 7.38 | 7.47 | 7.42 |
75th Percentile | 7.71 | 7.77 | 7.74 |
95th Percentile | 7.86 | 7.89 | 7.87 |
Minimum | 4.28 | 5.60 | 4.28 |
Maximum | 7.97 | 7.97 | 7.97 |
Skewness | -1.43 | -1.37 | -1.40 |
Kurtosis | 5.10 | 3.11 | 4.31 |
Male | Female | All | |
---|---|---|---|
Count | 12069.00 | 11017.00 | 23086.00 |
Prop. | 0.52 | 0.48 | 1.00 |
Mean | 2548.27 | 3013.95 | 2770.50 |
Median | 2338.08 | 2781.21 | 2540.88 |
Mode | 201.43 | 293.90 | 201.43 |
Std Dev | 1195.63 | 1397.19 | 1316.42 |
IQR | 1525.39 | 1851.76 | 1701.91 |
5th Percentile | 1006.69 | 1139.91 | 1063.60 |
25th Percentile | 1672.27 | 1973.63 | 1803.40 |
75th Percentile | 3197.67 | 3825.39 | 3505.31 |
95th Percentile | 4815.17 | 5662.61 | 5273.66 |
Minimum | 201.43 | 293.90 | 201.43 |
Maximum | 9816.23 | 10944.40 | 10944.40 |
Skewness | 1.05 | 0.85 | 0.98 |
Kurtosis | 1.55 | 0.82 | 1.20 |
While male and female images have comparable overall color complexity or information content (entropy) higher variance in female images indicates that the colors in these images are more spread out from the mean color.
e.g. females image might contain a wide range of colors (high variance) in a balanced, evenly distributed manner (similar entropy to male images). For instance, a colorful floral dress with many different hues, but well-distributed throughout the image.
This would raise a few questions that could influence or preprocessing pipeline and the model itself:
- Difference in color variance between male and female images could become a strong predictive feature for gender classification. However, the model, might become overly reliant on color variance, potentially misclassifying males with high color variance or females with low color variance.
- While this affect won't be noticeable when testing on a sample of the same dataset (or is likely to improve the models performance) it might mean that the model might perform worse in real world conditions or different datasets because a part of its decision-making is based not core facial attributes but clothing, cosmetics and other external factors (assuming our hypothesis is correct).
We'll try to handle this by including various augmentation techniques that add color jitter to individual samples and even remove all color information from images (however we'd need to use a different dataset to fully verify this)
/tmp/ipykernel_2896/290011491.py:23: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. for (row_val, row_data), ax in zip(data.groupby(row_col), g.axes.flat):
We can see similar differences when comparing different age groups as well.
Skin Color EstimationΒΆ
Additionally, we'll try to determine the skin color of the subjects so that we could later measure whether that has an impact on the performance of our model.
We've attempted to use various heuristics (or their combination) for this, however we've found that just using luminance
directly provides the most predictable and reasonable useful results:
thresh: 60.112435, filtered_df: 231
thresh: 194.07173500000016, filtered_df: 231
Measuring Image QualityΒΆ
The quality and validity of the data we're using also has a significant effect (even if it's not-necessarily easy to estimate when using the same dataset for evaluation).
While the UTK dataset is relatively high quality it still contains some invalid images (and some probably mislabeled ones, but we'll get to that later)
BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator):ΒΆ
A no-reference image quality assessment method. Uses scene statistics of locally normalized luminance coefficients to quantify possible losses of "naturalness" in the image due to distortions. Operates in the spatial domain.
Bassically it allows us to detect very blurry images:
thresh: 66.22156698932034, filtered_df: 35
While these images seem mostly valid (i.e. contain human faces) we can see that BRISQUE
wuold allow to filter out the images which have a very poor quality and would be too hard to classify. Also depending on production use cases it would be possible to just indicate to the user which images to classify or not.
Examples of High BRISQUE ImagesΒΆ
thresh: -3.152889781340932, filtered_df: 35
Laplacian VarianceΒΆ
A measure of image sharpness/blurriness. Uses the Laplacian operator to compute the second derivative of the image. Measures the variance of the Laplacian-filtered image.
thresh: 17.4395013321, filtered_df: 35
Laplacian Variance seems to correlate very highly with BRISQUE , bassically allow us to filter out very similar images.
FFT-based Blur DetectionΒΆ
thresh: 0.8144524239654852, filtered_df: 35
FFT seems to be somewhat too agressive for our purposes, it assigns very low scores even with images with reasonably discernible faces.
Feature CorrelationΒΆ
All the three new metrics are strongly correlated to each other just proving that they more or less measure the same thing (blurriness and amount of detail)
Color Chanel Distribution by ClassΒΆ
These plots show the normalized intensity (0 - 255) distributions of color channel by class. The Y show the normalized frequency (density) relative to all color channels (based on highest individual value for any channel).
The charts are made by generating a histogram for each image, normalizing it (normalization process maintains the shape of the histogram, meaning the relative distribution of pixel intensities is preserved. All histograms in the class are then averaged.