Exploring Vision — Automatic Face Detection and Cropping for Profile Pictures (Swift)
Practical Application of Vision
[2024/08/13 Update]
- Refer to the new article and API: “iOS Vision framework x WWDC 24 Discover Swift enhancements in the Vision framework Session”
Without further ado, here is a comparison image:
Before Optimization V.S. After Optimization — Marry Me APP
With the recent iOS 12 update, I noticed the new CoreML machine learning framework and found it quite interesting. I began to think about how to incorporate it into our current products.
The article on trying out CoreML is now available: Automatically Predict Article Categories Using Machine Learning, Even Train the Model Yourself
CoreML provides the ability to train and reference machine learning models for text and images in an app. Initially, I thought of using CoreML for face recognition to address the issue of cropping heads or faces in the app, as shown on the left in the image above. Faces can easily be cut off due to scaling and cropping if they appear at the edges.
After some online research, I realized my knowledge was limited, and this functionality was already available in iOS 11 through the “Vision” framework, which supports text detection, face detection, image comparison, QR code detection, object tracking, and more.
In this case, I utilized the face detection feature from Vision and optimized it as shown on the right in the image; finding faces and cropping around them.
Let’s get started with the practical implementation:
First, let’s create a feature that can mark the position of faces and get familiar with how to use Vision.
Demo APP
As shown in the completed image above, it can mark the positions of faces in the photo.
P.S. It can only mark “faces,” not the entire head including hair 😅
This program mainly consists of two parts. The first part addresses the issue of white space when resizing the original image to fit into an ImageView. In simple terms, we want the ImageView size to match the image size. Directly inserting the image can cause misalignment as shown below.
You might consider changing the ContentMode to fill, fit, or redraw, but this may cause distortion or cropping of the image.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
let ratio = UIScreen.main.bounds.size.width
// Here, I set the alignment of my UIImageView to 0 on both sides, with an aspect ratio of 1:1
let sourceImage = UIImage(named: "Demo2")?.kf.resize(to: CGSize(width: ratio, height: CGFloat.leastNonzeroMagnitude), for: .aspectFill)
// Using KingFisher's image resizing feature, based on width, with flexible height
imageView.contentMode = .redraw
// Using redraw to fill the contentMode
imageView.image = sourceImage
// Assigning the image
imageViewConstraints.constant = (ratio - (sourceImage?.size.height ?? 0))
imageView.layoutIfNeeded()
imageView.sizeToFit()
// Here, I adjust the constraints of the imageView. For more details, refer to the complete example at the end of the document
Here is the translated content:
The above is the processing for images.
The cropping part uses Kingfisher to assist us, and can also be replaced with other libraries or custom methods.
Next, let’s focus on the code directly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
if #available(iOS 11.0, *) {
// Supported after iOS 11
let completionHandle: VNRequestCompletionHandler = { request, error in
if let faceObservations = request.results as? [VNFaceObservation] {
// Recognized faces
DispatchQueue.main.async {
// Operate on UIView, switch back to the main thread
let size = self.imageView.frame.size
faceObservations.forEach({ (faceObservation) in
// Coordinate system conversion
let translate = CGAffineTransform.identity.scaledBy(x: size.width, y: size.height)
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -size.height)
let transRect = faceObservation.boundingBox.applying(translate).applying(transform)
let markerView = UIView(frame: transRect)
markerView.backgroundColor = UIColor.init(red: 0/255, green: 255/255, blue: 0/255, alpha: 0.3)
self.imageView.addSubview(markerView)
})
}
} else {
print("No faces detected")
}
}
// Recognition request
let baseRequest = VNDetectFaceRectanglesRequest(completionHandler: completionHandle)
let faceHandle = VNImageRequestHandler(ciImage: ciImage, options: [:])
DispatchQueue.global().async {
// Recognition takes time, so it is executed in the background thread to avoid freezing the current screen
do{
try faceHandle.perform([baseRequest])
}catch{
print("Throws: \(error)")
}
}
} else {
//
print("Not supported")
}
The main thing to note is the coordinate system conversion part; the results recognized are in the original coordinates of the image; we need to convert it to the actual coordinates of the ImageView outside to use it correctly.
Next, let’s focus on today’s highlight - cropping the correct position of the avatar according to the position of the face.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
let ratio = UIScreen.main.bounds.size.width
// Here, because I set the left and right alignment of my UIImageView to 0, with a ratio of 1:1, details can be found in the complete example at the end
let sourceImage = UIImage(named: "Demo")
imageView.contentMode = .scaleAspectFill
// Use scaleAspectFill mode to fill
imageView.image = sourceImage
// Assign the original image, we will operate on it later
if let image = sourceImage, #available(iOS 11.0, *), let ciImage = CIImage(image: image) {
let completionHandle: VNRequestCompletionHandler = { request, error in
if request.results?.count == 1, let faceObservation = request.results?.first as? VNFaceObservation {
// One face
let size = CGSize(width: ratio, height: ratio)
let translate = CGAffineTransform.identity.scaledBy(x: size.width, y: size.height)
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -size.height)
let finalRect = faceObservation.boundingBox.applying(translate).applying(transform)
let center = CGPoint(x: (finalRect.origin.x + finalRect.width/2 - size.width/2), y: (finalRect.origin.y + finalRect.height/2 - size.height/2))
// Here is the calculation of the middle point position of the face range
let newImage = image.kf.resize(to: size, for: .aspectFill).kf.crop(to: size, anchorOn: center)
// Crop the image according to the center point
DispatchQueue.main.async {
// Operate on UIView, switch back to the main thread
self.imageView.image = newImage
}
} else {
print("Detected multiple faces or no faces detected")
}
}
let baseRequest = VNDetectFaceRectanglesRequest(completionHandler: completionHandle)
let faceHandle = VNImageRequestHandler(ciImage: ciImage, options: [:])
DispatchQueue.global().async {
do{
try faceHandle.perform([baseRequest])
}catch{
print("Throws: \(error)")
}
}
} else {
print("Not supported")
}
The logic is similar to marking the position of a face, the difference is that the avatar part has a fixed size (e.g. 300x300), so we skip the first part that requires the Image to fit the ImageView.
Another difference is that we need to calculate the center point of the face area and use this center point as the reference for cropping the image.
The red dot is the center point of the face area.
Final effect image:
The second before the blink is the original image position.
Complete app example:
The code has been uploaded to Github: Click here
For any questions or suggestions, feel free to contact me.
===
===
This article was first published in Traditional Chinese on Medium ➡️ View Here