Đánh giá việc sử dụng Facemesh cho bài toán tìm kiếm khuôn mặt

Bài viết này nhằm đánh giá hiệu quả của việc áp dụng Face Mesh trong bài toán tìm kiếm khuôn mặt. Các phương pháp so sánh sẽ tập trung vào tính chính xác và tính đồng nhất trong các điều kiện khác nhau. Ngoài ra, chúng tôi cũng khảo sát các thách thức đặt ra khi sử dụng Face Mesh trong thực tế, như nhiễu khuôn mặt trong một cảnh hoặc sự che khuất khuôn mặt một phần.

Cơ sở lý thuyết

Giới thiệu về Face Mesh

Face Mesh là một trong những giải pháp hàng đầu được cung cấp bởi thư viện MediaPipe, do Google phát triển. Đây là một công cụ mạnh mẽ trong việc định vị và phân tích khuôn mặt, cung cấp tới 468 điểm mốc 3D chi tiết trên khuôn mặt người từ dữ liệu hình ảnh hoặc video.

So sánh hai Face Mesh

Khoảng cách Hausdorff là một thước đo phổ biến để đánh giá độ tương đồng giữa hai tập hợp điểm trong không gian. Đối với Face Mesh, nó được sử dụng để so sánh hai tập hợp các điểm mốc trên khuôn mặt. Khoảng cách Hausdorff xác định mức độ khớp giữa hai tập hợp bằng cách tìm điểm xa nhất trong mỗi tập hợp so với tập hợp còn lại.

Khoảng cách Hausdorff lớn biểu thị hai tập điểm có sự khác biệt đáng kể, trong khi khoảng cách nhỏ biểu thị sự tương đồng cao.

Ưu và nhược điểm của phương pháp

Ưu điểm:

Đánh giá chính xác mức độ khác biệt giữa hai Face Mesh.
Không yêu cầu các điểm trong hai tập hợp phải được ánh xạ một cách rõ ràng.

Nhược điểm:

Đòi hỏi phải chuẩn hóa các Face Mesh trước để đảm bảo sự đồng nhất về vị trí và tỷ lệ.

So sánh hai khuôn mặt là so sánh Face Mesh của hai khuôn mặt đó

Việc so sánh hai khuôn mặt thực chất là so sánh hai Face Mesh (tập hợp các điểm mốc đặc trưng trên khuôn mặt). Để làm điều này, các kỹ thuật xử lý thường sử dụng thông tin về hình học và các phép đo giữa các điểm mốc để đánh giá sự tương đồng hoặc khác biệt.

Các bước thực hiện:

Trích xuất Face Mesh của khuôn mặt.
Chuẩn hóa các Face Mesh.
Tính khoảng cách giữa hai Face Mesh.

Mô phỏng tìm kiếm khuôn mặt dựa trên Face Mesh

Mục tiêu

Với một khuôn mặt cho trước, trích xuất Face Mesh của khuôn mặt, duyệt qua folder chứa ảnh, trả về những ảnh có chứa khuôn mặt đó.

Chuẩn bị

Sử dụng Python 3.6 trở lên trên Windows 10 hoặc Ubuntu 18.04 (OS khác chưa test)
pip install mediapipe

Giới thiệu bộ dữ liệu

Dữ liệu đầu vào là một folder chứa các hình ảnh, bao gồm:

Hình ảnh có chứa khuôn mặt cần tìm: Các ảnh này có thể chứa khuôn mặt của người cần tìm trong các góc độ, biểu cảm khác nhau hoặc có sự che khuất một phần khuôn mặt.
Hình ảnh không chứa khuôn mặt cần tìm: Bao gồm các ảnh chứa khuôn mặt của những người khác hoặc không chứa khuôn mặt nào.

Chuẩn hóa dữ liệu

a. Hàm align_face_mesh

Mục tiêu: Căn chỉnh các điểm mốc 3D của khuôn mặt về một gốc tọa độ chuẩn.

Cách hoạt động: Chuyển tất cả các điểm mốc về gốc tọa độ bằng cách dịch chuyển trung tâm (landmark mũi)

def align_face_mesh(face_mesh):
    face_mesh = np.array(face_mesh)
    # Translate to the origin
    center = face_mesh[1]
    # center = np.mean(face_mesh, axis=0)
    aligned_face_mesh -= center  # Center the face_mesh
return aligned_face_mesh

def align_face_mesh(face_mesh):

face_mesh = np.array(face_mesh)

# Translate to the origin

center = face_mesh[1]

# center = np.mean(face_mesh, axis=0)

aligned_face_mesh -= center # Center the face_mesh

return aligned_face_mesh

b. Hàm rotate_face_mesh

Mục tiêu: Căn chỉnh 3D Face Mesh theo các trục không gian để loại bỏ mọi xoay nghiêng.

Cách hoạt động:

Căn chỉnh trục mũi (Landmark 1 đến 4): Xoay khuôn mặt sao cho trục mũi vuông góc với trục Y.
Căn chỉnh trục mắt (Landmark 33 đến 263): Xoay khuôn mặt để trục ngang mắt vuông góc với trục X.
Căn chỉnh mặt phẳng Z: Xoay để toàn bộ khuôn mặt không bị nghiêng dọc theo trục Z

def rotate_face_mesh(face_mesh):

    def compute_vector(face_mesh, idx1, idx2):
        """Computes the vector between two face_mesh."""
        return face_mesh[idx2] - face_mesh[idx1]
    
    def compute_rotation_angle_to_align_with_axis(vector, target_axis):
        """Computes the rotation angle to align a vector with a target axis."""
        vector = vector / np.linalg.norm(vector)
        target_axis = target_axis / np.linalg.norm(target_axis)
        angle = np.arccos(np.clip(np.dot(vector, target_axis), -1.0, 1.0))
        return angle
    
    def rotate_face_mesh(face_mesh, axis, angle):
        """Rotates 3D face_mesh around a given axis by a specified angle."""
        axis = axis / np.linalg.norm(axis)
        ux, uy, uz = axis
        cos_theta = np.cos(angle)
        sin_theta = np.sin(angle)
        R = np.array([
            [cos_theta + ux**2 * (1 - cos_theta), ux * uy * (1 - cos_theta) - uz * sin_theta, ux * uz * (1 - cos_theta) + uy * sin_theta],
            [uy * ux * (1 - cos_theta) + uz * sin_theta, cos_theta + uy**2 * (1 - cos_theta), uy * uz * (1 - cos_theta) - ux * sin_theta],
            [uz * ux * (1 - cos_theta) - uy * sin_theta, uz * uy * (1 - cos_theta) + ux * sin_theta, cos_theta + uz**2 * (1 - cos_theta)]
        ])
        return np.dot(face_mesh, R.T)
    
    # Step 1: Align the nose axis (face_mesh 1 to 4) with the Y-axis
    nose_axis = compute_vector(face_mesh, idx1=1, idx2=4)
    angle_nose = compute_rotation_angle_to_align_with_axis(nose_axis, target_axis=np.array([0, 1, 0]))
    rotation_axis_nose = np.cross(nose_axis, [0, 1, 0])
    rotated_face_mesh = rotate_face_mesh(face_mesh, axis=rotation_axis_nose, angle=angle_nose)
    
    # Step 2: Align the eye axis (face_mesh 33 to 263) with the X-axis
    eye_axis = compute_vector(rotated_face_mesh, idx1=33, idx2=263)
    angle_eyes = compute_rotation_angle_to_align_with_axis(eye_axis, target_axis=np.array([1, 0, 0]))
    rotation_axis_eyes = np.cross(eye_axis, [0, 0, 1])
    rotated_face_mesh = rotate_face_mesh(rotated_face_mesh, axis=rotation_axis_eyes, angle=angle_eyes)
    
    # Step 3: Align the face along the Z-axis (no tilt)
    midline_vector = compute_vector(rotated_face_mesh, idx1=1, idx2=152)  # From nose tip to chin
    angle_z = compute_rotation_angle_to_align_with_axis(midline_vector, target_axis=np.array([0, 0, 1]))
    rotation_axis_z = np.cross(midline_vector, [0, 0, 1])
    fully_rotated_face_mesh = rotate_face_mesh(rotated_face_mesh, axis=rotation_axis_z, angle=angle_z)
    
    return fully_rotated_face_mesh

def rotate_face_mesh(face_mesh):

def compute_vector(face_mesh, idx1, idx2):

"""Computes the vector between two face_mesh."""

return face_mesh[idx2] - face_mesh[idx1]

def compute_rotation_angle_to_align_with_axis(vector, target_axis):

"""Computes the rotation angle to align a vector with a target axis."""

vector = vector / np.linalg.norm(vector)

target_axis = target_axis / np.linalg.norm(target_axis)

angle = np.arccos(np.clip(np.dot(vector, target_axis), -1.0, 1.0))

return angle

def rotate_face_mesh(face_mesh, axis, angle):

"""Rotates 3D face_mesh around a given axis by a specified angle."""

axis = axis / np.linalg.norm(axis)

ux, uy, uz = axis

cos_theta = np.cos(angle)

sin_theta = np.sin(angle)

R = np.array([

[cos_theta + ux**2 * (1 - cos_theta), ux * uy * (1 - cos_theta) - uz * sin_theta, ux * uz * (1 - cos_theta) + uy * sin_theta],

[uy * ux * (1 - cos_theta) + uz * sin_theta, cos_theta + uy**2 * (1 - cos_theta), uy * uz * (1 - cos_theta) - ux * sin_theta],

[uz * ux * (1 - cos_theta) - uy * sin_theta, uz * uy * (1 - cos_theta) + ux * sin_theta, cos_theta + uz**2 * (1 - cos_theta)]

])

return np.dot(face_mesh, R.T)

# Step 1: Align the nose axis (face_mesh 1 to 4) with the Y-axis

nose_axis = compute_vector(face_mesh, idx1=1, idx2=4)

angle_nose = compute_rotation_angle_to_align_with_axis(nose_axis, target_axis=np.array([0, 1, 0]))

rotation_axis_nose = np.cross(nose_axis, [0, 1, 0])

rotated_face_mesh = rotate_face_mesh(face_mesh, axis=rotation_axis_nose, angle=angle_nose)

# Step 2: Align the eye axis (face_mesh 33 to 263) with the X-axis

eye_axis = compute_vector(rotated_face_mesh, idx1=33, idx2=263)

angle_eyes = compute_rotation_angle_to_align_with_axis(eye_axis, target_axis=np.array([1, 0, 0]))

rotation_axis_eyes = np.cross(eye_axis, [0, 0, 1])

rotated_face_mesh = rotate_face_mesh(rotated_face_mesh, axis=rotation_axis_eyes, angle=angle_eyes)

# Step 3: Align the face along the Z-axis (no tilt)

midline_vector = compute_vector(rotated_face_mesh, idx1=1, idx2=152) # From nose tip to chin

angle_z = compute_rotation_angle_to_align_with_axis(midline_vector, target_axis=np.array([0, 0, 1]))

rotation_axis_z = np.cross(midline_vector, [0, 0, 1])

fully_rotated_face_mesh = rotate_face_mesh(rotated_face_mesh, axis=rotation_axis_z, angle=angle_z)

return fully_rotated_face_mesh

Thực hiện tìm kiếm

Trích xuất và chuẩn hóa Face Mesh của từng ảnh trong folder chứa data.

import numpy as np
import cv2
import mediapipe as mp



prototxt = "deploy.prototxt.txt"
model = "res10_300x300_ssd_iter_140000.caffemodel"
image = "family_photo.jpg"
conf = 0.5 #confidence

# load our serialized model from disk
net = cv2.dnn.readNetFromCaffe(prototxt, model)

def CropFace(image):
	(h, w) = image.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0,
		(300, 300), (104.0, 177.0, 123.0))

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()
	# loop over the detections
	cropped_faces = []  # List to store cropped images
	for i in range(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with the
		# prediction
		confidence = detections[0, 0, i, 2]

		# filter out weak detections by ensuring the `confidence` is
		# greater than the minimum confidence
		if confidence &gt; conf:
			# compute the (x, y)-coordinates of the bounding box for the
			# object
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")

			# Compute the center of the bounding box
			centerX = (startX + endX) // 2
			centerY = (startY + endY) // 2

			# Calculate the side length of the square bounding box
			box_size = max(endX - startX, endY - startY)  # Max dimension

			# Recalculate start and end coordinates to make the bounding box square
			startX = max(0, centerX - box_size // 2)
			startY = max(0, centerY - box_size // 2)
			endX = min(w, centerX + box_size // 2)
			endY = min(h, centerY + box_size // 2)
			
			# Crop the detected face
			cropped_face = image[startY:endY, startX:endX]
			cropped_faces.append(cropped_face.copy())
		
	
			# draw the bounding box of the face along with the associated
			# probability
			text = "{:.2f}%".format(confidence * 100)
			y = startY - 10 if startY - 10 &gt; 10 else startY + 10
			cv2.rectangle(image, (startX, startY), (endX, endY),
				(0, 0, 255), 2)
			cv2.putText(image, text, (startX, y),
				cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
	return cropped_faces

def ExtractLandMark(face):

    mp_face_mesh = mp.solutions.face_mesh



    # Create a face mesh object
    with mp_face_mesh.FaceMesh(
            static_image_mode=True,
            max_num_faces=1,
            refine_landmarks=True,
            min_detection_confidence=0.5) as face_mesh:
        
        # Read image file with cv2 and convert from BGR to RGB
        results = face_mesh.process(cv2.cvtColor(face, cv2.COLOR_BGR2RGB))        
        
        if results.multi_face_landmarks and len(results.multi_face_landmarks) &gt; 0:
            return results.multi_face_landmarks[0]
        return None

def ProcessFolder(folder_path):
    image_landmarks = {}

    for file_name in os.listdir(folder_path):
        file_path = os.path.join(folder_path, file_name)
        if os.path.isfile(file_path) and file_name.lower().endswith(('png', 'jpg', 'jpeg')):
            image = cv2.imread(file_path)

            cropped_faces = CropFace(image)
            face_mesh_list = []
            for face in cropped_faces:
                face_mesh = ExtractLandMark(face)
                if face_mesh:
                    face_mesh_list.append(landmarks)
            if face_mesh_list:
                image_face_mesh[file_name] = face_mesh_list

    return  image_face_mesh

100

101

102

103

104

import numpy as np

import cv2

import mediapipe as mp

prototxt = "deploy.prototxt.txt"

model = "res10_300x300_ssd_iter_140000.caffemodel"

image = "family_photo.jpg"

conf = 0.5 #confidence

# load our serialized model from disk

net = cv2.dnn.readNetFromCaffe(prototxt, model)

def CropFace(image):

(h, w) = image.shape[:2]

blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0,

(300, 300), (104.0, 177.0, 123.0))

# pass the blob through the network and obtain the detections and

# predictions

net.setInput(blob)

detections = net.forward()

# loop over the detections

cropped_faces = [] # List to store cropped images

for i in range(0, detections.shape[2]):

# extract the confidence (i.e., probability) associated with the

# prediction

confidence = detections[0, 0, i, 2]

# filter out weak detections by ensuring the `confidence` is

# greater than the minimum confidence

if confidence > conf:

# compute the (x, y)-coordinates of the bounding box for the

# object

box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])

(startX, startY, endX, endY) = box.astype("int")

# Compute the center of the bounding box

centerX = (startX + endX) // 2

centerY = (startY + endY) // 2

# Calculate the side length of the square bounding box

box_size = max(endX - startX, endY - startY) # Max dimension

# Recalculate start and end coordinates to make the bounding box square

startX = max(0, centerX - box_size // 2)

startY = max(0, centerY - box_size // 2)

endX = min(w, centerX + box_size // 2)

endY = min(h, centerY + box_size // 2)

# Crop the detected face

cropped_face = image[startY:endY, startX:endX]

cropped_faces.append(cropped_face.copy())

# draw the bounding box of the face along with the associated

# probability

text = "{:.2f}%".format(confidence * 100)

y = startY - 10 if startY - 10 > 10 else startY + 10

cv2.rectangle(image, (startX, startY), (endX, endY),

(0, 0, 255), 2)

cv2.putText(image, text, (startX, y),

cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

return cropped_faces

def ExtractLandMark(face):

mp_face_mesh = mp.solutions.face_mesh

# Create a face mesh object

with mp_face_mesh.FaceMesh(

static_image_mode=True,

max_num_faces=1,

refine_landmarks=True,

min_detection_confidence=0.5) as face_mesh:

# Read image file with cv2 and convert from BGR to RGB

results = face_mesh.process(cv2.cvtColor(face, cv2.COLOR_BGR2RGB))

if results.multi_face_landmarks and len(results.multi_face_landmarks) > 0:

return results.multi_face_landmarks[0]

return None

def ProcessFolder(folder_path):

image_landmarks = {}

for file_name in os.listdir(folder_path):

file_path = os.path.join(folder_path, file_name)

if os.path.isfile(file_path) and file_name.lower().endswith(('png', 'jpg', 'jpeg')):

image = cv2.imread(file_path)

cropped_faces = CropFace(image)

face_mesh_list = []

for face in cropped_faces:

face_mesh = ExtractLandMark(face)

if face_mesh:

face_mesh_list.append(landmarks)

if face_mesh_list:

image_face_mesh[file_name] = face_mesh_list

return image_face_mesh

Tính khoảng cách Hausdorff với Face Mesh của khuôn mặt cần tìm.

import numpy as np
from scipy.spatial.distance import directed_hausdorff

def hausdorff_distance(points_a, points_b):
    # Convert to NumPy arrays if not already
    points_a = np.array(points_a)
    points_b = np.array(points_b)
    
    # Compute the directed Hausdorff distances
    forward_distance = directed_hausdorff(points_a, points_b)[0]
    backward_distance = directed_hausdorff(points_b, points_a)[0]
    
    # Return the maximum of forward and backward distances
    return max(forward_distance, backward_distance)

import numpy as np

from scipy.spatial.distance import directed_hausdorff

def hausdorff_distance(points_a, points_b):

# Convert to NumPy arrays if not already

points_a = np.array(points_a)

points_b = np.array(points_b)

# Compute the directed Hausdorff distances

forward_distance = directed_hausdorff(points_a, points_b)[0]

backward_distance = directed_hausdorff(points_b, points_a)[0]

# Return the maximum of forward and backward distances

return max(forward_distance, backward_distance)

Sắp xếp và trả về danh sách các file tương đồng cao nhất.

def face_search(input, folder):
    distances = []
    for file_name, faces in folder.items():
        for face_mesh in faces:
            face_mesh = np.array([(lm.x, lm.y, lm.z) for lm in face_mesh.landmark])
            aligned_face_mesh = align_landmarks_3d(face_mesh)
            rotated_face_mesh = rotate_face_mesh(aligned_face_mesh)
            distance = hausdorff_distance(input, rotated_face_mesh)
            if distance &lt; 0.03:
                distances.append((file_name, distance))

    return distances.sort(key=lambda x: x[1])

def face_search(input, folder):

distances = []

for file_name, faces in folder.items():

for face_mesh in faces:

face_mesh = np.array([(lm.x, lm.y, lm.z) for lm in face_mesh.landmark])

aligned_face_mesh = align_landmarks_3d(face_mesh)

rotated_face_mesh = rotate_face_mesh(aligned_face_mesh)

distance = hausdorff_distance(input, rotated_face_mesh)

if distance < 0.03:

distances.append((file_name, distance))

return distances.sort(key=lambda x: x[1])

Đánh giá

Độ hiệu quả:

Nhanh và chính xác khi xử lý các khuôn mặt chụp chính diện.
Khả năng chuẩn hóa và so sánh giúp đảm bảo kết quả tìm kiếm đáng tin cậy.

Hạn chế:

Hiệu quả giảm khi xử lý khuôn mặt chụp góc nghiêng hoặc không chính diện.
Khó khăn với khuôn mặt bị che khuất một phần.