Face Pose Estimation Task
Jump to navigation
Jump to search
A Face Pose Estimation Task is a Preprocessing Task that is based on a Face Recognization Task.
- AKA: Head Pose Estimation Task.
- Context:
- It can estimate the gazing direction or head postures.
- Example(s):
- Counter-Example(s):
- See: Person Face, Articulated Body Pose Estimation, Face Perception.
References
2017a
- (Hong & Yu, 2017) ⇒ Hong, C., & Yu, J. (2017). Multi-modal Face Pose Estimation with Multi-task Manifold Deep Learning. arXiv preprint arXiv:1712.06467.
- ABSTRACT: Human face pose estimation aims at estimating the gazing direction or head postures with 2D images. It gives some very important information such as communicative gestures, saliency detection and so on, which attracts plenty of attention recently. However, it is challenging because of complex backgrounds, various orientations and face appearance visibility. Therefore, a descriptive representation of face images and mapping it to poses are critical. In this paper, we make use of multi-modal data and propose a novel face pose estimation method that uses a novel deep learning framework named Multi-task Manifold Deep Learning M2DL. It is based on feature extraction with improved deep neural networks and multi-modal mapping relationship with multi-task learning. In the proposed deep learning based framework, Manifold Regularized Convolutional Layers (MRCL) improve traditional convolutional layers by learning the relationship among outputs of neurons. Besides, in the proposed mapping relationship learning method, different modals of face representations are naturally combined to learn the mapping function from face images to poses. In this way, the computed mapping model with multiple tasks is improved. Experimental results on three challenging benchmark datasets DPOSE, HPID and BKHPD demonstrate the outstanding performance of M2DL.
2017b
- (Xu & Kakadiaris, 2017) ⇒ Xu, X., & Kakadiaris, I. A. (2017, May). "Joint head pose estimation and face alignment framework using global and local CNN features". In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 642-649). DOI: 10.1109/FG.2017.81.
- ABSTRACT: In this paper, we explore global and local features obtained from Convolutional Neural Networks (CNN) for learning to estimate head pose and localize landmarks jointly. Because there is a high correlation between head pose and landmark locations, the head pose distributions from a reference database and learned local deep patch features are used to reduce the error in the head pose estimation and face alignment tasks. First, we train GNet on the detected face region to obtain a rough estimate of the pose and to localize the seven primary landmarks. The most similar shape is selected for initialization from a reference shape pool constructed from the training samples according to the estimated head pose. Starting from the initial pose and shape, LNet is used to learn local CNN features and predict the shape and pose residuals. We demonstrate that our algorithm, named JFA, improves both the head pose estimation and face alignment. To the best of our knowledge, this is the first system that explores the use of the global and local CNN features to solve head pose estimation and landmark detection tasks jointly.
2016
- (Saeed et al., 2016) ⇒ Saeed, A., Al-Hamadi, A., & Handrich, S. (2016, December). ["Advancement in the head pose estimation via depth-based face spotting"]. In Computational Intelligence (SSCI), 2016 IEEE Symposium Series on (pp. 1-6). IEEE.
- ABSTRACT: Head pose estimation is not only a crucial preprocessing task in applications such as facial expression and face recognition, but also the core task for many others, e.g. gaze; driver focus of attention; head gesture recognitions. In real scenarios, the fine location and scale of a processed face patch should be consistently and automatically obtained. To this end, we propose a depth-based face spotting technique in which the face is cropped with respect to its depth data, and is modeled by its appearance features. By employing this technique, the localization rate was gained. additionally, by building a head pose estimator on top of it, we achieved more accurate pose estimates and better generalization capability. To estimate the head pose, we exploit Support Vector (SV) regressors to map Histogram of oriented Gradient (HoG) features extracted from the spotted face patches in both depth and RGB images to the head rotation angles. The developed pose estimator compared favorably to state-of-the-art approaches on two challenging DRGB databases.
2015
- (Saeed et al., 2015) ⇒ Saeed, A., Al-Hamadi, A., & Ghoneim, A. (2015). [Head pose estimation on top of haar-like face detection: A study using the kinect sensor]. Sensors, 15(9), 20945-20966 DOI: 10.3390/s150920945.
- ABSTRACT: Head pose estimation is a crucial initial task for human face analysis, which is employed in several computer vision systems, such as: facial expression recognition, head gesture recognition, yawn detection, etc. In this work, we propose a frame-based approach to estimate the head pose on top of the Viola and Jones (VJ) Haar-like face detector. Several appearance and depth-based feature types are employed for the pose estimation, where comparisons between them in terms of accuracy and speed are presented. It is clearly shown through this work that using the depth data, we improve the accuracy of the head pose estimation. Additionally, we can spot positive detections, faces in profile views detected by the frontal model, that are wrongly cropped due to background disturbances. We introduce a new depth-based feature descriptor that provides competitive estimation results with a lower computation time. Evaluation on a benchmark Kinect database shows that the histogram of oriented gradients and the developed depth-based features are more distinctive for the head pose estimation, where they compare favorably to the current state-of-the-art approaches. Using a concatenation of the aforementioned feature types, we achieved a head pose estimation with average errors not exceeding 5:1; 4:6; 4:2 for pitch, yaw and roll angles, respectively.
2012
- (Zhu & Ramanan, 2012) ⇒ Zhu, X., & Ramanan, D. (2012, June). Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 2879-2886). DOI: 10.1109/CVPR.2012.6248014
- ABSTRACT: We present a unified model for face detection, pose estimation, and landmark estimation in real-world, cluttered images. Our model is based on a mixtures of trees with a shared pool of parts; we model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint. We show that tree-structured models are surprisingly effective at capturing global elastic deformation, while being easy to optimize unlike dense graph structures. We present extensive results on standard face benchmarks, as well as a new “in the wild” annotated dataset, that suggests our system advances the state-of-the-art, sometimes considerably, for all three tasks. Though our model is modestly trained with hundreds of faces, it compares favorably to commercial systems trained with billions of examples (such as Google Picasa and face.com).
2008
- (Xiang et al., 2008) ⇒ Shiming Xiang, Feiping Nie, and Changshui Zhang. (2008). “Learning a Mahalanobis Distance Metric for Data Clustering and Classification.” In: Pattern Recognition 41.
- ABSTRACT: Distance metric is a key issue in many machine learning algorithms. This paper considers a general problem of learning from pairwise constraints in the form of must-links and cannot-links. As one kind of side information, a must-link indicates the pair of the two data points must be in a same class, while a cannot-link indicates that the two data points must be in two different classes. Given must-link and cannot-link information, our goal is to learn a Mahalanobis distance metric. Under this metric, we hope the distances of point pairs in must-links are as small as possible and those of point pairs in cannot-links are as large as possible. This task is formulated as a constrained optimization problem, in which the global optimum can be obtained effectively and efficiently. Finally, some applications in data clustering, interactive natural image segmentation and face pose estimation are given in this paper. Experimental results illustrate the effectiveness of our algorithm.