In the previous post, we have introduced the concept of Linear Blend Skinning (LBS), which is a highly important technique in graphics and animation. However, LBS is just so simple that it has many problems such as volume collapsing and candy wrapper (see Figure 4 in this post), which can lead to unrealistic body deformation. Furthermore, in many practical problems, we may want a realistic body model that has the ability to represent any human body in the real world and the model should not be too complicated to use. In this article, we will go into a human body model called Skinned Multi-Person Linear Model (SMPL) [1], which utilize a very effective data-driven strategy to address all of these concerns.
Linear Blend Skinning Recall
To recap, linear blend skinning parameters contain:
- Mesh vertices in rest pose:
- Joint locations: .
- Joint rotations (pose parameters):
- Blend skinning weights:
where is the number of vertices, and is the number of articulated body joints. The local bone transformation matrix at joint depends on the joint location and the joint location, which can be computed as:
The function transforms the joint rotation vector into a rotation matrix and is the offsets of the current joint from its parent (). To obtain the world transformation at joint , we simply accumulate the local transformation matrix along the kinematic tree up to the root joint: , where is the ordered list of joints in the path from the root to joint . Finally, the position of the vertex after LBS is calculated as
where is the world transformation of joint according to the rest pose ; is the vertex position in the rest pose; is the transformed vertex corresponding to the desired pose .
Pose corrective blendshapes
To deal with the errors of linear blend skinning, in practice, people often artistically sculpt the blendshape, and then add it into the template mesh to correct the posed mesh. A blendshape is a vector of vertex displacements to the original template mesh. As shown in Figure 1, after adding the blendshape to the original mesh, the artifacts around the elbow and the hip go away, and the posed mesh is correct.
However, it is quite a tedious and expensive process. Inspired by this, the SMPL's author resolved the problem of LBS by learning the corrective blendshape from a large amount of scanned data of real humans. Specifically, in the SMPL formulation, the pose corrective blendshapes are added to the original LBS (Equation 1) as follows:
where is the pose correctives. Let be some function of and denote the -th element. The pose correctives depend on the current pose and can be calculated as the combination of the learned blendshapes:
The authors of SMPL designed to be the function converting into a vectorized version of the concatenated joint rotation matrices . Note that although may be non-linear in pose , it is linear in elements of the rotation matrices (the authors also tried to make the model linear, i.e., , but it did not work). Therefore, SMPL is an additive and simple yet very effective model. These blendshape vectors are the model's parameters and are learned from real-world data.
Training of pose-related parameters
The pose-related parameters of the model are trained using a 3D scan dataset when people are in different poses (we called multi-pose dataset). SMPL pose-related parameters in this stage include:
- : Joint regressor matrix, which is used to regress the joint location based of the surrounding vertices (Figure 2):
- : blend skinning weights matrix.
- : pose corrective blendshapes.
To train these parameters, we minimize the surface reconstruction error (squared euclidean distance) between the ground-truth mesh vertices (template aligned from the scanned human) and the output vertices of the corrective skinning function (Equation 2):
where is the ground-truth aligned mesh, denotes the -th vertex position. In addition, there are many subjects in the dataset, each human subject has a different body configuration (e.g., fat, skinny, tall,...). Therefore, we also need to compute the rest template mesh and the joint location for each subject. This result in an alternating optimization scheme where we alternate updating between the pose parameter , the subject-specific parameters , , and the global parameters ,
Since the model consists of a large number of parameters, we also apply several regularizations to prevent overfitting:
- : regularized the blendshapes towards zeros.
- : regularized towards the predicting joints near the boundaries between the body parts and to be sparse.
- : regularized towards the initial artist-design LBS blend weights.
Identity-dependent (Shape) blendshapes
Next, we want the body model to be able represent a wide variety of human body shape in the population. To do so, the authors also built a dataset for shape training, called multi-shape dataset. This dataset contains several template meshes with high variation of body shape of real human (mostly from US and Europe) . In order to learn the shape space properly (the shape must be completely independent with the pose), we also need to factor out the pose (pose normalization) so that every people in this dataset are in the same rest pose.
Our goal is to learn a statistical model from a body shape space. Note that each body can be considered as a vector in a -dimensional space (3D positions of vertices). However, this space is infeasible to deal with since is typically large, and not all vectors in this space correspond to a valid body shape (they only take the place of a tiny subspace). Therefore, we need to reduce the dimensionality to compress the shape space into a low dimensional space.
We first start by computing the mean body shape from the training data. We subtract the mean mesh from each of all the meshes in the training dataset (Figure 3). After that, we stack these body mesh vertices into a matrix and perform principal component analysis (PCA). This results in a set of eigenvalues and eigenvectors that describes the major directions of variation in the body shape space. The top ordered eigenvectors give us a low-dimensional linear subspace (typically 10-300D) that captures most of the variance in the -D space. Consequently, the body shapes of different people are approximated by a linear combination of the shape parameters and the shape-dependent blendshapes plus a mean body :
Figure 5 below shows the variation in the first principal components of the shape PCA subspace. We also note that this procedure works because the non-linearities of the pose are factored out so that the body meshes live in a linear euclidean space of point. Moreover, the joint locations is now a function depending on the shape parameters since it is regressed from the body mesh that may vary with the shape.
Putting it all together, we obtained the final skinning equation of SMPL:
where
- is the mean body template mesh (in the rest pose).
- is the shape blendshapes (Equation 5).
- is the pose blendshapes (Equation 3).
The SMPL model can represent "any" human body in different poses in an effective way thanks to the disentanglement of the shape and the pose. This means we can easily transfer the pose from a person with this body shape to another one with different body shape by just directly copying the pose parameters. We can also see that the SMPL is a simple additive model and is built upon the Linear Blend Skinning. Therefore, it is compatible with existing graphics engines. Another advantage is that it is trained from real data so that it is accurate and can resolve many problems of the traditional skinning method.
This model cannot just only be used to synthesize artificial characters but can also be used to estimate the pose and the shape of humans precisely even from just a single image, thanks to the differentiability and linearity in the formulation. We can arbitrarily generate a new body shape by randomly sampling the shape vector around a Gaussian distribution with zero means. Moreover, the emergence of this model pushes forward many interesting research directions such as modeling clothing/garments, human reconstruction and motion dynamics, virtual avatars, scene interaction... Since SMPL is simple, it may not fully describe all features of human. There are several extensions of SMPL that can be useful in some specific tasks: DMPL [1] to model body and dynamic soft tissue movements, SMPL-H [2] to model body and hand/finger movement, SMPL-X [3] to model body, hand, and facial expression.
References
[1] Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG).
[2] Romero, J., Tzionas, D., & Black, M. J. Embodied hands: Modeling and capturing hands and bodies together. ACM transactions on graphics (TOG).
[3] Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A. A., Tzionas, D., & Black, M. J. Expressive body capture: 3d hands, face, and body from a single image. In CVPR 2019.