Home > Article > Technology peripherals > Parallel human posture estimation patent: Microsoft AR/VR technology realizes virtual representation
(Nweon November 13, 2023) Information about human user postures can be mapped to virtual articulated representations. For example, when participating in a virtual reality environment, a human user's performance in the virtual environment will exhibit postures similar to real-world postures. The user's real-world pose can be converted into a virtual articulated representation's pose by a previously trained model, and the model can be trained to output the same virtual articulated representation's pose for final rendering.
Sometimes the system needs to display unrealistic performance. For example, users can choose cartoon characters with different body proportions, bones, or other aspects
As shown in Figure 1, a human user 100 in a real-world environment 102 is displayed. As can be seen, the human user's gestures are applied to the articulated representation 104. In other words, when a human user performs activities in a real-world environment, the corresponding actions are translated into movements of the articulated representation 104 in the virtual environment 106
Sometimes, the virtual articulation representation may be different from the representation used to train the model and needs to be rewritten. To solve this problem, Microsoft's patent "Concurrent human pose estimates for virtual representation" introduces a technology that can simultaneously estimate the poses of the model's articulated representation and the target's articulated representationSpecifically, the computing system receives positioning data for detailed parameters of one or more body parts of a human user based at least in part on input from one or more sensors. These sensors can include the inertial measurement unit output of the headset, as well as the output of the appropriate camera
The rewritten content is: the system will simultaneously maintain one or more mapping constraints of the model joint representation associated with the target joint representation, such as joint mapping constraints. The pose optimization opportunity uses positioning data and mapping constraints to simultaneously estimate the pose represented by the model joints and the target pose represented by the target joints. Once the estimation is complete, the system can display the target joint representation along with the target pose as a virtual representation for human users to view
The pose optimization machine can be trained using training positioning data with ground truth labels for the model's articulated representation. However, training localization data may lack ground truth labels for target articulation representations.
With this approach, accurate reproduction of real-world poses can be effectively achieved without the need for expensive training calculations for each different potential target. An inventive description of this technology can have a positive impact on human users
When users participate in a virtual environment, they can choose a different avatar to represent themselves, and can change their appearance at any time during the communication process. New target articulated representations can be added to the menu of representations available to the user without having to retrain the model for a specific representation, thus saving computational expense
The invention describes a technology that can provide the technical advantage of reducing computational resource consumption while accurately recreating the real-world pose of a human user and allowing the accurate pose to be applied to any of multiple different target articulation representations. The specific method is by simultaneously estimating the pose of the target and the model.
An example method for virtual representation of human posture 200
is shown in Figure 2
At 202, positioning data for detailed parameters of one or more body parts of the human user is received based on input from the one or more sensors.In 204, one or more mapping constraints related to the target articulated representation need to be maintained to ensure the connection of the model. As shown in Figure 4, an example model articulated representation 400
is shown
As mentioned above, the target articulated representation is rendered for display in the virtual environment, and can be displayed by outputting the target pose through the pose optimization machine. For example, the target articulated representation may have any suitable appearance and proportions, and may have any suitable number of limbs, joints, and/or other movable body parts.
This can be rewritten as: The target articulated representation can represent a non-human animal, a fictional character, or any suitable avatar. The model articulated representation and the target articulated representation are related through one or more mapping constraints 402
One or more mapping constraints may include joint mapping constraints 404. For joints in a target articulated representation, a joint mapping constraint specifies a set of one or more joints in the model's articulated representation. For example, model articulated representation 400 includes a plurality of joints, two of which are labeled 403A and 403B, which correspond to the shoulder joint and the elbow joint.
Target Articulation No. 104 includes similar joints 405A and 405B. Therefore, the target representation's joints 405A and 405B may have multiple different joint mapping constraints, indicating that these joints map to the model representation's joints 403A and 403B
Joint mapping constraints can further specify the weight of each model joint when mapping to the target joint representation. For example, when a model's articulated representation has only one joint mapped to a specific joint of the target articulated representation, the weight of that model's joints might be 100%. When two model joints are mapped to target joints, the weights of the two model joints can be 50% and 50%, 30% and 70%, 10% and 90%, etc.
In Figure 2, method 200 simultaneously estimates the model pose represented by the model hinge and the target pose represented by the target hinge by optimizing the previously trained pose. The estimation of model pose and target pose relies at least in part on positioning data
Figure 5A schematically shows an example of a pose optimization machine 500, which may be implemented as any suitable combination of computer logic components. As a non-limiting example, the pose optimization machine 500 may be implemented as a logic subsystem 602 as shown in FIG. 6 .
As shown in Figure 5A, the posture optimization machine simultaneously estimates the model posture 502A represented by the model articulation and the target posture 502B represented by the target articulation. This is accomplished based at least in part on positioning data 504 and one or more mapping constraints 506 .
Pose estimation may be accomplished, at least in part, based on one or more previous model poses and previous target poses estimated at one or more previous time frames. Therefore, the pose optimization machine 500 stores multiple previous poses 506, which can be represented as multiple local rotations for each model joint.
One or more of the mapping constraints may include pose continuity constraints, which impose frame-to-frame constraints on the extent to which the local rotation of a given joint can change from one frame to another. A set of mapping constraints can be applied to pose continuity to constrain the local rotation of a given joint by limiting the degree of change from frame to frame
Figure 5B schematically illustrates the process of applying estimated model and target poses to model and target articulated representations. Specifically, FIG. 5B again shows the default postures 407A and 407B corresponding to the model articulated representation 400 and the target articulated representation 104 . Then, by changing the direction of the articulation, the model articulation representation 400 assumes the model posture 502A, and the target articulation representation 104 assumes the target posture 502B
In pose optimization, it is necessary to estimate the pose of the model and the pose of the target at the same time. In other words, unlike other methods, the pose optimization machine does not first output the pose representation of the model and then convert it into the pose representation of the target. In contrast, pose estimation is the process of simultaneously finding the model pose and target pose that satisfy a set of constraints
For example, the pose of the model articulated representation can be constrained by prior training of a pose optimization machine to output possible human poses given a set of positioning data, and the pose of the target articulated representation can be constrained by aligning the target articulated representation with the model A constraint that represents one or more associated mapping constraints.
In addition, in the previous training, the pose estimation can be implemented by the machine learning model 508 that performs pose optimization. In one example, the pose optimization machine may be configured to output a pose based on sparse input positioning data. In other words, the attitude optimization machine can be trained to output more accurate attitude estimates, which depends on more input parameters received at runtime
In other words, the positioning data received by the posture optimization machine may contain the rotation parameters of n joints of the human user. In the previous training, the attitude optimization machine received the rotation parameters of n m joints as input, where m is greater than 1. Then, the pose of the estimated model can be determined by estimating the rotation parameters of n m model joints represented by the model's articulation. At least the rotation parameters based on n joints are required, but not the rotation parameters based on m joints.
In addition, when training the attitude optimization machine, there is no need to include the ground truth label of the target articulated representation. Instead, the target articulated representation is associated with the model articulated representation through one or more mapping constraints, typically constraining the target pose to be substantially similar to the model pose
Microsoft points out that using the above technology, the speed of the process can be beneficially increased by two orders of magnitude. This enables real-time concurrent estimation of model and target poses without the need for specialized hardware acceleration.
In FIG. 2 , method 200 includes outputting a target articulated representation having a target pose as a virtual representation of a human user for display, which step occurs at 208 . For example, in FIG. 1 , target articulated representation 104 is displayed through electronic display device 108 . The display device used to display the articulated representation of the target may take any suitable form, and may use any suitable underlying display technology
Related Patents: Microsoft Patent | Concurrent human pose estimates for virtual representation
The Microsoft patent application titled "Concurrent human pose estimates for virtual representation" was originally submitted in April 2022 and was recently published by the US Patent and Trademark Office.
The above is the detailed content of Parallel human posture estimation patent: Microsoft AR/VR technology realizes virtual representation. For more information, please follow other related articles on the PHP Chinese website!