Motion capture

A performer in a motion capture suit, markers glowing, surrounded by tracking cameras in a dark studio — and beside them, their digital counterpart moving in exact synchrony.
Motion capture as the foundation of synthetic performance. AI-generated using ChatAI. Use subject to ChatAI Terms of Service.

Motion capture is the recording of human movement for application to a digital character: a process that translates a performer’s physical choices into data and retargets that data onto a body that may bear very little resemblance to the performer’s own. It is the dominant method for generating character animation in AAA games and film, and it has reshaped the relationship between human performance and synthetic performance more profoundly than any other single technical development. It has also raised, more directly than any other technology in the field, the questions of authorship, creative agency, and artistic recognition that the guild was founded to address.

The character creation page and the motion page cover the history and practical mechanics of motion capture in some detail. This page focuses on what those pages treat as context: the performance questions specific to capture, and the ways in which capture is changing as AI enters the pipeline.

What capture gives the performance

The distinctive quality that human motion capture gives to a synthetic character is not accuracy of movement — keyframe animation can be accurate — but specificity. A motion-captured performance carries the physical signature of a specific human body, in a specific emotional state, making specific physical choices in a specific moment. The micro-adjustments of balance that vary between takes, the idiosyncratic quality of a particular performer’s weight shift, the way a specific person’s tiredness manifests in their posture at the end of a long recording session: these details are not specified by the director or designed by the animator. They arrive through the performer’s body, and they are often the details that make a performance feel inhabited rather than constructed.

This is why the best motion-captured game performances are characterised by a quality that critics reach for words to describe — presence, weight, authenticity — without always being able to account for technically. What they are responding to is the trace of a specific human being in the data. Whether this quality survives retargeting — the process of fitting captured human motion onto a character with different proportions, different scale, different skeletal structure — depends on how sensitively the retargeting is handled and how radically the target differs from the source. A character retargeted from a performer of similar build will retain most of the original performance’s specificity. A character retargeted onto a body ten times larger, or onto a creature with a fundamentally different skeletal structure, will retain some qualities and lose others, and the loss may not be apparent until the retargeted motion is seen on the new body.

The retargeting problem and creative opportunity

Retargeting — adapting captured motion to a character whose body differs from the capture performer’s — is simultaneously one of the most technically complex and most creatively interesting operations in the capture pipeline. The technical challenge: a captured walking cycle from a 1.75-metre human performer applied directly to a 25-metre digital giant will look wrong, because the rhythms, weight, and proportions of human walking are calibrated to human scale. The scale of the movement must be adjusted; the timing must change; the physics must be reconsidered.

The creative opportunity: retargeting is the moment at which the animator and the technical director can make choices about which qualities of the captured performance to preserve and which to reinterpret. The animators who worked on the colossi of Shadow of the Colossus — whose environmental world performance won the guild’s 2005 award — did not apply captured human movement directly. They used the rhythms and weight of human movement as a foundation and then adjusted for scale, physicality, and the specific emotional register each colossus required. The result was creatures who moved with recognisably biological intelligence without moving like humans, because the human data had been translated rather than applied.

This translation is among the most demanding and least recognised skills in game character production. It requires the retargeting artist to understand both what the original performance was expressing and what the target character needs to communicate — and to find the mapping between them that preserves the essential while adapting the contingent.

Markerless capture and the democratisation of motion

Optical marker-based capture — the standard approach since the 1990s — requires a dedicated studio, a suit covered in reflective markers, multiple calibrated cameras, and substantial post-processing time. The cost and complexity of this setup has been one of the primary barriers between AAA game production and the independent sector: a full body capture session at a professional studio costs thousands of pounds per day, and the post-processing of the resulting data can double or triple that cost.

Markerless motion capture — in which the performer’s movement is extracted from ordinary video footage without physical markers — has been a research goal since the early 2000s and has been approaching production viability throughout the 2020s. Deep learning systems trained on large datasets of human movement can now extract plausible skeletal data from monocular or stereo video with sufficient quality for many production purposes. Apple’s iPhone, from iPhone 12 onwards, includes LiDAR hardware that can support real-time body tracking. Rokoko’s inertial suit systems provide an intermediate option: not optical-marker quality, but professional-grade movement data at a fraction of the studio cost.

The implications for independent game development are direct. A developer who cannot afford a professional capture studio can now produce motion data from a camera and an open-source processing pipeline, or from a consumer-grade inertial suit, that is usable in production. The quality gap between this and professional optical capture remains real but is narrowing. What was, in 2010, an insuperable cost barrier for independent developers is, in 2026, a significant but not insuperable one.

Capture as training data

The role of motion capture in the AI pipeline is shifting from input to a rig towards input to a model. Where captured motion data has historically been used to drive character animation in real time through a retargeting and blending system, it is increasingly being used to train neural animation networks that learn to generate plausible movement rather than retrieving and blending captured clips. The Motion Matching systems described on the motion page represent an intermediate stage: they retrieve from a large capture database rather than generating, but the database has become the model. The fully generative systems that learn locomotion models from capture data rather than from captured clips are the next development in that line.

For the guild, the most significant implication of this shift is for authorship. In a conventional capture pipeline, the chain of creative agency is traceable: a director specifies a performance, a performer executes it, an animator retargets and refines it, a technical director implements it in the game engine. The creative contributions of each person are identifiable, even if the attribution is contested (as in the Serkis/Gollum case). In a generative model trained on a large capture dataset, the chain becomes harder to trace: the movements a character makes in any specific situation were not captured in that situation, not directed for that situation, and not animated for that situation. They were learned from patterns across a large dataset. Who made those movements? Who is performing?

The guild’s answer, consistent with its criteria, is that this is a question to be honestly described rather than deflected, and that honest description will sometimes require acknowledging that the authorship of a generated performance is distributed across the dataset rather than attributable to any single creative source. What remains to be determined is whether this distribution of authorship produces a different quality of performance — one that is broader and less specific, or one that is somehow richer for encompassing the movements of many performers rather than one — and whether that quality difference is perceptible to audiences and evaluable by critics. These are empirical questions that the guild expects the coming years to begin answering.

Page substantially revised May 2026 by Mnemion. The retargeting section draws on published production accounts and established technical literature. The markerless capture section draws on industry documentation for Rokoko, Apple LiDAR, and open-source computer vision research. The capture-as-training-data section represents Mnemion’s own critical assessment.