With Oscillations’ connection to the movement arts, it made sense to experiment with existing motion capture technology to find accurate, consistent, and scalable ways to obtain three-dimensional motion data for purposes such as animation or machine learning to augment performances in virtual reality.
An additional motivation to learn more about motion capture was connected to our early experiments with spatial audio (read more about them in our spatial audio blog post). Apart from using ambisonic recordings, we attempted to sharpen the effect of audio-visual synchrony by connecting sounds to movements in post-production, but this proved to be difficult using only two-dimensional motion tracking software.
Although we did not have time to return to our spatial audio prototype with 3D data, below we describe the results of having played with five different hardware and software options for motion capture. Each description is formatted similarly to a tech review you would see online and at the end, we summarize our findings in a chart. Our experiments are by no means scientifically sound nor rigorous, but we hope to give readers an intuition of each technology’s efficacy and even how to improve upon them for motion capture purposes.
Mocha is a graphic tracking software by Boris FX known for its planar tracking capabilities. The company has a VR (which they really mean 360) version that works specifically with equirectangular footage. Ranging from $995 to $1695, this option is on the relatively cheap side of motion capture technology and benefits from the fact that, except for a computer, Mocha does not need any additional hardware to work.
However, right off the bat we noticed how tracking a 360 video is still just motion tracking in two dimensions. This is a limitation especially if Oscillations were to incorporate depth into their virtual reality experiences. This was the software we initially used to attempt a prototype with spatialized audio, and while we could visually fake spatialization we could not spatialize audio without Z-axis information.
We moved on to another motion capture technique after reaching this result, but the jury is still out on whether Mocha would be a completely useless solution for Oscillations’ purposes. Consider applications with artificial intelligence to augment a VR performance. Could a machine use only 2D tracked data to learn phrases of movement to which it could react or for generating its own dance? We hope to answer questions like these with more research with this software.
After Mocha, we tried out Microsoft’s Kinect (Version 2) sensor with an application called Kinect Animation Studio (KAS). Out of the box, the Kinect can be used with Processing and some libraries to access the depth sensor and do things like track facial expressions and create skeletons from tracked bodies. The cost of all this is what it would cost for a strong computer, a Kinect, and a Kinect to Windows adapter, if you can still get your hands on one now that they are out of production.
KAS takes this tracking a step further, providing an interface for recording 3D skeleton data that is saved as a .fbx file which can then be edited and altered in motion graphics software like Autodesk’s MotionBuilder. After downloading the software, we just plugged in the Kinect and hit record. From there, it was also easy to transfer the .fbx file to MotionBuilder; the developers had some helpful video tutorials on their documentation page that detailed the workflow.
Then, we analyzed and played with the data. The results were decent. The Kinect is less accurate when a limb is obscured, like an arm going behind a person’s back. This makes sense considering the Kinect can only see one side of a body at any given time. The Kinect collected data at thirty frames/second with a fairly smooth and consistent result, although there was a quirky error in which the feet would glide around instead of staying steadily grounded.
One common thing to do with motion capture data is to do something called retargeting which is taking the animated data and rigging it into an avatar that is different than the original one. In this way, you can have many characters doing the same dance, or simply change the appearance of the dancer virtually without having to re-record another animated sequence. Naturally, we tried this and got some uncanny dancing characters. The avatars did not take the retargeting well, and their limbs ended up twisted and even more glitchy than the originally recorded skeleton, and our lack of experience with MotionBuilder meant we could not fix these errors (if they were even fixable in the first place).
The Kinect is far from perfect in regards to motion capturing, but it does have an abundance of data points, and our research showed that using more than one Kinect might improve the quality of the capture, so we followed up on these leads and explored it further.
iPi Recorder and Mocap Studio
Even though Kinect Animation Studio was very appealing for being free and easy-to-use, it only seemed fair to test the Kinect using premium software, considering every other mocap technique we tested also involved premium software. The best paid option for Kinect mocap that we found seemed to be iPi Recorder and Mocap Studio. iPi Recorder is free, capturing proprietary sensor footage with the Kinect for use with Mocap Studio, which costs between $95 and $1,195 a year. The cheapest version that works with two Kinects is $345 a year.
Setting up single Kinect recording is still incredibly easy. There is a one-step calibration that scans the background, but then it is ready to record. Since recording is done with iPi Recorder, one cannot see real-time motion capture. Recordings must be done first, and then ported over the iPi Mocap Studio, which is an unfortunate limitation. Still, single Kinect recordings did end up looking better than what we got with Kinect Animation Studio. It couldn’t surmount the 180 degree problem, but it did more reliably track what it could see. iPi Mocap Studio also makes it easy to make manual adjustments. This turned out to be very necessary, since there seemed to be a certain point during recordings where tracking would break down completely, requiring a manual fix. It would also be necessary for manually adjusting limbs out of the Kinect’s view.
The iPi software can get around this issue by accepting up to four Kinects for tracking. The major limitation with multi-sensor solutions is the fact that each Kinect requires its own computer given the nature of the Kinect SDK. And the iPi software is fairly processor-intensive, meaning the computers need to be powerful too. iPi does make dual-Kinect recording somewhat easier by allowing one computer to be the master, as long as both computers are on the same network. Pressing record on the master starts recording from both computers, and when the recording is done, the files automatically transfer over from the other computer to the master. We tried a couple times to get a multi-Kinect solution working properly using two gaming computers, but we weren’t able to get the setup working well enough to capture usable data.
There are two reasons for this. The first is that the calibration procedure for dual Kinects requires a very particular setup. Even though the maximum tracking area is 7 by 7 feet (for single or dual Kinect recording, which is admittedly small), the Kinects need to be farther apart than that. Calibration involves a bright light that can constantly be seen by both Kinects, or a large board that needs to be moved throughout the tracking volume. We had two gaming desktop computers that we tried to use for the dual Kinects, but space limitations meant we could not attain the proper setup. This meant our calibration efforts failed, and the data we tried to capture was an unusable mess.
We were able to try in a larger space that could accommodate a proper setup by utilizing a gaming laptop. But the software was intensive enough that even a Razer Blade laptop, with a quad-core i7 processor and GTX 1060, couldn’t keep it running at a stable FPS. The software refused to run without stable FPS from both computers, and we couldn’t move the desktops around to achieve the same setup. Even if we were able to, these very particular setup and calibration requirements for dual Kinects show that it is not the most feasible mocap technique. Still, from the quality of data we’ve seen online, we think exploring dual Kinect mocap could still be a worthwhile future direction for later iterations of this project. The ability to capture markerless 360 mocap data is too alluring to give up completely on this option.
HTC Vive Trackers
Using the HTC Vive was our first motion capture attempt with markers or trackers. The Vive Vive Trackers are accessories to the headset sold as being able to “bring real-world objects into your virtual world”, but we attached them to a dancer’s body to capture their movement. We were inspired by previous attempts at using Vive Trackers for motion capture such as the work by Chen Chen, a Northwestern PhD student who has used them for motion capture art installations, and wanted to try out accompanying software that would make the capturing process easier.
We came across a paid option called IKinema Orion which conveniently forms a rigged model that we can view in real time as you record the dancer. The way it works is that it requires a minimum of six trackers, for example one on your hip, two on your feet, two controllers, and the headset. Then, the software interpolates the intermediary joints between trackers to form a full skeleton.
The setup for this technique is relatively straightforward as long as you have the equipment, which could easily cost upwards of $1000. You’ll need the HTC Vive, which costs $600, for its base stations and hand controllers. Although the headset isn’t used during motion capture, it still needs to be on because the hand controllers connect to the computer through the headset. You also have to consider the size of the space you are recording in, and if there are any light sources or objects that might interfere with the sensors. The tracking area is limited to the size of your Vive playspace; in other words, how far apart you can place the base stations that come with the Vive. Our recording space was 14 by 14 feet. Another logistical issue to consider is comfort for the performer. The trackers have to be strapped on, the controllers must be held, and they stick out in odd ways that may throw a dancer off balance. However, with all the pieces in place, you can get some decent results from the trackers.
While the Kinect had an abundance of data points, the Vive has at least 6 and no more than 9, which means loss of resolution of motion; any nuances are lost on this technique. However, the trackers have a higher capture rate of 200Hz/device/second which is most likely why the result from the Vive is comparable to the Kinect results; the Vive has a finer temporal resolution, while the Kinect has richer spatial data. All in all, the Vive seemed to have a higher quality capture than the Kinect, and we can attribute this improvement to the robust SteamVR laser tracking system (called Lighthouse) that provides additional directional data that the Kinect can not collect, as well as the 360 degree data capture that the Vive setup allows. Lighthouse affords incredibly fluid motion capture that almost never loses the tracking points.
Still, this technique leaves us wanting. Is there another option that does not need to be confined to a space? Can we get more tracking points on the body? Can we get even more accurate results? A faster frame rate for higher resolution capture? We examine two more techniques that explore these questions, and they take the form of motion capture body suits.
Note: Setting up 3 or more Vive Trackers on a single computer is not easy. These specific instructions are what finally worked for us.
The Perception Neuron is the cheaper of the two “motion capture suit” technologies that we tried. It isn’t really a suit; it’s rather a series of straps that connect together at a central core that buckles around the hip. Straps go around the arms, legs, feet, and forehead, and the user wears gloves that are covered in further sensors. This design means it is not nearly as constricting as a skin-tight suit like the Xsens, but it does take a few minutes to put on properly. Overall, it was the more comfortable of the two mocap suits. One thing to note is that Xsens comes with a set of straps that we didn’t use this time, and Perception Neuron team is said to be working on a lycra suit to increase accuracy of the data and simplify setup, so this was not a significant consideration in our comparison.
On paper, the Neuron immediately has benefits over the other mocap solutions mentioned so far. The sensors are incredibly tiny, so they don’t impede much movement, and with a maximum of 32 sensors accepted by the system, the Neuron can track more points on the body than any of the other solutions we tested. This is the only mocap solution we tested that has individual finger tracking, for example. For these mocap suits, no external trackers are necessary, and they capture full 360 degree data.
We tested the most expensive Neuron kit (32 sensors), which is $1,500 for the hardware and comes with free software called Neuron Axon. The Neuron is nice because it can accept any number of sensors. For example, one could go without the hand tracking for easier setup and less constriction on the wearer.
In practice, the benefits of the Neuron are not as clear. Setup was difficult, and the software often did not work as intended. The Neuron has a beta wireless feature, but we could not get it working, so we were stuck with the default tethered mode to view live data. This is obviously very disadvantageous for any mocap, especially of dancers. There is another mode where data can be saved locally to an SD card using a slot on the Neuron’s core, but it’s impossible to know if the suit is working properly in this mode.
Reliability was a major problem with the Neuron, even as we ran it in tethered mode. There always seemed to be at least one or two random sensors that would not track on the suit, which could only be fixed by switching them out with unused sensors. The software was generally buggy, making it difficult to calibrate the suit and even recognize it when it was plugged in. The most frustrating issue was constantly losing leg tracking, rendering a skeleton that would simply pivot on a fixed floor point.
One major limitation with the Neuron is its sensitivity to magnetic interference. Each sensor has a magnetometer, and Perception says that the suit needs to be kept at least three feet away from any strong magnetic field, including most electronic devices. They also recommend scanning the area to be used with the Neuron with a magnetometer (or magnetometer smartphone app), and when in transport, the sensors need to be placed in special containers. This severely limits the kinds of spaces that the Neuron can be used in, and our inability to ensure complete shielding from magnetic interference could also be a reason the Neuron did not do so well in our tests.
Even when we did get the Neuron working, we were not impressed with the quality of the data. The tracking was rough and somewhat jittery, even compared to the Vive Trackers. In fact, even with all the additional tracking points, the skeleton did not seem to have better inverse kinematics than IKinema Orion. Seeing the hand data was definitely impressive, and the additional body joints and rotational information definitely helped make a more convincing skeleton. Still, the software lacked the refinement that would make the skeleton look more natural, and the glitches we experiences with the software and hardware made the Neuron less than ideal to work with. We do want to clarify that we had less time than we would have liked to learn the setup process and test the Neuron to its full ability (such as wirelessly), so we do think it has much more potential if one is dedicated to putting more time into it.
The Xsens Link showed dramatic improvements over the Neuron, coupled with a dramatic price increase. The kit costs $12,000, and the required software, MVN Animate, costs an additional annual subscription or lifetime price. While a premium price, it’s still on the cheaper end of high-end motion capture techniques, especially compared to purpose-built optical tracking solutions such as OptiTrack. These require careful setup of expensive external equipment, a controlled environment, and often invasive markers, and costs can run into the hundreds of thousands of dollars.
The Xsens Link is a skin-tight suit that doesn’t have these limitations. It transmits to a computer using Wifi, meaning the Wifi signal is the only limitation for tracking space. There is no magnetic sensitivity or need for external tracking sensors, and it works indoors and outdoors. Many of the dancers did complain that the suit was incredibly tight, but it was faster to change into compared to the Neuron.
However, the Xsens Link is the hardest mocap solution we tested to set up. This is largely because properly arranging the sensors within the suit is difficult and often non-intuitive. Whereas the Neuron has clearly labeled components and fewer ways of routing wires, the Xsens had some unclear labels and lots of wires to deal with. This meant changing suit sizes was a time consuming process, taking around 20 minutes. This made it impractical to switch between mocap sessions for people of very different sizes. While this could be worked around by running sessions on people of one size before moving to another size, it’s less than ideal. Once the suit was set up, it did look very seamless thanks to zippers that hide all the components.
After initial setup, running the software was painless. We didn’t encounter any bugs with calibration, unlike the Neuron, and the recordings were incredibly impressive in quality. Data from the Xsens Link was by far the the best out of all the mocap solutions we tested. The tracking pretty much never cut or glitched out, keeping up with very complex dance moves. Even when the dancers were completely upside down or spinning around on the floor, the skeleton remained faithful to their movements. And the suit captures data at 240 frames per second, meaning the dances were very smooth. This is compared with the 60 fps capture from the Neuron when using all the sensors (which can go up to 120 fps when using 17 or fewer of the 32 sensors).
Our only complaint is that the IK of the skeleton were a little confused by some of the more complicated dance moves, resulting in slightly unnatural limb movements at times. This seems to be a limitation of the tracking points; with 17 trackers, it offers more detailed joint data than most of the other solutions we tested, but still not enough to always know what the body is doing. Still, as the Neuron shows, more sensors doesn’t necessarily mean higher quality data. The Xsens Link may still not match the quality of truly premium solutions like OptiTrack, but the increase in data quality seems marginal given how much more expensive the next-best solutions would be.
This software was our weapon of choice for viewing and toying with .fbx data, first from the Kinect, and later from the body suits. Our knowledge of 3D motion graphics software is limited, but what MotionBuilder seems to be optimized for is editing 3D motion data. You can add keyframes and adjust certain parts of the animation, and you can easily retarget animations. We had some difficulty defining skeletons when our subject did not start in a standard “T-Pose”, but there may have been a way around that issue that we just didn’t know about.
MotionBuilder is an Autodesk software and is free for students for three years. It is a pricey bit of software without the discount.
Nuke is a film industry standard 3D video editing software for creating and altering three-dimensional models and environments for screen; think Industrial, Light, and Magic. With the “3D” part of the software in mind, we dabbled in it to see if it might be useful for us, maybe more so than Mocha or even MotionBuilder. However, no one on the team had experience with it, and familiarizing ourselves with the interface was taking a long time. So we stopped looking into it for now, but there is still a lot to explore with Nuke and hopefully future research can revisit this software to examine its potential.
All of these solutions do require a powerful computer to run them.
We divide price into the hardware and software cost, excluding the cost of the computer.
We divide ease into two categories. Ease of setup is how difficult it is to set up for the first time, and ease of use is how easy it is to use for doing mocap recordings, after everything is set up.
We divide results into two categories, both on a 1 to 10 scale where 10 is best. Consistency is how infrequently the capture technique loses data. For example, if a tracking point is often lost, then the consistency score goes down. Accuracy is how well the mocap data models the actual dancer’s movements. When nuance is lost, the score suffers.
Recording limitations include the maximum size that the technique can capture data within, the type of data it collects (180 or 360, 2D or 3D), the frames per second of the method, and the comfort for the dancer (or whoever is recorded).
|Price (hardware/ software)||Hardware: $0 Software: $995 to $1,695||Hardware: ~$250 Software: Free to $1,195/year||Hardware: ~$1,000 Software: Free to ~$500||Hardware: $1,500 Software: Included||Xsens: $12,000 Software: $15,000/year|
|Ease (use and setup)||Setup: None Use: Learning curve, but easy with experience||Setup: Easy Use: Could be hard if you code yourself, but existing software makes it easy||Setup: Medium Use: Could be hard if you code yourself, but existing software makes it easy||Setup: Hard Use: Easy||Setup: Hardest Use: Easy|
|Results (consistency, accuracy)||Accuracy and Consistency: for 2D motion tracking: 10 for 3D motion capture: 0||Consistency: 3 Accuracy: 5||Consistency: 8 Accuracy: 7||Consistency: 6 Accuracy: 6||Consistency: 9 Accuracy: 8.5|
|Recording Limitations (maximum space, depth data, frame rate, comfort)||Space: Unlimited Depth: None Slow on a weak computer FPS: depends on footage Comfort: Not a concern||Space: 7 by 7 feet Depth: 180 to 360 FPS: 30 Comfort: Not a concern, but maybe limited motion to 180 degrees||Space: ~20 by 20 feet Depth: 360 FPS: 100 Comfort: OK||Space: USB tethered or Wifi-limited Depth: 360 FPS: 60 Comfort: Very good||Space: Wifi-limited Depth: 360 FPS: 240 Comfort: Good|
Our findings reflect that generally speaking, the more expensive the solution, the better the motion capture data. The Xsens was by far the most expensive solution we tested, but it was also by far the most refined hardware and software, and it produced the highest quality mocap data. This is certainly not a surprise, but the more interesting question is: which solution(s) are good enough? In other words, what’s the best price to quality ratio? This is hard to answer without more inquiry into the mocap solutions we weren’t able to try. For example, single Kinect recordings do not seem sufficient because of the heavy manual editing that needs to be done to account for the loss of tracking whenever a dancer turns. However, dual Kinects may be enough to rival the more expensive solutions we tested.
The software is another important limitation. The Kinect’s experience varies wildly depending on the software used. The Vive Trackers ended up being so much more impressive than we expected because the Orion software had superb IK. Mocha VR is not a very viable option because of the time-intensive task of motion tracking, and the lack of 3D data.
The total cost for the Vive Trackers and Perception Neuron both add up to about $1,500. They were both lacking in their quality of mocap data, but the Vive Trackers did have the edge for ease-of-use and tracking reliability. They would make a fairly easy way for collecting lots of mocap data of dancing, as long as the dancers aren’t too limited by its physical implementation. The Neuron seemed much too unrefined for collecting large amounts of mocap data, as we spent vastly more time troubleshooting its problems than actually recording dances during our time with it.
The Xsens really was the best solution we tested by far, and while it is no doubt expensive, its costs may be worth it if high-quality mocap data is the end goal. This is especially true considering the next best options are so much more expensive. Still, if you just want to try out mocap with the least additional cost, and you already have a HTC Vive, the extra cost of four Vive Trackers ($400) is worth it considering the quality of data. Otherwise, maybe the Kinect is the best place to start.