A Deep Dive into AI-based Animal Body Part Detection

Janell Richardson, PhD

Friday, December 21st, 2018

A Deep Dive into AI-based Animal Body Part Detection — Photo courtesy of Dr. Alexander Mathis

Animal behavior can provide valuable scientific insight, from assessing model validity of a disease state to understanding the function of complex neural circuits⁶^,¹². Accurately tracking an animal's body parts (such as limbs, digits, ears, snout, tail) is especially relevant within the neurodegenerative field¹¹, but is difficult without highly specialized and demanding equipment.

Many of the commercial systems and open source software available can record and analyze complex body movement of animals in a variety of user-friendly interfaces with high throughput⁷^-¹⁰^,¹³. However, they are often rigid in their use, rely on expensive licenses, can require invasive markers, and/or have specific hardware requirements.

Researchers need systems which can precisely observe animal behavior, but with greater flexibility and reduced overhead requirements. This is the niche that machine learning systems, such as DeepLabCut, are attempting to fill.

Observing Animal Behavior with Machine Learning

Mathis and colleagues published their technical report in the September issue of Nature Neuroscience¹. DeepLabCut is an open source Python toolbox for using user-labelled images to train tailored networks to perform automatic labeling of novel data²^,³.

DeepLabCut provides a robust and powerful machine learning platform (based on DeeperCut, pretrained on ImageNet) to predict the location of a body part without the need for markers, providing a flexible detection system¹^,⁴^,⁵.

Challenges of Observing Animal Models

In current video analysis of hand movement detection, markers are often difficult to use and can inhibit full articulation of the joints¹. Furthermore, in the real-world laboratory, use of video tracking introduces a multitude of confounding factors, such as changing light conditions, lens distortions, shadows, and animal body part distortions based on environmental interactions.

Commercial systems and computational video analysis can address many of these in preprocessing steps, e.g. camera calibrations. Yet, it can be a tedious process and cannot account for all variables that can distort video detection during an experiment.

To test its ability to detect over a variable visual landscape, DeepLabCut was evaluated without the use of preprocessing¹. The team wanted to test DeepLabCut's ability to:

Locate and track body parts without predefined markers
Evolve its ability to do so as its data set grows
Function in a variety of behavioral and environmental setups
Support maximum user flexibility and transparency
Detect and observe behavior faster and more accurately

Mathis and colleagues subjected DeepLabCut to analyze three different behaviors in three different animals, all without any preprocessing, to test its labelling accuracy against that of a human researcher.

DeepLabCut Performance

DeepLabCut was able to approach human detection quality when tested after training¹. They also varied the size of the training sets to compare the number of trained images with the reported error size. Thus, they were able to determine the number of training frames required for "excellent" generalization of a given body location indicative of a set behavior¹.

The algorithm is also highly data-efficient, in that, data augmentation alone does not provide for any further reduction in error. However, increasing the number of images that capture behavioral variability did produce a performance gain. Interestingly, DeepLabCut performed significantly better when trained on all body parts of a given animal compared to a more focused training approach, such as only observing a single part. This holds true even when both networks (all body parts vs. specific body part) have the same amount of data on the given specific body¹.

DeepLabCut was able to provide excellent detection with a variety of training size sets to: odor detection (required snout, ear, and tail tracking) in a mouse; body tracking in freely behaving Drosophila (notable background and orientation challenges); and a skilled reaching task (pulling a joystick requiring individual digit tracking) of a mouse's hand¹. The more complex a behavioral detection and/or setting, the more it increased the need for training image sets. However, DeepLabCut was able to detect highly enriched hand pose estimation in a mouse with only 141 training frames representing 5 different mice¹.

Harnessing Pre-Trained Data Sets

A key difference in DeepLabCut compared to established detection platforms is its ability to demonstrate markerless generalization and transfer learning of a trained network¹. The algorithm trained to only one mouse's body parts could recognize the body parts of novel mice with different body sizes. The task was not error-free but provided a case for body location detection trained in a single animal extending to multiples within the same image, which could be useful in social paradigms¹.

The power of DeepLabCut is in its ability to harness the full neural network of DeeperCut, which is used to learn and predict human body postures in a robust way⁴^,⁵. Furthermore, the ability of DeepLabCut to demonstrate transfer learning (pretrained models to new tasks) can reduce the time and data necessary to accurately detect a given behavior in a laboratory setting.

Importance of Training Sets

DeepLabCut's highly-flexible algorithm allows for quick application to highly diverse behaviors with equally diverse video quality outputs, e.g. hand movements in a mouse vs. freely moving winged insects. It is a welcome addition to the animal detection/tracking platform. However, DeepLabCut is only as good as the training set volume, diversity, and accuracy of both labeling by the user and reflection of experimental (unlabeled) data.

The flexibility of the system requires users to adequately define their training set and its diversity based on the behavior they want to analyze. The ability of the algorithm to generalize can also lead to suboptimal sampling if given a noisy or sparse behavior lacking adequate representation within the training set¹. This can be addressed by the user in post hoc fine-tuning of the network, but requires knowledge of how the system arrives at its estimations.

Accessibility for Researchers

DeepLabCut is not a plug-and-play option. Users must be familiar with coding in Python (although a step by step guide is provided by Mathis and colleagues) and the visual requirements of the network to reliably capture their experimental behavior³. This is in contrast to many commercial systems that often have user-friendly interfaces, customer support, automated functions, and established, preprogrammed computational methods for key behavioral captures. The trade-off is often the rigidity and cost of the plug-and-play system.

Summary

DeepLabCut provides an open source software tool that harnesses the power of the DeeperCut platform⁴^,⁵. Investigators were able to show it accurately extracted image frames from video and could match human labeling accuracy after exposure to a sufficient training set. This trained network could then be used to label body part locations in unlabeled (novel) data with high accuracy.

The plasticity of DeepLabCut can provide a profound tool in the plethora of available detection software to provide an ever-increasing toolbox for investigators.

View the Taconic Biosciences' Webinars:

References:

1. Mathis A., Mamidanna P., Cury K.M., Abe T., Murthy V.N., Weygandt Mathis M., and Bethge M. DeepLabCut: Markerless Pose Estimation of User-Defined Body Parts with Deep Learning. Nat Neurosci. 2018, 21, 1281-1289.

2. Adaptive Motor Control Lab. DeepLabCut. (accessed Dec 5, 2018).

3. Nath T., Mathis A., Chi Chen A., Patel A., Bethge M., Weyandt Mathis M. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. (accessed Dec 5 2018).

4. Insafutdinov E., Pishchulin L., Andres B., Andriluka M., Schiele B. DeeperCut: a deeper stronger, and faster multi-person pose estimation model. European Conference on Computer Vision (Springer, NY, USA, 2016) 34-50.

5. Insafutdinov E., Pishchulin L., Gehler P., Schiele B., 2D human pose estimation: new benchmark and state of the art analysis, Proceedings if the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, Piscataway, NJ, USA, 2014) 3686-3693.

6. Sousa N., Almeida O.F., Wotjack C.T.A Hitchiker's Guide to Behavioral Analysis in Laboratory Rodents. Genes Brain Behav. 2006, 5 (Suppl. 2) 5-24.

7. Maghsoudi O.H., Tabrizi A.V., Robertson B., Spence A., Superpixels Based Marker Tracking vs. Hue Thresholding in Rodent Biomechanics Application. (2018).

8. Kabra M., Robie A. A., Rivera-Alba M., Branson S., Branson K. JAABA: Interactive Machine Learning for Automatic Annotation of Animal Behavior. Nat methods. 2012, 10, 64-67.

9. Drai D. and Golani I. SEE: A tool for the Visualization and Analysis of Rodent Exploratory Behavior. Neurosci Biobehav Rev.2001 25(5) 409-26.

10. Gomez-Marin A., Partoune N., Stephens G.J., Louis M., Brembs B., Automated Tracking of Animal Posture and Movement during Exploration and Sensory Orientation Behaviors. PLoS One. 2012 7(8) e41642.

11. Dawson T.M., Golde T.E., Laiger-Tourenne C. Animal models of Neurodegenerative Diseases. Nat Neurosci. 2018 21 1370-1379.

12. Krakauer J.W., Ghazanfar A.A., Gomez-Marin A., MacIver M.A., Poeppel D. Neuroscience Needs Behavior: Correcting a Reductionist Bias. Neuron. 2017 93 480-490.

13. OmicX. Animal Behavior Tracking Software Tools. (accessed Dec 5 2018).

Experience & Expertise You Can Trust

Taconic Biosciences' model generation team has produced about 5,000 models in the last 15 years, developing a globally-recognized reputation for advancing the work of in vivo researchers. Our scientific program managers are here to help you navigate the complexities of model generation.

Request a Consultation