Rapid Autism Classification for Public Health

Leveraging algorithms to review medical records and reduce labor of estimating the prevalence of autism in the US.

Executive Summary

There is currently no definitive biomarker for autism and professionals use a variety of standards and tools for diagnosing the condition. A major challenge for autism prevalence studies is establishing a reliable definition of autism that can be applied consistently to each child.

Through the Autism and Developmental Disabilities Monitoring Network (Network), the CDC supports population-based studies to estimate the prevalence of autism spectrum disorder among children living in different areas of the US. Trained clinicians manually review and annotate developmental evaluations collected from health and education records to determine whether a particular child has the behavioral features needed to meet the criteria for autism. The Network reviewers must read many varied paper documents, make notations, and finally determine whether to count the child as having autism (for the prevalence study). This process takes approximately 45 minutes for each child’s record.

As a pilot study, the team trained a computer algorithm to use the words contained in a child’s records to predict whether the Network clinician would have classified the child as meeting the autism criteria. The algorithm’s predictions agreed with the Network clinicians about 87% of the time. In comparison, two human Network clinicians agreed with each other about 91% of the time. The algorithm also produced a prevalence estimate of 1.5% as compared to 1.6% when done manually. The notable difference was that it took clinicians about 1,200 hours to get to that number, while the computer needed only 1 second!

The team will work toward several major goals:

  1. Develop more refined classification algorithms using recently developed machine-learning and natural language processing tools, such as paragraph vectors and convolutional neural networks.
  2. Evaluate the accuracy of the classification algorithms through testing at multiple Network sites and across time to establish how well an algorithm can perform using evaluations from different parts of the country or written in different years.
  3. Make the tools and processes scalable across the agency.

A project supported by: HHS Secretary's Ventures Fund

Team Members

Matthew Maenner (Project Lead), CDC
Nicole Dowling, CDC
Chad Heilig, CDC
Scott Lee, CDC
Laura Schieve, CDC
Maureen Durkin, University of Wisconsin-Madison


June 2015: Project begins receiving mentoring and funding from the HHS Secretary's Ventures Fund
June 2017: Support from HHS Secretary's Ventures Fund ends

Project Sponsor

Coleen Boyle, Director, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention

Bill MacKenzie, Deputy Director for Science, Center for Surveillance, Epidemiology, and Laboratory Services, Centers for Disease Control and Prevention

Additional Information