Natural Language Processing Portfolio Analysis

Leveraging natural language processing to analyze grant portfolio data.

Executive Summary

With portfolios changing every year and ranging from 10 to over 100 projects, program officers and their analysts spend a lot of time keeping track of their funded projects. What species does a portfolio include? What is the breakdown of grant mechanisms? What is the change in funding for mice (or any model, species, etc) this year vs last year vs two years ago? What is the gender distribution of Principal Investigators? Answering these questions requires days, if not weeks of time of applying formulas and searches to the National Institutes of Health’s grant application system or manually reading through abstracts, specific aims, and summary statements.

This team would like to create a tool that uses natural language processing (NLP) to search through a grant application’s text to answer a program officer’s questions. The team seeks to start with answers to pre-defined, popular questions and also give users the ability to ask their own questions. This will allow huge time savings, which results in more bandwidth for program officers and analysts to do other work. What gets measured gets managed. With the ability to get a better overview of what is happening in their portfolio, program officers can set goals based on the newly available information and work to achieve those goals.

A project supported by the: HHS Ignite Accelerator

Team Members

Bishen Singh (Project Lead), NIH
Brian McClanahan, NIH


January 2017: Project selected into the HHS Ignite Accelerator
February 2017: Time in the Accelerator began