NIH Toolbox®

NIH Toolbox contains two types of measures:

  • Performance-based tests of function (also known as “objective measures”)
  • Self-report and parent-proxy measures of functions, symptoms, and feelings

Scores for Performance Tests of Function

NIH Toolbox performance tests of function have three types of scores: Age-Corrected Standard Scores, for which the normative mean is 100 and the standard deviation (SD) is 15 (commonly referred to as Standard Scores); Uncorrected Standard Scores (mean = 100, SD = 15); and Fully Corrected T-Scores, which are primarily intended for neuropsychological applications and correct for age and other demographic characteristics (education, gender, and race/ethnicity) that may affect the performance of people in the general population. To distinguish them from Age-Corrected Standard Scores, these Fully Corrected Scores are based upon a T-Score metric, with a normative mean of 50 and an SD of 10.

A standard score of 85 indicates a level one SD below the mean of the referent population; a standard score of 115 indicates one SD above that mean. Analogously, a T-Score of 40 indicates a performance level one SD below the referent population mean, while a T-Score of 60 indicates performance one SD above that mean.

Uncorrected Standard Score:

  • This score compares the score of the test taker to those in the nationally representative NIH Toolbox normative sample regardless of age or any other variable
  • The Uncorrected Standard Score provides a glimpse of the given participant’s overall performance when compared with the general U.S. population

Age-Corrected Standard Score:

  • This score compares the score of the test-taker to others of the same age
  • A score of 100 indicates performance that was at the national average for the test-taking participant’s age. A score of 115 or 85, for example, would indicate that the participant’s performance is 1 SD above or below the national average, respectively, when compared with like-aged participants
  • Age-corrected standard scores were derived separately for children (ages 3-17) and adults (ages 18-85)
  • Age bands
  • Adults: 18-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-85
  • Children ages 3-17: Normative scores are provided separately for each year of age to take into account expected developmental changes


National Percentiles:

  • A Percentile represents the percentage of people nationally above whom the participant’s score ranks (the comparison group will be based on whichever normative score is used)
  • A Percentile is simply a transformation of the participant’s normative score (Age-Corrected Standard Score, Fully Corrected T-Score, or Uncorrected Standard Score) into a format that many consider more easily understood
  • A 14-year-old who attains a national percentile of 84 on a given NIH Toolbox measure performs better than 84 percent of 14-year-olds in the NIH Toolbox national norming study
  • Percentiles for any NIH Toolbox normative score can be looked up in Appendix A of the NIH Scoring and Interpretation Manual for the iPad

Scores for Emotion and Other Self-Report Measures

Emotion measures of NIH Toolbox are self-report assessments. Like PROMIS® measures, these NIH Toolbox scores are reported as T-scores, where 50 is the mean of a referent population and 10 is the standard deviation. Higher scores indicate more of the trait being measured. The types of scores available for NIH Toolbox Emotion and other self-report measures are:

Uncorrected T-Score:

  • This score, provided for participants of all ages, compares the performance of the test-taker to those in the entire NIH Toolbox nationally representative normative pediatric or adult (as appropriate) sample, regardless of age or any other variable
  • The uncorrected T-score provides a glimpse of the given participant’s overall performance when compared with the general U.S. population. For adults, this score is on a T-score metric.
  • This score may be most useful when trying to gauge one’s overall level of functioning, not in the context of age, gender, or other demographic factors. It may also be of interest when monitoring performance over time.

Age and Gender Corrected T-Score for Children:

  • These are scores for children only (ages 3-17) in which corrections are made both for age and for gender (they are considered “fully corrected” because they are the factors that can lead to significantly and meaningfully different scores for these ages, based on analyses of the NIH Toolbox normative study data).
  • There are two main reasons for providing age- and gender-corrected scores for children on the NIH Toolbox Emotion battery: 1) somewhat different instruments are used for different ages, including that they are based only on parent report for ages 3-7, both parent report and self-report for ages 8-12, and only self-report for ages 13-17; and 2) it is generally considered not appropriate or desirable to use the same normative standards and expectations for boys and girls at different ages (as an extreme example, for a 3-year-old boy and a 17-year-old girl).

Scores for Translations

Spanish NIH Toolbox measures also have three types of scores: Age-Corrected Standard Scores, Uncorrected Standard Scores, and Fully Corrected T-Scores. The same guidance used for selecting the appropriate score in English applies to Spanish translations.

Swahili and Dholuo
It is strongly recommended that examiners and administrators only use raw scores in any data interpretation of the NIH Toolbox African Languages tests. All standard scores, T-scores, and percentiles are based on the NIH Toolbox normative sample that was collected on an English-speaking population in the United States. Therefore, it would not be appropriate to compare participants in Africa taking translated instruments to the normative data.