National Institutes of Health Stroke Scale (NIHSS)

Overview

The NIHSS has established reliability and validity for use in
prospective clinical research, and predictive validity for long-term
stroke outcome (Adams et al., 1999; Brott et al., 1989; Lyden et al.,
1994). For the purposes of this review, we conducted a literature search
to identify all relevant publications on the psychometric properties of
the NIHSS.

Reliability

Original.
Brott et al. (1989) designed the NIHSS and assessed the scale’s
reliability in 24 patients with stroke. Inter-rater reliability
for the scale was adequate (mean kappa = 0.69). Agreement was excellent for six items: papillary response (kappa = 0.95), best
motor arm performance (kappa = 0.85), best motor leg performance (kappa
= 0.83), best gaze (kappa = 0.82), and level of consciousness questions
(kappa = 0.80). The lowest agreement was for the qualitative assessment
of level of consciousness (kappa = 0.49). Of the 15 test items, the most
inter-rater reliable item was pupillary response. Less reliable items
were upper or lower extremity motor function. Test-retest
reliability
was adequate to excellent (mean kappa = 0.66 to 0.77). The
correlation between the first examination scores and the second
examination scores (within 24 hours) was excellent (r = 0.98).
Test-retest reliability did not differ significantly when administered
by different health care professionals such that the correlation of one
examiner’s score for the first exam with a different examiner’s score
for the second examination was excellent; for example, a first
examination by the neurologist of an individual patient correlated with
a second examination of that patient by the emergency department nurse
with Spearman’s correlation = 0.98. These results suggest that the NIHSS
can be reliably administered to patients with stroke.

Meyer et al. (2002) examined the inter-rater reliability of
the NIHSS and the mNIHSS in 45 patients with a history of stroke. Two
neurologists tested each patient. Dysarthria was the only item of the
NIHSS found to have poor inter-rater reliability (kappa = 0.289), and
four items were found to have adequate reliability. Ten items were found
to have excellent inter-rater reliability. Kappa scores ranged from
0.289 to 0.975. The kappa value for the total NIHSS score was excellent
(kappa = 0.969). The results of this study suggest that the NIHSS has
high inter-rater reliability.

Similarly, Goldstein, Bertels and Davis (1989) examined the
inter-rater reliability of the NIHSS in 20 patients with
stroke. A pair of clinical stroke fellows rated each patient.
Inter-rater reliability ranged from adequate to excellent for 9 out of
13 items.

Goldstein and Samsa (1997)
examined the reliability of the NIHSS when administered by
non-neurologists in the setting of a clinical trial. Thirty physician
investigators (30% non-neurologists) and 29 non-physician study
coordinators were trained to administer the NIHSS. Four patients were
rated and after 3 months had elapsed, then the same four patients were
re-rated, in order to provide a measure of intra-rater reliability. Four new patients were also rated after 3 months and
were compared to the initial 4 ratings in order to assess inter-rater
reliability
. The intraclass correlation coefficients (ICC’s) were
excellent for the initial four cases (ICC = 0.94) and for the four new
cases rated 3 months later (ICC = 0.92). The overall ICC based on the
ratings of these 8 cases was excellent (ICC = 0.95), suggesting that
NIHSS administration by non-neurologists has a high level of
inter-rater reliability for the cases
rated during the initial training session and re-rated after 3 months
had elapsed (ICC = 0.93), suggesting that NIHSS administration by
non-neurologists also has a high level of intra-rater reliability.

Lyden et al. (1994) trained raters to administer the NIHSS to 11
patients using a training video. The inter-rater reliability of
this method was then calculated. Moderate to excellent agreement was
established on most NIHSS items (unweighted kappa > 0.60). Only two
items, ataxia and facial paresis, showed poor agreement (unweighted
kappa < 0.40). The results of this study demonstrate the strong
reliability of the NIHSS when raters are trained by a standardized
video.

Shafqat et al. (1999) evaluated the reliability of administering the
NIHSS remotely (by telemedicine link) by obtaining one bedside and one
remote NIHSS score independently for 20 patients with stroke. Kappa
coefficients were calculated for inter-rater reliability between
bedside and remote administration scores. Excellent agreement was
achieved for four items (orientation, kappa = 0.75; motor arm, kappa =
0.82; motor leg, kappa = 0.83; neglect, kappa = 0.77). Six items
displayed adequate agreement (language, kappa = 0.65; dysarthria, kappa
= 0.55; sensation, kappa = 0.48; visual fields, kappa = 0.60; facial
palsy, kappa = 0.40; gaze, kappa = 0.41). Two items achieved poor
agreement (commands, kappa = 0.29; ataxia, kappa = -0.07). Total NIHSS
scores obtained by bedside and remote methods of administration were
highly correlated (r = 0.97). These results suggest that the NIHSS can
be reliability administered by telemedicine.

Similar to the study by Shafqat
et al. (1999), Meyer et al. (2005) also examined the reliability of
NIHSS administration by wireless and site-independent telemedicine in 25
patients with stroke. Patients were evaluated by both remote and bedside
examination. Inter-rater reliability between remote and beside
examiners for the NIHSS was found to be poor for two items (facial
palsy, kappa = 0.22; limb ataxia, kappa = 0.34), adequate for 3 items
(left leg motor, kappa = 0.74; language, kappa = 0.73; dysarthria, kappa
= 0.61). Ten items showed excellent agreement (kappa’s ranged from 0.80
to 1.00). The ICC was excellent for the total NIHSS score (ICC = 0.94).
Taken together with the results by Shafqat et al. (1999), the NIHSS can
be reliably administered by wireless and site-independent telemedicine.

Dewey et al. (1999) examined the
reliability of the NIHSS in a community-based sample of 31 patients with
stroke. Two neurologists and one of two research nurses assessed the
patients. Inter-rater reliability, as
there was a high level of agreement for total scores between the two
neurologists (ICC = 0.95) and between each neurologist and research
nurse (ICC = 0.92 and 0.96). While there was adequate to excellent
agreement among neurologists and research nurse (weighted kappa >
0.4) for the majority of the NIHSS items, there was poor agreement for
the item ‘limb ataxia’ item. The results of this study suggest that the
NIHSS can be reliably administered to a community-based sample.

Schmülling et al. (1998) examined
the reliability of the NIHSS when administered by untrained raters in 22
patients with stroke. All diagnoses were confirmed by computed
tomography. Four neurologists assessed the patients. Two raters were
video trained and experienced in administering the NIHSS, and the other
two were inexperienced and were given no training in administering the
NIHSS. Excellent inter-rater reliability (kappa = 0.61) was
achieved among the trained raters, however only adequate inter-rater
reliability
(kappa = 0.33) was achieved among the untrained raters.
Between trained and untrained raters, the unweighted kappa was adequate
(kappa = 0.45). The reliability of individual items also differed
between trained and untrained raters. Among trained raters, only two
items had adequate agreement (ataxia, kappa = 0.34; neglect, kappa =
0.32), and the rest were excellent. Among the untrained raters, 6 items
had adequately reliability, and 4 items had poor reliability (ataxia,
kappa = -0.03; gaze, kappa = 0.06; visual fields, kappa = -0.02;
dysarthria, kappa = 0.18). The results of this study suggest that the
NIHSS has excellent inter-rater reliability only when raters are trained
and knowledgeable on how to correctly administer the NIHSS.

Kasner et al. (1999) examined
whether NIHSS scores could be retrospectively estimated from medical
records. NIHSS scores of 39 patients with acute stroke were estimated
from notes from medical records by 6 raters. These scores were compared
to their actual NIHSS scores to which the raters had been blinded.
Overall inter-rater reliability (ICC = 0.82).
Agreement between pairs of raters ranged from good to excellent (ICC’s
ranged from 0.70 to 0.89). Over 90% of the estimated NIHSS scores were
within 5 points at both admission and discharge for all pairs of raters.
The results of this study suggest that the NIHSS can be reliably
abstracted from medical records for retrospective studies on acute
stroke outcome.

Williams et al. (2000) developed an algorithm for retrospective NIHSS
scoring from chart documentation. One investigator prospectively scored
the admission NIHSS in 32 patients with stroke. Two raters
retrospectively scored the NIHSS by applying the algorithm to
photocopied admission notes. Linear regression was used to assess
inter-rater reliability and agreement between prospective and
retrospective NIHSS scores. Weighted kappa statistics were calculated to
assess the level of agreement of individual NIHSS items. Inter-rater
reliability
was excellent, (r = 0.98) as was agreement between
prospective and retrospective NIHSS scores (r = 0.94). Agreement for
individual items ranged from adequate (response to commands, kappa =
0.54; visual, kappa = 0.64; ataxia, kappa = 0.66; sensory, kappa = 0.60;
dysarthria, kappa = 0.69, extinction/inattention, kappa = 0.57) to
excellent (response to questions, kappa = 0.87; best gaze, kappa = 0.94;
facial palsy, kappa = 0.76; left arm, kappa = 0.85; left leg, kappa =
0.87; right arm, kappa = 0.79; right leg, kappa = 0.75; best language,
kappa = 0.80). Only one item, level of consciousness, had poor agreement
(kappa = -0.10). The results of this study suggest that retrospective
NIHSS scoring with the developed algorithm is reliable and unbiased even
if information is missing from chart documentation.

Bushnell et al. (2001) looked at
the retrospective scoring of both the Canadian Neurological Scale
and the NIHSS. They compared data from academic medical centers to
community hospitals with neurologists and community hospitals without
neurologists. More data was missing for the NIHSS in comparison to the
amount of data missing for the Canadian Neurological Scale. Almost
perfect levels of inter-rater agreement was found for NIHSS
scores retrospectively at the academic medical centers (ICC = 0.93) and
at community hospitals with neurologists (ICC = 0.89), however, only
adequate agreement was found at community hospitals without neurologists
(ICC = 0.48). These results suggest that scoring the NIHSS
retrospectively may not be reliable unless the medical record contains
evaluation material from a neurologist.

Modified.
Lyden et al. (2001) developed the mNIHSS and assessed the scale’s
reliability using the certification data originally collected to assess
the reliability of investigators in the National Institute of
Neurological Disorders and Stroke rtPA (recombinant tissue plasminogen
activator) Trial. Inter-rater reliability was improved with the
mNIHSS in comparison to the original NIHSS. The number of scale items
with poor kappa coefficients decreased from 8 (20%) to 3 (14%): loss of
consciousness commands, gaze, and language. The mNIHSS remains to be
tested prospectively, as the original NIHSS may be more appropriate for
clinical monitoring of patients.

Meyer et al. (2002) also examined the reliability of the mNIHSS in 45
patients with a history of stroke. Two neurologists tested each patient.
Ten out of eleven mNIHSS kappa scores showed excellent inter-rater
reliability
(ranging from kappa = 0.841 to kappa = 0.975). Only gaze
had a adequate kappa score of 0.661. The total mNIHSS kappa was
excellent (kappa = 0.988). In this study, the mNIHSS was found to be
more reliable than the original NIHSS.

Meyer et al. (2005) examined the reliability of mNIHSS administration
by wireless and site-independent telemedicine in 25 patients with
stroke. Patients were evaluated by both remote and bedside examination.
Inter-rater reliability between remote and beside examiners for
the mNIHSS was found to be adequate for two items (left leg motor, kappa
= 0.74; language, kappa = 0.69). Nine items showed excellent inter-rater
reliability (kappas ranged from 0.80 to 1.00). The ICC was excellent for
the total mNIHSS score (ICC = 0.95). The results of this study suggest
that the mNIHSS can be reliably administered by wireless and
site-independent telemedicine.

Validity

Construct:

Original.
N/A

Modified.
Meyer et al. (2002) tested the construct validity of the NIHSS
and mNIHSS in 45 patients with a history of stroke. Two neurologists
tested each patient. The Spearman correlation coefficient between NIHSS
and mNIHSS (for both examiners) was excellent (r = 0.947 and r = 0.941),
with an overall average correlation of r = 0.944. Construct validity of
the mNIHSS was demonstrated in this study as the scale was found to
perform similarly to the NIHSS.

Criterion:

Concurrent:

Original.
Meyer et al. (2002) examined the concurrent validity of the
NIHSS and mNIHSS by comparing the scales with the Barthel Index and the
Modified Rankin Scale. The coefficients for the examiners combined for
NIHSS versus Barthel Index and Modified Rankin Scale were -0.165 (the
correlation is negative because a high score on the NIHSS indicates
severe neurological impairment, whereas a high score on the BI indicates
functional independence) and 0.219 respectively. The authors suggest
that the poor relationships observed may be due to the fact that
patients in this study had only mild deficits, rendering it difficult to
determine concurrent validity, especially at the higher end of the
scale.

Brott et al. (1989) assessed the concurrent validity of the
NIHSS by comparing the scale scores obtained prospectively on 65
patients with acute stroke to the patients’ infarction size as measured
by computed tomography at 1 week. The Spearman’s correlation between the
total NIHSS score at 7 days and the computed tomography scan lesion
volume at 7 days was excellent (r = 0.74). The patients’ initial
neurologic deficit as measured by the scale also correlated with the
7-10 day computed tomography lesion volume (r = 0.78). The
scale-computed tomography correlation at 7 days for patients with left
hemisphere infarctions was 0.72, while this correlation for patients
with right hemisphere infarctions was 0.74. The results of this study
demonstrate that the NIHSS has excellent concurrent validity with
infarct volumes using computed tomography.

Schiemanck, Post, Witkamp, Kappelle and Prevo (2005) examined the
concurrent validity of infarct volumes in 94 patients with stroke
as assessed by magnetic resonance imaging (MRI) with stroke severity as
measured by the NIHSS at 2 weeks post-stroke.A strong correlation
between lesion volume and NIHSS score was found (r = 0.61), suggesting
that the NIHSS has excellent concurrent validity with infarct volumes
using MRI.

However, Saver et al. (1999) also investigated the concurrent
validity
of infarct volumes with 3-month NIHSS scores in 191
patients with acute stroke. In this study, computed tomography scans at
days 6 to 11 were only adequately correlated with 3-month NIHSS scores
(r=0.54).

Similarly, Lyden, Claesson, Havstad, Ashwood, and Lu (2004) examined
the concurrent validity of baseline NIHSS scores with 30-day
infarct volumes using computed tomography in patients with acute stroke
seen within 12 hours of stroke onset. Baseline NIHSS scores and lesion
volumes were also found to be only adequately correlated (r = 0.37).

Derex et al. (2004) examined the concurrent validity of the NIHSS
with lesion volumes in 49 patients with stroke. Patients underwent MRI
prior to thrombolysis and were then administered the NIHSS at day one.
Baseline NIHSS scores were highly correlated with baseline
diffusion-weighted imaging lesion volumes (r = 0.71), and correlated
adequately with perfusion-weighted imaging abnormality volumes (r =
0.58) and time to peak delays (r = 0.41). The NIHSS score also
correlated with the site of arterial occlusion.

Fink et al. (2002) examined the concurrent validity of the NIHSS with
lesion volumes measured by diffusion weighted imaging within 24 hours of
stroke in 153 patients with acute stroke. The NIHSS was adequately
correlated with acute diffusion weighted imaging lesion volumes (r =
0.48, right; r = 0.58, left) and with acute NIHSS scores and
perfusion-weight imaging hypoperfusion volumes (r = 0.62, right; r =
0.60, left). However, a difference was observed in left- versus
right-sided stroke. Among patients with diffusion weighted imaging
lesions larger than the median volume, 8/37 with right-sided stroke had
an NIHSS score of 0 – 5 compared with 1/39 patients with left-sided
stroke. However, multiple linear regression analysis revealed a
significantly lower acute NIHSS on the right compared with the left side
when adjusted for stroke volume, suggesting that patients with a
right-sided stroke may have a low NIHSS score despite substantial lesion
volume.

Woo et al. (1999) concurred with
the findings of Fink et al. (2002). By using the placebo arm of the
National Institute of Neurological Disorders and Stroke rtPA
(recombinant tissue plasminogen activator) Trial to examine whether
total volume of cerebral infarction in patients with right hemisphere
strokes would be greater than the volume of cerebral infarction in
patients with left hemisphere strokes who have similar NIHSS scores. The
results of this study suggested that the volume for right hemisphere
stroke was statistically greater than the volume for left hemisphere
strokes, when the baseline NIHSS score was adjusted. For each 5-point
category of the NIHSS score (eg. from 16-20), the median volume of right
hemisphere strokes was approximately double the median volume of left
hemisphere strokes. The Spearman rank correlation between the 24-hour
NIHSS score and 3-month lesion volume was 0.72 for patients with left
hemisphere stroke and 0.71 for patients with right hemisphere stroke.
The results of this study show that for a given NIHSS score, the median
volume of right hemisphere strokes is consistently larger than the
median volume of left hemisphere strokes. Therefore, care must be taken
when infarction size is being predicted from NIHSS score.

Modified.
In a retrospective analysis, Lyden et al. (2001) measured the
concurrent validity of the mNIHSS by comparing the correlation of
mNIHSS with the other neurological scales (the Barthel Index, the
Modified Rankin Scale, and the Glasgow Outcome Scale) measured at 3
months. The mNIHSS showed an excellent correlation with these scales at all
time points, with correlations being strongest at 90 days (r = -0.82 for
Barthel Index; r = 0.83 for modified Rankin Scale; r = 0.82 for Glasgow
Outcome Scale). Correlation with the Barthel Index is negative because a
high score on the Barthel Index indicates functional independence
whereas a high score on the mNIHSS indicates neurological deficit.

In a prospective analysis, Meyer et al. (2002) found that the mNIHSS
demonstrated poor concurrent validity with the Barthel Index and
the Modified Rankin Scale. The coefficients for mNIHSS versus Barthel
Index and modified Rankin Scale were -0.238 (the correlation is negative
because a high score on the NIHSS indicates severe neurological
impairment, whereas a high score on the Barthel Index indicates
functional independence) and 0.296, respectively. The absolute Spearman
correlations were higher with the use of the mNIHSS in comparison to the
original NIHSS, however, values were not statistically significant. The
weak relationships observed with the mNIHSS and the other scales may be
due to the fact that patients in this study had only mild deficits,
rendering it difficult to determine concurrent validity, especially at
the higher end of the scale.

Predictive.

Original.
Lyden et al. (1999) used data
from the National Institute of Neurological Disorders and Stroke (NINDS)
tPA Stroke Trial to determine whether the NIHSS was valid in patients
treated with tissue plasminogen activator. To assess the predictive validity of the NIHSS, the scale was compared over time with the
3-month outcome of the Barthel Index, the Rankin Scale, and the Glasgow
Outcome Scale. The correlations between the NIHSS and the other clinical
outcomes were significant but adequate at baseline (Placebo group:
Barthel Index, r = -0.48; Rankin Scale, r = 0.51; Glasgow Outcomes
Scale, r = 0.49; Treatment group: Barthel Index, r = -0.51, Rankin
Scale, r = 0.56; Glasgow Outcomes Scale, r = 0.56) and at 2 hours
(Placebo group: Barthel Index, r = -0.58; Rankin Scale, r = 0.61;
Glasgow Outcomes Scale, r = 0.60; Treatment group: Barthel Index, r =
-0.65; Rankin Scale, r = 0.70; Glasgow Outcomes Scale, r = 0.68) after
stroke. The correlations were greater for the measurements later in time
(24 hours, 7-10 days, 90 days post-stroke), which suggests that after 2
hours from stroke, the NIHSS may have greater predictive validity in
terms of the 3-month outcome.

Schlegel et al. (2003) tested
whether the NIHSS in the first 24 hours after stroke onset could predict
the next level of care after acute hospitalization in a retrospective
study of 94 patients with stroke. From medical records it was determined
that 59% of patients were discharged home, 30% to rehabilitation, and
11% to a long-term nursing facility. For each 1-point increase in NIHSS
score, the likelihood of going home was significantly reduced (OR =
0.79). The category of NIHSS score also predicted the next level of
care. An NIHSS score 5 was strongly associated with discharge
home. When compared with patients with an NIHSS ? 5, patients with a
score from 6 to 13 were nearly 5 times more likely to be discharged to
rehabilitation (OR = 4.8). Patients who scored >13 were nearly 10
times more likely to require rehabilitation (OR = 9.5) and more than
100-fold more likely to be placed in a long-term nursing facility (OR =
310). The results of this study suggest that the NIHSS, administered in
the first 24 hours after stroke onset, can retrospectively predict the
next level of care after acute hospitalization.

Schlegel et al. (2004) examined whether the NIHSS could predict the
next level of care in 46 patients with acute stroke treated with
thrombolysis (recombinant tissue plasminogen activator). In a
multinomial regression analysis, increasing NIHSS score was a strong
independent predictor of discharge to rehabilitation or nursing
facilities, roughly doubling for each 5-point increment (score 6 – 10:
rehabilitation OR = 1.78, nursing facility OR = 2.31; score 11 – 15:
rehabilitation OR = 2.66, nursing facility OR = 5.05; score 16 – 20:
rehabilitation OR = 5.31, nursing facility OR = 16.30; score > 20
rehabilitation OR = 8.36, nursing facility OR = 27.40). The results of
this study suggest that stroke severity as determined by the admission
NIHSS score is a major independent predictor of the next level of care
following hospitalization and treatment with thrombolysis for acute
stroke.

Demchuk et al. (2001) examined factors that were independently
predictive of good outcome among 1,205 patients with acute stroke who
were treated with alteplase (a type of thrombolytic therapy). Using
multivariable logistic regression modeling, the most important predictor
of outcome identified was found to be baseline stroke severity as
measured by the NIHSS score. The higher the NIHSS score, the worse the
odds were of having a good outcome (OR good outcome = 1.00 for NIHSS
score ? 5; OR good outcome = 0.05 for NIHSS > 20).

Muir et al. (1996) compared the NIHSS, the Canadian Neurological
Scale
, and the Middle Cerebral Artery Neurological Score to see
which scale best predicted good (alive at home) or poor (alive in care
or dead) outcome in 408 patients with acute stroke. Predictive accuracy
of the variables was compared by ROC curves and stepwise logistic
regression. Logistic regression showed that the NIHSS added
significantly to the predictive value of all other scores. The NIHSS
overall accuracy was excellent (0.83). A cutoff point of 13 on the NIHSS
best predicted 3-month outcome.

Adams et al. (1999) found that
the NIHSS strongly predicts the likelihood of a patient’s recovery after
stroke in a post-hoc analysis by stroke subtype of 1,268 patients
enrolled in an acute stroke trial. NIHSS scores were taken at baseline,
7 days, and 3 months after stroke. A score of ? 16 forecasted a high
probability of death or severe disability whereas a score of ? 6
forecasted a good recovery. The baseline NIHSS score strongly predicted
outcome at 7 days and at 3 months. By 7 days, 2/3 of the patients
scoring ? 3 at baseline had an excellent outcome. One additional point
on the NIHSS decreased the likelihood of excellent outcomes at 7 days by
24% and at 3 months by 17%. Patients with lacunar infarcts had
significantly higher likelihood of an excellent outcome at 7 days and 3
months than did patients with non-lacunar strokes, but odds were poorer
compared with patients with other types of stroke when scores were 10 or
more. At 3 months, excellent outcomes were noted in 46% of patients with
NIHSS scores of 7 – 10 and in 23% of patients with scores of 11 – 15.
Very few patients with baseline scores of > 15 had excellent outcomes
after 3 months.

Albers, Bates, Clark, Bell, Verro, and Hamilton (2000) examined
patients administered intravenous tissue-type plasminogen activator for
treatment of acute stroke in 389 patients. A multivariate analysis found
a less severe baseline NIHSS score (? 10) was a predictor of favorable
outcome. For every 5-point increase in baseline NIHSS score, patients
had a 22% decrease in the odds of recovery (OR = 0.78), and patients
with baseline NIHSS scores greater than 10 had a 75% decrease in the
odds of recovery (OR = 0.25).

DeGraba et al. (1999) administered the NIHSS serially to 127 patients
with stroke for the first 48 hours of admission to the neuroscience
intensive care unit and found that a 3-point or greater increase on the
NIHSS indicated stroke progression. A significant cutoff that allowed
for the greatest likelihood of predicting patient progression occurred
when NIHSS scores were stratified as ? 7 and > 7. Patients with an
initial NIHSS score of ? 7 experienced a 14.8% worsening rate and were
more likely to be functionally normal (45% were functionally normal at
48 hours). Patients with an initial NIHSS score of > 7 had a 65.9%
worsening rate and were less likely to be functionally normal at 48
hours (only 2.4% were functionally normal). These results demonstrate
the predictive validity of the NIHSS.

Frankel et al. (2000) examined whether a practical method for
predicting a poor outcome after acute ischemic stroke could be
developed. Using data from the placebo arm of Part 1 and 2 of the
National Institute of Neurological Disorders and Stroke rt-PA
(recombinant tissue plasminogen activator) Stroke Trial, patients with
an NIHSS score > 17 with atrial fibrillation, yielded a positive
predictive value of 96%. At 24 hours, the best predictor was an NIHSS
score > 22, yielding a positive predictive value of 98%. At 7 – 10
days, the best predictor was an NIHSS score > 16, yielding a positive
predictive value of 92%. The results of this study suggest that patients
with a severe neurologic deficit after acute ischemic stroke, as
measured by the NIHSS, have a poor prognosis and that during the first
week after acute stroke, it is possible to identify a subset of patients
who are highly likely to have a poor outcome.

Rundek et al. (2000) examined predictors of discharge destinations
following acute care hospitalization in 893 patients who survived acute
care hospitalization for a first stroke, followed prospectively.
Polytomous logistic regression was used to determine predictors for
rehabilitation and nursing home placement versus returning home. Among
the survivors of acute stroke care hospitalization, 611 patients were
discharged to their homes, 168 to rehabilitation, and 114 to nursing
homes. Patients with adequate neurological deficits (NIHSS score from 6
– 13; rehabilitation OR = 8.0, nursing home OR = 3.8) and severe
neurological deficits (NIHSS score ? 14; rehabilitation OR = 17.9,
nursing home OR = 27.9) had more than a threefold increased risk of
being sent to a nursing home and more than an eightfold increased risk
of being sent to rehabilitation, demonstrating the clinical
predictive validity of the NIHSS.

Bohannon, Lee, and Maljanian
(2002) examined what variables predicted three hospital outcomes
(hospital length of stay, hospital charges, and hospital discharge
destination). NIHSS scores and Barthel Index scores correlated with all
three outcomes. The correlations between NIHSS scores and hospital
length of stay and hospital charges (ranging from r = 0.276 to r
= 0.381) were positive, indicating that patients with more severe
strokes had a longer hospital length of stay and higher hospital
charges. The correlations between NIHSS scores and discharge destination
were negative (r = -0.344 and r = -0.355), meaning that patients
with more severe strokes were less likely to be discharged home.
Regression analysis showed that once post-admission Barthel Index scores
were accounted for, no other variable added to the prediction of
hospital length of stay or discharge destination, however the NIHSS
score added to the explanation of hospital charges provided by
post-admission Barthel Index scores.

Derex et al. (2003) examined whether pre-treatment MRI parameters
predicted clinical outcome in 49 patients with acute stroke treated by
intravenous recombinant tissue plasminogen activator. Univariate and
multivariate logistic regression analyses were used to identify the
predictors of clinical outcome. The results of these analyses suggested
that baseline NIHSS score was the best independent predictor of clinical
outcome at day 60 (OR = 1.28).

Baird et al (2001) used logistic regression to develop a 3-item scale
for predicting good stroke recovery, which was tested in 63 patients. By
combining the NIHSS with the time from onset and lesion volume (as
detected by diffusion weighted imaging) a score could be obtained to
accurately predict stroke recovery. Scores of 0 to 2 indicate low
probability of recovery, 3 to 4 medium, and 5 to 7 high. This score can
help early decision-making regarding aggressiveness of care, discharge
planning, and rehabilitation options.

Briggs, Felberg, Malkoff, Bratina, and Grotta (2001) examined the
NIHSS scores of 138 patients admitted within 24 hours of stroke to help
determine if patients with mild stroke fared better by admission to a
general ward or to the intensive care unit. They found a general
positive correlation between baseline NIHSS score and discharge Rankin
score in adequate patients regardless of whether they were admitted to
the intensive care unit or the ward (R2 = 0.273 and 0.09, respectively).
Patients with mild stroke (NIHSS score < 8) admitted to a general
ward had fewer complications and more favorable discharge Rankin Scale
scores than similar patients admitted to the intensive care unit. There
was no obvious cutoff baseline NIHSS score that was predictive of better
outcome (lower Rankin) in intensive care unit patients. There was no
statistical difference in length of stay. Routinely admitting patients
with NIHSS scores < 8 to intensive care appears to have no cost or
outcomes benefit.

Di Legge, Saposnik, Nilanont, and
Hachinski (2006) identified a subset of variables that were
independently associated with major neurological improvement at 24 hours
and good outcome at 3 months after treatment for 219 patients with
stroke who received intravenous recombinant tissue plasminogen activator
in the emergency department. Using logistic regression, the results of
this study suggested that among other predictors, pre-treatment NIHSS
score was an excellent negative predictor of good outcome at 3 months
(OR = 0.83).

Chang, Tseng, Tan, and Liou (2006) examined factors related to
3-month mortality at admission in 360 patients with first-ever acute
stroke. Multivariate logistic regression analysis was used to identify
the main predictors of 3-month stroke-related mortality. Admission NIHSS
score (OR = 1.17), history of cardiac disease (OR = 2.73), and posterior
circulation stroke (OR = 5.25) were significant risk factors for 3-month
mortality.

Fischer et al. (2005) examined the admission NIHSS scores of 226
patients with stroke who underwent arteriography. Patients with NIHSS
scores ? 10 had positive predictive values to show arterial occlusions
in 97% of carotid and 96% of vertebrobasilar strokes. With an NIHSS
score ? 12, the positive predictive value to find a central occlusion
was 91%. In a multivariate analysis, NIHSS subitems such as level of
consciousness questions (OR = 4.0), gaze (OR = 2.9), motor leg (OR =
4.2), and neglect (OR = 3.2) were predictors of central occlusions.
There was a significant association between NIHSS scores and the
presence and location of a vessel occlusion. With an NIHSS score ? 10, a
vessel occlusion would likely be seen on arteriography, and with a score
? 12, its location would probably be central.

Modified.
Lyden et al. (2001) examined the
predictive validity of the mNIHSS using the outcome results of
the National Institute of Neurological Disorders and Stroke recombinant
tissue plasminogen activator Stroke Trial. Using the mNIHSS to test for
treatment effect on improvement at 24 hours and treatment effect on
minimal or no disability at 3 months after stroke, the scale scores
differentiated the two treatment groups at 24 hours and at 3 months. The
proportion of patients who improved ? 4 points within 24 hours after
treatment was significantly increased by recombinant tissue plasminogen
activator (OR = 1.3). Likewise, the odds ratio for complete/nearly
complete resolution of stroke symptoms 3 months after treatment was
significant (OR = 1.7) with the mNIHSS.

Content validity

Original.
Lyden et al. (1999) used data from the National Institute of
Neurological Disorders and Stroke recombinant tissue plasminogen
activator Trial to determine whether the NIHSS was valid in patients
treated with tissue plasminogen activator. To assess the content
validity
of the scale, an exploratory factor analysis of NIHSS data
was performed within the first 24 hours after stroke, to derive an
underlying factor structure. The results from this analysis suggested
that there were two factors, representing left and right brain function,
underlying the NIHSS. The internal scale structure remained consistent
in placebo and treated groups and when administered successively over
time, confirming the content validity of the scale.

Modified.
Lyden et al. (2001) developed and assessed the validity of the
mNIHSS. Content validity was determined using factor analysis,
and the goodness of fit was recalculated on the basis of a 4-factor
solution restricted to the 11 NIHSS items involved in the mNIHSS. To
prevent the confounding effects of time or treatment, the goodness of
fit was calculated for data collected at 2 hours, 24 hours, 7 to 10
days, and 3 months after recombinant tissue plasminogen activator or
placebo treatment. The results suggested that the internal structure of
the mNIHSS was identical to that of the NIHSS. The goodness of fit
(comparative fit index = 0.96) was equal to that of the NIHSS. When used
over time, and in placebo-treated versus active-treated groups, the
mNIHSS values ranged from 0.93 to 0.96 and were as strong as those of
the NIHSS.

Responsiveness

Original.

Brott et al. (1989) assessed the responsiveness of the NIHSS
by comparing the scale scores obtained prospectively on 65 patients with
acute stroke to the patients’ infarction size as measured by computed
tomography at 1 week. Although most patients improved clinically, 4/15
items changed only minimally: facial palsy (-2% improvement for item
score at 1 week), plantar reflex (7% improvement for item score at 1
week), dysarthria (-1% improvement for item score at 1 week), and
language (6% improvement for item score at 1 week). Also, change in limb
ataxia (59% improvement) and best gaze (52% improvement) may have been
overstated, based on infarction size observed. The other 10 items
changed an average of 25% over 7 days. Raters in this study also had to
conclude whether patients changed neurologically from the previous
examination and from baseline. This was defined as “Same” (a change of
0-1 scale point), “Better” (an improvement of ? 2 scale points), and
“Worse” (a deterioration of ? 2 scale points). Based on these
definitions, from baseline to 7-10 days, agreement was achieved for
40/63 patients surviving at 7-10 days (63%) (compared quantitative
criteria for patient change with the investigator’s judgment of patient
change). The results of this study demonstrate that the NIHSS is
responsive to change.

Modified.

Lyden at al. (2001) examined the responsiveness of the mNIHSS
in a retrospective analysis. The mNIHSS imitated the original NIHSS in
the predictive models, which can be taken as an indicator of
responsiveness. That is, the mNIHSS tends to predict response of
patients to recombinant tissue plasminogen activator as well as the
original scale, when used in the multivariable model. Likewise, the
mNIHSS predicts likelihood of hemorrhage after recombinant tissue
plasminogen activator treatment as well as the original in the
multivariable model of symptomatic hemorrhage. Further, the power to
detect a 4-point or greater improvement by 24 hours was increased from
24% with the NIHSS to 51% with the mNIHSS. Within-patient responsiveness
could not be assessed in this study.

Floor and Ceiling Effects

Muir et al. (1996) suggested that
a potential shortcoming of the NIHSS is that because many scale items
cannot be tested in patients with very severe stroke, there may be a
ceiling effect below the theoretical limit.

Williams, Weinberger, Harris, Clark, and Biller (1999) administered
the NIHSS to patients 1 and 3 months post-stroke. A ceiling
effect
of the NIHSS was observed in the upper extremity domain:
although 62% of patients reported upper extremity dysfunction 1 month
after stroke, only 11% had an NIHSS arm score > 1.

Pickard, Johnson, and Feeny (2005) compared five health-related
quality of life measures administered at baseline and at 6 months. A
notable ceiling effect was observed with the NIHSS at 6 months
(20% of patients).

References
  • Adams, H. P., Davis, P. H., Leira, E. C., Chang, K-C., Bendixen,
    B. H., Clarke, W. R., Woolson, R. F., Hansen, M. D. (1999). Baseline
    NIH Stroke Scale score strongly predicts outcome after stroke: a
    report of the Trial of Org 10172 in Acute Stroke Treatment (TOAST).
    Neurology, 53, 126 -31.
  • Albanese, M. A., Clarke, W. R., Adams, H. P., Woolson, R. F.,
    and TOAST Investigators. (1994). Ensuring reliability of outcome
    measures in multicenter clinical trials of treatments for acute
    ischemic stroke. Stroke, 25, 1746 -1751.
  • Albers GW, Bates, V. E., Clark, W. M., Bell, R., Verro, P.,
    Hamilton, S. A. (2000). Intravenous tissue-type plasminogen activator
    for treatment of acute stroke: the Standard Treatment with Alteplase
    to Reverse Stroke (STARS) study. JAMA, 283, 1145 -1150.
  • Baird, A. E., Dambrosia, J., Janket, S., Eichbaum, Q., Chaves,
    C., Silver, B., Barber, P., Parsons, M., Darby, D., Davis, S. (2001).
    A three-item scale for the early prediction of stroke recovery.
    Lancet, 357, 2095 -2099.
  • Berger, K., Weltermann, B., Kolominsky-Rabas, P., Meves, S.,
    Heuschmann, P., Bohner, J., Neundorfer, B., Hense, H. W., Buttner, T.
    (1999). The reliability of stroke scales. The German version of NIHSS,
    ESS and Rankin scales [German]. Fortschr Neurol Psychiatr,
    67(2), 81-93.
  • Bohannon, R. W., Lee, N., Maljanian, R. (2002). Postadmission
    function best predicts acute hospital outcomes after stroke. Am J
    Phys Med Rehabil,
    81, 726 -730.
  • Briggs, D. E., Felberg, R. A., Malkoff, M. D., Bratina, P.,
    Grotta, J. C. (2001). Should mild or adequate stroke patients be
    admitted to an intensive care unit? Stroke, 32, 871
    -876.
  • Brott, T. G., Haley, Jr., E. C., Levy, D. E., Barsan, W.,
    Broderick, J., Sheppard, G. L., Spilker, J., Dongable, G. L., Massey,
    S., Reed, R. (1992). Urgent therapy for stroke I: Pilot study of
    tissue plasminogen activator administered within 90 minutes.
    Stroke, 23, 632-640.
  • Brott, T. G., Adams, H. P., Olinger, C. P., Marler, J. R.,
    Barsan, W. G., Biller, J., Spilker, J., Holleran, R., Eberle, R.,
    Hertzberg, V., Rorick, M., Moomaw, C. J., Walker, M. (1989).
    Measurements of acute cerebral infarction: a clinical examination
    scale. Stroke, 20, 864 -70.
  • Bushnell, C. D., Johnston, D. C. C., Goldstein, L. B. (2001).
    Retrospective assessment of initial stroke severity: comparison of the
    NIH Stroke Scale and the Canadian Neurological Scale. Stroke,
    32, 656 -60.
  • Chang, K-C., Tseng, M-C., Tan, T-Y., Liou, C-W. (2006).
    Predicting 3-month mortality among patients hospitalized for
    first-ever acute ischemic stroke. Journal of the Formosan Medical
    Association
    , 105(4), 310-7.
  • DeGraba, T. J., Hallenbeck, J. M., Pettigrew, K. D., Dutha, A.
    J., Kelly, B. J. (1999). Progression in acute stroke. Value of initial
    NIH Stroke Scale on patient stratification in future trials.
    Stroke, 30: 1208 -1212.
  • Demchuk, A. M., Tanne, D., Hill, M. D., Kasner, S. E., Hanson,
    S., Grond, M., Levine, S. R., The Multicentre tPA Stroke Survey Group.
    (2001). Predictors of good outcome after intravenous tPA for acute
    ischemic stroke. Neurology, 57, 474 – 480.
  • Dewey, H. M., Donnan, G. A., Freeman, E. J., Sharples, C. M.,
    Macdonell, R. A. L., McNeil, J. J., Thrift, A. G. (1999). Interrater
    Reliability of the National Institutes of Health Stroke Scale: Rating
    by Neurologists and Nurses in a Community-Based Stroke Incidence
    Study. Cerebrovascular Diseases, 9, 323-327.
  • Dominguez, R., Vila, J. F., Augustovski, F., Irazola, V.,
    Castillo, P. R., Escalante, R. R., Brott, T. G., Meschia, J. F.
    (2006). Spanish cross-cultural adaptation and validation of the
    National Institutes of Health Stroke Scale.
  • Derex, L., Nighoghossian, N., Hermier, M., Adeleine, P.,
    Berthezene, Y., Philippeau, F., Honnorat, J., Froment, J. C.,
    Trouillas, P. (2004). Influence of pretreatment MRI parameters on
    clinical outcome, recanalization and infarct size in 49 stroke
    patients treated by intravenous tissue plasminogen activator. J
    Neurol Sci,
    225, 3 -9.
  • Fink, J. N., Selim, M. H., Kumar, S., Silver, B., Linfante, I.,
    Caplan, L. R., Schlaug, G. (2002). Is the association of National
    Institutes of Health Stroke Scale scores and acute magnetic resonance
    imaging stroke volume equal for patients with right- and
    left-hemisphere ischemic stroke? Stroke, 33, 954 -958.
  • Fischer, U., Arnold, M., Nedeltchev, K., Brekenfeld, C.,
    Ballinari, P., Remonda, L., Schroth, G., Mattle, H. (2005). NIHSS
    score and arteriographic findings in acute ischemic stroke Stroke,
    36, 2121-2125.
  • Goldstein, L., Bertels, C., Davis, J. (1989). Interrater
    reliability of the NIH Stroke Scale. Arch. Neurol, 46,
    660-662.
  • Goldstein, L. B., Samsa, G. P. (1997). Reliability of the
    National Institutes of Health stroke scale: Extension to
    non-neurologists in the context of a clinical trial. Stroke,
    28, 307 -310.
  • Haley, E. C., Levy, D. E., Brott, T. G., Sheppard, G. L., Wong,
    M. C., Kongable, G. L., Torner, J. C. Marler, J. R. (1992). Urgent
    therapy for stroke. II: Pilot study of tissue plasminogen activator
    administered 91-180 minutes from onset. Stroke, 23,
    641-645.
  • Kasner, S. E., Chalela, J. A., Luciano, J. M., Cucchiara, B. L.,
    Raps, E. C., McGarvey, M. L., Conroy, M. B., Localio, A. R. (1999).
    Reliability and validity of estimating the NIH Stroke Scale score from
    medical records. Stroke, 30, 1534 -37.
  • Kasner, S. E., Cucchiara, B. L., McGarvey, M. L., Luciano, J.
    M., Liebeskind, D. S., Chalela, J. A. (2003). Modified National
    Institutes of Health Stroke Scale can be estimated from medical
    records. Stroke, 34, 568 -70.
  • Lai. S. M., Duncan, P. W., Keighley, J. (1998). Prediction of
    functional outcome after stroke. Comparison of the Orpington
    Prognostic Scale and the NIH Stroke Scale. Stroke, 29,
    1838-1842.
  • Lyden, P., Raman, R., Liu, L., Grotta, J., Broderick, J., Olson,
    S., Shaw, S., Spilker, J., Meyer, B., Emr, M., Warren, M., Marler, J.
    (2005). NIHSS training and certification using a new digital video
    disk is reliable. Stroke, 36, 2446-2449.
  • Lyden, P. Lau, G. T. (1991). A critical appraisal of stroke
    evaluation and rating scales. Stroke, 22, 1345-1352.
  • Lyden, P., Brott, T., Tilley, B., Welch, K. M., Mascha, E. J.,
    Levine, S., Haley, E. C., Grotta, J., Marler, J. (1994). Improved
    reliability of the NIH Stroke Scale using video training.
    Stroke, 25, 2220-2226.
  • Lyden, P., Lu, M. Jackson, C., Marler, J., Kothari, R., Brott,
    T., Zivin, J. (1999) Underlying structure of the National Institutes
    of Health Stroke Scale: Results of a factor analysis. Stroke
    30, 2347-2354.
  • Lyden, P. D., Lu, M., Levine, S. R., Brott, T. G., Broderick, J.
    (2001). NINDS rtPA Stroke Study Group. A modified National Institutes
    of Health Stroke Scale for use in stroke clinical trials: preliminary
    reliability
  • and validity. Stroke, 32, 1310 -17.
  • Lyden, P., Claesson, L., Havstad, S., Ashwood, T., Lu, M.
    (2004). Factor analysis of the National Institutes of Health Stroke
    Scale in patients with large strokes. Arch Neurol, 61, 1677
    -80.
  • Meyer, B. C., Lyden, P. D., Al-Khoury, L., Cheng, Y., Raman, R.,
    Fellman, R., Beer, J., Rao, R., Zivin, J. A. (2005). Prospective
    reliability of the STRokE DOC wireless/site independent telemedicine
    system. Neurology, 64, 1058 -60.
  • Meyer, B. C., Hemmen, T. M., Jackson, C. M., Lyden, P. D.
    (2002). Modified National Institutes of Health stroke scale for use in
    stroke clinical trials: prospective reliability and validity.
    Stroke, 33, 1261 -66.
  • Muir, K. W., Weir, C. J., Murray, G. D., Povey, C., Lees, K. R.
    (1996). Comparison of neurological scales and scoring systems for
    acute stroke prognosis. Stroke, 27, 1817-1820.
  • National Institute of Neurological Disorders and Stroke rt-PA
    Stroke Study Group (1995). Tissue Plasminogen activator for acute
    ischemic stroke. N. Eng. J. Med, 333, 1581-1587.
  • Nighoghossian, N., Hermier, M., et al. (2004). Influence of
    pretreatment MRI parameters on clinical outcome, recanalization and
    infarct size in 49 stroke patients treated by intravenous tissue
    plasminogen activator. J Neurol Sci, 225, 3 -9.
  • Olinger, C. P., Adams, H. P., Brott, T. G., Biller, J., Barsan,
    W. G., Toffol, G. J., Eberle, R. W., Marler, J.
  • R. (1990). High-dose intravenous naloxone for the treatment of
    acute ischemic stroke. Stroke, 21, 721-725.
  • Pickard, A. S., Johnson, J. A., Feeny, D. H. (2005).
    Responsiveness of generic health-related quality of life measures in
    stroke. Qual Life Res, 14, 207-219.
  • Rundek, T., Mast, H., Hartmann, A., Boden -Albala, B., Lennihan,
    L., Lin, I.-F., Paik, M. C., Sacco, R. L. (2000). Predictors of
    resource use after acute hospitalization: the Northern Manhattan
    Stroke Study. Neurology, 55, 1180 -87.
  • Saver, J. L., Johnston, K. C., Homer, D., et al. (1999). Infarct
    volume as a surrogate or auxiliary outcome measure in ischemic stroke
    clinical trials. Stroke, 30, 293 -98.
  • Schiemanck, S. K., Post, M. W. M., Witkamp, T. D., Kappelle, L.
    J., Prevo, A. J. H. (2005). Relationship between ischemic lesion
    volume and functional status in the 2nd week after middle cerebral
    artery stroke. Neurorehabil Neural Repair, 19, 133 -38.
  • Schlegel, D., Kolb, S. J., Luciano, J. M., Tovar, J. M.,
    Cucchiara, B. L., Liebeskind, D. S., Kasner, S. E. (2003). Utility of
    the NIH Stroke Scale as a predictor of hospital disposition.
    Stroke, 34, 134 -37.
  • Schlegel, D. J., Tanne, D., Demchuk, A. M., Levine, S. R.,
    Kasner, S. E. (2004). Multicenter rt-PA Stroke Survey Group.
    Prediction of hospital disposition after thrombolysis for acute
    ischemic stroke using the National Institutes of Health Stroke Scale.
    Arch Neurol, 61, 1061 -64.
  • Schmülling, S., Grond, M., Rudolf, J., Kiencke, P. (1998).
    Training as a prerequisite for reliable use of NIH Stroke Scale
    [letter]. Stroke,