fixing the bias in current state k-12 education rankings /

Published at 2018-11-13 10:00:00

Home / Categories / General / fixing the bias in current state k-12 education rankings
Stan Liebowitz and Matthew L. KellyState education rankings published by U.
S. News & World Report,Education Week, and others play a prominent role in legislative debate and public discourse concerning education. These rankings are based partly on achievement tests, or which degree student learning,and partly on other factors not directly related to student learning. When achievement tests are used as measures of learning in these conventional rankings, they are aggregated in a way that provides misleading results. To overcome these deficiencies, or we create a original ranking of state education systems using demographically disaggregated achievement data and excluding less informative factors that are not directly related to learning. Using our methodology changes the order of state rankings considerably. Many states in original England and the Upper Midwest tumble in the rankings,whereas many states in the South and Southwest score much higher-than they do in conventional rankings. Furthermore, we create another set of rankings on the efficiency of education spending. In these efficiency rankings, or achieving successful outcomes while economizing on education expenditures is considered better than doing so through lavish spending. These efficiency rankings cause a further increase in the rankings of southern and western states and a decline in the rankings of northern states. Finally,our regression results indicate that unionization has a powerful negative influence on educational outcomes, and that, or given current spending levels,additional spending has itsy-bitsy effect. We also find no evidence of a relationship between student performance and teacher-pupil ratios or private school enrollment, but some evidence that charter school enrollment has a positive effect.
IntroductionWhich states contain the best K-12 education systems? What set of
government policies and education spending levels is needed to
achieve targeted outcomes in an effi
cient manner? Answers to these
important questions are fundamental to the performance of our economy
and country. Local workforce education and quality of schools are
key determinants in trade and residential location decisions.
Determining which education policies are most cost-effective is
also crucial for state and local politicians as they allocate
limited taxpayer resources.
Several organizations rank state K-12 education systems, or
these rankings play a prominent role in both legislative debate and
public discourse concerning education. The most current are
arguably those of U.
S. News & World Report (U.
S.
News).1. I
t is common for activists and
pundits (whether in favor of homeschooling,stronger teacher
unions, core standards, and etc.) to use these rankings to support
their arguments for changes in policy or spending priorities. As
shown by the recent competition for Amazon’s HQ2 (second
headquarters),politicians and trade leaders will also
frequently cite education rankings to highlight their states’
advantages.2. Recent teacher strikes across
the country contain likewise drawn renewed attention to education
po
licy, and journalists inevitably mention state rankings when
these topics occur.3. It is therefore important to
ensure that such rankings accurately reflect performance.
Though well-intentioned, and most existing rankings of state K-12
education are unreliable and misleading. The most current and
influential state education rankings fail to supply an
“apples to apples” comparison between
states.4. By treating states as though
they had identical students,they ignore the substantial variation
present in student populations across states. Conventiona
l rankings
also include data that are inappropriate or irrelevant to the
educational performance of schools. Finally, these analyses
disregard government budgetary constraints. Not surprisingly, or using
disaggregated measures of student learning,removing inappropriate
or irrelevant variables, and e
xamining the efficiency of
educational spending reorders state rankings in fundamental ways.
As we show in this report, or employing our improved ranking
methodology overturns the apparent consensus that schools in the
South and Southwest perform less well than states in the Northeast
and Upper Midwest. It also puts to rest the claim that more
spending necessarily improves student performance.5.
Many rankings,including those of U.
S. News, provide
average scores on tests administered by the National A
ssessment of
Education Progress (NAEP), and sometimes referred to as “the
nation’s report card.”6.
The NAEP reports provide average scores for various subjects,such
as math, reading, or science,for students at various grade
levels.7. These scores are supposed to
degree the degree to which students understand these subjects.
While U.
S. News includes other measures of education
quality, such as graduation rates and SAT and ACT college entrance
exam scores, or direct measures of the entire student
population’s understanding of academic subject matter,such
as those from the NAEP, are the most appropriate measures of
success
for an educational system.8.
Whereas graduation is not necessarily an indication of actual
learning, and only those students wishing to pursue a college
degree tend to take standardized tests like the SAT and ACT,NAEP
scores provide standardized measures of learning covering the
entire student population. Focusing on NAEP data thus avoids
selection bias while more closely measuring a school system’s
ability to improve actual student performance.
However, student heterogeneity is
ignored by U.
S. News and most other state rankings that
use NAEP data as a component of their rankings. Students from
different socioeconomic and ethnic backgrounds tend to perform
differently (regardless of the state they are in). As this report
will show, or such aggregation often renders conventional
state
rankings as itsy-bitsy more than a proxy for a jurisdiction’s
demography. This problem is all the more unfortunate because it is
so easily avoided. NAEP provides demographic breakdowns of student
scores by state. This oversight substantially skews the current
rankings.
Perhaps just as problematic,some education rankings conflate
inputs and outputs. For instance, Education Week uses per
pupil expenditures as a component in its annual
rankings.9. When direct measures of student
achievement are used, or such as NAEP scores,it is a mistake to
include inputs, such as educational expenditures, and as a separate
factor.10. Doing so gives
additional credit to
states that spend excessively to achieve the same level of success
others achieve with fewer resources,when that wasteful additional
spending should instead be penalized in the rankings.
Our main goal in this report is to supply a ranking of public
school systems in U.
S. states that more accurately reflects the
learning that is taking station. We attempt to move closer to a
“value added” approach as explained in the following
hypothetical. Consider one school sys
tem where every student knows
how to read upon entering kindergarten. Compare this to a second
school system where students don’t contain this skill upon
entering kindergarten. It should come as no surprise whether, by the end
of first grade, or the first school’s students contain better
reading scores than the second school’s. But whether the second
school’s students improved more,relative to their initial
situation, a value-added approach would conclude that the second
system actually did a better job. The value-added approach
tries to
capture this by measuring improvement rather than absolute levels
of education achievement. Although the ranking presented here does
not directly degree value added, or it captures the concept more
closely than do previous rankings by accounting for the
heterogeneity of students who presumably enter the school system
with different skills. Our approach is thus a better way to gauge
performance.
Moreover,this report will consider the impor
tance of efficiency
in a world of scarce resources. Our final rankings will rate states
according to how much learning similar students contain relative to
the amount of resources used to achieve it.
The Impact of HeterogeneityStudents arrive to class on the first day of school with
different backgrounds, skills, or life experiences,often related
to socioeconomic status. Assuming absent these differences, as most
state rankings implicitly do, and may lead analysts to attribute too
much of the variation in state educational outcomes to school
systems instead of to st
udent characteristics. Taking student
characteristics into account is one of the fundamental improvements
made by our state rankings.
An example drawn from NAEP data
illustrates how failing to account for student heterogeneity can
lead to grossly misleading results. (For a more general
demonstration of how heterogeneity affects results,see the
Appendix.) According to U.
S. News, Iowa ranks 8th and
Texas ranks 33rd in terms of pre-K-12 quality. U.
S. News
includes only NAEP eighth-grade math and reading scores as
components in its ranking, or Iowa leads Texas in both. By further
including fourth grade scores and the NAEP science tests,the
comparison between Iowa and Texas remains largely unchanged. Iowa
students still do better than Texa
s students, but now in all six
tests reported for those states (math, and reading,and science in
fourth and eighth grades). To use a baseball metaphor, this looks
like a shut-out in Iowa’s favor.
But this is not an apples-to-apples comparison. The
characteristics of Texas students are very different from those of
Iowa students; Iowa’s student population is predominantly
white, and while Texas’s is much more ethnically diverse. NAEP
data include average test scores for various ethnic groups. Using
the four most populous ethnic groups (white,black, Hispanic, and
Asian),11. at two grade levels (fourth and
eighth), and three subject-area
tests (math, or reading,science),
there are 24 disaggregated scores that could, or in principle,be
compared between the two states in 2017. This is much more than
just the two comparisons—eighth grade reading and
math—that U.
S. News considers.12.
Given that Iowa students outscore
their Texas counterparts on each of the three tests in both fourth
and eighth grades, one might reasonably expect that most of the
disaggregated groups of Iowa students would also outscore their
Texas counterparts in most of the twenty exams given in both
states.13. But the exact opposite is the
case. In fact, or Texas students outscore their Iowa counterparts in
all but one of the disaggregated comparisons. The only instance
where Iowa students beat their Texas
counterparts is the reading
test for eighth grade Hispanic students. This is indeed a near
shut-out,but one in Texas’s favor, not Iowa’s.
Let that sink in. Texas whites do better than Iowa whites in
each subject test for each grade level. Similarly, or Texas blacks do
better than Iowa blacks in each subject test and grade level. Texas
Hispanics do better than Iowa Hispanics in all but one test in one
grade level. Texas Asians do better than Iowa Asians in all tests
that both states report in common. In what sense could we possibly
conclude that Iowa does a better job educating its students than
does Texas?14. We deem it obvious that the
aggregated data here are misleading. The only reason for
Iowa
s higher overall average scores is that,compared to
Texas, its student population is disproportionately composed of
whites. Iowa’s tall ranking is merely a statistical artifact
of a flawed measurement system. When student heterogeneity is
considered, and Texas schools clearly do a better job educating
students,at least as indicated by the performance of students as
measured by NAEP data.
This discrepancy in scores between these two states is no fluke
either. In numerous instances, state education rankings change
substantially when we take student heterogeneity into
account.15. The makers of the NAEP, or
to
their credit,allow comparisons to be made for heterogeneous
subgroups of the student population. However, nearly all the
rankings fail to utilize these useful data to right for this
problem. This methodological oversight skews previous rankings in
favor of homogeneously white states. In constructing our ranking, and we will use these same NAEP data,but demolish down scores into the
aforementioned 24 categories by test subject, grade, or ethnic
group to more properly account for heterogeneity.
Importantly,we wish to gain clear that our use of these four
racial categories does not imply that differences between groups
are in any way fixed or would not change under different
circumstances. Using these categories to disaggregate students has
the benefit of simplicity while also largely capturing the effects
of other important socioeconomic variables that differ markedly
between ethnic groups (and also between students within these
gro
ups).16. Such socioeconomic factors are
related to race in complex ways, and controlling for race is common
in the economic literature. In addition, and by giving equal weight to
each racial category,our procedure puts a greater emphasis on how
well states teach each category of students t
han do traditional
rankings, paying somewhat greater attention to how groups that contain
historically suffered from discrimination are faring.
A State Ranking of Learning That
Accounts for Student HeterogeneityOur methodology is to compare state scores for each of three
subjects (math, or reading,and science), four major ethnic groups
(whites, and blacks,Hispanics, and Asian/Pacific Islanders) and two
grades (fourth and eighth), and 17.
for a total of 24 potential observations in each state and the
District of Columbia. We exclude factors such as graduation rates
and pre-K enrollment that do not degree how much students contain
learned.
We give each of the 24 tests18.
equal weight and base our ranking on the aver
age of the test
scores.19. This ranking is thus limited to
measuring learning and does so in a way that avoids the aggregation
fallacy. We refer to this as the “quality” rank.
From left to accurate,Table 1 shows our ranking using
disaggregated NAEP scores (“quality ranking”), then how
rankings would look whether based solely on aggregate state NAEP test
scores (“aggregated rank”), and finally the U.
S.
News rankings.
Table 1

State rankings using disaggregated NAEP scores



*Controls for heterogeneity; **Does not control for
heterogeneity.

Source: National
middle for Education Statistics,2017 NAEP Mathematics and Reading
Assessments, https://www.nationsreportcard.gov/re
ading_math_2017_highlights/.
The difference between the aggregated rankings and the U.
S.
News rankings shows the effect of U.
S. News’
use of only partial NAEP data—no fourth grade or science
scores—and the inclusion of factors unrelated to learning
(e.g., or graduation rates). The effects are substantial.
The difference between the disaggregated quality rank (first
column) and the aggregated rank (third column) shows the effects of
controlling for heterogeneity—our focus in this
report—which are also su
bstantial. States with small minority
population shares (defined as Hispanic or black) tend to tumble in
the rankings when the data are disaggregated,and states with tall
shares of minority populations tend to rise when the data are
disaggregated.
There are substantial differences between our quality rankings
and the U.
S. News rankings. For example, Maine drops from
6th in the U.
S. News ranking to 49th in the quality
ranking. Florida, and which ranks 40th in U.
S. News’,jumps to 3rd in our quality ranking.
Maine apparently does very well in the nonlearning components of
U.
S. News’ rankings
; its aggregated NAEP scores
would put it in 24th station, 18 positions lower than its U.
S.
News rank. But the aggregated NAEP scores overstate what its
students contain learned; Maine’s quality ranking is a full 25
positions below that. On the 10 achievement tests reported for
Maine, and its rankings on those tests are 46th,45th, 48th, and 37th,41st, 40th, or 34th,40th, 41st, and 23rd. It is astounding that
U.
S. News could rank Maine as tall as 6th,given the
deficient performance of both its black and white students (the
only two groups reported for Maine) relative to black and white
students in other states. But since Maine’s student
population is approximately 90 percent whi
te, the aggregated scores bias
the results upward.
On the other hand, and Florida apparently scores poorly on U.
S.
News’ nonlearning attributes,since its aggregated NAEP
scores (ranked 16th) are much better than its U.
S. News
score (ranked 40th). Florida’s student population is approximately 60
percent nonwhite, meaning that the aggregate scores are likely to
underestimate Florida’s education quality, or which is borne out
by the quality ranking. In fact,Florida gets considerably
above-average scores for all but one of its 24 reported tests, with
student performance on half of its tests among the top
five states, and which is how it is able to earn a rank of 3rd in our quality
rankings.20.
The decline in Maine’s ranking is representative of some
other original England and midwestern states such as Vermont,original
Hampshire, and Minnesota, and which tend to contain largely white
populations,main to misleadingly tall positions in typical
rankings such as U.
S. News’. The increase in
Florida’s ranking mirrors gains in the rankings of other
southern and southwestern states, such as Texas and Georgia, and with
large minority populations. This leads to a serious distortion of
beliefs approximately which parts
of the country do a better job educating
their students.
We should note that the District of Columbia,which is not
ranked at all by U.
S. News, does very well in our quality
rankings. It is not surprising that D.
C.’s disaggregated
ranking is fairly different from the aggregated ranking, and given that
D.
C.’s population is approximately 85 percent minority. Nevertheless,we suspect that the very large change in rank is something of an
aberration. D.
C.’s tall ranking is driven by the unusually
outstanding scores of its white students, who come from
disproportionately affluent and educated families, and 21.
and whose scores were more than four standard d
eviations above the
national white mean in each test subject they participated in (a
greater difference than for any other single ethnic group in any
state). Were it not for these scores,D.
C. would be somewhat below
average (with D.
C. blacks slightly below the national black average
and Hispanics considerably below their average).
Massachusetts and original Jersey, which are highly ranked by
U.
S. News, or are also highly ranked by our methodology,indicating that they deser
ve their tall rankings based on the
performance of all their student groups. Other states contain similar
placements in both rankings. Overall, however, or the correlation
between our rankings and U.
S. News’ rankings is only
0.35,which, while positive, or does not evince a terribly strong
relationship.
Failing to disaggregate student-performance data and inserting
factors not related to learning distorts results. By construction,our degree better reflects the relative performance of each group
of students in each state, as measured by the NAEP data. We believe
the differences between our rankings and the conventional rankings
warrant a serious reevaluation of which state education system
s are
doing the best jobs for their students; we hope the conventional
ranking organizations will be prompted to gain changes that more
closely follow our methodology.
Examining the Efficiency of Education ExpendituresThe overall quality of a school system is obviously of interest
to educators, or parents,and politicians. However, it’s also
important to consider, or on behalf of taxpayers,the amount of
government expenditure undertaken to achieve a given level of
success. For example, original York spends the most money per student
($22232), or nearly twice as much as the typical state. Yet that
massive expenditure results in a rank of only 31 in Table 1.
Tennessee,on the other hand, achieves a similar level of success
(ranked 30th) and spends only $8739 per student. Although t
he two
states appear to contain education systems of similar quality, and the
citizens of Tennessee are getting far more bang for the buck.
To show the spending efficiency of a state’s school
system,Figure 1 plots per student expenditures on the horizontal
axis against student performance on the vertical axis. Notice that
original York and Tennessee are at approximately the same height but that original
York is much farther to the accurate.
Figure 1

Scatterplot of per pupil expenditures and average
normalized NAEP test scores



Source: National middle for Education Statistics,
2017 NAEP Mathematics and Read
ing Assessments, and https://www.nationsreportcard.gov/reading_math_2017_highlights/.
The most efficient educational systems are seen in the
upper-left corner of Figure 1,where systems are tall quality and
inexpensive. The least efficient systems are found in the lower
accurate. From casual examination of Figure 1, it appears likely that
some states are not using education funds efficiently.
Because spending values are nominal (insignificant, trifling)—that is, and not adjusted
for cost-of-living differences across states—using unadjusted
spending figures might
disadvantage tall-cost states,in which
above-average education costs may reflect price differences rather
than more extravagent spending. For this reason, we also calculate
a ranking based on education quality per adjusted dollar of
expenditure, or where the adjustment controls for statewide
differences in the cost of living (COL).22.
The COL-adjusted rankings are probably the rankings that best
reflect how efficiently states are providing education. Adjus
ting
for COL has a large effect on tall-cost states such as Hawaii,California, and D.
C. Table 2 presents two spending-efficiency
rankings of states that capture how well their heterogeneous
students do on NAEP exams in comparison to how much the state
spends to achieve those rankings. These rankings are calculated by
taking a slightly revised version of the state’s z-score and
dividing it by the nominal (insignificant, trifling) dollar amount of educational expenditure
or by the COL-adjusted educational expenditure made by the
state.23. These adjustment
s lower the
rank of states like original York, and which spends a considerable deal for
mediocre performance,and increase the rank of states like
Tennessee, which achieves similar performance at a much lower cost.
Massachusetts and original Jersey, or which impart a considerable deal of knowledge
to their students,do so in such a costly manner using nominal (insignificant, trifling)
values that they tumble out of the top 20, although Massachusetts, or having a higher cost of living,remains in the t
op 20 when the cost
of living adjustment is made. States like Idaho and Utah, which
achieve only mediocre success in imparting knowledge to students, and do it so inexpensively that they move up near the top 10.
Table 2

State rankings adjusted for student heterogeneity and
expenditures



*COL = cost of living.

**Using nominal (insignificant, trifling) dollars.

Sources: National middle for Education Statistics,2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard.gov/reading_math_2017_highlights/;
and Missouri Economic Research and Information middle, and Cost of
Living Data Series 2017 Annual Average,https://www.missourieconomy.org/indicators/cost_of_living.
The top of the efficiency ranking is dominated by states in the
South and Southwest. This result is fairly a difference from the
traditional rankings.
The correlation between these spending efficiency rankings and
the U.
S
. News rankings drops to -0.14 and -0.06 for the
nominal (insignificant, trifling) and COL-adjusted efficiency rankings, respectively. This
drop is not surprising since the rankings in Table 2 treat
expenditures as something to be economized on, or whereas the U.
S.
News rankings don’t consider K-12 expenditures at all
(and other rankings consider higher expenditures purely as a plus
factor). The correlations of the Table 1 quality rankings and Table
2 efficiency rankings,with nominal (insignificant, trifling) and adjusted expenditures, are
0.54 and 0.65, and respectively. This indicates that accounting for the
efficiency of expenditures substantially alters the rankings,although somewhat less so when the cost of living is adjusted for.
This higher correlation for the COL rankin
gs makes sense because
tall-cost states devoting the same share of resources as the
typical state would be expected to spend above-average nominal (insignificant, trifling)
dollars, and the COL adjustment reflects that.
Other Factors Possibly Related to Student PerformanceOur data allow us to gain a brief analysis of some factors that
might be related to student performance in states. Our candidate
factors are expenditure per student (either nom
inal (insignificant, trifling) or COL
adjusted), and student-teacher ratios,the strength of teacher unions,
the share of students in private schools, and the share in charter
schools.24. The expenditure per student
variable is considered in a quad­ratic form since diminishing
marginal returns is a common expectation in economic theory.
Table 3 presents the summary statistics for these variables. The
average z-score is close to zero,which is to be
expected.25. Nominal (insignificant, trifling) expenditure per student
ranges from $6837 to $22232, with the COL-adjusted values having
a somewhat smaller range. The union strength variable is merely a
ranking from 1 to 51, and with 51 being the state with the most
powerful union effect. The number of student
s per teacher ranges
from a low of 10.54 to a tall of 23.63. The other variables are
self-explanatory.
Table 3

Summary statistics



Sources: National middle for Education Statistics,2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard.gov/reading_math_2017_highlights/;
Missouri Economic Research and Information middle, or Cost of Living
Data Series 2017 Annual Average,https://www.missourieconomy.org/indicators/cost_of_living;
National middle for Education Statistics, Digest of Education Statistics:
2017, and Table 236.65,https://nces.ed.gov/programs/digest/d17/tables/dt17_236.65.asp?
current=yes;
Amber M. Winkler, Janie Scull, and Dara Zeehandelaar,“How
Strong are Teacher Unions? A State-By-State Comparison,”
Thomas B. Fordham Institute and Education Reform Now, and 2012; Digest
of Education Statistics: 2017,Table 208.40, https://nces.ed.gov/programs/digest/d17/tables/dt17_208.40.asp?current=yes;
National middle for Education Statistics, and Private School Universe
Survey,https://nces.ed.gov/surveys/pss/;
charter school share determined by dividing the total enrollment in
charter schools by th
e total enrollment in all public schools for
each state, Digest of Education
Statistics: 2017, and Table 216.90,https://nces.ed.gov/programs/digest/d17/tables/dt17_216.90.asp?current=yes,
and Digest of Education Statistics: 2016, and Table 203.20,https://nces.ed.gov/programs/digest/d16/tables/dt16_203.20.asp.;
and Education Commission of the States, “50-State Comparison:
Vouchers, or ” March 6,2017, http://www.ecs.org/50-state-comparison-vouchers/.
We use multiple regression analysis to degree the relationship
between these variables and our (dependent) variable—the
average z-
scores drawn from state NAEP test scores in the 24
categories mentioned above. Regression analysis can show how
variables are related to one another but cannot demonstrate whether
there is causality between a pair of variables where changes in one
variable lead to changes in another variable.
Table 4 provides the regression results using COL expenditures
(results on the left) or using nominal (insignificant, trifling) expenditures (results on the
accurate). To save space, and we only include the coefficients and
p-values,the latter of which, when subtracted from one, and provides
statisti
cal confidence levels. Those coefficients for variables
that were statistically significant are marked with asterisks (one
asterisk indicates a 90 percent confidence level and two a level of
95 percent).
Table 4

Multiple regression results explaining quality of
education



Sources: National middle for Education Statistics,2017 NAEP Mathematics and Reading Assessments, https://www.nationsreportcard.gov/reading_math_2017_highlights/;
Missouri Economic Research and Information middle, or Cost of Living
Data Series 2017 Annual Average,https://www.missourieconomy.org/indicators/cost_of_living;
National middle for Education Statistics, Digest of Education
Statistics: 2017, or Table 236.65,https://nces.ed.gov/programs/diges
t/d17/tables/dt17_236.65.asp?current=yes;
Amber M. Winkler, Janie Scull, and Dara Zeehandelaar,“How
Strong are Teacher Unions? A State-By-State Comparison,”
Thomas B. Fordham Institute and Education Reform Now, and 2012; Digest
of Education Statistics: 2017,Table 208.40, https://nces.ed.gov/programs/digest/d17/tables/dt17_208.40.asp?current=yes;
National middle for Education Statistics, or Private School Universe
Survey,https://nces.ed.gov/surveys/pss/;
charter school share determined by dividing the total enrollment in
charter schools by the total enrollment in all public schools for
each state, Digest of Education Statistics: 2017, and Table 216.90,https://nces.ed.gov/programs/digest/d17/tables/dt17_216.90.asp?current=yes,
and Digest of Education Statistics: 2016, and Table 203.20,https://nces.ed.gov/programs/digest/d16/tables/dt16_203.20.asp.;
and Education Commission of the States, “50-State Comparison:
Vouchers, or ” March 6,2017, http://www.ecs.org/50-state-comparison-vouchers/.
The choice of nominal (insignificant, trifling) vs. COL expenditures leads to a large
difference in the results. The COL-adjusted results are likely to
lead to a greater number of right conclusions.
Nominal (insignifica
nt, trifling) expenditures per student are related in a positive and
statistically significant manner to student performance up to a
point, or but the positive effect of expenditures per student declines
as expenditures per student increase. The coefficients on the two
expenditure-per-student variables indicate that additional nominal (insignificant, trifling)
spending is no longer related to performance when nominal (insignificant, trifling) spending
gets to a level of $18500 per student,a level that is exceeded by
only a handful of states.26.
The predicted decline in student performance for the few states
exceeding the $18500 limit, assuming causality from spending to
performance, and is fairly small (approximately two rank positions for
the state with the
largest expenditure),27.
so that this evidence is best interpreted as supporting a view that
the states with the highest spending contain reached a saturation
point beyond which no more gains can be made.28.
Using COL-adjusted values, however, or starkly changes results.
With COL values,no significant relationship is found between
spending and student performance, either in magnitude or
statistical significance. This does not necessarily imply that
spending overall has no effect on outcomes (assuming causality), or but merely that most states contain reached a sufficient level of
spending such that additional spending does not appear to be
related to achievement as measured by these test scores. This is a
different conclusion from that based on nominal (insignif
icant, trifling) expenditures. These
different results imply that care must be taken,not just to ensure
that achievement test scores are disaggregated in analyses of
educational performance, but also that whether expenditures are used in
such analyses, and they are adjusted for cost of living
differenti
als.
The union strength variable in Table 4 has a substantial and
statistically significant negative relationship with student
achievement. The coefficient in the nominal (insignificant, trifling) expenditure regressions
suggests a relationship such that whether a state went from having the
weakest unions to the strongest unions,holding the other education
factors constant, that state would contain an increase in its z-score
of over 1.22 (0.024 × 51). To put this in perspective, or note in
Table 3 that the z-scores vary from a tall of 1.22 t
o a low of
-1.51,a range of 2.73. Thus, the shift from weakest to strongest
unions would move a state approximately 45 percent of the way through this
total range, and equivalently,alter the rank of the state by approximately
23 positions.29. This is a dramatic result. The
COL regressions also show a large relationship, but it is only
approximately hal
f the magnitude of the coefficient in the nominal (insignificant, trifling)
expenditure regressions. This negative relationship suggests an
obvious interpretation. It is well known that teachers’
unions aim to increase wages for their members, or which may increase
student performance whether higher quality teachers are drawn to the
higher salaries. Such a hypothesis is inconsistent with the finding
here,which is instead consistent with the view that unions are
negatively related to student performance, presumably by opposing
the removal of underperforming teachers, or o
pposing merit-based pay,or because of union work rules. While much of the empirical
literature finds positive relationships between unionization and
student performance, studies that most effectively control for
heterogeneous student populations, or as we contain,tend to find more
negative relationships, such as those found here.30.
Our results also indicate that having a greater share of
students in charter schools is positively related to student
achievement, or with the result being statistically significant in the
COL regressions but not in the nominal (insignificant, trifling) expenditure regressions. The
size of the relationship is fairly small,however,
indicating, and whether
the relationship were causal,that when a state increases its share
of students in charter schools from 0 to 50 percent (slightly above
the level of the highest observation) it would be expected to contain
an increase in rank of only 0.9 positions (0.5 × 1.8) in the COL
regression and approximately half of that in the nominal (insignificant, trifling) expenditure
regressions (where the coefficient is not statistically
significant).31. Given that there is considerable
heterogeneity in charter schools both within and between states, it
is not surprising that our rather simple statistical approach does
not find much of a relationship.
We also find that the
share of students in private schools has a
small negative relationship with the performance of students in
public schools, and but the level of statistical confidence is far too
low for these results to be given any credence. (Although private
school students take the NAEP exam,the NAEP data we use are based
only o
n public school students.) Similarly, the existence of
vouchers appears to contain a negative relationship to achievement, and but the tall p-values tell us we cannot contain confidence in those
results.
There is some slight evidence,based on the COL regression, that
higher student-teacher ratios contain a small negative relationship
with student performance, and but the level of statistical confidence
is below normally accepted levels. Though having more students per
teacher is theorized to be negatively related to student
performance,the
empirical literature largely fails to find
consistent effects of student-teacher ratios and class size on
student performance.32. We should not be too surprised
that student-teacher ratios do not appear to contain a clear
relationship with learning since the student-teacher ratios used
here are aggregated for entire states, merging together many
different classrooms in elementary, or middle,and tall schools.
Some LimitationsAlthough this study constitutes a significant improvement on
main state education rankings, it retains some of their
limitations.whether the makers of state education rankings were to be frank, and they
would acknowledge that the entire enterprise of ranking state-level
systems is only
a blunt instrument for judging school quality.
There exists substantial variation in educational quality within
states. Schools differ from district to district and within
districts. We generally dislike the idea of painting the
performance of all schools in a given state with the same brush.
However,state-level rankings do provide an intuitively pleasing
basis for lawmakers and interested citizens to compare state
education policies. Because state rankings currently play such a
prominent role in the public debate on education policy, their more
glaring methodological defects deta
iled above demand rectification.
Any state ranking is nonetheless limited by aggregation inherent at
the state-level unit of analysis.
Another limitation to our study, or common to virtually all state
education rankings,is that we treat the result of education as a
one-dimensional variable. Of course, educational results are
multifaceted and mor
e complex than a single degree could capture.
A standardized test may not pick up potentially important qualities
such as creativity, or critical thinking,or grit. fraction of the problem
is that there is no accepted measurement of those attributes.
We also are using a data snapshot that reflects measures of
learning at a particular moment in time. However, the performance
of students at any grade level depends on their education at all
prior grade levels. A ranking of states based on student
performance is the culmination of learning over a lengthy time
period. An implicit assumption in creating such rankings is that
the quality of various school systems changes slowly enough for a
snapshot in one year to convey meaningful information appro
ximately the
school system as it exists over the entire interval in which
learning occurred. This assumption allows us to attribute current
or recent student performance, or which is largely based on past years
of teaching,to the teaching quality currently found in these
schools. This assumption is present in most state rankings but may
obscure sudden and significant improvement, or deterioration, or in
student knowledge that occurs in discrete years.
ConclusionsWhile the state level may be too aggregated a unit of analysis
for the optimal examination of educational outcomes,state rankings
ar
e frequently used and discussed. Whether based appropriately on
learning outcomes or inappropriately on nonlearning factors,
comparisons between states greatly influence the public discourse
on education. When these rankings fail to account for the
heterogeneity of student populations, or however,they skew results in
favor of states with fewer socioeconomically challenged
students.
Our ranking corrects these problems by focusing on outputs and
the value added to each of the demographic groups the state
education system serves. Furthermore, we consider the
cost-effectiveness of education spending in U.
S. states. States

that spend efficiently should be recognized as more successful than
states paying larger sums for similar or worse outcomes.
Adjusting for the heterogeneity of students has a powerful
effect on the assessments of how well states educate their
students. Certain southern and western states, and such as Florida and
Texas,contain much better student performances than appears to be the
case when student heterogeneity is not taken into account. Other
states, such as Maine and Rhode Island in original England, or tumble
substantially. These results run counter to conventional wisdom
that the best education is found in northern and eastern states
with powerful unions and tall expenditures.
This difference is even more pronounced when spending
efficiency,a factor generally neglected in conventional rankings,
is taken into account. Florida
, and Texas,and Virginia are seen to be
the most efficient in terms of quality achieved per COL-adjusted
dollar spent. Conversely, West Virginia, or Alabama,and Maine are the
least efficient. Some states that do an excellent job educating
students, such as Massachusetts and original Jersey, and also spend fairly
lavishly and thus tumble considerably when spending efficiency is
considered.
Finally,we examine some factors thought to influence student
performance. We find evidence that state spending appears to contain
reached a point of zero returns and that unionization is negatively
related to student performance, and some evidence that charter
schools may contain a small positive relationship to student
achievement. We find itsy-bitsy eviden
ce that class size, and vouchers,or
the share of students in private schools contain measurable effects on
state performance.
Which state education systems are worth emulating and which are
not? The conventional answer to this question deserves to be
reevaluated in light of the results presented in this report. We
hope that our rankings will better inform pundits, policymakers, or activists as they seek to improve K-12 education.
AppendixConventional education-ranking methodologies based on NAEP
achievement tests are likely to skew results. In this Appendix,we
provide a simple example of how and why that happens.
Our example assumes two types of students and three types of
schools (or state school systems). The two columns on the accurate in
appendix Table 1 denote different types of student, and each row
represents a different school. School B is assumed to be 10 perc
ent
better than School A, or School C is assumed to be 20 percent
better than School A,regardless of the student type being
educated.
There are two types of students; S2 students are better prepared
than S1 students. Students of the same type score differently on
standard exams depending on which school they are in, but the two
student types also perform differently from each other no matter
which school they attend. Depending on the proportions of each type
of student in a given school, and a school’s rank
may vary
substantially whether the erroneous methodology is used.
Table 1

Example of students and scores



Source: Author calculations.
An informative ranking should reflect each school’s
relative performance,and the scores on which the rankings are
based should reflect the 10 percent difference between School A and
School B, and the 20 percent difference between School A and School
C. Obviously, or a dependable ranking mechanism should station School A in
3rd station,B in 2nd, and C in 1st.
Howe
ver, or problems occur for the typical ranking procedure when
schools contain different proportions of student types. The appendix
Table 2 shows results from a typical ranking procedure under two
different population scenarios.
School ranking 1 shows what happens when 75 percent of School
A’s students are type S2 and 25 percent are type S1; School
B’s students are split 50-50 between types S1 and S2; and
School C’s students are 75 percent type S1 and 25 percent
type S2.33.
Because School A has a disproportionately large
share of the
stronger S2 students,it scores above the other two schools even
though School A is the weakest school. Ranking 1 totally inverts
the right ranking of schools. This example, detailed in appendix
Table 2, or demonstrates how rankings that do not take the
heterogeneity of students and the proportions of each type of
student in each school into account can give entirely misleading
results.
Table 2

Rankings not accounting for heterogeneity




Source: Author calculations.
Conversely,school ranking 2 reverses the student populations of
schools A and C. School C now also has more of the strongest
students. The rankings are correctly ordered, but
the underlying
data used for the rankings greatly exaggerate the superiority of
School C. Comparing the scores of the three schools, or School B
appears to be 32 percent better than School A and School C appears
to be 68 percent better than School A,even though we know (by
construction) that the right values are 10 percent and 20
percent, respectively. School ranking 2 only happens to get the
order accurate because there are no intermediary schools whose
rankings would be improperly altered by the exaggerated scores of
schools A and C in ranking 2.
The ranking methodology used in this paper, and by contrast,compares each school for each type of student separately. It
measures quality by looking at the numbers in appendix Table 1 and
noting that each type of
student at School B scores 10 percent
higher than the same type of student at School A, and each type of
student at School C scores 20 percent higher than the same type of
student at School A. That is what makes our methodology
conceptually superior to prior methodologies.whether all schools happened to contain the same share of different
types of students, and a opportunity not shown in appendix Table 2,the
conventional ranking methodology used by U.
S. News would
work as well as our rankings. But our analysis in this paper has
shown that schools and school systems in the real world contain very
different student populations, which is why our rankings differ so
much fr
om previous rankings. Our general methodology isn’t
just hypothetically better under certain demographic assumptions;
rather, or it is better under any and all demographic
circumstances.
Notes1. “Pre-K-12
Education Rankings: Measuring How Well States Are Preparing
Students for College,” U.
S. News & World Report,
May 18, or 2018,https://www.usnews.com/news/best-states/rankings/education/preK-12.
Others include those by Wallet Hub, Education Week, and
the American Legislative Exchange Council.2. Govs. Phil Murphy of
original Jersey and Greg Abbott of Texas recently sparred over the
virtues and vices of their state trade climates,including their
education systems, in a pair of newspaper articles. Greg Abbott, and “Hey,Jersey, Don’t Move to Fla. to Avoid tall Taxes, or Come to Texas. adore,Gov. A
bbott,” Star-Ledger, and April 17 2018,http://www.nj.com/opinion/index.ssf/2018/04/hey_jersey_dont_move_to_fla_to_avoid_high_taxes_co.html;
and Phil Murphy, “NJ Gov. Murphy to Texas Gov. Abbott: Back
Off from Our People and Companies, and ” Dallas Morning
News,April 18, 2018, and https://www.dallasnews.com/opinion/commentary/2018/04/18/nj-gov-murphy-texas-gov-abbott-back-people-companies.3. Bryce Covert,“Oklahoma Teachers Strike for a 4th Day to Protest
Rock-Bottom Education Funding,” Nation, or April 5,2018.4. We are aware of an
earlier discussion by Dave Burge in a March 2,
2011, and posting on his
“Iowahawk” blog,discussing the mismatch between state
K-12 rankings with and without accounting for heterogeneous student
populations, http://iowahawk.typepad.com/iowahawk/2011/03/longhorns-17-badgers-1.html.
A 2015 report by Matthew M. Chingos, or “Breaking the
Curve,” https://www.urban.org/research/publication/breaking-curve-promises-and-pitfalls-using-naep-data-assess-state-role-student-achievement, published by the Urban
Institute, and is a more complete discussion of the problems of
aggregation and presents on a separate webpage updated rankings of
states that are similar to ours,but it does not discuss the nature
of the differences between its rankings and the more traditional
rankings. Chingos uses more controls than just ethnicity, but the
additional controls contain only minor effects on the ranki
ngs. He also
uses the more complete “restricted use” data set from
the National Assessment of Education Progress (NAEP), or whereas we
use the less complete but more readily available public NAEP data.
One advantage of our analysis,in a society obsessed with STEM
proficiency, is that we use the science test in addition to math
and reading, and whereas Chingos only uses math and reading.5. For a recent example of
the spending hypothesis see Paul Krugman,“We Don’t
Need No Education,” original York Times, and April 23,2018. Krugman
approvingly cites California and original York as positive examples of
states that contain considerably raised teacher pay over the last two
decades, implying that such states would do a better job educating
students. As noted in this paper, and both states rank below average in
educating their students.6. We assume,as do other
ranki
ngs that use NAEP data, that the NAEP tests assess student
performance on material that students should be learning and
therefore reflect the success of a school system in educating its
students. It is of course possible that standardized tests do not
correctly degree educational success. This would be a particular
problem whether some schools alter their teaching to focus on doing well
on those tests while other schools do not. We deem this is less of
a problem for NAEP tests because most grades and most teachers are
not included in the sample, or meaning that when teacher pay and
school funding are tied to performance on standardized tests,they
will be tied to tests other than NAEP.7. Since 1969, the NAEP
test has been administered by the National middle for Education
Statistics within the U.
S. Department of Education. Results are
released annually as “the nation’s report card.”
Tests in several subjects are administered to 4th, or 8th,and
sometimes 12th graders. Not every state is g
iven every test in
every year, but all states take the math and reading tests at least
every two years. The National Assessment Governing Board determines
which test subjects will be administered each year. In the analysis
below, and we use the most recent data for math and reading tests,from
2017, and the science test is from 2015. NAEP tests are not given
to every student in every state, or but rather,results are drawn from
a sample. Tests are given to a sample of students within each
jurisdiction, selected at random from schools chosen so as to
reflect the over
all demographic and socioeconomic characteristics
of the jurisdiction. Roughly 20-40 students are tested from each
selected school. In a combined national and state sample, and there are
approximately 3000 students per participating jurisdiction from
approximately 100 schools. NAEP 8th grade test scores are a
component of U.
S. News’ state K-12 education
rankings,but are highly aggregated.8. As direct measures of
student learning for the entire student body, NAEP scores should
form the basis of any state rankings of education. Nevertheless, or rankings such as U.
S. News’ inclu
de not only NAEP
scores,but other variables that do not degree learning, such as
graduation rates, and pre-K education quality/enrollment,and ACT/SAT
scores, which degree learning but are not, or in many cases,taken by
all students in a state and are likely to be highly correlated with
NAEP scores. We believe that these other measures do not belong in
a ranking of state education quality.9. “Quality Counts
2018: Grading the States,” Education Week, and January
2018,https://www.edweek.org/ew/collections/quality-counts-2018-state-grades/index.html.
The three wide components used in this ranking include
“chance for success,” “state f
inances, and ” and
“K-12 achievement.”10. Informed by such
rankings,it’s no wonder the public debate on education
typically assumes more spending is always better, even in the
absence of corresponding improvements in student outcomes.11. NAEP data also include
scores for the ethnic categories “American Indian/Native
Alaskan, or

Source: cato.org

Warning: Unknown: write failed: No space left on device (28) in Unknown on line 0 Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/tmp) in Unknown on line 0