https://nickouellet.github.io

Nick Ouellet and Garrett Gilliom

Investigating K-12 Academic Performance in Louisiana and New Jersey

Project Goals

The team, Nick Ouellet and Garrett Gilliom, will be investigating data sets related to public school districts that measure student, faculty, and staff demographics, school district academic performance and finances, and more. So far, the team has come across two education-based data sets for Louisiana's parishes* and many more for New Jersey school districts. Collectively, the data measures different variables and attributes of each state's school districts across the same year; the data are also well-populated and taken from each state's Department of Education, which is a reputable source.

The following tutorial will consider and investigate the following questions:

Louisiana: An Initial Investigation

First, the team will bring in, clean up, review, and explore data related to public school districts in Louisiana. Financial information, demographics, and academic performance metrics will be the primary focus of the team's analysis.

Louisiana Data: Finances

This first table is from the U.S. Department of Education National Center for Education Statistics Common Code of Data (CCD). Their website allows users to make specific queries to their database to get personalized data; the team opted to focus on 2016-2017 data on Louisiana public school districts, each of which is its own observation and row, and included nearly all variable measurements to be included for each district, including diversity levels within the district, pupil/teacher ratio, total revenue from different sources, and expenditures. The team chose this data set because it provides data on many points and variables that may be relevant upon further investigation; the years chosen were also found to be relatively well-populated and up-to-date.

This data table not only includes parish (equivalent to districts) information, but information about each school. Because we only wish to study parish data, we will drop the rest of the columns.

We are only studying Louisiana schools, so will will drop the columns that give us redundant data (namely 'State Name [District] Latest available year' and 'State Name [District] 2016-17'). We will also drop 'Agency Name [District]' column because we already are representing that info and 'Agency Type [District] 2016-17' because we know we are only dealing with districts.

Now we will rename certain columns to remove excess wording.

Now, we will remove the "PARISH" values from each row entry under the Parish column and also return the entries as lower case.

Let's now remove all missing data and replace it with NaN.

Checking the dtypes, we find that the columns are not properly formatted so we make the appropriate changes.


Louisiana Data: Academic Performance

The second dataset comes from the Louisiana Department of Education and includes data relating to measurements of student performance, including a “Letter Grade,” Assessment Index/Index ACT, strength of diploma, cohort graduation rate index, and progress points. The folloiwng data can be downloaded by clicking on the "2017 District Performance Scores" link under "District Performance Scores."

Each school district is, again, separated as their own observation, which marks school districts as the unit of observation across both this and the former dataset. Again, this dataset comes from a reputable source, being the Louisiana Department of Education; the data also recognizes that some of its measurements may not be too complete or are unreliable and marks values as such; however, nearly every cell is filled in, so little cleaning will need to be done otherwise. This dataset may provide insight into the following questions:

Rename 'District' column to 'Parish', and lower case all parish names as well as removing "Parish" from values.

We have too much information right now; let's drop some unnecessary columns to declutter the dataframe.

As of now, the ACT assesment index does not mean much. We will translate the ACT index score to an average score reflective of the actual ACT scores achieved by students. The scale is as follows (from louisianabelieves.com):

ACT Score -> ACT Index Score
0-17 -> 0.0
18 -> 70.0
19 -> 80.0
20 -> 90.0
21 -> 100.0
22 -> 103.4
23 -> 106.8
24 -> 110.2

Since our highest score is a 106.0, we won't need to convert to a score higher than a 22 according to the table, and we will convert the lowest score to a 70.

The Final Louisiana Data set:

At this point, the data from both sets have been collected, loaded, and cleaned up. To perform an analysis on these data, the sets will be merged for future ease of use and processing. This results in one large data set that contains demographic, financial, and academic performance information on every parish (school district) in the state.

Louisiana Exploration and Analysis: Finances, ACT Score, and Graduation Rate

Our intuition would tell us that the higher ranked schools would have a higher Revenue per Pupil, but upon further inspection we don't always see this to be true. While the median is highest in the highest ranked schools, we can see that the "B" rank schools actually have the lowest Revenue per Pupil ratio. Lets look at data that would indicate a sucessful education, namely ACT score.

It seems like the highely rated schools produce a higher ACT score in general. Lets plot this information, along with graduation rates and total revenues.

These tables show that bringing in more revenue does not always mean a better education. For example, a school district exists with a C ranking and low ACT score that also brings in the most revenue in the entire state. Furthermore, we cannot say much in terms of Revenue per Pupil and ACT score, either, as they do not seem to correlate much at all.

Let's now explore a different academic performance metric, graduation rate, and more successful school system, the New Jersey public school system, to determine if other variables will result in found relationships.


New Jersey: a Basis for Comparison

Let's take a look at another state's education data, namely New Jersey, which is a consistently ranked high by reputable sources for their K-12 public education system. By investigating the relative differences between New Jersey and Louisiana's education systems, explanations for why Louisiana underperforms may be found.

Similar to the Louisiana data sets, we'll look into district demographics, financial information, and performance statistics to examine the attributes across school districts.

2016-2017 financial data for New Jersey public schools will be used for analysis and taken from the website of New Jersey's Department of Education. The data sets, along with a data layout file, can be found here.

New Jersey Data Collection: Revenue

Let's first take a look at school district revenue, or how much money each school district in New Jersey and Louisiana are bringing in from the local, state, and federal levels.

We would expect that, on average, New Jersey public schools have more funding, as its schools are consistently rated highly compared to other states'.
We will first clean the data, as the revenue for each district is split among many rows.

Much more neat and meaningful.

From this, we can determine both the total revenue of New Jersey public schools and average revenue across its school distrcits. Let's look at these numbers next to the saem statistics of Louisiana and its school districts to see how they line up.

From this, we can see that although New Jersey receives nearly 4x more revenue in total than Louisiana does, Louisiana school districts receive more funding, on average, than those in New Jersey. It's clear that this is the case because New Jersey has far more school distrits than Louisiana does – nearly 10x as many.

When we view histograms of each, we see that both states have a few outliers receiving a lot of funding, with most receiving a much lower portion:

By examining these distributions as a box plot, we can get a better sense of how the data are distributed:

From this, we can tell that Louisiana's funding is spread out better than New Jersey's, as New Jersey's box plot reveals many outliers. Let's look at revenue per pupil, next:

Here we see that New Jersey spends far more money per pupil than Louisiana does – about 6.5x more. To better view this distribution, let's perform a log transform:

From the above, we can see that the average rate of revenue per pupil is much higher for New Jersey school districts than for Louisiana school districts. The two metrics we've looked at, average total revenue and average total revenue per pupil, indicate opposing results. Let's incorporate a metric of academic performance or success, such as graduation rate, to determine whether either of these correllate with academic performance.

New Jersey vs. Louisiana Exploration and Analysis: Revenue and Graduation Rate

New Jersey enrollment and graduation rate data, as published by New Jersey's Department of Education, can be found here under "2016 Adjusted Cohort 4 year Graduation rates (Excel)"

We can see that the average 4 year graduation rate for New Jersey school districts is about 10% higher than that of Louisiana school districts. We'll create scatter plots to test whether a correlation between graduation rate and revenue statistics exists for both New Jersey and Louisiana schools.

From these two tables, we'll be able to plot points based on graduation rates and total revenue, enrollment, and revenue per pupil. To get a better sense of the distribution, we'll be limiting the New Jersey data set based on total revenue, as we've seen that a few isolated outliers exist that makes it difficult to differentiate between the rest of the points.

From the above, we can see that not much of a correlation exists between revenue and 4-year graduation rates within school districts within either state. The size of each point corresponds to the number of enrolled students within each school, which was added for greater context when looking at the distribution of school districts.

Next, let's test revenue per pupil:

These plots, too, produce little to no evidence of a correlation between graduation rates and revenues per pupil. Similar to before, the size of each point corresponds to the size of that school district's student population.

New Jersey Data Collection: Appropriations, Expenditures, and Spending

So far, the team has found no relevant relationship between funding and academic perforamnce of school districts. Next, the team will investigate whether appropriations and spending, rather than funding, has any sort of relationship with academic performance. Instruction materials and resources, teacher salaries, and other forms of education expenditures will be looked into.

By looking at this data closely, it can be seen that a few observations exist without an account. These can be attributed to subtotals that should not be considered a part of the overall summation of monies. Let's filter our data to only work with observations with an actual account number related to each expense.

Now that a proper, well-organized data set of appropriations measurements for each school district in New Jersey exists, it can be added to the team's primary data set, nj_finances, for future analysis.

Finally, before the team can proceed with analysis across both states' data, a column in the Louisiana data set, la_df, must be created to tally total appropriations, or expenditures, as a combination of salary and instruction spending.

New Jersey vs. Louisiana Exploration and Analysis: Appropriations and Graduation Rate

At this point, we've successfully loaded in and processed total appropriations numbers for the New Jersey and Louisiana school districts that had available data. Next, let's take a look at how this feature compares across the two states' school districts.

From this data, we can see that NJ spends much more money in total, which makes sense considering how many more school districts it has. However, what's interesting is that LA school districts spend, on average, about twice as much as NJ school districts do. This reflects what we found earlier: that LA school districts bring in about 3 times more revenue per school district, on average, than those on NJ. These changes between the statistics are likely a result of the presence of massive outliers within the New Jersey school districts, as seen in the scatter plots throughout this analysis.

To explore further, let's investigate appropriations and revenue at the same time by computing their difference for each school district; this will be represented by a new "Monetary Difference" value. We'll be working with a slightly smaller data set for NJ, as not every school district could be accounted for during data work and processing.

First, let's use a boxplot to get a sense of how this new value is distributed across each state. A positive value would indicate that more money is being brought in than being spent while a negative value means the reverse, where more money is being spent by the school district than is being brought in (a.k.a. going over budget).

In New Jersey, it appears that most school districts are over-budget with a good portion being outliers that are well over-budget. In Louisiana, on the other hand, the reverse is true: Most schools are just a bit under-budget with a few leaving lots of room to spare in terms of how much money they could spend.

Next, let's look at the relationship, if any, between graduation rate and appropriations for each state. To do so, we'll plot each variable combination on a scatter plot, similar to our methodology for investigating revenue and graduation rate:

Before the investigation, the team expected to find evidence for a positive correlation between appropriations and graduation rate; it would make sense that the school districts that spend the most money on learning supplies and resources would garner the best performance metrics. However, from these plots, little evidence exists to support a strong argument for this relationship between the two variables. In fact, they appear to be negatively correlated, if anything, which is contradictory to what the team expected and similar to the results of the revenue vs. graduation rate plots.

It's also important that appropriations and spending per pupil is looked into as well, which takes the enrollment size of each school district into account alongside financial information. To do this, a new column will be added to store this information; scatter plots will also be created to aid observational analysis in the event that a correlation exists.

It appears that, on average, New Jersey public school districts spend about 10x more per student than they do in Louisiana. The distance between these averages and that of funding per pupil is greater; New Jersey school districts spends more on average per student than they receive in revenue per student while less is spent than is received per student in Louisiana.

To further investigate, let's plot a distribution of spending per pupil for both states as well as a plot of how that metric relates to graduation rate.

As expected, spending per pupil in New Jersey school districts is far greater than that in Louisiana.

These plots provide little evidence for a relationship between spending per pupil and graduation rate. If anything, each state's correlation appears to be opposite, as Louisiana has a slight, yet scattered, trend upward while New Jersey has a downward trend. This reflects the lack of correlation found for revenue per pupil that was investigated earlier.

Monetary Difference of New Jersey and Louisiana Schools: a Brief Analysis

Finally, we'll look into whether a relationship exists between monetary difference, or how much a school district goes over/under budget, and graduation rate:

A few observations can be made from the plots above. First, the vast majority of school districts are huddled around the 0 value for monetary difference in both plots. Thich makes sense, considering school districts likely aim to be on-par with their budget year after year. Those aforementioned districts also feature a variety of graduation rates, from the 70% range to nearly 95-100%. As the monetary values grow further from that center point, though, the graduation rate tends to move downward in both plots, albeit in different directions.

It should be noted that the presence of school districts that are so over or under budget may be an indicator of error, as the filtering performed on the data to calculate the total amounts of revenue and expenditures may either over-account or under-account funds. Furthermore, these plots only resemble data from a single year; further analysis would require how these graphs change over time to determine if the data are accurate or if a consistent trend is found for a school district.

The Final Model: Predicting Louisiana Graduation Rates

To test the reliability of the relationships between the investigated school district features, the team will create a regression model based on scikit-learn's K-Nearest Neighbors regressor. This model will be trained on the New Jersey data and will predict the graduation rate of Louisiana school districts based on their input features.

If the model was accurate, meaning that a relationship exists between the feature variables in both data sets, then the team would expect the predicted graduation rates for Louisiana to closely resemble their true graduation rates. If the model was accurate and the predicted labels for Louisiana were off, the existence of some other unmeasured feature or factor that's influencing the real-world graduation rates is possible or even likely. However, because the team observed little evidence for a relationship between the variables measured, an inaccurate prediction is expected.

The above bar chart illustrates the model's graduation rate of each Louisiana school district, which is consistently lower than each district's actual graduation rate. The team expected the predicted labels to be off but was surprised by the consistency in the model's error, considering how the previously shown scatter plots illustrated that New Jersey school district graduation rates varied across schools of differing financial and enrollment features.

Now that it has been shown that these measured features are not all that affects graduation rates, the following question is raised: which factors do have an influence? Let's plot each parish, along with their graduation rates, to see if any paterns emerge.

Even though some data are missing in Louisiana (in white), with the data that is represented there does not seem to be any relation between graduation rate and location.

This is just one example of another method of searching for variables and features related to academic performance. Future exploration and data analysis could focus on the search for the factors that are most influencial to the quality or success of K-12 education within school districts.


Closing Thoughts and Future Goals:

The team found little evidence for a correlation between the funding or spending of school districts and academic performance for the school districts within Louisiana and New Jersey. This was surprising, as the team expected to find that school districts with the higher funding or spending to yield higher academic performance results. This lack of evidence was found even, and possibly more importantly, when considering revenue per pupil, which takes both enrollment size and financial information into account and adds a layer of standardization across each states' districts. Although strange, this information could be used to pinpoint school districts that may be collecting great amount of money, only to spend it in an irresponsible way.

More investigations should be completed to determine why school districts with high funding amounts are underperforming and how school districts with low funding levels are performing well in both states. Furthermore, the lack of a found relationship between variables that were expected to correlate may suggest that other factors beyond the financial, enrollment, and performance metrics used here contribute to the difference in education standards and quality between the two states. Further research on the variables investigated here, as well as others, is recommended.

As a result of this exploratory data analysis completed on data sets relating to financial information and academic performance of both New Jersey and Louisiana school districts, the team has deteremined new questions and considerations to be used in future analyses:

Validity of data and results

Other relationships, features, and labels to investigate

It's vital that future investigations are completed to test new features of school districts in search for relevant relationships. If found, the relationships present may lead to projects or initiatives by policymakers that are focused on improving education standards based on discrepancies found within under-performing school districts. If none are continued to be found, it's possible that factors outside of school district-specific data are influencing the differing levels of academic performance across communities. The team's results showed little relationship between the features overall despite the expectation of a relationship to be found. To validate the team's results, more analysis or other investigations should be completed in the future.