PyCitySchools

Analysis

Based on the data analyzed, we have observed:

  • Charter schools have higher average scores and also higher percentage of students passing in both subjects.
  • Schools with a size of 2000 or less students have higher average scores and also higher percentage of students passing in both subjects.
  • Schools with a budget of $615 or less also have higher average scores and higher percentage passing.
In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

District Summary

  • Calculate the total number of schools

  • Calculate the total number of students

  • Calculate the total budget

  • Calculate the average math score

  • Calculate the average reading score

  • Calculate the overall passing rate (overall average score), i.e. (avg. math score + avg. reading score)/2

  • Calculate the percentage of students with a passing math score (70 or greater)

  • Calculate the percentage of students with a passing reading score (70 or greater)

  • Create a dataframe to hold the above results

  • Optional: give the displayed data cleaner formatting

In [2]:
# Total number of schools
total_schools = len(school_data_complete.school_name.unique())

# Total number of students
total_students = school_data_complete.student_name.count()

# Total budget
total_budget = sum(school_data_complete.budget.unique())

# Average math score
average_math_score = school_data_complete.math_score.mean()

# Average reading score
average_reading_score = school_data_complete.reading_score.mean()

# Percentage of students with passing math score
passing_math_pct = (school_data_complete[school_data_complete['math_score'] >= 70].math_score.count()) / total_students

# Percentage of students with passing reading score
passing_reading_pct = (school_data_complete[school_data_complete['reading_score'] >= 70].reading_score.count()) / total_students

# Overall passing rate
overall_passing_rate = (passing_math_pct + passing_reading_pct)/2
In [3]:
# Store all calculations into dataframe
district_summary = pd.DataFrame({"Total Schools":[total_schools],
                                 "Total Students":[total_students],
                                 "Total Budget":[total_budget],
                                 "Average Math Score":[average_math_score],
                                 "Average Reading Score":[average_reading_score],
                                 "% Passing Math":[passing_math_pct*100],
                                 "% Passing Reading":[passing_reading_pct*100],
                                 "% Overall Passing Rate":[overall_passing_rate*100]})

# Reformat Total Students and Total Budget
district_summary['Total Students'] = district_summary.apply(lambda x: "{:,.0f}".format(x['Total Students']), axis=1)
district_summary['Total Budget'] = district_summary.apply(lambda x: "${:,.2f}".format(x['Total Budget']), axis=1)

# Print Summary Dataframe
district_summary
Out[3]:
Total Schools Total Students Total Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
0 15 39,170 $24,649,428.00 78.985371 81.87784 74.980853 85.805463 80.393158

School Summary

  • Create an overview table that summarizes key metrics about each school, including:

    • School Name
    • School Type
    • Total Students
    • Total School Budget
    • Per Student Budget
    • Average Math Score
    • Average Reading Score
    • % Passing Math
    • % Passing Reading
    • Overall Passing Rate (Average of the above two)
  • Create a dataframe to hold the above results

In [4]:
school_summary = school_data_complete.groupby(['school_name'])
school_names = school_data_complete.school_name.sort_values().unique()
school_types = school_data.sort_values(by="school_name").type
school_total_students = list(school_summary.student_name.count())
school_budget = list(school_summary.budget.mean())
school_per_student_budget = [i/j for i,j in zip(school_budget,school_total_students)]
school_avg_math_score = list(school_summary.math_score.mean())
school_avg_reading_score = list(school_summary.reading_score.mean())

# Calculating passing percentages
school_summary = school_data_complete[school_data_complete['math_score'] >= 70].groupby(['school_name'])
school_pct_passing_math = [(i/j)*100 for i,j in zip(school_summary.math_score.count(),school_total_students)]
school_summary = school_data_complete[school_data_complete['reading_score'] >= 70].groupby(['school_name'])
school_pct_passing_reading = [(i/j)*100 for i,j in zip(school_summary.reading_score.count(),school_total_students)]
school_overall_passing = [(i+j)/2 for i,j in zip(school_pct_passing_math,school_pct_passing_reading)]

# Compile all calculations into dataframe
school_summary_df = pd.DataFrame({"School Names":school_names,
                                  "School Type":school_types,
                                  "Total Students":school_total_students,
                                  "Total School Budget":school_budget,
                                  "Per Student Budget":school_per_student_budget,
                                  "Average Math Score":school_avg_math_score,
                                  "Average Reading Score":school_avg_reading_score,
                                  "% Passing Math":school_pct_passing_math,
                                  "% Passing Reading":school_pct_passing_reading,
                                  "Overall Passing Rate":school_overall_passing})

school_summary_df = school_summary_df.reset_index(drop=True)
school_summary_df
Out[4]:
School Names School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
0 Bailey High School District 4976 3124928 628.0 77.048432 81.033963 66.680064 81.933280 74.306672
1 Cabrera High School Charter 1858 1081356 582.0 83.061895 83.975780 94.133477 97.039828 95.586652
2 Figueroa High School District 2949 1884411 639.0 76.711767 81.158020 65.988471 80.739234 73.363852
3 Ford High School District 2739 1763916 644.0 77.102592 80.746258 68.309602 79.299014 73.804308
4 Griffin High School Charter 1468 917500 625.0 83.351499 83.816757 93.392371 97.138965 95.265668
5 Hernandez High School District 4635 3022020 652.0 77.289752 80.934412 66.752967 80.862999 73.807983
6 Holden High School Charter 427 248087 581.0 83.803279 83.814988 92.505855 96.252927 94.379391
7 Huang High School District 2917 1910635 655.0 76.629414 81.182722 65.683922 81.316421 73.500171
8 Johnson High School District 4761 3094650 650.0 77.072464 80.966394 66.057551 81.222432 73.639992
9 Pena High School Charter 962 585858 609.0 83.839917 84.044699 94.594595 95.945946 95.270270
10 Rodriguez High School District 3999 2547363 637.0 76.842711 80.744686 66.366592 80.220055 73.293323
11 Shelton High School Charter 1761 1056600 600.0 83.359455 83.725724 93.867121 95.854628 94.860875
12 Thomas High School Charter 1635 1043130 638.0 83.418349 83.848930 93.272171 97.308869 95.290520
13 Wilson High School Charter 2283 1319574 578.0 83.274201 83.989488 93.867718 96.539641 95.203679
14 Wright High School Charter 1800 1049400 583.0 83.682222 83.955000 93.333333 96.611111 94.972222

Top Performing Schools (By Passing Rate)

  • Sort and display the top five schools in overall passing rate
In [5]:
top_5 = school_summary_df.sort_values(by='Overall Passing Rate', ascending=False).head(5).reset_index(drop=True)
top_5
Out[5]:
School Names School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
0 Cabrera High School Charter 1858 1081356 582.0 83.061895 83.975780 94.133477 97.039828 95.586652
1 Thomas High School Charter 1635 1043130 638.0 83.418349 83.848930 93.272171 97.308869 95.290520
2 Pena High School Charter 962 585858 609.0 83.839917 84.044699 94.594595 95.945946 95.270270
3 Griffin High School Charter 1468 917500 625.0 83.351499 83.816757 93.392371 97.138965 95.265668
4 Wilson High School Charter 2283 1319574 578.0 83.274201 83.989488 93.867718 96.539641 95.203679

Bottom Performing Schools (By Passing Rate)

  • Sort and display the five worst-performing schools
In [6]:
bot_5 = school_summary_df.sort_values(by='Overall Passing Rate', ascending=True).head(5).reset_index(drop=True)
bot_5
Out[6]:
School Names School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
0 Rodriguez High School District 3999 2547363 637.0 76.842711 80.744686 66.366592 80.220055 73.293323
1 Figueroa High School District 2949 1884411 639.0 76.711767 81.158020 65.988471 80.739234 73.363852
2 Huang High School District 2917 1910635 655.0 76.629414 81.182722 65.683922 81.316421 73.500171
3 Johnson High School District 4761 3094650 650.0 77.072464 80.966394 66.057551 81.222432 73.639992
4 Ford High School District 2739 1763916 644.0 77.102592 80.746258 68.309602 79.299014 73.804308

Math Scores by Grade

  • Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

    • Create a pandas series for each grade. Hint: use a conditional statement.

    • Group each series by school

    • Combine the series into a dataframe

    • Optional: give the displayed data cleaner formatting

In [7]:
# Function to create series for average math score by grade
def average_math_by_grade(grade):
    school_summary = school_data_complete.loc[school_data_complete.grade == grade].groupby(['school_name'])
    school_names = school_data_complete.school_name.sort_values().unique()
    school_avg_math_score = list(school_summary.math_score.mean())

    # Compile all calculations into dataframe
    average_math_df = pd.DataFrame({"School Names":school_names,
                                    f"{grade} Avg Math Score":school_avg_math_score})

    average_math_df = average_math_df.reset_index(drop=True)
    return average_math_df

# Function to create series for average reading score by grade
def average_reading_by_grade(grade):
    school_summary = school_data_complete.loc[school_data_complete.grade == grade].groupby(['school_name'])
    school_names = school_data_complete.school_name.sort_values().unique()
    school_avg_reading_score = list(school_summary.reading_score.mean())

    # Compile all calculations into dataframe
    average_reading_df = pd.DataFrame({"School Names":school_names,
                                    f"{grade} Avg Reading Score":school_avg_reading_score})

    average_reading_df = average_reading_df.reset_index(drop=True)
    return average_reading_df
In [8]:
# Create series for each grade and group by school
grade_9 = average_math_by_grade('9th')
grade_10 = average_math_by_grade('10th')
grade_11 = average_math_by_grade('11th')
grade_12 = average_math_by_grade('12th')
avg_math_score_by_grade = pd.merge(grade_9,grade_10,how='inner',suffixes=('',''))
avg_math_score_by_grade = pd.merge(avg_math_score_by_grade,grade_11,how='inner',suffixes=('',''))
avg_math_score_by_grade = pd.merge(avg_math_score_by_grade,grade_12,how='inner',suffixes=('',''))
avg_math_score_by_grade
Out[8]:
School Names 9th Avg Math Score 10th Avg Math Score 11th Avg Math Score 12th Avg Math Score
0 Bailey High School 77.083676 76.996772 77.515588 76.492218
1 Cabrera High School 83.094697 83.154506 82.765560 83.277487
2 Figueroa High School 76.403037 76.539974 76.884344 77.151369
3 Ford High School 77.361345 77.672316 76.918058 76.179963
4 Griffin High School 82.044010 84.229064 83.842105 83.356164
5 Hernandez High School 77.438495 77.337408 77.136029 77.186567
6 Holden High School 83.787402 83.429825 85.000000 82.855422
7 Huang High School 77.027251 75.908735 76.446602 77.225641
8 Johnson High School 77.187857 76.691117 77.491653 76.863248
9 Pena High School 83.625455 83.372000 84.328125 84.121547
10 Rodriguez High School 76.859966 76.612500 76.395626 77.690748
11 Shelton High School 83.420755 82.917411 83.383495 83.778976
12 Thomas High School 83.590022 83.087886 83.498795 83.497041
13 Wilson High School 83.085578 83.724422 83.195326 83.035794
14 Wright High School 83.264706 84.010288 83.836782 83.644986

Reading Score by Grade

  • Perform the same operations as above for reading scores
In [9]:
# Create series for each grade and group by school
grade_9 = average_reading_by_grade('9th')
grade_10 = average_reading_by_grade('10th')
grade_11 = average_reading_by_grade('11th')
grade_12 = average_reading_by_grade('12th')
avg_reading_score_by_grade = pd.merge(grade_9,grade_10,how='inner',suffixes=('',''))
avg_reading_score_by_grade = pd.merge(avg_reading_score_by_grade,grade_11,how='inner',suffixes=('',''))
avg_reading_score_by_grade = pd.merge(avg_reading_score_by_grade,grade_12,how='inner',suffixes=('',''))
avg_reading_score_by_grade
Out[9]:
School Names 9th Avg Reading Score 10th Avg Reading Score 11th Avg Reading Score 12th Avg Reading Score
0 Bailey High School 81.303155 80.907183 80.945643 80.912451
1 Cabrera High School 83.676136 84.253219 83.788382 84.287958
2 Figueroa High School 81.198598 81.408912 80.640339 81.384863
3 Ford High School 80.632653 81.262712 80.403642 80.662338
4 Griffin High School 83.369193 83.706897 84.288089 84.013699
5 Hernandez High School 80.866860 80.660147 81.396140 80.857143
6 Holden High School 83.677165 83.324561 83.815534 84.698795
7 Huang High School 81.290284 81.512386 81.417476 80.305983
8 Johnson High School 81.260714 80.773431 80.616027 81.227564
9 Pena High School 83.807273 83.612000 84.335938 84.591160
10 Rodriguez High School 80.993127 80.629808 80.864811 80.376426
11 Shelton High School 84.122642 83.441964 84.373786 82.781671
12 Thomas High School 83.728850 84.254157 83.585542 83.831361
13 Wilson High School 83.939778 84.021452 83.764608 84.317673
14 Wright High School 83.833333 83.812757 84.156322 84.073171

Scores by School Spending

  • Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
    • Average Math Score
    • Average Reading Score
    • % Passing Math
    • % Passing Reading
    • Overall Passing Rate (Average of the above two)
In [10]:
# Sample bins. Feel free to create your own bins.
spending_bins = [0, 585, 615, 645, 675]
group_names = ["<$585", "$585-615", "$615-645", "$645-675"]

scores_by_spending = school_summary_df[["School Names",
                                    "Average Math Score",
                                    "Average Reading Score",
                                    "% Passing Math",
                                    "% Passing Reading",
                                    "Overall Passing Rate"]]
scores_by_spending["Spending Summary"] = pd.cut(school_summary_df["Per Student Budget"], spending_bins, labels=group_names)
scores_by_spending = scores_by_spending.groupby(["Spending Summary"])
scores_by_spending.mean()
C:\Users\David\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
Out[10]:
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
Spending Summary
<$585 83.455399 83.933814 93.460096 96.610877 95.035486
$585-615 83.599686 83.885211 94.230858 95.900287 95.065572
$615-645 79.079225 81.891436 75.668212 86.106569 80.887391
$645-675 76.997210 81.027843 66.164813 81.133951 73.649382

Scores by School Size

  • Perform the same operations as above, based on school size.
In [11]:
# Sample bins. Feel free to create your own bins.
size_bins = [0, 1000, 2000, 5000]
group_names = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

scores_by_school_size = school_summary_df[["School Names",
                                    "Average Math Score",
                                    "Average Reading Score",
                                    "% Passing Math",
                                    "% Passing Reading",
                                    "Overall Passing Rate"]]
scores_by_school_size["School Size Summary"] = pd.cut(school_summary_df["Total Students"], size_bins, labels=group_names)
scores_by_school_size = scores_by_school_size.groupby(["School Size Summary"])
scores_by_school_size.mean()
C:\Users\David\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
Out[11]:
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
School Size Summary
Small (<1000) 83.821598 83.929843 93.550225 96.099437 94.824831
Medium (1000-2000) 83.374684 83.864438 93.599695 96.790680 95.195187
Large (2000-5000) 77.746417 81.344493 69.963361 82.766634 76.364998

Scores by School Type

  • Perform the same operations as above, based on school type.
In [12]:
scores_by_type = school_summary_df[["School Names",
                                    "School Type",
                                    "Average Math Score",
                                    "Average Reading Score",
                                    "% Passing Math",
                                    "% Passing Reading",
                                    "Overall Passing Rate"]]
scores_by_type = scores_by_type.groupby("School Type")
scores_by_type.mean()
Out[12]:
Average Math Score Average Reading Score % Passing Math % Passing Reading Overall Passing Rate
School Type
Charter 83.473852 83.896421 93.620830 96.586489 95.103660
District 76.956733 80.966636 66.548453 80.799062 73.673757
In [ ]: