PyCitySchools¶

Analysis¶

Based on the data analyzed, we have observed:

Charter schools have higher average scores and also higher percentage of students passing in both subjects.
Schools with a size of 2000 or less students have higher average scores and also higher percentage of students passing in both subjects.
Schools with a budget of $615 or less also have higher average scores and higher percentage passing.

# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"

# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])

District Summary¶

Calculate the total number of schools
Calculate the total number of students
Calculate the total budget
Calculate the average math score
Calculate the average reading score
Calculate the overall passing rate (overall average score), i.e. (avg. math score + avg. reading score)/2
Calculate the percentage of students with a passing math score (70 or greater)
Calculate the percentage of students with a passing reading score (70 or greater)
Create a dataframe to hold the above results
Optional: give the displayed data cleaner formatting

# Total number of schools
total_schools = len(school_data_complete.school_name.unique())

# Total number of students
total_students = school_data_complete.student_name.count()

# Total budget
total_budget = sum(school_data_complete.budget.unique())

# Average math score
average_math_score = school_data_complete.math_score.mean()

# Average reading score
average_reading_score = school_data_complete.reading_score.mean()

# Percentage of students with passing math score
passing_math_pct = (school_data_complete[school_data_complete['math_score'] >= 70].math_score.count()) / total_students

# Percentage of students with passing reading score
passing_reading_pct = (school_data_complete[school_data_complete['reading_score'] >= 70].reading_score.count()) / total_students

# Overall passing rate
overall_passing_rate = (passing_math_pct + passing_reading_pct)/2

# Store all calculations into dataframe
district_summary = pd.DataFrame({"Total Schools":[total_schools],
                                 "Total Students":[total_students],
                                 "Total Budget":[total_budget],
                                 "Average Math Score":[average_math_score],
                                 "Average Reading Score":[average_reading_score],
                                 "% Passing Math":[passing_math_pct*100],
                                 "% Passing Reading":[passing_reading_pct*100],
                                 "% Overall Passing Rate":[overall_passing_rate*100]})

# Reformat Total Students and Total Budget
district_summary['Total Students'] = district_summary.apply(lambda x: "{:,.0f}".format(x['Total Students']), axis=1)
district_summary['Total Budget'] = district_summary.apply(lambda x: "${:,.2f}".format(x['Total Budget']), axis=1)

# Print Summary Dataframe
district_summary

School Summary¶

Create an overview table that summarizes key metrics about each school, including:
- School Name
- School Type
- Total Students
- Total School Budget
- Per Student Budget
- Average Math Score
- Average Reading Score
- % Passing Math
- % Passing Reading
- Overall Passing Rate (Average of the above two)
Create a dataframe to hold the above results

school_summary = school_data_complete.groupby(['school_name'])
school_names = school_data_complete.school_name.sort_values().unique()
school_types = school_data.sort_values(by="school_name").type
school_total_students = list(school_summary.student_name.count())
school_budget = list(school_summary.budget.mean())
school_per_student_budget = [i/j for i,j in zip(school_budget,school_total_students)]
school_avg_math_score = list(school_summary.math_score.mean())
school_avg_reading_score = list(school_summary.reading_score.mean())

# Calculating passing percentages
school_summary = school_data_complete[school_data_complete['math_score'] >= 70].groupby(['school_name'])
school_pct_passing_math = [(i/j)*100 for i,j in zip(school_summary.math_score.count(),school_total_students)]
school_summary = school_data_complete[school_data_complete['reading_score'] >= 70].groupby(['school_name'])
school_pct_passing_reading = [(i/j)*100 for i,j in zip(school_summary.reading_score.count(),school_total_students)]
school_overall_passing = [(i+j)/2 for i,j in zip(school_pct_passing_math,school_pct_passing_reading)]

# Compile all calculations into dataframe
school_summary_df = pd.DataFrame({"School Names":school_names,
                                  "School Type":school_types,
                                  "Total Students":school_total_students,
                                  "Total School Budget":school_budget,
                                  "Per Student Budget":school_per_student_budget,
                                  "Average Math Score":school_avg_math_score,
                                  "Average Reading Score":school_avg_reading_score,
                                  "% Passing Math":school_pct_passing_math,
                                  "% Passing Reading":school_pct_passing_reading,
                                  "Overall Passing Rate":school_overall_passing})

school_summary_df = school_summary_df.reset_index(drop=True)
school_summary_df

Top Performing Schools (By Passing Rate)¶

Sort and display the top five schools in overall passing rate

top_5 = school_summary_df.sort_values(by='Overall Passing Rate', ascending=False).head(5).reset_index(drop=True)
top_5

Bottom Performing Schools (By Passing Rate)¶

Sort and display the five worst-performing schools

bot_5 = school_summary_df.sort_values(by='Overall Passing Rate', ascending=True).head(5).reset_index(drop=True)
bot_5

Math Scores by Grade¶

Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.
- Create a pandas series for each grade. Hint: use a conditional statement.
- Group each series by school
- Combine the series into a dataframe
- Optional: give the displayed data cleaner formatting

# Function to create series for average math score by grade
def average_math_by_grade(grade):
    school_summary = school_data_complete.loc[school_data_complete.grade == grade].groupby(['school_name'])
    school_names = school_data_complete.school_name.sort_values().unique()
    school_avg_math_score = list(school_summary.math_score.mean())

    # Compile all calculations into dataframe
    average_math_df = pd.DataFrame({"School Names":school_names,
                                    f"{grade} Avg Math Score":school_avg_math_score})

    average_math_df = average_math_df.reset_index(drop=True)
    return average_math_df

# Function to create series for average reading score by grade
def average_reading_by_grade(grade):
    school_summary = school_data_complete.loc[school_data_complete.grade == grade].groupby(['school_name'])
    school_names = school_data_complete.school_name.sort_values().unique()
    school_avg_reading_score = list(school_summary.reading_score.mean())

    # Compile all calculations into dataframe
    average_reading_df = pd.DataFrame({"School Names":school_names,
                                    f"{grade} Avg Reading Score":school_avg_reading_score})

    average_reading_df = average_reading_df.reset_index(drop=True)
    return average_reading_df

# Create series for each grade and group by school
grade_9 = average_math_by_grade('9th')
grade_10 = average_math_by_grade('10th')
grade_11 = average_math_by_grade('11th')
grade_12 = average_math_by_grade('12th')
avg_math_score_by_grade = pd.merge(grade_9,grade_10,how='inner',suffixes=('',''))
avg_math_score_by_grade = pd.merge(avg_math_score_by_grade,grade_11,how='inner',suffixes=('',''))
avg_math_score_by_grade = pd.merge(avg_math_score_by_grade,grade_12,how='inner',suffixes=('',''))
avg_math_score_by_grade

Reading Score by Grade¶

Perform the same operations as above for reading scores

# Create series for each grade and group by school
grade_9 = average_reading_by_grade('9th')
grade_10 = average_reading_by_grade('10th')
grade_11 = average_reading_by_grade('11th')
grade_12 = average_reading_by_grade('12th')
avg_reading_score_by_grade = pd.merge(grade_9,grade_10,how='inner',suffixes=('',''))
avg_reading_score_by_grade = pd.merge(avg_reading_score_by_grade,grade_11,how='inner',suffixes=('',''))
avg_reading_score_by_grade = pd.merge(avg_reading_score_by_grade,grade_12,how='inner',suffixes=('',''))
avg_reading_score_by_grade

Scores by School Spending¶

Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
- Average Math Score
- Average Reading Score
- % Passing Math
- % Passing Reading
- Overall Passing Rate (Average of the above two)

# Sample bins. Feel free to create your own bins.
spending_bins = [0, 585, 615, 645, 675]
group_names = ["<$585", "$585-615", "$615-645", "$645-675"]

scores_by_spending = school_summary_df[["School Names",
                                    "Average Math Score",
                                    "Average Reading Score",
                                    "% Passing Math",
                                    "% Passing Reading",
                                    "Overall Passing Rate"]]
scores_by_spending["Spending Summary"] = pd.cut(school_summary_df["Per Student Budget"], spending_bins, labels=group_names)
scores_by_spending = scores_by_spending.groupby(["Spending Summary"])
scores_by_spending.mean()

C:\Users\David\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()

Scores by School Size¶

Perform the same operations as above, based on school size.

# Sample bins. Feel free to create your own bins.
size_bins = [0, 1000, 2000, 5000]
group_names = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

scores_by_school_size = school_summary_df[["School Names",
                                    "Average Math Score",
                                    "Average Reading Score",
                                    "% Passing Math",
                                    "% Passing Reading",
                                    "Overall Passing Rate"]]
scores_by_school_size["School Size Summary"] = pd.cut(school_summary_df["Total Students"], size_bins, labels=group_names)
scores_by_school_size = scores_by_school_size.groupby(["School Size Summary"])
scores_by_school_size.mean()

C:\Users\David\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()

Scores by School Type¶

Perform the same operations as above, based on school type.

scores_by_type = school_summary_df[["School Names",
                                    "School Type",
                                    "Average Math Score",
                                    "Average Reading Score",
                                    "% Passing Math",
                                    "% Passing Reading",
                                    "Overall Passing Rate"]]
scores_by_type = scores_by_type.groupby("School Type")
scores_by_type.mean()

	School Names	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
0	Bailey High School	District	4976	3124928	628.0	77.048432	81.033963	66.680064	81.933280	74.306672
1	Cabrera High School	Charter	1858	1081356	582.0	83.061895	83.975780	94.133477	97.039828	95.586652
2	Figueroa High School	District	2949	1884411	639.0	76.711767	81.158020	65.988471	80.739234	73.363852
3	Ford High School	District	2739	1763916	644.0	77.102592	80.746258	68.309602	79.299014	73.804308
4	Griffin High School	Charter	1468	917500	625.0	83.351499	83.816757	93.392371	97.138965	95.265668
5	Hernandez High School	District	4635	3022020	652.0	77.289752	80.934412	66.752967	80.862999	73.807983
6	Holden High School	Charter	427	248087	581.0	83.803279	83.814988	92.505855	96.252927	94.379391
7	Huang High School	District	2917	1910635	655.0	76.629414	81.182722	65.683922	81.316421	73.500171
8	Johnson High School	District	4761	3094650	650.0	77.072464	80.966394	66.057551	81.222432	73.639992
9	Pena High School	Charter	962	585858	609.0	83.839917	84.044699	94.594595	95.945946	95.270270
10	Rodriguez High School	District	3999	2547363	637.0	76.842711	80.744686	66.366592	80.220055	73.293323
11	Shelton High School	Charter	1761	1056600	600.0	83.359455	83.725724	93.867121	95.854628	94.860875
12	Thomas High School	Charter	1635	1043130	638.0	83.418349	83.848930	93.272171	97.308869	95.290520
13	Wilson High School	Charter	2283	1319574	578.0	83.274201	83.989488	93.867718	96.539641	95.203679
14	Wright High School	Charter	1800	1049400	583.0	83.682222	83.955000	93.333333	96.611111	94.972222

	School Names	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
0	Cabrera High School	Charter	1858	1081356	582.0	83.061895	83.975780	94.133477	97.039828	95.586652
1	Thomas High School	Charter	1635	1043130	638.0	83.418349	83.848930	93.272171	97.308869	95.290520
2	Pena High School	Charter	962	585858	609.0	83.839917	84.044699	94.594595	95.945946	95.270270
3	Griffin High School	Charter	1468	917500	625.0	83.351499	83.816757	93.392371	97.138965	95.265668
4	Wilson High School	Charter	2283	1319574	578.0	83.274201	83.989488	93.867718	96.539641	95.203679

	School Names	School Type	Total Students	Total School Budget	Per Student Budget	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
0	Rodriguez High School	District	3999	2547363	637.0	76.842711	80.744686	66.366592	80.220055	73.293323
1	Figueroa High School	District	2949	1884411	639.0	76.711767	81.158020	65.988471	80.739234	73.363852
2	Huang High School	District	2917	1910635	655.0	76.629414	81.182722	65.683922	81.316421	73.500171
3	Johnson High School	District	4761	3094650	650.0	77.072464	80.966394	66.057551	81.222432	73.639992
4	Ford High School	District	2739	1763916	644.0	77.102592	80.746258	68.309602	79.299014	73.804308

	School Names	9th Avg Math Score	10th Avg Math Score	11th Avg Math Score	12th Avg Math Score
0	Bailey High School	77.083676	76.996772	77.515588	76.492218
1	Cabrera High School	83.094697	83.154506	82.765560	83.277487
2	Figueroa High School	76.403037	76.539974	76.884344	77.151369
3	Ford High School	77.361345	77.672316	76.918058	76.179963
4	Griffin High School	82.044010	84.229064	83.842105	83.356164
5	Hernandez High School	77.438495	77.337408	77.136029	77.186567
6	Holden High School	83.787402	83.429825	85.000000	82.855422
7	Huang High School	77.027251	75.908735	76.446602	77.225641
8	Johnson High School	77.187857	76.691117	77.491653	76.863248
9	Pena High School	83.625455	83.372000	84.328125	84.121547
10	Rodriguez High School	76.859966	76.612500	76.395626	77.690748
11	Shelton High School	83.420755	82.917411	83.383495	83.778976
12	Thomas High School	83.590022	83.087886	83.498795	83.497041
13	Wilson High School	83.085578	83.724422	83.195326	83.035794
14	Wright High School	83.264706	84.010288	83.836782	83.644986

	School Names	9th Avg Reading Score	10th Avg Reading Score	11th Avg Reading Score	12th Avg Reading Score
0	Bailey High School	81.303155	80.907183	80.945643	80.912451
1	Cabrera High School	83.676136	84.253219	83.788382	84.287958
2	Figueroa High School	81.198598	81.408912	80.640339	81.384863
3	Ford High School	80.632653	81.262712	80.403642	80.662338
4	Griffin High School	83.369193	83.706897	84.288089	84.013699
5	Hernandez High School	80.866860	80.660147	81.396140	80.857143
6	Holden High School	83.677165	83.324561	83.815534	84.698795
7	Huang High School	81.290284	81.512386	81.417476	80.305983
8	Johnson High School	81.260714	80.773431	80.616027	81.227564
9	Pena High School	83.807273	83.612000	84.335938	84.591160
10	Rodriguez High School	80.993127	80.629808	80.864811	80.376426
11	Shelton High School	84.122642	83.441964	84.373786	82.781671
12	Thomas High School	83.728850	84.254157	83.585542	83.831361
13	Wilson High School	83.939778	84.021452	83.764608	84.317673
14	Wright High School	83.833333	83.812757	84.156322	84.073171

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
Spending Summary
<$585	83.455399	83.933814	93.460096	96.610877	95.035486
$585-615	83.599686	83.885211	94.230858	95.900287	95.065572
$615-645	79.079225	81.891436	75.668212	86.106569	80.887391
$645-675	76.997210	81.027843	66.164813	81.133951	73.649382

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
School Size Summary
Small (<1000)	83.821598	83.929843	93.550225	96.099437	94.824831
Medium (1000-2000)	83.374684	83.864438	93.599695	96.790680	95.195187
Large (2000-5000)	77.746417	81.344493	69.963361	82.766634	76.364998

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	Overall Passing Rate
School Type
Charter	83.473852	83.896421	93.620830	96.586489	95.103660
District	76.956733	80.966636	66.548453	80.799062	73.673757