We would like to learn more about the demographic information breakdown in suspects and victims. The analysis is done through three aspects: race, age, and sex. We divided the exploration into different boroughs as it is a representative way of looking at NYC.
susp_race=demo|>
select(boro_nm,susp_race)|>
group_by(boro_nm,susp_race)|>
summarise(count=n())|>
group_by(boro_nm)|>
mutate(total=sum(count),
percent=(count/total)*100)|>
select(-count,-total)
colnames(susp_race)=c("boro_nm","race","percent_susp")
vic_race=demo|>
select(boro_nm,vic_race)|>
group_by(boro_nm,vic_race)|>
summarise(count=n())|>
group_by(boro_nm)|>
mutate(total=sum(count),
percent=(count/total)*100)|>
select(-count,-total)
colnames(vic_race)=c("boro_nm","race","percent_vic")
combined_race=vic_race|>
full_join(susp_race,by=c("boro_nm","race"))|>
pivot_longer(
cols=starts_with("percent"),
names_to="category",
values_to="percent"
)|>
mutate(
category=ifelse(category=="percent_susp","SUSPECT","VICTIM")
)
race_borough_plot=combined_race|>
ggplot(aes(x=boro_nm,y=percent,fill=race))+
geom_bar(stat="identity")+
facet_grid(~category)+
labs(
title="Race Distribution in Suspects and Victims",
x="NYC boroughs",
y="Percentage(%)"
)+
theme_minimal()+
scale_fill_viridis_d()+
#scale_fill_brewer(palette = "Spectral")+
theme(axis.text.x=element_text(angle=45,hjust=1),legend.position="bottom")
plot(race_borough_plot)
This figure shows the percentage of different race for all NYC boroughs. Black, white, and white hispanic are the top three percent in both suspects and victims across boroughs. The distribution of races is similar between suspects and victims for each borough.
susp_age=demo|>
select(boro_nm,susp_age_group)|>
group_by(boro_nm,susp_age_group)|>
summarise(count=n())|>
filter(count>10)|>
group_by(boro_nm)|>
mutate(total=sum(count),
percent=(count/total)*100)|>
select(-count,-total)
colnames(susp_age)=c("boro_nm","age","percent_susp")
vic_age=demo|>
select(boro_nm,vic_age_group)|>
group_by(boro_nm,vic_age_group)|>
summarise(count=n())|>
filter(count>10)|>
group_by(boro_nm)|>
mutate(total=sum(count),
percent=(count/total)*100)|>
select(-count,-total)
colnames(vic_age)=c("boro_nm","age","percent_vic")
combined_age=vic_age|>
full_join(susp_age)|>
pivot_longer(
cols=starts_with("percent"),
names_to="category",
values_to="percent"
)|>
mutate(
category=ifelse(category=="percent_susp","SUSPECT","VICTIM")
)
age_borough_plot=combined_age|>
ggplot(aes(x=boro_nm,y=percent,fill=age))+
geom_bar(stat="identity")+
facet_grid(~category)+
labs(
title="Age Distribution in Suspects and Victims",
x="NYC boroughs",
y="Percentage(%)"
)+
theme_minimal()+
scale_fill_viridis_d()+
#scale_fill_brewer(palette = "Dark2")+
theme(axis.text.x=element_text(angle=45,hjust=1),legend.position="bottom")
plot(age_borough_plot)
This figure shows the percentage of age group for suspect and victim in all NYC boroughs. Majority of suspects and victims come from the age group of 25-44. The distribution of age is similar between suspects and victims for each borough as well as across boroughs.
data=df_nypd|>
select(law_cat_cd,susp_age_group,susp_race,susp_sex,vic_age_group,vic_race,vic_sex)|>
mutate_all(~na_if(.,"UNKNOWN"))|>
na.omit()
susp_sex=demo|>
select(boro_nm,susp_sex)|>
group_by(boro_nm,susp_sex)|>
summarise(count=n())|>
filter(susp_sex!="U")|>
group_by(boro_nm)|>
mutate(total=sum(count),
percent=(count/total)*100)|>
select(-count,-total)
colnames(susp_sex)=c("boro_nm","sex","percent_susp")
vic_sex=demo|>
select(boro_nm,vic_sex)|>
group_by(boro_nm,vic_sex)|>
summarise(count=n())|>
filter(vic_sex%in%c("F","M"))|>
group_by(boro_nm)|>
mutate(total=sum(count),
percent=(count/total)*100)|>
select(-count,-total)
colnames(vic_sex)=c("boro_nm","sex","percent_vic")
combined_sex=vic_sex|>
full_join(susp_sex)|>
pivot_longer(
cols=starts_with("percent"),
names_to="category",
values_to="percent"
)|>
mutate(
category=ifelse(category=="percent_susp","SUSPECT","VICTIM")
)
sex_borough_plot=combined_sex|>
ggplot(aes(x=boro_nm,y=percent,fill=sex))+
geom_bar(stat="identity")+
facet_grid(~category)+
labs(
title="Sex Distribution in Suspects and Victims",
x="NYC boroughs",
y="Percentage(%)"
)+
theme_minimal()+
scale_fill_viridis_d()+
#scale_fill_brewer(palette = "Dark2")+
theme(axis.text.x=element_text(angle=45,hjust=1),legend.position="bottom")
plot(sex_borough_plot)
This figure shows the percentage of sex in suspects and victims for all NYC boroughs. Around 75% of suspects are male and around 25% are female for all boroughs. Around 70% of victims are female and around 30% are male for all boroughs.
The following queries are exploring the demographic information breakdown in suspects and victims by the severity of the crime reported. Using the crime levels of violation, misdemeanor, and felony, these charts visualize the counts of suspects/victims by race, age, and sex, respectively.
race_combined <- gather(data, key = "variable", value = "race", susp_race, vic_race)
race_plot <- ggplot(race_combined, aes(x = law_cat_cd, fill = race)) +
geom_bar(position = "dodge") +
labs(title = "Distribution of supect and victim race by level of offense") +
guides(fill = guide_legend(title = "Race")) +
facet_wrap(~variable, scales = "free_x", ncol = 2) +
theme_minimal()+
scale_fill_viridis_d()+
#scale_fill_brewer(palette = "Dark2")+
theme(axis.text.x=element_text(angle=45,hjust=1),legend.position="bottom")
print(race_plot)
This chart describes the counts of race distribution over different levels of crime in both suspects and victims. We can see that black individuals have the highest counts across all crimes for both suspects and victims.
age_combined <- gather(data, key = "variable", value = "age", susp_age_group, vic_age_group)
age_plot <- data |>
filter(susp_age_group %in% c('<18', '18-24', '25-44', '45-64', '65+') & vic_age_group %in% c('<18', '18-24', '25-44', '45-64', '65+')) |>
gather(key = "variable", value = "age", susp_age_group, vic_age_group) |>
ggplot(aes(x = law_cat_cd, fill = age)) +
geom_bar(position = "dodge") +
labs(title = "Distribution of suspect and victim age by level of offense") +
guides(fill = guide_legend(title = "Age")) +
facet_wrap(~variable, scales = "free_x", ncol = 2) +
theme_minimal()+
scale_fill_viridis_d()+
#scale_fill_brewer(palette = "Dark2")+
theme(axis.text.x=element_text(angle=45,hjust=1),legend.position="bottom")
print(age_plot)
This chart describes counts of individuals in each age group across different levels of offense in both suspects and victims. We see that the 25-44 age group is the most populous for both suspects and victims across all levels of crime.
sex_combined <- gather(data, key = "variable", value = "sex", susp_sex, vic_sex)
sex_plot <- data |>
filter(susp_sex %in% c('F', 'M') & vic_sex %in% c('F', 'M')) |>
gather(key = "variable", value = "sex", susp_sex, vic_sex) |>
ggplot(aes(x = law_cat_cd, fill = sex)) +
geom_bar(position = "dodge") +
labs(title = "Distribution of suspect and victim sex by level of offense") +
guides(fill = guide_legend(title = "Sex")) +
facet_wrap(~variable, scales = "free_x", ncol = 2) +
theme_minimal()+
scale_fill_viridis_d()+
#scale_fill_brewer(palette = "Dark2")+
theme(axis.text.x=element_text(angle=45,hjust=1),legend.position="bottom")
print(sex_plot)
This chart describes counts of individuals in each sex across levels of defense in both suspects and victims. In this chart we see that women make up a greater number of victims, while men make up a greater number of suspects.
Throughout our public health education, we have learned about the importance of intersectionality. While these snapshots give us information about key demographic identities, we may be missing some kind of bigger picture. So, now we will investigate which combinations of identities are the most prevalent among the suspects and victims.
vic_age_group | vic_race | vic_sex | count |
---|---|---|---|
25-44 | BLACK | F | 116922 |
25-44 | WHITE HISPANIC | F | 75828 |
25-44 | BLACK | M | 48217 |
45-64 | BLACK | F | 46294 |
25-44 | WHITE | F | 39104 |
susp_age_group | susp_race | susp_sex | count |
---|---|---|---|
25-44 | BLACK | M | 161899 |
25-44 | WHITE HISPANIC | M | 84750 |
25-44 | BLACK | F | 55501 |
45-64 | BLACK | M | 52222 |
18-24 | BLACK | M | 50762 |
After creating these tables, we are able to see that black women between the ages of 25-44 make up the highest count of victims, and black men between the ages of 25-44 make up the highest count of suspects. Policies targeted towards reducing crime and protecting civilians should take into consideration the unique socio-ecological factors surrounding these groups.