library(skimr)  # get overview of data
library(tidyverse)  # data management + ggplot2 graphics
library(readxl)  # import excel data
library(ggplot2)  # plots
library(gtsummary)  # summary statistics and tests
library(here)  # helps with file management
library(knitr)
library(janitor)  # for data cleaning, making tables
library(wesanderson)  # ggplot2 palettes
library(paletteer)  # extra ggplot2 palettes
library(dplyr)
library(broom)
library(stringr)
library(viridisLite)
library(glue)  # this was added so gt() would show
library(gt)

The Growth of Death + Black Metal in the U.S.

This dataset is sourced from mrpantherson on Kaggle which is titled “Metal Bands by Nation.”

This data analysis project was done solely by Saffron Evergreen, aside from the maker of the dataset obtained on Kaggle. There is no contribution from classmates or other peers.

1. Define research question

(10 points)

Research question:

Between the time brackets of 1980-1989 and 1990-1999, which decade had the most significant growth in death/black metal band formation in the United States?

Reasoning:

My hypothesis is that there would be a significant increase in death/black metal bands forming and the growth of fans over the course of those two decades due to the Satanic Panic that was happening primarily in the U.S. and the effect that had on various types of pop-culture. I have chosen to only look at death and black sub-types of metal since those two seem to be mostly connected to modern-day Satanism and utilize their shock-factor abilities as part of their stage presence.

2. Load the data

(10 points)

metal_bands <- read_excel("Metal_Bands_BSTA.xlsx", col_types = c("numeric", "text",
    "numeric", "numeric", "text", "numeric", "text"))  # changing col_types here will automatically change the numeric columns as numeric, while keeping NA values as NA
gt(head(metal_bands))  # previewed the data to get a peek of how it's formatted
...1 band_name fans formed origin split style
1 Iron Maiden 4195 1975 United Kingdom NA New wave of british heavy,Heavy
2 Opeth 4147 1990 Sweden 1990 Extreme progressive,Progressive rock,Progressive
3 Metallica 3712 1981 USA NA Heavy,Bay area thrash
4 Megadeth 3105 1983 USA 1983 Thrash,Heavy,Hard rock
5 Amon Amarth 3054 1988 Sweden NA Melodic death
6 Slayer 2955 1981 USA 1981 Thrash
skim_metal <- skim(metal_bands) %>%
    as_tibble() %>%
    print()  # saved skim() to keep as a reference point
## # A tibble: 7 x 17
##   skim_type skim_variable n_missing complete_rate character.min character.max
##   <chr>     <chr>             <int>         <dbl>         <int>         <int>
## 1 character band_name             0         1                 1            48
## 2 character origin                8         0.998             3            31
## 3 character style                 0         1                 2            82
## 4 numeric   ...1                  0         1                NA            NA
## 5 numeric   fans                  0         1                NA            NA
## 6 numeric   formed                4         0.999            NA            NA
## 7 numeric   split              2215         0.557            NA            NA
## # ... with 11 more variables: character.empty <int>, character.n_unique <int>,
## #   character.whitespace <int>, numeric.mean <dbl>, numeric.sd <dbl>,
## #   numeric.p0 <dbl>, numeric.p25 <dbl>, numeric.p50 <dbl>, numeric.p75 <dbl>,
## #   numeric.p100 <dbl>, numeric.hist <chr>

I utilized the gt() and head() functions to get a clean and quick look at how this data is structured and organized. This table shows the top 6 popular metal bands (includes all styles), how many fans they had/have, what year they formed, the country of origin, what year the band split and descriptors of the style of metal the bands played.

I saved a skim() table as a reference point to make sure my data types were correct and to be able to look back at the completion rates and missing values of the variables.

What needs to be done for transformation:

  • filter out the bands that only originate from the U.S.
  • remove the “…1” (a rank in popularity) and “split” column since those are not relevant to my analysis, however, I am keeping the column “fans” because it might be fun for visualization later
  • mutate and create a factor for the years 1980-1989 and 1990-1999
  • filter out the bands that have any variation of “black” or “death” in the style variable

3. Transform the data

(15 points)

# filter, make data table showing only U.S. bands

metal_filtered <- metal_bands %>%
    filter(origin %in% ("USA"))

# remove the column '...1' (ranked in popularity) and 'split' since it is not
# important in this analysis and takes up space

metal_filtered <- subset(metal_filtered, select = -c(1, 6))

# this allows me to minimize unnecessary bulk so I can transform and analyze
# easier
# mutate and create new table with a factored column labeling which bands
# formed between the years 1980-1989 and 1990-1999

metal_new <- metal_filtered %>%
    mutate(formed_category = case_when(formed >= 1980 & formed < 1990 ~ "1980-1989",
        formed >= 1990 & formed < 2000 ~ "1990-1999")) %>%
    mutate(formed_category = factor(formed_category))

metal_new <- na.omit(metal_new)  # this gets rid of all rows containing N/A

# na.omit shows only bands (all styles) that formed within the two decades

skim_metal_new <- skim(metal_new) %>%
    as_tibble() %>%
    print()
## # A tibble: 6 x 20
##   skim_type skim_variable   n_missing complete_rate character.min character.max
##   <chr>     <chr>               <int>         <dbl>         <int>         <int>
## 1 character band_name               0             1             2            27
## 2 character origin                  0             1             3             3
## 3 character style                   0             1             2            65
## 4 factor    formed_category         0             1            NA            NA
## 5 numeric   fans                    0             1            NA            NA
## 6 numeric   formed                  0             1            NA            NA
## # ... with 14 more variables: character.empty <int>, character.n_unique <int>,
## #   character.whitespace <int>, factor.ordered <lgl>, factor.n_unique <int>,
## #   factor.top_counts <chr>, numeric.mean <dbl>, numeric.sd <dbl>,
## #   numeric.p0 <dbl>, numeric.p25 <dbl>, numeric.p50 <dbl>, numeric.p75 <dbl>,
## #   numeric.p100 <dbl>, numeric.hist <chr>
# saved this skim table as another reference point, making sure there are no
# missing values and the completion rate is 1 for all columns


# 454 out of 5,000 bands (all styles) that formed in the USA between 1980-1999
# Reminder: objective is to find out the growth/loss rate of bands that are death and/or black metal in the USA, between the two decades

metal_complete <- dplyr::filter( # dplyr::filter because this wouldn't work without specifically calling the package
  metal_new, grepl("death|black", style, ignore.case = TRUE)) %>% #grepl(keywords, column)
  filter(!duplicated(band_name)) # use !duplicated to make extra sure there is no overlap in band_names

BONUS POINTS (not applicable)

This dataset needed reduction and organizing; no merging was needed, at least at this point of the project.

Beyond this point, the data represented in various tables meet these parameters; formed between 1980-1999, originated from the US, and the band’s metal style is labeled under any variation of “death” or “black”.

Transformed data shown in tables

(head(metal_complete)) %>%
    print()
## # A tibble: 6 x 6
##   band_name        fans formed origin style                      formed_category
##   <chr>           <dbl>  <dbl> <chr>  <chr>                      <fct>          
## 1 Death            2690   1983 USA    Progressive death,Death,P~ 1980-1989      
## 2 Agalloch         1881   1995 USA    Atmospheric black,Neofolk  1990-1999      
## 3 Nile             1189   1993 USA    Brutal death,Technical de~ 1990-1999      
## 4 Cannibal Corpse  1162   1988 USA    Death                      1980-1989      
## 5 Morbid Angel      975   1984 USA    Death                      1980-1989      
## 6 Deicide           628   1987 USA    Death                      1980-1989
skim_metal_complete <- skim(metal_complete) %>%
    as_tibble() %>%
    view()  # saved for another reference point, looking at this table I'm able to see the band count for either black or death metal style has dropped to 122 over the 2 decades

Are the values what you expected for the variables? Why or Why not?

After reviewing my transformed data frame, I am able to see that of there are 122 death/black metal bands out of 422 metal bands formed in the U.S. between 1980-1999. I looked at the two histograms provided in the skim table and am suprised that the fan count is negatively skewed and band formation count is mostly unimodal, may bimodal, with a faint positive skew. I wasn’t expecting this because I was thinking there would’ve been a take off in death/black metal band formations and fan counts, with it pretty much leveling out over time.

4. Visualizing and Summarizing the Data

(15 points)

# Reminder: objective is to find out the growth/loss rate of bands that are death and/or black metal in the USA, between the two decades

# shows/organizes how many bands were formed, between the two decades
decade_count <- metal_complete %>%
  group_by(formed_category) %>% 
  summarize(count = n())
gt(decade_count)
formed_category count
1980-1989 43
1990-1999 79
# saved this to compare for analysis and visualization, n = 122, 80s = 43, 90s = 79


# shows/organizes total fans between the two decades  
fan_count <- metal_complete %>% # shows decade sum of fans
  group_by(formed_category) %>% 
  summarize(sum_fans = sum(fans)) %>% print()  
## # A tibble: 2 x 2
##   formed_category sum_fans
##   <fct>              <dbl>
## 1 1980-1989          10408
## 2 1990-1999           7709
# n = 18,117, 80s = 10,408, 90s = 7709
N_fan_count <- sum(fan_count$sum_fans) # used this to pull for later if needed

# shows/organizes the average # of fans per band between the two decades  
mean_fan_group <- metal_complete %>%
  group_by(formed_category) %>%
  summarize(mean_fans = mean(fans)) %>% print()
## # A tibble: 2 x 2
##   formed_category mean_fans
##   <fct>               <dbl>
## 1 1980-1989           242. 
## 2 1990-1999            97.6
# avg 242 fans/band in 80s, avg 97.6 fans/band in 90s

count_and_mean_fans <- bind_cols(fan_count, mean_fan_group[2])

gt(count_and_mean_fans)
formed_category sum_fans mean_fans
1980-1989 10408 242.04651
1990-1999 7709 97.58228
# noted- 80s = n(43), 90s = n(79) - the data is showing that though there were less death/black metal bands formed in the 80's the bands that were formed during this period had significantly more fans than those formed in the 90s
# breaking down these transformations below, into separate tables, is easier
# for me to understand what I'm looking at and understand, then after looking
# at them individually, I will bind them into one data frame and then look at
# the whole picture

# shows/organizes band formation count per year 1980-1999
i_fan_count <- metal_complete %>%
    group_by(formed) %>%
    summarize(count = n()) %>%
    print()
## # A tibble: 17 x 2
##    formed count
##     <dbl> <int>
##  1   1983     5
##  2   1984     6
##  3   1985     4
##  4   1986     5
##  5   1987     9
##  6   1988     7
##  7   1989     7
##  8   1990     7
##  9   1991     4
## 10   1992     7
## 11   1993    11
## 12   1994     3
## 13   1995    11
## 14   1996    10
## 15   1997    10
## 16   1998     7
## 17   1999     9
# shows/organizes total amount of fans per year 1980-1999
i_fan_sum <- metal_complete %>%
    group_by(formed) %>%
    summarize(sum_fans = sum(fans)) %>%
    print()
## # A tibble: 17 x 2
##    formed sum_fans
##     <dbl>    <dbl>
##  1   1983     3005
##  2   1984     2187
##  3   1985      107
##  4   1986      604
##  5   1987     1688
##  6   1988     1271
##  7   1989     1546
##  8   1990      663
##  9   1991      620
## 10   1992      394
## 11   1993     1628
## 12   1994       80
## 13   1995     2329
## 14   1996      934
## 15   1997      512
## 16   1998      221
## 17   1999      328
# shows/organizes average fans per year 1980-1999
mean_i_fan_count <- metal_complete %>%
    group_by(formed) %>%
    summarize(mean_fans = mean(fans)) %>%
    print()
## # A tibble: 17 x 2
##    formed mean_fans
##     <dbl>     <dbl>
##  1   1983     601  
##  2   1984     364. 
##  3   1985      26.8
##  4   1986     121. 
##  5   1987     188. 
##  6   1988     182. 
##  7   1989     221. 
##  8   1990      94.7
##  9   1991     155  
## 10   1992      56.3
## 11   1993     148  
## 12   1994      26.7
## 13   1995     212. 
## 14   1996      93.4
## 15   1997      51.2
## 16   1998      31.6
## 17   1999      36.4
# combined 3 data frames from above, [] extracts the column wanted, avoiding
# duplicate columns
summary_i_fans <- bind_cols(i_fan_count, i_fan_sum[2], mean_i_fan_count[2]) %>%
    print()
## # A tibble: 17 x 4
##    formed count sum_fans mean_fans
##     <dbl> <int>    <dbl>     <dbl>
##  1   1983     5     3005     601  
##  2   1984     6     2187     364. 
##  3   1985     4      107      26.8
##  4   1986     5      604     121. 
##  5   1987     9     1688     188. 
##  6   1988     7     1271     182. 
##  7   1989     7     1546     221. 
##  8   1990     7      663      94.7
##  9   1991     4      620     155  
## 10   1992     7      394      56.3
## 11   1993    11     1628     148  
## 12   1994     3       80      26.7
## 13   1995    11     2329     212. 
## 14   1996    10      934      93.4
## 15   1997    10      512      51.2
## 16   1998     7      221      31.6
## 17   1999     9      328      36.4
# same process as last time I used case_when() but I'm just adding this to the new data frame for making it easier on me when I start making data visualizations (this is the only character variable besides USA, which is all the same)
use_summary_i_fans <- summary_i_fans %>%
  mutate(
    formed_category = case_when(
      formed >= 1980 &
        formed < 1990 ~ "1980-1989",
      formed >= 1990 & 
        formed < 2000 ~ "1990-1999")) %>%
  mutate(formed_category = factor(formed_category))

# rearranged columns so the years could be together for easier interpretations
use_summary_i_fans <- use_summary_i_fans[, c(1,5,2,3,4)] %>% print()
## # A tibble: 17 x 5
##    formed formed_category count sum_fans mean_fans
##     <dbl> <fct>           <int>    <dbl>     <dbl>
##  1   1983 1980-1989           5     3005     601  
##  2   1984 1980-1989           6     2187     364. 
##  3   1985 1980-1989           4      107      26.8
##  4   1986 1980-1989           5      604     121. 
##  5   1987 1980-1989           9     1688     188. 
##  6   1988 1980-1989           7     1271     182. 
##  7   1989 1980-1989           7     1546     221. 
##  8   1990 1990-1999           7      663      94.7
##  9   1991 1990-1999           4      620     155  
## 10   1992 1990-1999           7      394      56.3
## 11   1993 1990-1999          11     1628     148  
## 12   1994 1990-1999           3       80      26.7
## 13   1995 1990-1999          11     2329     212. 
## 14   1996 1990-1999          10      934      93.4
## 15   1997 1990-1999          10      512      51.2
## 16   1998 1990-1999           7      221      31.6
## 17   1999 1990-1999           9      328      36.4
# this was a function I spent way too much time on trying to figure out the differences and changes (rate) in fan counts per year, shout out to stack exchange
formation_rate <- use_summary_i_fans %>%
  # first sort by year, most likely this was but helps make sure it's all uniform
  arrange(formed) %>%
  mutate(Diff_year = formed - lag(formed),  # Difference in time (just in case there are gaps)
         Diff_fans = sum_fans - lag(sum_fans), # Difference in count between years
         fan_rate_percent = (Diff_fans / Diff_year)/sum_fans * 100) %>%
  mutate(Diff_year = formed - lag(formed),  # Difference in time (just in case there are gaps)
         Diff_count = count - lag(count), # Difference in count between years
         count_rate_percent = (Diff_count / Diff_year)/count * 100) %>% print()
## # A tibble: 17 x 10
##    formed formed_category count sum_fans mean_fans Diff_year Diff_fans
##     <dbl> <fct>           <int>    <dbl>     <dbl>     <dbl>     <dbl>
##  1   1983 1980-1989           5     3005     601          NA        NA
##  2   1984 1980-1989           6     2187     364.          1      -818
##  3   1985 1980-1989           4      107      26.8         1     -2080
##  4   1986 1980-1989           5      604     121.          1       497
##  5   1987 1980-1989           9     1688     188.          1      1084
##  6   1988 1980-1989           7     1271     182.          1      -417
##  7   1989 1980-1989           7     1546     221.          1       275
##  8   1990 1990-1999           7      663      94.7         1      -883
##  9   1991 1990-1999           4      620     155           1       -43
## 10   1992 1990-1999           7      394      56.3         1      -226
## 11   1993 1990-1999          11     1628     148           1      1234
## 12   1994 1990-1999           3       80      26.7         1     -1548
## 13   1995 1990-1999          11     2329     212.          1      2249
## 14   1996 1990-1999          10      934      93.4         1     -1395
## 15   1997 1990-1999          10      512      51.2         1      -422
## 16   1998 1990-1999           7      221      31.6         1      -291
## 17   1999 1990-1999           9      328      36.4         1       107
## # ... with 3 more variables: fan_rate_percent <dbl>, Diff_count <int>,
## #   count_rate_percent <dbl>

Assumptions made so far

Looking through the data frame formation_rate, I was able to quickly see that year-by-year there are significant fluctuations between the yearly rate of fans and band formations.
The top 2 years that had the most death/black metal band formation were 1993 and 1995, with all positive statistics. Though these two years were not the most successful compared to other years with more positive rates with fan or band formation count, from the data I have analysed prior to making graphs, I would make the conclusion that overall the 90’s were more successful overall despite the 80’s large fan base. However, I will look over these assumptions with graphs.

Plot 1: Density Plot of Total Fans

# Reminder: objective is to find out the growth/loss rate of bands that are
# death and/or black metal in the USA, between the two decades


density_sum_fans <- ggplot(data = formation_rate, aes(x = sum_fans, fill = formed_category  # must always be categorical
)) + geom_density(alpha = 0.3)  # transparency +
scale_fill_discrete(name = "Decade")  # renames the legend +
## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: grey50
##     name: Decade
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: <ggproto object: Class RangeDiscrete, Range, gg>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeDiscrete, Range, gg>
##     rescale: function
##     reset: function
##     scale_name: hue
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>
labs(x = "Fans per Year", y = "Density", title = "Density Plot of Total Fans Accumulated per Year (1980-1999)")
## $x
## [1] "Fans per Year"
## 
## $y
## [1] "Density"
## 
## $title
## [1] "Density Plot of Total Fans Accumulated per Year (1980-1999)"
## 
## attr(,"class")
## [1] "labels"
density_sum_fans

Plot 2: Density Plot of Average Fans

library(ggridges)

ridgeline_mean_fans <- ggplot(data = use_summary_i_fans, aes(x = mean_fans, y = formed_category)) +
    geom_density_ridges_gradient() + aes(fill = formed_category) + scale_fill_discrete(name = "Decade") +
    labs(x = "Average Fans Accumulated per Year", y = "Decade", title = "Ridgeline Density Plot of Average Fans Accumulated per Year (1980-1999)") +
    theme(axis.text.x = element_text(angle = -30, hjust = 0))
ridgeline_mean_fans

Plot 3: Change in Rate of Band Formations per Year

I have decided to make extra graphs, because despite meeting the minimum 2, I had realized that my graphs did not represent my research question that well, especially in regards to analyzing the rate of changes in bands formed and fans gained per year.

count_rate_diff <- ggplot(data = formation_rate, aes(x = formed, y = count_rate_percent)) +
    geom_point() + geom_line() + geom_hline(yintercept = 0) + theme_bw() + labs(x = "Year",
    y = "Band Formation Rate (%) Per Year (1980-1999)", title = "Change in Rate of Bands Formed per Year",
    subtitle = "to detect patterns of band formation growth or reduction")

count_rate_diff

Plot 3 shows that the 1980’s group had fluctuations in the amount of death/black metal bands that were formed, however, the rate changes were far more drastic in the 1990’s group. I included a horizontal line at the y-intercept for readability, as I think it helps break apart the positive and negative rates.

Plot 4: Change in Rate of Fans per Year

count_diff <- 
  ggplot(data = formation_rate,
         aes(x = formed,
             y = fan_rate_percent)) +
  geom_line()+ # left out hline and geom_point since it seemed unnecessary 
      theme_bw() +
  labs(
    x = "Year",
    y = "Fan Rate (%) Per Year (1980-1999)",
    title = "Change in Rate of Fans per Year",
    subtitle = "to detect patterns of fan growth or reduction")
  
count_diff  

Plot 4 shows a fairly similar trend in fan rates between the 80’s and 90’s groups. By looking at this, I would make the assumption that the negative and positive rate changes in fans per year are weighted similarly between the two decades and that fan growth rate (+/-) should not be a sole indicator of death/black metal band formation.

Plot 5: Combined Graph of Bands Formed + Net Losses/Gains per Year

combined_graph_count <- ggplot(formation_rate) + geom_line(mapping = aes(x = formed,
    y = count, color = "purple")) + geom_line(mapping = aes(x = formed, y = Diff_count,
    color = "orange")) + geom_hline(yintercept = 0, size = 0.5) + geom_vline(xintercept = 1990) +
    theme(legend.position = "none") + labs(x = "Year Bands Formed", y = "Band Formation Count +/-",
    title = "Line Graph of Bands Formed + Net Losses/Gains per Year")
combined_graph_count

5. Final Summary

(10 points)

My goal for this data analysis project was to determine which time bracket, between 1980-1989 and 1990-1999, had the most significant growth in death/black metal formation in the United States. The most broad conclusion I have is that there are several variables that are not accounted for in this analysis, like bands with the most frequent concerts, albums created vs albums sold, age-ranges of fans, ways to adjust for fans in rural areas or marginalized communities that may not have been counted, plus I’m sure there are more.

Given the data I have been working with and looking at strictly the variables that are within my dataset, I did notice some prominent differences in the amount and growth of death/black metal bands that were formed between the two decades, 1980-1989 and 1990-1999. One of them being that in 1980-1989, 43 death/black metal bands were formed, while 79 were formed in 1990-1999. However the amount of fans that were accounted for per band formation per year were higher in 1980-1989 (n = 10,408, mean = 242) than in 1990-1999 (n = 7,709, mean = 98).

My conclusion is that the overall rates of band formation and fan counts per year appear mostly non-linear and unpredictable and that there is not a significant growth in death/black metal bands around the era of the Satanic Panic. Black/death metal did start to gain more popularity beginning in the 1980’s, which would be another research project for another time, however, over the course of 19 years during the height of the Satanic Panic, there is not significant growth.