Please submit your .Rmd
and .html
files in Sakai. If you are working together, both people should submit the files.
60 / 60 points total
The goal of the midterm project is to showcase skills that you have learned in class so far. The midterm is open note, but if you use someone else’s code, you must attribute them.
#data
channel on Slack..csv
file into your data
folder.Define a research question, involving at least one categorical variable. You may schedule a time with Jessica or Colin to discuss your dataset and research question, or you may message it to one of us in slack or email. Please do one of the two options pretty early on. We just want to look at the data and make sure that it is appropriate for your question.
You must use each of the following functions at least once:
mutate()
group_by()
summarize()
ggplot()
case_when()
across()
*_join()
(i.e. left_join()
)pivot_*()
(i.e. pivot_longer()
)The code chunks below are guides, please add more code chunks to do what you need.
If you do not want your final project posted on the public website, please let Jessica know. We can also keep it anonymous if you’d like to remove your name from the Rmd and html, or use a pseudonym.
I am fine with having my final project posted on the public website :)
You may remove these instructions from your final Rmd if you like
If you’d like to work together, that is encouraged, but you must divide the work equitably and you must note who worked on what. This is probably easiest as notes in the text. Please let Colin or Jessica know that you’ll be working together.
No acknowledgements of contributions = -10 points overall.
dplyr::glimpse()
or skimr::skim()
on the data. You should upload the data file into the data
directory.
coffee_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## total_cup_points = col_double(),
## number_of_bags = col_double(),
## aroma = col_double(),
## flavor = col_double(),
## aftertaste = col_double(),
## acidity = col_double(),
## body = col_double(),
## balance = col_double(),
## uniformity = col_double(),
## clean_cup = col_double(),
## sweetness = col_double(),
## cupper_points = col_double(),
## moisture = col_double(),
## category_one_defects = col_double(),
## quakers = col_double(),
## category_two_defects = col_double(),
## altitude_low_meters = col_double(),
## altitude_high_meters = col_double(),
## altitude_mean_meters = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
dim(coffee_ratings)
## [1] 1339 43
dplyr::glimpse(coffee_ratings)
## Rows: 1,339
## Columns: 43
## $ total_cup_points <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
## $ species <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
## $ owner <chr> "metad plc", "metad plc", "grounds for health ad…
## $ country_of_origin <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
## $ farm_name <chr> "metad plc", "metad plc", "san marcos barrancas …
## $ lot_number <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ mill <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
## $ ico_number <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
## $ company <chr> "metad agricultural developmet plc", "metad agri…
## $ altitude <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
## $ region <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
## $ producer <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
## $ number_of_bags <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
## $ bag_weight <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
## $ in_country_partner <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ harvest_year <chr> "2014", "2014", NA, "2014", "2014", "2013", "201…
## $ grading_date <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
## $ owner_1 <chr> "metad plc", "metad plc", "Grounds for Health Ad…
## $ variety <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
## $ processing_method <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
## $ aroma <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
## $ flavor <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
## $ aftertaste <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
## $ acidity <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
## $ body <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
## $ balance <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
## $ uniformity <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ clean_cup <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
## $ sweetness <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ cupper_points <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
## $ moisture <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
## $ category_one_defects <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ quakers <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ color <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
## $ category_two_defects <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
## $ expiration <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
## $ certification_body <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
## $ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
## $ unit_of_measurement <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
## $ altitude_low_meters <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
## $ altitude_high_meters <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
## $ altitude_mean_meters <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …
NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
case_when()
, etc.# Making species into a binary variable with Arabica as the referent group
coffee_ratings <- coffee_ratings %>%
mutate(species_binary = ifelse(species == "Arabica", 0, 1))
# Making variable for continent
coffee_ratings <- coffee_ratings %>%
mutate(continent = fct_collapse(country_of_origin,
"South_America" = c("Brazil", "Colombia", "Ecuador", "Peru"),
"North_America" = c("Costa Rica", "Nicaragua", "United States (Puerto Rico)",
"El Salvador", "Haiti", "Mexico", "Honduras", "United States",
"Guatemala", "Panama", "United States (Hawaii)"),
"Asia" = c("India", "Laos", "Vietnam", "Indonesia", "Japan", "Myanmar", "Taiwan",
"Philippines", "China", "Thailand", "Papua New Guinea"),
"Africa" = c("Burundi", "Ethiopia", "Malawi", "Cote d?Ivoire", "Kenya", "Mauritius",
"Uganda", "Tanzania, United Republic Of", "Rwanda", "Zambia")
))
max(coffee_ratings$altitude_mean_meters, na.rm = TRUE)
## [1] 190164
coffee_ratings %>% filter(altitude_mean_meters > 8000) %>% nrow()
## [1] 4
# Removing impossibly large altitudes from the data set
coffee_ratings <- coffee_ratings %>%
mutate(altitude_mean_meters = ifelse(altitude_mean_meters > 8000,
NA, altitude_mean_meters))
# Making a variable for harvest year
coffee_ratings$harvest_year <- gsub("March", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("September", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("Sept", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("April", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("May", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("June", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("July", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("August", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("December", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("January", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("Abril", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("Julio", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("Through", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("Spring", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("/", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("-", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("to", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub(" ", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("inColombia", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("Fall", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("4T10", "2010", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("mmm", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("0809crop", "2008", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("1t", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("1T", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("2t", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("2T", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("3t", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("3T", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("4t", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("4T", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("4T7", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("472010", "2010", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("232010", "2010", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("oa", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("TEST", "", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- gsub("72010", "2010", coffee_ratings$harvest_year)
coffee_ratings$harvest_year <- as.numeric(substr(coffee_ratings$harvest_year, 1, 4))
left_join
, inner_join
, or right_join
on these tables. No credit will be provided if you don’t.
coffee_ratings1 <- data.frame(coffee_ratings[,1], coffee_ratings[,5], coffee_ratings[,2:4], coffee_ratings[,6:10])
coffee_ratings2 <- data.frame(coffee_ratings[,1], coffee_ratings[,5], coffee_ratings[,(11:45)])
coffee_ratings <- coffee_ratings1 %>%
left_join(coffee_ratings2,
by = c("total_cup_points", "farm_name"))
glimpse()
, skim()
or head()
to illustrate your point.glimpse(coffee_ratings)
## Rows: 3,187
## Columns: 45
## $ total_cup_points <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
## $ farm_name <chr> "metad plc", "metad plc", "san marcos barrancas …
## $ species <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
## $ owner <chr> "metad plc", "metad plc", "grounds for health ad…
## $ country_of_origin <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
## $ lot_number <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ mill <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
## $ ico_number <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
## $ company <chr> "metad agricultural developmet plc", "metad agri…
## $ altitude <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
## $ region <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
## $ producer <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
## $ number_of_bags <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
## $ bag_weight <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
## $ in_country_partner <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ harvest_year <dbl> 2014, 2014, NA, 2014, 2014, 2013, 2012, 2010, 20…
## $ grading_date <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
## $ owner_1 <chr> "metad plc", "metad plc", "Grounds for Health Ad…
## $ variety <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
## $ processing_method <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
## $ aroma <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
## $ flavor <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
## $ aftertaste <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
## $ acidity <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
## $ body <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
## $ balance <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
## $ uniformity <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ clean_cup <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
## $ sweetness <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ cupper_points <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
## $ moisture <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
## $ category_one_defects <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ quakers <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ color <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
## $ category_two_defects <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
## $ expiration <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
## $ certification_body <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
## $ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
## $ unit_of_measurement <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
## $ altitude_low_meters <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
## $ altitude_high_meters <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
## $ altitude_mean_meters <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …
## $ species_binary <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ continent <fct> Africa, Africa, North_America, Africa, Africa, S…
group_by()
and summarize()
to make a summary of the data here. The summary should be relevant to your research questioncoffee_ratings %>%
group_by(species) %>%
summarise(Obs = n(),
Mean_cup_points = mean(total_cup_points),
SE_cup_points = round(sd(total_cup_points)/Obs, 5))
## # A tibble: 2 × 4
## species Obs Mean_cup_points SE_cup_points
## <chr> <int> <dbl> <dbl>
## 1 Arabica 3149 82.7 0.00078
## 2 Robusta 38 81.2 0.0589
coffee_ratings$continent <- factor(coffee_ratings$continent,
levels = c("North_America", "South_America",
"Asia", "Africa", NA))
coffee_ratings %>%
group_by(continent) %>%
summarise(Obs = n(),
Mean_cup_points = mean(total_cup_points),
SE_cup_points = round(sd(total_cup_points)/Obs, 5))
## # A tibble: 5 × 4
## continent Obs Mean_cup_points SE_cup_points
## <fct> <int> <dbl> <dbl>
## 1 North_America 1140 81.9 0.00312
## 2 South_America 1492 83.1 0.00077
## 3 Asia 251 82.2 0.00723
## 4 Africa 303 83.7 0.00575
## 5 <NA> 1 79.1 NA
coffee_ratings$processing_method <- factor(coffee_ratings$processing_method,
levels = c("Washed / Wet", "Natural / Dry",
"Semi-washed / Semi-pulped", "Pulped natural / honey",
"Other"))
coffee_ratings %>%
group_by(processing_method) %>%
summarise(Obs = n(),
Mean_cup_points = mean(total_cup_points),
SE_cup_points = round(sd(total_cup_points)/Obs, 5))
## # A tibble: 6 × 4
## processing_method Obs Mean_cup_points SE_cup_points
## <fct> <int> <dbl> <dbl>
## 1 Washed / Wet 1852 82.6 0.00112
## 2 Natural / Dry 663 82.8 0.00299
## 3 Semi-washed / Semi-pulped 76 82.6 0.0216
## 4 Pulped natural / honey 25 82.4 0.0601
## 5 Other 27 81.4 0.146
## 6 <NA> 544 82.8 0.00716
coffee_ratings$harvest_year <- factor(as.factor(coffee_ratings$harvest_year),
levels = c("2012", "2008", "2009", "2010", "2011", "2013",
"2014", "2015", "2016", "2017", "2018"))
coffee_ratings %>%
group_by(harvest_year) %>%
summarise(Obs = n(),
Mean_cup_points = mean(total_cup_points),
SE_cup_points = round(sd(total_cup_points)/Obs, 5))
## # A tibble: 12 × 4
## harvest_year Obs Mean_cup_points SE_cup_points
## <fct> <int> <dbl> <dbl>
## 1 2012 788 82.3 0.00295
## 2 2008 2 78.0 0.856
## 3 2009 32 84.6 0.0515
## 4 2010 98 83.2 0.0205
## 5 2011 114 82.9 0.0116
## 6 2013 546 82.7 0.00308
## 7 2014 575 82.7 0.00415
## 8 2015 308 83.1 0.00429
## 9 2016 340 82.9 0.00535
## 10 2017 152 82.1 0.0456
## 11 2018 1 81.6 NA
## 12 <NA> 231 83.0 0.00559
coffee_ratings$color <- factor(coffee_ratings$color,
levels = c("Green", "Bluish-Green", "Blue-Green", "None"))
coffee_ratings %>%
group_by(color) %>%
summarise(Obs = n(),
Mean_cup_points = mean(total_cup_points),
SE_cup_points = round(sd(total_cup_points)/Obs, 5))
## # A tibble: 5 × 4
## color Obs Mean_cup_points SE_cup_points
## <fct> <int> <dbl> <dbl>
## 1 Green 1948 82.6 0.00142
## 2 Bluish-Green 307 83.0 0.00512
## 3 Blue-Green 179 82.9 0.0103
## 4 None 90 81.4 0.0316
## 5 <NA> 663 83.0 0.00278
coffee_model <- glm(total_cup_points ~ species_binary + altitude_mean_meters +
continent + number_of_bags + as.factor(harvest_year) +
processing_method + color,
data = coffee_ratings)
summary(coffee_model)
##
## Call:
## glm(formula = total_cup_points ~ species_binary + altitude_mean_meters +
## continent + number_of_bags + as.factor(harvest_year) + processing_method +
## color, data = coffee_ratings)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -21.4660 -0.6042 0.1833 0.8176 6.7159
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 8.058e+01 1.907e-01 422.570
## species_binary -1.450e+00 7.300e-01 -1.986
## altitude_mean_meters 4.860e-04 9.550e-05 5.089
## continentSouth_America 1.665e+00 1.122e-01 14.845
## continentAsia 7.756e-01 1.875e-01 4.136
## continentAfrica 2.101e+00 1.615e-01 13.006
## number_of_bags 8.429e-05 4.694e-04 0.180
## as.factor(harvest_year)2011 -3.006e-01 3.210e-01 -0.936
## as.factor(harvest_year)2013 1.117e-01 1.302e-01 0.858
## as.factor(harvest_year)2014 1.453e-01 1.328e-01 1.094
## as.factor(harvest_year)2015 6.580e-01 1.528e-01 4.308
## as.factor(harvest_year)2016 3.257e-01 1.859e-01 1.752
## as.factor(harvest_year)2017 5.377e-01 1.935e-01 2.778
## as.factor(harvest_year)2018 -4.789e-02 1.878e+00 -0.025
## processing_methodNatural / Dry 1.645e-01 1.224e-01 1.343
## processing_methodSemi-washed / Semi-pulped 6.385e-01 2.482e-01 2.573
## processing_methodPulped natural / honey -6.160e-01 5.718e-01 -1.077
## processing_methodOther -5.642e-01 4.112e-01 -1.372
## colorBluish-Green 2.648e-01 1.560e-01 1.697
## colorBlue-Green 4.375e-01 1.776e-01 2.464
## colorNone -7.573e-01 2.653e-01 -2.855
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## species_binary 0.04719 *
## altitude_mean_meters 3.96e-07 ***
## continentSouth_America < 2e-16 ***
## continentAsia 3.69e-05 ***
## continentAfrica < 2e-16 ***
## number_of_bags 0.85750
## as.factor(harvest_year)2011 0.34918
## as.factor(harvest_year)2013 0.39106
## as.factor(harvest_year)2014 0.27422
## as.factor(harvest_year)2015 1.74e-05 ***
## as.factor(harvest_year)2016 0.08002 .
## as.factor(harvest_year)2017 0.00552 **
## as.factor(harvest_year)2018 0.97966
## processing_methodNatural / Dry 0.17933
## processing_methodSemi-washed / Semi-pulped 0.01017 *
## processing_methodPulped natural / honey 0.28145
## processing_methodOther 0.17025
## colorBluish-Green 0.08992 .
## colorBlue-Green 0.01384 *
## colorNone 0.00436 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 3.486427)
##
## Null deviance: 8135.4 on 1857 degrees of freedom
## Residual deviance: 6404.6 on 1837 degrees of freedom
## (1329 observations deleted due to missingness)
## AIC: 7616.1
##
## Number of Fisher Scoring iterations: 2
species_beta <- -1*(round(as.numeric(data.frame(coffee_model$coefficients)[2,1]), 2))
altitude_beta <- round(as.numeric(100*data.frame(coffee_model$coefficients)[3,1]), 2)
SA_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[4,1]), 2)
Asia_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[5,1]), 2)
Africa_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[6,1]), 2)
no_bags_beta <- round(as.numeric(10*data.frame(coffee_model$coefficients)[7,1]), 2)
y2011_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[8,1]), 2)
y2013_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[9,1]), 2)
y2014_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[10,1]), 2)
y2015_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[11,1]), 2)
y2016_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[12,1]), 2)
y2017_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[13,1]), 2)
y2018_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[14,1]), 2)
ND_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[15,1]), 2)
Semi_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[16,1]), 2)
Pulp_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[17,1]), 2)
Other_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[18,1]), 2)
BishG_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[19,1]), 2)
BG_beta <- round(as.numeric(data.frame(coffee_model$coefficients)[20,1]), 2)
None_beta <- -round(as.numeric(data.frame(coffee_model$coefficients)[21,1]), 2)
ggplot(data = coffee_ratings,
aes(x = continent,
fill = continent)) +
geom_bar() +
scale_fill_viridis_d() +
labs(
title = "Number of Coffees From Each Continent",
x = "Continent",
y = "Number of Coffees Sampled"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = continent,
y = total_cup_points,
fill = continent)) +
geom_boxplot() +
scale_fill_viridis_d() +
labs(
title = "Total Cup Points of Coffees",
subtitle = "By Continent",
x = "Continent",
y = "Total Cup Points"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = coffee_ratings,
aes(x = species,
fill = species)) +
geom_bar() +
scale_fill_viridis_d() +
labs(
title = "Number of Coffees From Each Species",
x = "Species",
y = "Number of Coffees Sampled"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = species,
y = total_cup_points,
fill = species)) +
geom_boxplot() +
scale_fill_viridis_d() +
labs(
title = "Total Cup Points of Coffees",
subtitle = "By Species",
x = "Species",
y = "Total Cup Points"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = altitude_mean_meters,
y = total_cup_points)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Total Cup Points of Coffees vs Altitude",
x = "Altitude (m)",
y = "Total Cup Points"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 860 rows containing non-finite values (stat_smooth).
## Warning: Removed 860 rows containing missing values (geom_point).
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = number_of_bags,
y = total_cup_points)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Total Cup Points of Coffees vs Number of Bags Tested",
x = "Number of Bags Tested",
y = "Total Cup Points"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = harvest_year,
y = total_cup_points)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Total Cup Points of Coffees vs Harvest Year",
x = "Harvest Year",
y = "Total Cup Points"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
ggplot(data = coffee_ratings,
aes(x = as.factor(harvest_year),
fill = as.factor(harvest_year))) +
geom_bar() +
scale_fill_viridis_d() +
labs(
title = "Number of Coffees From Each Harvest Year",
x = "Harvest Year",
y = "Number of Coffees Sampled"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = as.factor(harvest_year),
y = total_cup_points,
fill = as.factor(harvest_year))) +
geom_boxplot() +
scale_fill_viridis_d() +
labs(
title = "Total Cup Points of Coffees",
subtitle = "By Harvest Year",
x = "Harvest Year",
y = "Total Cup Points"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = coffee_ratings,
aes(x = processing_method,
fill = processing_method)) +
geom_bar() +
scale_fill_viridis_d() +
labs(
title = "Number of Coffees From Each Processing Method",
x = "Processing Method",
y = "Number of Coffees Sampled"
) +
theme_bw() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 20,hjust=1))
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = processing_method,
y = total_cup_points,
fill = processing_method)) +
geom_boxplot() +
scale_fill_viridis_d() +
labs(
title = "Total Cup Points of Coffees",
subtitle = "By Processing Method",
x = "Processing Method",
y = "Total Cup Points"
) +
theme_bw() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 20,hjust=1))
ggplot(data = coffee_ratings,
aes(x = color,
fill = color)) +
geom_bar() +
scale_fill_viridis_d() +
labs(
title = "Number of Coffees From Each Coffee Bean Color",
x = "Coffee Bean Color",
y = "Number of Coffees Sampled"
) +
theme_bw() +
theme(legend.position = "none")
ggplot(data = (coffee_ratings %>% filter(total_cup_points > 0)), # I removed an outlier
aes(x = color,
y = total_cup_points,
fill = color)) +
geom_boxplot() +
scale_fill_viridis_d() +
labs(
title = "Total Cup Points of Coffees",
subtitle = "By Coffee Bean Color",
x = "Coffee Bean Color",
y = "Total Cup Points"
) +
theme_bw() +
theme(legend.position = "none")