Midterm (Due Sunday 2/13/2022 at 11:55 pm)

Please submit your .Rmd and .html files in Sakai. If you are working together, both people should submit the files.

Define Your Research Question (10 points)

Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?

Research Question: Do killers earn significantly more points than survivors in Dead by Daylight (DBD)?

Data Interest: This data is taken from a DBD tournament that I help facilitate every month. One of the consistent complaints we deal with after every tournament is that this particular role (i.e., killer) consistently earns more points than the survivors. Because certain people don’t want to play both roles in the tournament, they feel it is unfair to those people.

Given your question, what is your expectation about the data?

As this is my primary data, I suspect that the answer to this question is yes. The nature of the game is that it is a team of 4 versus 1 match up, so there should be some incentive for the 1 person if they are able to beat the team of 4. Though a certain role may earn more points on average, I will be curious if there is a significant difference between the two.

Loading the Data (10 points)

#Notes on loading data.  This took me some back and forth with code manipulation.  As we've talked about not typing out a code multiple times, I figured there must be a better way to add multiple sheets from one Excel book.  I ended up creating a function in order to do just that.  While I am sure it may be easy to just add 3 lines of code, I will be working with multiple sheets inside of books in the very near future, so I wanted to identify a function that will be useful for me.



library(readxl)
library(openxlsx)
library(skimr)

read_excel_allsheets <- function(filename, tibble = FALSE) {
    sheets <- readxl::excel_sheets(filename)
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
    if(!tibble) x <- lapply(x, as.data.frame)
    names(x) <- sheets
    x
}

SCORES_FOR_RUBIAN_S_TOURNEY_in_R <- read_excel_allsheets("C:/Users/Brian/Downloads/SCORES FOR RUBIAN'S TOURNEY in R.xlsx")

glimpse(SCORES_FOR_RUBIAN_S_TOURNEY_in_R)

## List of 3
##  $ Round1:'data.frame':  175 obs. of  11 variables:
##   ..$ player_ID                     : num [1:175] 1 2 3 4 5 1 2 3 4 5 ...
##   ..$ role (1=killer, 2=survivor)   : num [1:175] 1 2 2 2 2 2 1 2 2 2 ...
##   ..$ loadout1                      : chr [1:175] "iron will" "iron will" "borrowed time" "smash hit" ...
##   ..$ loadout2                      : chr [1:175] "sprintburst" "pebble" "spinechill" "borrowed time" ...
##   ..$ loadout3                      : chr [1:175] "borrowed time" "bond" "for the people" "adrenaline" ...
##   ..$ loadout4                      : chr [1:175] "spinechill" "dead hard" "iron will" "botany knowledge" ...
##   ..$ map                           : chr [1:175] "macmillan estate" "macmillan estate" "macmillan estate" "macmillan estate" ...
##   ..$ item                          : chr [1:175] "flashlight" "medkit" "toolbox" "medkit" ...
##   ..$ bloodpoints                   : num [1:175] 30010 24189 27102 11813 17762 ...
##   ..$ rank                          : num [1:175] 4 3 5 4 2 8 4 5 4 1 ...
##   ..$ results (1 = escape, 2 = died): chr [1:175] "1" "1" "2" "2" ...
##  $ Round2:'data.frame':  175 obs. of  11 variables:
##   ..$ player_ID                     : num [1:175] 1 2 3 4 5 1 2 3 4 5 ...
##   ..$ role (1=killer, 2=survivor)   : num [1:175] 2 2 2 1 2 2 1 2 2 2 ...
##   ..$ loadout1                      : chr [1:175] "iron will" "iron will" "borrowed time" "smash hit" ...
##   ..$ loadout2                      : chr [1:175] "sprintburst" "pebble" "spinechill" "borrowed time" ...
##   ..$ loadout3                      : chr [1:175] "borrowed time" "bond" "for the people" "adrenaline" ...
##   ..$ loadout4                      : chr [1:175] "spinechill" "dead hard" "iron will" "botany knowledge" ...
##   ..$ map                           : chr [1:175] "macmillan estate" "macmillan estate" "macmillan estate" "macmillan estate" ...
##   ..$ item                          : chr [1:175] "flashlight" "medkit" "toolbox" "medkit" ...
##   ..$ bloodpoints                   : num [1:175] 14940 17484 9987 19709 19276 ...
##   ..$ rank                          : num [1:175] 4 3 5 4 2 8 4 5 4 1 ...
##   ..$ results (1 = escape, 2 = died): chr [1:175] "2" "1" "2" "NA" ...
##  $ Round3:'data.frame':  175 obs. of  11 variables:
##   ..$ player_ID                     : num [1:175] 1 2 3 4 5 1 2 3 4 5 ...
##   ..$ role (1=killer, 2=survivor)   : num [1:175] 2 1 2 2 2 2 1 2 2 2 ...
##   ..$ loadout1                      : chr [1:175] "iron will" "iron will" "borrowed time" "smash hit" ...
##   ..$ loadout2                      : chr [1:175] "sprintburst" "pebble" "spinechill" "borrowed time" ...
##   ..$ loadout3                      : chr [1:175] "borrowed time" "bond" "for the people" "adrenaline" ...
##   ..$ loadout4                      : chr [1:175] "spinechill" "dead hard" "iron will" "botany knowledge" ...
##   ..$ map                           : chr [1:175] "macmillan estate" "macmillan estate" "macmillan estate" "macmillan estate" ...
##   ..$ item                          : chr [1:175] "flashlight" "medkit" "toolbox" "medkit" ...
##   ..$ bloodpoints                   : num [1:175] 19841 23107 19987 11698 20779 ...
##   ..$ rank                          : num [1:175] 4 3 5 4 2 8 4 5 4 1 ...
##   ..$ results (1 = escape, 2 = died): chr [1:175] "1" "NA" "1" "2" ...

write.xlsx(SCORES_FOR_RUBIAN_S_TOURNEY_in_R, "C:/Users/Brian/Downloads/midterm_project_2022/Data/VeraMidtermData.xlsx")

Because I loaded in all sheets, I will likely use a join to make this into one giant table. Also, because this is primary data that I coded myself and all columns match each other from each sheet, there shouldn’t be much (if anything) alterations I need to make.

summary(SCORES_FOR_RUBIAN_S_TOURNEY_in_R)

##        Length Class      Mode
## Round1 11     data.frame list
## Round2 11     data.frame list
## Round3 11     data.frame list

Whelp! I am first noticing that I created a list of data.frames. That will be something I need to deal with. I am also noticing that my “role” category is treated as a numeric variable, instead of a character variable. So I will need to change that - even though it is a binary variable it still represents categorical data.

Transforming the data (15 points)

#I am using the first part of this code to first join all 3 data frames together into one large table.  And then I will use a code to change the "role" variable into a character variable.  As I was thinking about joining I realized that the variable I wanted to connect on "player_ID" is used in all 3 sheets with the same IDs, even though they represent different players per month.  Additionally, because each player participates in 5 rounds of the tournament, there number appears 5 times in the sheet.  What I have now decided is that, because my question doesn't concern the player identity and really is only about the role that they played, I can instead treat every entry as a unique identity.  Thus,  I will create a new column that runs from 1:525 and then that will become the new player ID.  Since I started this project, we've been introduced to bind_rows which just made this process much easier.


round1 <- bind_rows(SCORES_FOR_RUBIAN_S_TOURNEY_in_R)

#Reasoning for Binding: Since I was going to treat each row as an individual entry for this project, all I really wanted to do was stack these sheets on top of one another.  Additionally, because my function created a list of 3 elements, I could use something new like bind_rows instead of manipulating the data for a join that did not have unique values.

round1v1 <- round1 %>% mutate(NEWplayerID = c(1:525), .after = "player_ID")
  
round1v2 <- round1v1 %>% select(NEWplayerID, `role (1=killer, 2=survivor)`, bloodpoints, rank)

#I have now created a dataframe that only contains the 4 elements that I am really concerned about with this project question; however, that column name 'role (1=killer, 2=survivor)' is cumbersome.  It was my effort to remind myself how I coded things in my primary data, but now it just gets in the way.  Let's rename that.

round1v2 <- rename(round1v2, role = "role (1=killer, 2=survivor)")

#To see what the histogram for bloodpoints looks like, I want to view data based on role.  This might indicate what my answer is likely to be. Oh but wait, I forgot to transform both the "NewplayerID" and the "role" variables into categorical ones.  Let's do that now. 

round1v2$NEWplayerID <- as.factor(round1v2$NEWplayerID)
round1v2$role <- as.factor(round1v2$role)


#In order to get a comparison histogram visualization, I need to break bloodpoints out by role (killer versus survivor), let's do that now using group_by to subset "role".

round1v2 %>% group_by(role) %>% skim()

Data summary
Name	Piped data
Number of rows	525
Number of columns	4
_______________________
Column type frequency:
factor	1
numeric	2
________________________
Group variables	role

Variable type: factor

skim_variable	role	n_missing	complete_rate	ordered	n_unique	top_counts
NEWplayerID	1	0	1	FALSE	105	1: 1, 7: 1, 13: 1, 19: 1
NEWplayerID	2	0	1	FALSE	420	2: 1, 3: 1, 4: 1, 5: 1

Variable type: numeric

skim_variable	role	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
bloodpoints	1	1	22786.10	6693.61	0	19074	23997	27786.0	33018.7	▁▂▃▇▇
bloodpoints	2	1	18308.65	5798.85	0	14456	18563	22371.5	31416.0	▁▃▇▇▂
rank	1	1	5.08	2.74	1	3	4	5.0	14.0	▃▇▂▁▁
rank	2	1	5.08	2.69	1	4	4	5.0	14.0	▃▇▂▁▁

#I can now see two histograms within the bloodpoints category that are grouped by role.  It does certainly look like one category (the role of concern) is negatively skewed, and since we have the mean value w can see that there's about a 4500 point difference in means.  Let's dig a little deeper into these numbers.


#I remember we had some people disconnect during the tournament, giving them a score of '0' or 'NA' - I want to check for those values in "bloodpoints" category.  Those values should be dropped from this analysis in order to avoid potentially skewing the data.

which(is.na(round1v2$bloodpoints))

## integer(0)

#No NA values identified - which is good. Time to identify the number 0 values.

round1v2 %>% count(bloodpoints)

##     bloodpoints n
## 1           0.0 7
## 2        4829.0 1
## 3        4920.0 1
## 4        5770.0 1
## 5        5785.0 1
## 6        6048.0 1
## 7        6458.0 1
## 8        6984.0 1
## 9        7126.0 1
## 10       7177.0 1
## 11       7262.0 1
## 12       7297.0 1
## 13       7448.0 1
## 14       7767.0 1
## 15       8038.0 1
## 16       8050.0 1
## 17       8066.0 1
## 18       8185.0 1
## 19       8346.0 1
## 20       8401.0 2
## 21       8551.0 1
## 22       8555.0 1
## 23       8563.0 1
## 24       8664.0 1
## 25       8683.0 1
## 26       8758.0 1
## 27       8790.0 1
## 28       9033.0 1
## 29       9132.0 1
## 30       9524.0 1
## 31       9660.0 1
## 32       9750.0 1
## 33       9792.0 1
## 34       9987.0 1
## 35       9998.0 1
## 36      10067.0 1
## 37      10276.0 1
## 38      10278.0 1
## 39      10305.0 1
## 40      10310.0 1
## 41      10459.0 1
## 42      10472.0 1
## 43      10594.0 1
## 44      10660.0 1
## 45      10752.0 1
## 46      10902.0 1
## 47      10942.0 1
## 48      10973.0 1
## 49      11044.0 1
## 50      11056.0 1
## 51      11298.0 1
## 52      11673.0 1
## 53      11681.0 1
## 54      11698.0 1
## 55      11727.0 1
## 56      11728.0 1
## 57      11813.0 1
## 58      12076.0 1
## 59      12078.0 1
## 60      12197.0 1
## 61      12199.0 1
## 62      12238.0 1
## 63      12315.0 1
## 64      12351.0 1
## 65      12416.0 1
## 66      12473.0 1
## 67      12587.0 1
## 68      12702.0 1
## 69      12731.0 1
## 70      12739.0 1
## 71      12809.0 1
## 72      12830.0 1
## 73      12832.0 1
## 74      12886.0 1
## 75      12890.0 1
## 76      12923.0 1
## 77      12946.0 1
## 78      13051.0 1
## 79      13143.0 1
## 80      13233.0 1
## 81      13248.0 1
## 82      13261.0 1
## 83      13398.0 1
## 84      13423.0 1
## 85      13511.0 1
## 86      13531.0 1
## 87      13561.0 1
## 88      13709.0 1
## 89      13710.0 1
## 90      13869.0 1
## 91      13872.0 1
## 92      13900.0 1
## 93      13933.0 1
## 94      14018.0 1
## 95      14029.0 1
## 96      14035.0 1
## 97      14045.0 1
## 98      14053.0 1
## 99      14062.0 1
## 100     14222.0 1
## 101     14274.0 1
## 102     14277.0 1
## 103     14313.0 1
## 104     14331.0 1
## 105     14337.0 1
## 106     14346.0 1
## 107     14347.0 1
## 108     14355.0 1
## 109     14357.0 1
## 110     14489.0 1
## 111     14662.0 1
## 112     14679.0 1
## 113     14717.0 1
## 114     14757.0 1
## 115     14771.0 1
## 116     14790.0 1
## 117     14833.0 1
## 118     14940.0 1
## 119     15141.0 1
## 120     15149.0 1
## 121     15160.0 1
## 122     15167.0 1
## 123     15194.0 1
## 124     15216.0 1
## 125     15247.0 1
## 126     15287.0 1
## 127     15319.0 2
## 128     15325.0 1
## 129     15339.0 1
## 130     15403.0 1
## 131     15437.0 1
## 132     15523.0 1
## 133     15535.0 1
## 134     15555.0 1
## 135     15647.0 1
## 136     15652.0 1
## 137     15666.0 1
## 138     15699.0 1
## 139     15715.0 2
## 140     15728.0 1
## 141     15761.0 1
## 142     15938.0 1
## 143     15975.0 1
## 144     16047.0 1
## 145     16157.0 1
## 146     16192.0 1
## 147     16285.0 1
## 148     16309.0 1
## 149     16345.0 1
## 150     16380.0 1
## 151     16408.0 1
## 152     16413.0 1
## 153     16435.0 1
## 154     16471.0 1
## 155     16520.0 1
## 156     16532.0 1
## 157     16536.0 1
## 158     16547.0 1
## 159     16576.0 1
## 160     16600.0 1
## 161     16608.0 1
## 162     16624.0 1
## 163     16675.0 1
## 164     16718.0 1
## 165     16724.0 1
## 166     16725.0 1
## 167     16763.0 1
## 168     16921.0 1
## 169     17085.0 1
## 170     17086.0 1
## 171     17100.0 1
## 172     17117.0 1
## 173     17133.0 1
## 174     17136.0 1
## 175     17156.0 1
## 176     17167.0 1
## 177     17178.0 1
## 178     17190.0 1
## 179     17213.0 1
## 180     17375.0 1
## 181     17386.0 1
## 182     17484.0 1
## 183     17529.0 1
## 184     17555.0 2
## 185     17564.0 1
## 186     17599.0 1
## 187     17606.0 1
## 188     17622.0 1
## 189     17631.0 1
## 190     17700.0 1
## 191     17712.0 1
## 192     17716.0 1
## 193     17733.0 1
## 194     17762.0 1
## 195     17781.0 1
## 196     17801.0 1
## 197     17806.1 1
## 198     17909.0 1
## 199     17927.0 1
## 200     17941.0 1
## 201     17953.0 1
## 202     17991.0 1
## 203     18018.0 1
## 204     18054.0 1
## 205     18059.0 1
## 206     18068.0 1
## 207     18164.0 1
## 208     18231.0 1
## 209     18247.0 2
## 210     18256.0 1
## 211     18262.0 1
## 212     18265.0 1
## 213     18269.0 1
## 214     18329.0 1
## 215     18348.0 1
## 216     18370.0 1
## 217     18411.0 1
## 218     18462.0 1
## 219     18475.0 1
## 220     18551.0 1
## 221     18554.0 1
## 222     18572.0 1
## 223     18581.0 1
## 224     18603.0 1
## 225     18612.0 1
## 226     18614.0 1
## 227     18639.0 1
## 228     18677.0 1
## 229     18680.0 1
## 230     18714.0 1
## 231     18741.0 1
## 232     18771.0 1
## 233     18820.0 1
## 234     18832.0 1
## 235     18895.0 1
## 236     18969.0 1
## 237     18970.0 1
## 238     18991.0 1
## 239     19021.0 1
## 240     19024.0 1
## 241     19060.0 1
## 242     19074.0 1
## 243     19101.0 1
## 244     19108.0 1
## 245     19110.0 1
## 246     19115.0 1
## 247     19153.0 1
## 248     19276.0 1
## 249     19336.0 1
## 250     19350.0 1
## 251     19375.0 1
## 252     19438.0 1
## 253     19511.0 1
## 254     19544.0 1
## 255     19621.0 1
## 256     19647.0 1
## 257     19670.0 1
## 258     19674.0 1
## 259     19675.0 1
## 260     19685.0 1
## 261     19709.0 1
## 262     19719.0 1
## 263     19743.0 1
## 264     19763.0 1
## 265     19824.0 1
## 266     19841.0 1
## 267     19853.0 1
## 268     19854.0 1
## 269     19957.0 1
## 270     19969.0 1
## 271     19987.0 1
## 272     20039.0 1
## 273     20041.0 1
## 274     20186.0 1
## 275     20196.0 1
## 276     20200.0 1
## 277     20258.0 1
## 278     20296.0 1
## 279     20316.0 1
## 280     20347.0 1
## 281     20370.0 1
## 282     20373.0 1
## 283     20455.0 1
## 284     20491.0 1
## 285     20512.0 2
## 286     20517.0 1
## 287     20557.0 1
## 288     20569.0 1
## 289     20619.0 1
## 290     20647.0 1
## 291     20665.0 1
## 292     20714.0 1
## 293     20728.0 1
## 294     20779.0 1
## 295     20793.0 1
## 296     20796.0 1
## 297     20804.0 1
## 298     20879.0 1
## 299     20893.0 1
## 300     20904.0 1
## 301     20929.0 1
## 302     20948.0 1
## 303     20970.0 1
## 304     21014.0 1
## 305     21029.0 1
## 306     21135.0 1
## 307     21171.0 1
## 308     21181.0 1
## 309     21182.0 1
## 310     21203.0 1
## 311     21207.0 1
## 312     21237.0 1
## 313     21315.0 1
## 314     21330.0 1
## 315     21360.0 1
## 316     21415.0 1
## 317     21467.0 1
## 318     21564.0 1
## 319     21581.0 1
## 320     21592.0 1
## 321     21661.0 1
## 322     21662.0 1
## 323     21729.0 1
## 324     21741.0 1
## 325     21798.0 1
## 326     21816.0 1
## 327     21825.0 1
## 328     21890.0 1
## 329     21992.0 1
## 330     22015.0 1
## 331     22027.0 1
## 332     22037.0 1
## 333     22107.0 1
## 334     22122.0 1
## 335     22129.0 1
## 336     22164.0 1
## 337     22183.0 1
## 338     22197.0 1
## 339     22202.0 1
## 340     22207.0 1
## 341     22246.0 1
## 342     22271.0 1
## 343     22277.0 1
## 344     22324.0 1
## 345     22345.0 1
## 346     22369.0 1
## 347     22379.0 1
## 348     22436.0 1
## 349     22512.0 1
## 350     22537.0 1
## 351     22553.0 1
## 352     22591.0 1
## 353     22610.0 1
## 354     22801.0 1
## 355     22821.5 1
## 356     22829.0 1
## 357     22861.0 1
## 358     22866.0 1
## 359     22872.0 1
## 360     22910.0 1
## 361     22975.0 1
## 362     23007.0 1
## 363     23042.0 1
## 364     23107.0 1
## 365     23144.0 2
## 366     23175.0 1
## 367     23262.0 1
## 368     23276.0 1
## 369     23279.0 1
## 370     23326.0 1
## 371     23468.0 1
## 372     23471.0 1
## 373     23542.0 1
## 374     23585.0 1
## 375     23593.0 1
## 376     23616.0 1
## 377     23649.0 1
## 378     23662.0 1
## 379     23663.0 1
## 380     23679.0 1
## 381     23694.0 1
## 382     23696.0 1
## 383     23768.0 1
## 384     23780.0 1
## 385     23830.0 1
## 386     23874.0 1
## 387     23877.0 1
## 388     23968.0 1
## 389     23997.0 1
## 390     24018.0 1
## 391     24034.0 1
## 392     24057.0 1
## 393     24084.0 1
## 394     24103.0 1
## 395     24119.0 1
## 396     24189.0 1
## 397     24218.0 1
## 398     24226.0 1
## 399     24275.0 1
## 400     24400.0 1
## 401     24404.0 1
## 402     24430.0 1
## 403     24449.0 1
## 404     24499.0 1
## 405     24524.0 1
## 406     24610.0 1
## 407     24615.0 1
## 408     24654.0 1
## 409     24662.0 1
## 410     24750.0 1
## 411     24768.0 1
## 412     24880.0 1
## 413     24910.0 1
## 414     24914.0 1
## 415     24946.0 1
## 416     25003.0 1
## 417     25027.0 1
## 418     25075.0 1
## 419     25099.0 1
## 420     25152.0 1
## 421     25211.0 1
## 422     25222.0 1
## 423     25306.0 1
## 424     25315.0 1
## 425     25356.5 1
## 426     25445.0 1
## 427     25507.0 1
## 428     25574.0 1
## 429     25591.0 1
## 430     25706.0 1
## 431     25712.0 1
## 432     25751.0 1
## 433     25828.0 1
## 434     25909.0 1
## 435     25927.0 1
## 436     25967.0 1
## 437     25991.0 1
## 438     26021.0 1
## 439     26025.0 1
## 440     26030.0 1
## 441     26138.0 1
## 442     26234.0 1
## 443     26257.0 1
## 444     26266.0 1
## 445     26281.0 1
## 446     26310.0 1
## 447     26327.0 1
## 448     26404.0 1
## 449     26437.0 1
## 450     26471.0 1
## 451     26554.0 1
## 452     26614.0 1
## 453     26643.0 1
## 454     26911.0 1
## 455     26915.0 1
## 456     27016.6 1
## 457     27102.0 1
## 458     27280.0 1
## 459     27327.0 1
## 460     27329.0 1
## 461     27467.0 1
## 462     27469.0 1
## 463     27487.0 1
## 464     27621.0 1
## 465     27627.6 1
## 466     27786.0 1
## 467     27816.0 1
## 468     27825.0 1
## 469     27938.0 1
## 470     27986.0 1
## 471     28062.0 1
## 472     28069.0 1
## 473     28088.0 1
## 474     28096.0 1
## 475     28144.0 1
## 476     28170.0 1
## 477     28179.0 1
## 478     28277.0 1
## 479     28355.0 1
## 480     28382.0 1
## 481     28480.0 1
## 482     28486.0 1
## 483     28699.0 1
## 484     28726.0 1
## 485     28730.0 1
## 486     28798.0 1
## 487     29133.0 1
## 488     29270.0 1
## 489     29311.0 1
## 490     29319.0 1
## 491     29339.0 1
## 492     29383.0 1
## 493     29418.0 1
## 494     29428.0 1
## 495     29435.0 1
## 496     29650.0 1
## 497     29801.0 1
## 498     30010.0 1
## 499     30151.0 1
## 500     30184.0 1
## 501     30567.0 1
## 502     30647.0 1
## 503     30722.0 1
## 504     30740.0 1
## 505     30800.0 1
## 506     30873.0 1
## 507     30970.0 1
## 508     31025.0 1
## 509     31416.0 1
## 510     31700.0 2
## 511     33018.7 1

#Great! Now I've identified that there are 7 "0" values in this data.  Now let's filter those out, and check to make sure that I have a new dataframe that only has 518 observations.  But before I do that...I just remembered that some people said that scores were map dependent as well, since certain maps favor certain roles.  Let's add that column back into our new dataset.

round1v3 <- cbind(round1v2, map = round1v1$map)

#Now we can remove those 7 "0" values and not have to worry about matching differences.

round1v4 <- filter(round1v3, !(bloodpoints %in% c(0)))

#The names of the maps aren't that important right now, so let's transform them to simple categories and worry about a codebook later.  We will use case_when to transform all of that category at once.

round1v5 <- round1v4 %>% mutate(map = case_when(map == "macmillan estate" ~ 1,
                                                map == "coldwind farm" ~2,
                                                map == "backwater swamp" ~3,
                                                map == "red forest" ~4,
                                                map == "ormond" ~5))
round1v5$map <- as.factor(round1v5$map)

round1v5 %>% group_by(role) %>% skim()

Data summary
Name	Piped data
Number of rows	518
Number of columns	5
_______________________
Column type frequency:
factor	2
numeric	2
________________________
Group variables	role

Variable type: factor

skim_variable	role	complete_rate	ordered	n_unique	top_counts
NEWplayerID	1	1	FALSE	103	1: 1, 7: 1, 13: 1, 19: 1
NEWplayerID	2	1	FALSE	415	2: 1, 3: 1, 4: 1, 5: 1
map	1	1	FALSE	5	2: 21, 3: 21, 4: 21, 1: 20
map	2	1	FALSE	5	3: 84, 4: 84, 5: 83, 1: 82

Variable type: numeric

skim_variable	role	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
bloodpoints	1	1	23228.55	5941.78	8038	19697.0	24057	27801.0	33018.7	▂▂▆▇▆
bloodpoints	2	1	18529.24	5471.34	4829	14737.0	18612	22407.5	31416.0	▂▅▇▆▂
rank	1	1	5.11	2.75	1	3.5	4	5.0	14.0	▃▇▂▁▁
rank	2	1	5.09	2.70	1	4.0	4	5.0	14.0	▃▇▂▁▁

#since I've done this extra data manipulation, let's see if that made any difference.

Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use left_join, inner_join, or right_join on these tables. No credit will be provided if you don’t.

I used bind_rows and cbind instead of join because it was more efficient for my purposes. That reasoning can be found in the above code notes.

Show your transformed table here. Use tools such as glimpse(), skim() or head() to illustrate your point.

head(round1v5)

##   NEWplayerID role bloodpoints rank map
## 1           1    1       30010    4   1
## 2           2    2       24189    3   1
## 3           3    2       27102    5   1
## 4           4    2       11813    4   1
## 5           5    2       17762    2   1
## 6           6    2       18572    8   2

#I only used head because I really only want to see the first few rows and how the data is classified.

Are the values what you expected for the variables? Why or Why not?

The values and the type of data is what I expected as this is still my primary data. What I was surprised by was how 7 “0” values could skew the data that much. There is still a skewedness in the “killer” role, but removing those values has pulled it closer to center.

Visualizing and Summarizing the Data (15 points)

Use group_by() and summarize() to make a summary of the data here. The summary should be relevant to your research question

round1v5 %>% group_by(role) %>% summarize(BPS = sum(bloodpoints)/length(role))

## # A tibble: 2 x 2
##   role     BPS
##   <fct>  <dbl>
## 1 1     23229.
## 2 2     18529.

#While this code produces an answer to the very basic question, (on average) do the scores between the two roles differ, it doesn't quite answer my real question - which is, are they significantly different...a t-test should do the trick

t.test(round1v5$bloodpoints ~ round1v5$role)

## 
##  Welch Two Sample t-test
## 
## data:  round1v5$bloodpoints by round1v5$role
## t = 7.2956, df = 147.84, p-value = 1.681e-11
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  3426.422 5972.190
## sample estimates:
## mean in group 1 mean in group 2 
##        23228.55        18529.24

#This code will help me run a simple independent 2-group t-test that compares the bloodpoints (my numeric variable) by a binary variable (the role of the player)

What are your findings about the summary? Are they what you expected?

Overall Finding: There is a significant difference in point differential between killers and survivors in Dead by Daylight.

The t-test was illuminating. I did not expect the differences between scores to be that significant. Though I suppose because this tournament includes people of all different play types there are likely several other factors that come into play (e.g., their level of experience with the game, the map played on, etc.). Still, this very basic question gives evidence that there is a difference between roles in the game that we need to pay attention to when setting up our tournament.

Make at least two plots that help you answer your question on the transformed or summarized data. Use scales and/or labels to make each plot informative.

ggplot(data = round1v5) +
  aes(x = map, y = bloodpoints, color = role) +
  
  geom_boxplot() +
  
  labs(title = "Bloodpoints versus Map by Role in Game",
       x = "Map Used for Round",
       y = "Bloodpoints Earned", color = "Role") + scale_x_discrete(labels = c("1" = "Macmillan Estate", "2" = "Coldwind Farm", "3" = "Backwater Swamp", "4" = "Red Forest", "5" = "Ormond")) + 
  theme(axis.text.x = element_text(angle=45)) +
  scale_color_discrete(labels = c("1" = "Killer", "2" = "Survivor"))

ggplot(data=round1v5, aes(x=map, y=bloodpoints, fill=role)) +
  geom_bar(position ="dodge", stat="identity", width=0.5) +
  
  labs(title = "Bloodpoints versus Map by Role in Game",
       x = "Map Used for Round",
       y = "Bloodpoints Earned", fill = "Role") + scale_x_discrete(labels = c("1" = "Macmillan Estate", "2" = "Coldwind Farm", "3" = "Backwater Swamp", "4" = "Red Forest", "5" = "Ormond")) + 
  theme(axis.text.x = element_text(angle=45)) +
  scale_fill_discrete(labels = c("1" = "Killer", "2" = "Survivor"))

Final Summary (10 points)

Summarize your research question and findings below.

Research Question Recap:Do killers earn significantly more points than survivors in Dead by Daylight (DBD)?

We learned that yes, killers do appear to significantly earn more points than survivors in DBD. Further we learned that when we stratified by map, these point differentials become more pronounced. If we look at the bar chart, Red Forest and Ormond appear to favor killers earning more bloodpoints, on average. When looking at the boxplots we can identify the fluctuation in median values with the killer role. Whereas survivor median values stayed relatively consistent, the killer median values appear to fluctuate near 5000 points depending on the map. The boxplots also identified 3 outlier values (2 for survivors and 1 for killers).

Are your findings what you expected? Why or Why not?

The findings generally track with what my expectation was, though I didn’t think the difference would be that pronounced. I also didn’t anticipate that maps would play such a factor into scoring. Dead by Daylight has become hugely popular since the start of the pandemic, possibly because of its ability to build community in a time when people had to build virtual ones. Part of my job in this tournament is to collect data and to listen to people’s concerns, since we want to foster a specific type of game experience. I heard for months that killers were earning more bloodpoints and that certain maps favored them, but it is easy to brush those concerns aside as people that are mad because they lost the tournament. This small study gives support to those concerns. It will be interesting to think about this during a next tourament planning phase, as we clearly need to reimagine our scoring system.

Midterm

Joseph W. Vera

2022-03-15