Please submit your .Rmd
and .html
files in Sakai. If you are working together, both people should submit the files.
Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
Research Question: Do killers earn significantly more points than survivors in Dead by Daylight (DBD)?
Data Interest: This data is taken from a DBD tournament that I help facilitate every month. One of the consistent complaints we deal with after every tournament is that this particular role (i.e., killer) consistently earns more points than the survivors. Because certain people don’t want to play both roles in the tournament, they feel it is unfair to those people.
Given your question, what is your expectation about the data?
As this is my primary data, I suspect that the answer to this question is yes. The nature of the game is that it is a team of 4 versus 1 match up, so there should be some incentive for the 1 person if they are able to beat the team of 4. Though a certain role may earn more points on average, I will be curious if there is a significant difference between the two.
#Notes on loading data. This took me some back and forth with code manipulation. As we've talked about not typing out a code multiple times, I figured there must be a better way to add multiple sheets from one Excel book. I ended up creating a function in order to do just that. While I am sure it may be easy to just add 3 lines of code, I will be working with multiple sheets inside of books in the very near future, so I wanted to identify a function that will be useful for me.
library(readxl)
library(openxlsx)
library(skimr)
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
x
}
SCORES_FOR_RUBIAN_S_TOURNEY_in_R <- read_excel_allsheets("C:/Users/Brian/Downloads/SCORES FOR RUBIAN'S TOURNEY in R.xlsx")
glimpse(SCORES_FOR_RUBIAN_S_TOURNEY_in_R)
## List of 3
## $ Round1:'data.frame': 175 obs. of 11 variables:
## ..$ player_ID : num [1:175] 1 2 3 4 5 1 2 3 4 5 ...
## ..$ role (1=killer, 2=survivor) : num [1:175] 1 2 2 2 2 2 1 2 2 2 ...
## ..$ loadout1 : chr [1:175] "iron will" "iron will" "borrowed time" "smash hit" ...
## ..$ loadout2 : chr [1:175] "sprintburst" "pebble" "spinechill" "borrowed time" ...
## ..$ loadout3 : chr [1:175] "borrowed time" "bond" "for the people" "adrenaline" ...
## ..$ loadout4 : chr [1:175] "spinechill" "dead hard" "iron will" "botany knowledge" ...
## ..$ map : chr [1:175] "macmillan estate" "macmillan estate" "macmillan estate" "macmillan estate" ...
## ..$ item : chr [1:175] "flashlight" "medkit" "toolbox" "medkit" ...
## ..$ bloodpoints : num [1:175] 30010 24189 27102 11813 17762 ...
## ..$ rank : num [1:175] 4 3 5 4 2 8 4 5 4 1 ...
## ..$ results (1 = escape, 2 = died): chr [1:175] "1" "1" "2" "2" ...
## $ Round2:'data.frame': 175 obs. of 11 variables:
## ..$ player_ID : num [1:175] 1 2 3 4 5 1 2 3 4 5 ...
## ..$ role (1=killer, 2=survivor) : num [1:175] 2 2 2 1 2 2 1 2 2 2 ...
## ..$ loadout1 : chr [1:175] "iron will" "iron will" "borrowed time" "smash hit" ...
## ..$ loadout2 : chr [1:175] "sprintburst" "pebble" "spinechill" "borrowed time" ...
## ..$ loadout3 : chr [1:175] "borrowed time" "bond" "for the people" "adrenaline" ...
## ..$ loadout4 : chr [1:175] "spinechill" "dead hard" "iron will" "botany knowledge" ...
## ..$ map : chr [1:175] "macmillan estate" "macmillan estate" "macmillan estate" "macmillan estate" ...
## ..$ item : chr [1:175] "flashlight" "medkit" "toolbox" "medkit" ...
## ..$ bloodpoints : num [1:175] 14940 17484 9987 19709 19276 ...
## ..$ rank : num [1:175] 4 3 5 4 2 8 4 5 4 1 ...
## ..$ results (1 = escape, 2 = died): chr [1:175] "2" "1" "2" "NA" ...
## $ Round3:'data.frame': 175 obs. of 11 variables:
## ..$ player_ID : num [1:175] 1 2 3 4 5 1 2 3 4 5 ...
## ..$ role (1=killer, 2=survivor) : num [1:175] 2 1 2 2 2 2 1 2 2 2 ...
## ..$ loadout1 : chr [1:175] "iron will" "iron will" "borrowed time" "smash hit" ...
## ..$ loadout2 : chr [1:175] "sprintburst" "pebble" "spinechill" "borrowed time" ...
## ..$ loadout3 : chr [1:175] "borrowed time" "bond" "for the people" "adrenaline" ...
## ..$ loadout4 : chr [1:175] "spinechill" "dead hard" "iron will" "botany knowledge" ...
## ..$ map : chr [1:175] "macmillan estate" "macmillan estate" "macmillan estate" "macmillan estate" ...
## ..$ item : chr [1:175] "flashlight" "medkit" "toolbox" "medkit" ...
## ..$ bloodpoints : num [1:175] 19841 23107 19987 11698 20779 ...
## ..$ rank : num [1:175] 4 3 5 4 2 8 4 5 4 1 ...
## ..$ results (1 = escape, 2 = died): chr [1:175] "1" "NA" "1" "2" ...
write.xlsx(SCORES_FOR_RUBIAN_S_TOURNEY_in_R, "C:/Users/Brian/Downloads/midterm_project_2022/Data/VeraMidtermData.xlsx")
Because I loaded in all sheets, I will likely use a join to make this into one giant table. Also, because this is primary data that I coded myself and all columns match each other from each sheet, there shouldn’t be much (if anything) alterations I need to make.
summary(SCORES_FOR_RUBIAN_S_TOURNEY_in_R)
## Length Class Mode
## Round1 11 data.frame list
## Round2 11 data.frame list
## Round3 11 data.frame list
Whelp! I am first noticing that I created a list of data.frames. That will be something I need to deal with. I am also noticing that my “role” category is treated as a numeric variable, instead of a character variable. So I will need to change that - even though it is a binary variable it still represents categorical data.
#I am using the first part of this code to first join all 3 data frames together into one large table. And then I will use a code to change the "role" variable into a character variable. As I was thinking about joining I realized that the variable I wanted to connect on "player_ID" is used in all 3 sheets with the same IDs, even though they represent different players per month. Additionally, because each player participates in 5 rounds of the tournament, there number appears 5 times in the sheet. What I have now decided is that, because my question doesn't concern the player identity and really is only about the role that they played, I can instead treat every entry as a unique identity. Thus, I will create a new column that runs from 1:525 and then that will become the new player ID. Since I started this project, we've been introduced to bind_rows which just made this process much easier.
round1 <- bind_rows(SCORES_FOR_RUBIAN_S_TOURNEY_in_R)
#Reasoning for Binding: Since I was going to treat each row as an individual entry for this project, all I really wanted to do was stack these sheets on top of one another. Additionally, because my function created a list of 3 elements, I could use something new like bind_rows instead of manipulating the data for a join that did not have unique values.
round1v1 <- round1 %>% mutate(NEWplayerID = c(1:525), .after = "player_ID")
round1v2 <- round1v1 %>% select(NEWplayerID, `role (1=killer, 2=survivor)`, bloodpoints, rank)
#I have now created a dataframe that only contains the 4 elements that I am really concerned about with this project question; however, that column name 'role (1=killer, 2=survivor)' is cumbersome. It was my effort to remind myself how I coded things in my primary data, but now it just gets in the way. Let's rename that.
round1v2 <- rename(round1v2, role = "role (1=killer, 2=survivor)")
#To see what the histogram for bloodpoints looks like, I want to view data based on role. This might indicate what my answer is likely to be. Oh but wait, I forgot to transform both the "NewplayerID" and the "role" variables into categorical ones. Let's do that now.
round1v2$NEWplayerID <- as.factor(round1v2$NEWplayerID)
round1v2$role <- as.factor(round1v2$role)
#In order to get a comparison histogram visualization, I need to break bloodpoints out by role (killer versus survivor), let's do that now using group_by to subset "role".
round1v2 %>% group_by(role) %>% skim()
Name | Piped data |
Number of rows | 525 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 2 |
________________________ | |
Group variables | role |
Variable type: factor
skim_variable | role | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|---|
NEWplayerID | 1 | 0 | 1 | FALSE | 105 | 1: 1, 7: 1, 13: 1, 19: 1 |
NEWplayerID | 2 | 0 | 1 | FALSE | 420 | 2: 1, 3: 1, 4: 1, 5: 1 |
Variable type: numeric
skim_variable | role | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
bloodpoints | 1 | 0 | 1 | 22786.10 | 6693.61 | 0 | 19074 | 23997 | 27786.0 | 33018.7 | ▁▂▃▇▇ |
bloodpoints | 2 | 0 | 1 | 18308.65 | 5798.85 | 0 | 14456 | 18563 | 22371.5 | 31416.0 | ▁▃▇▇▂ |
rank | 1 | 0 | 1 | 5.08 | 2.74 | 1 | 3 | 4 | 5.0 | 14.0 | ▃▇▂▁▁ |
rank | 2 | 0 | 1 | 5.08 | 2.69 | 1 | 4 | 4 | 5.0 | 14.0 | ▃▇▂▁▁ |
#I can now see two histograms within the bloodpoints category that are grouped by role. It does certainly look like one category (the role of concern) is negatively skewed, and since we have the mean value w can see that there's about a 4500 point difference in means. Let's dig a little deeper into these numbers.
#I remember we had some people disconnect during the tournament, giving them a score of '0' or 'NA' - I want to check for those values in "bloodpoints" category. Those values should be dropped from this analysis in order to avoid potentially skewing the data.
which(is.na(round1v2$bloodpoints))
## integer(0)
#No NA values identified - which is good. Time to identify the number 0 values.
round1v2 %>% count(bloodpoints)
## bloodpoints n
## 1 0.0 7
## 2 4829.0 1
## 3 4920.0 1
## 4 5770.0 1
## 5 5785.0 1
## 6 6048.0 1
## 7 6458.0 1
## 8 6984.0 1
## 9 7126.0 1
## 10 7177.0 1
## 11 7262.0 1
## 12 7297.0 1
## 13 7448.0 1
## 14 7767.0 1
## 15 8038.0 1
## 16 8050.0 1
## 17 8066.0 1
## 18 8185.0 1
## 19 8346.0 1
## 20 8401.0 2
## 21 8551.0 1
## 22 8555.0 1
## 23 8563.0 1
## 24 8664.0 1
## 25 8683.0 1
## 26 8758.0 1
## 27 8790.0 1
## 28 9033.0 1
## 29 9132.0 1
## 30 9524.0 1
## 31 9660.0 1
## 32 9750.0 1
## 33 9792.0 1
## 34 9987.0 1
## 35 9998.0 1
## 36 10067.0 1
## 37 10276.0 1
## 38 10278.0 1
## 39 10305.0 1
## 40 10310.0 1
## 41 10459.0 1
## 42 10472.0 1
## 43 10594.0 1
## 44 10660.0 1
## 45 10752.0 1
## 46 10902.0 1
## 47 10942.0 1
## 48 10973.0 1
## 49 11044.0 1
## 50 11056.0 1
## 51 11298.0 1
## 52 11673.0 1
## 53 11681.0 1
## 54 11698.0 1
## 55 11727.0 1
## 56 11728.0 1
## 57 11813.0 1
## 58 12076.0 1
## 59 12078.0 1
## 60 12197.0 1
## 61 12199.0 1
## 62 12238.0 1
## 63 12315.0 1
## 64 12351.0 1
## 65 12416.0 1
## 66 12473.0 1
## 67 12587.0 1
## 68 12702.0 1
## 69 12731.0 1
## 70 12739.0 1
## 71 12809.0 1
## 72 12830.0 1
## 73 12832.0 1
## 74 12886.0 1
## 75 12890.0 1
## 76 12923.0 1
## 77 12946.0 1
## 78 13051.0 1
## 79 13143.0 1
## 80 13233.0 1
## 81 13248.0 1
## 82 13261.0 1
## 83 13398.0 1
## 84 13423.0 1
## 85 13511.0 1
## 86 13531.0 1
## 87 13561.0 1
## 88 13709.0 1
## 89 13710.0 1
## 90 13869.0 1
## 91 13872.0 1
## 92 13900.0 1
## 93 13933.0 1
## 94 14018.0 1
## 95 14029.0 1
## 96 14035.0 1
## 97 14045.0 1
## 98 14053.0 1
## 99 14062.0 1
## 100 14222.0 1
## 101 14274.0 1
## 102 14277.0 1
## 103 14313.0 1
## 104 14331.0 1
## 105 14337.0 1
## 106 14346.0 1
## 107 14347.0 1
## 108 14355.0 1
## 109 14357.0 1
## 110 14489.0 1
## 111 14662.0 1
## 112 14679.0 1
## 113 14717.0 1
## 114 14757.0 1
## 115 14771.0 1
## 116 14790.0 1
## 117 14833.0 1
## 118 14940.0 1
## 119 15141.0 1
## 120 15149.0 1
## 121 15160.0 1
## 122 15167.0 1
## 123 15194.0 1
## 124 15216.0 1
## 125 15247.0 1
## 126 15287.0 1
## 127 15319.0 2
## 128 15325.0 1
## 129 15339.0 1
## 130 15403.0 1
## 131 15437.0 1
## 132 15523.0 1
## 133 15535.0 1
## 134 15555.0 1
## 135 15647.0 1
## 136 15652.0 1
## 137 15666.0 1
## 138 15699.0 1
## 139 15715.0 2
## 140 15728.0 1
## 141 15761.0 1
## 142 15938.0 1
## 143 15975.0 1
## 144 16047.0 1
## 145 16157.0 1
## 146 16192.0 1
## 147 16285.0 1
## 148 16309.0 1
## 149 16345.0 1
## 150 16380.0 1
## 151 16408.0 1
## 152 16413.0 1
## 153 16435.0 1
## 154 16471.0 1
## 155 16520.0 1
## 156 16532.0 1
## 157 16536.0 1
## 158 16547.0 1
## 159 16576.0 1
## 160 16600.0 1
## 161 16608.0 1
## 162 16624.0 1
## 163 16675.0 1
## 164 16718.0 1
## 165 16724.0 1
## 166 16725.0 1
## 167 16763.0 1
## 168 16921.0 1
## 169 17085.0 1
## 170 17086.0 1
## 171 17100.0 1
## 172 17117.0 1
## 173 17133.0 1
## 174 17136.0 1
## 175 17156.0 1
## 176 17167.0 1
## 177 17178.0 1
## 178 17190.0 1
## 179 17213.0 1
## 180 17375.0 1
## 181 17386.0 1
## 182 17484.0 1
## 183 17529.0 1
## 184 17555.0 2
## 185 17564.0 1
## 186 17599.0 1
## 187 17606.0 1
## 188 17622.0 1
## 189 17631.0 1
## 190 17700.0 1
## 191 17712.0 1
## 192 17716.0 1
## 193 17733.0 1
## 194 17762.0 1
## 195 17781.0 1
## 196 17801.0 1
## 197 17806.1 1
## 198 17909.0 1
## 199 17927.0 1
## 200 17941.0 1
## 201 17953.0 1
## 202 17991.0 1
## 203 18018.0 1
## 204 18054.0 1
## 205 18059.0 1
## 206 18068.0 1
## 207 18164.0 1
## 208 18231.0 1
## 209 18247.0 2
## 210 18256.0 1
## 211 18262.0 1
## 212 18265.0 1
## 213 18269.0 1
## 214 18329.0 1
## 215 18348.0 1
## 216 18370.0 1
## 217 18411.0 1
## 218 18462.0 1
## 219 18475.0 1
## 220 18551.0 1
## 221 18554.0 1
## 222 18572.0 1
## 223 18581.0 1
## 224 18603.0 1
## 225 18612.0 1
## 226 18614.0 1
## 227 18639.0 1
## 228 18677.0 1
## 229 18680.0 1
## 230 18714.0 1
## 231 18741.0 1
## 232 18771.0 1
## 233 18820.0 1
## 234 18832.0 1
## 235 18895.0 1
## 236 18969.0 1
## 237 18970.0 1
## 238 18991.0 1
## 239 19021.0 1
## 240 19024.0 1
## 241 19060.0 1
## 242 19074.0 1
## 243 19101.0 1
## 244 19108.0 1
## 245 19110.0 1
## 246 19115.0 1
## 247 19153.0 1
## 248 19276.0 1
## 249 19336.0 1
## 250 19350.0 1
## 251 19375.0 1
## 252 19438.0 1
## 253 19511.0 1
## 254 19544.0 1
## 255 19621.0 1
## 256 19647.0 1
## 257 19670.0 1
## 258 19674.0 1
## 259 19675.0 1
## 260 19685.0 1
## 261 19709.0 1
## 262 19719.0 1
## 263 19743.0 1
## 264 19763.0 1
## 265 19824.0 1
## 266 19841.0 1
## 267 19853.0 1
## 268 19854.0 1
## 269 19957.0 1
## 270 19969.0 1
## 271 19987.0 1
## 272 20039.0 1
## 273 20041.0 1
## 274 20186.0 1
## 275 20196.0 1
## 276 20200.0 1
## 277 20258.0 1
## 278 20296.0 1
## 279 20316.0 1
## 280 20347.0 1
## 281 20370.0 1
## 282 20373.0 1
## 283 20455.0 1
## 284 20491.0 1
## 285 20512.0 2
## 286 20517.0 1
## 287 20557.0 1
## 288 20569.0 1
## 289 20619.0 1
## 290 20647.0 1
## 291 20665.0 1
## 292 20714.0 1
## 293 20728.0 1
## 294 20779.0 1
## 295 20793.0 1
## 296 20796.0 1
## 297 20804.0 1
## 298 20879.0 1
## 299 20893.0 1
## 300 20904.0 1
## 301 20929.0 1
## 302 20948.0 1
## 303 20970.0 1
## 304 21014.0 1
## 305 21029.0 1
## 306 21135.0 1
## 307 21171.0 1
## 308 21181.0 1
## 309 21182.0 1
## 310 21203.0 1
## 311 21207.0 1
## 312 21237.0 1
## 313 21315.0 1
## 314 21330.0 1
## 315 21360.0 1
## 316 21415.0 1
## 317 21467.0 1
## 318 21564.0 1
## 319 21581.0 1
## 320 21592.0 1
## 321 21661.0 1
## 322 21662.0 1
## 323 21729.0 1
## 324 21741.0 1
## 325 21798.0 1
## 326 21816.0 1
## 327 21825.0 1
## 328 21890.0 1
## 329 21992.0 1
## 330 22015.0 1
## 331 22027.0 1
## 332 22037.0 1
## 333 22107.0 1
## 334 22122.0 1
## 335 22129.0 1
## 336 22164.0 1
## 337 22183.0 1
## 338 22197.0 1
## 339 22202.0 1
## 340 22207.0 1
## 341 22246.0 1
## 342 22271.0 1
## 343 22277.0 1
## 344 22324.0 1
## 345 22345.0 1
## 346 22369.0 1
## 347 22379.0 1
## 348 22436.0 1
## 349 22512.0 1
## 350 22537.0 1
## 351 22553.0 1
## 352 22591.0 1
## 353 22610.0 1
## 354 22801.0 1
## 355 22821.5 1
## 356 22829.0 1
## 357 22861.0 1
## 358 22866.0 1
## 359 22872.0 1
## 360 22910.0 1
## 361 22975.0 1
## 362 23007.0 1
## 363 23042.0 1
## 364 23107.0 1
## 365 23144.0 2
## 366 23175.0 1
## 367 23262.0 1
## 368 23276.0 1
## 369 23279.0 1
## 370 23326.0 1
## 371 23468.0 1
## 372 23471.0 1
## 373 23542.0 1
## 374 23585.0 1
## 375 23593.0 1
## 376 23616.0 1
## 377 23649.0 1
## 378 23662.0 1
## 379 23663.0 1
## 380 23679.0 1
## 381 23694.0 1
## 382 23696.0 1
## 383 23768.0 1
## 384 23780.0 1
## 385 23830.0 1
## 386 23874.0 1
## 387 23877.0 1
## 388 23968.0 1
## 389 23997.0 1
## 390 24018.0 1
## 391 24034.0 1
## 392 24057.0 1
## 393 24084.0 1
## 394 24103.0 1
## 395 24119.0 1
## 396 24189.0 1
## 397 24218.0 1
## 398 24226.0 1
## 399 24275.0 1
## 400 24400.0 1
## 401 24404.0 1
## 402 24430.0 1
## 403 24449.0 1
## 404 24499.0 1
## 405 24524.0 1
## 406 24610.0 1
## 407 24615.0 1
## 408 24654.0 1
## 409 24662.0 1
## 410 24750.0 1
## 411 24768.0 1
## 412 24880.0 1
## 413 24910.0 1
## 414 24914.0 1
## 415 24946.0 1
## 416 25003.0 1
## 417 25027.0 1
## 418 25075.0 1
## 419 25099.0 1
## 420 25152.0 1
## 421 25211.0 1
## 422 25222.0 1
## 423 25306.0 1
## 424 25315.0 1
## 425 25356.5 1
## 426 25445.0 1
## 427 25507.0 1
## 428 25574.0 1
## 429 25591.0 1
## 430 25706.0 1
## 431 25712.0 1
## 432 25751.0 1
## 433 25828.0 1
## 434 25909.0 1
## 435 25927.0 1
## 436 25967.0 1
## 437 25991.0 1
## 438 26021.0 1
## 439 26025.0 1
## 440 26030.0 1
## 441 26138.0 1
## 442 26234.0 1
## 443 26257.0 1
## 444 26266.0 1
## 445 26281.0 1
## 446 26310.0 1
## 447 26327.0 1
## 448 26404.0 1
## 449 26437.0 1
## 450 26471.0 1
## 451 26554.0 1
## 452 26614.0 1
## 453 26643.0 1
## 454 26911.0 1
## 455 26915.0 1
## 456 27016.6 1
## 457 27102.0 1
## 458 27280.0 1
## 459 27327.0 1
## 460 27329.0 1
## 461 27467.0 1
## 462 27469.0 1
## 463 27487.0 1
## 464 27621.0 1
## 465 27627.6 1
## 466 27786.0 1
## 467 27816.0 1
## 468 27825.0 1
## 469 27938.0 1
## 470 27986.0 1
## 471 28062.0 1
## 472 28069.0 1
## 473 28088.0 1
## 474 28096.0 1
## 475 28144.0 1
## 476 28170.0 1
## 477 28179.0 1
## 478 28277.0 1
## 479 28355.0 1
## 480 28382.0 1
## 481 28480.0 1
## 482 28486.0 1
## 483 28699.0 1
## 484 28726.0 1
## 485 28730.0 1
## 486 28798.0 1
## 487 29133.0 1
## 488 29270.0 1
## 489 29311.0 1
## 490 29319.0 1
## 491 29339.0 1
## 492 29383.0 1
## 493 29418.0 1
## 494 29428.0 1
## 495 29435.0 1
## 496 29650.0 1
## 497 29801.0 1
## 498 30010.0 1
## 499 30151.0 1
## 500 30184.0 1
## 501 30567.0 1
## 502 30647.0 1
## 503 30722.0 1
## 504 30740.0 1
## 505 30800.0 1
## 506 30873.0 1
## 507 30970.0 1
## 508 31025.0 1
## 509 31416.0 1
## 510 31700.0 2
## 511 33018.7 1
#Great! Now I've identified that there are 7 "0" values in this data. Now let's filter those out, and check to make sure that I have a new dataframe that only has 518 observations. But before I do that...I just remembered that some people said that scores were map dependent as well, since certain maps favor certain roles. Let's add that column back into our new dataset.
round1v3 <- cbind(round1v2, map = round1v1$map)
#Now we can remove those 7 "0" values and not have to worry about matching differences.
round1v4 <- filter(round1v3, !(bloodpoints %in% c(0)))
#The names of the maps aren't that important right now, so let's transform them to simple categories and worry about a codebook later. We will use case_when to transform all of that category at once.
round1v5 <- round1v4 %>% mutate(map = case_when(map == "macmillan estate" ~ 1,
map == "coldwind farm" ~2,
map == "backwater swamp" ~3,
map == "red forest" ~4,
map == "ormond" ~5))
round1v5$map <- as.factor(round1v5$map)
round1v5 %>% group_by(role) %>% skim()
Name | Piped data |
Number of rows | 518 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
factor | 2 |
numeric | 2 |
________________________ | |
Group variables | role |
Variable type: factor
skim_variable | role | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|---|
NEWplayerID | 1 | 0 | 1 | FALSE | 103 | 1: 1, 7: 1, 13: 1, 19: 1 |
NEWplayerID | 2 | 0 | 1 | FALSE | 415 | 2: 1, 3: 1, 4: 1, 5: 1 |
map | 1 | 0 | 1 | FALSE | 5 | 2: 21, 3: 21, 4: 21, 1: 20 |
map | 2 | 0 | 1 | FALSE | 5 | 3: 84, 4: 84, 5: 83, 1: 82 |
Variable type: numeric
skim_variable | role | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
bloodpoints | 1 | 0 | 1 | 23228.55 | 5941.78 | 8038 | 19697.0 | 24057 | 27801.0 | 33018.7 | ▂▂▆▇▆ |
bloodpoints | 2 | 0 | 1 | 18529.24 | 5471.34 | 4829 | 14737.0 | 18612 | 22407.5 | 31416.0 | ▂▅▇▆▂ |
rank | 1 | 0 | 1 | 5.11 | 2.75 | 1 | 3.5 | 4 | 5.0 | 14.0 | ▃▇▂▁▁ |
rank | 2 | 0 | 1 | 5.09 | 2.70 | 1 | 4.0 | 4 | 5.0 | 14.0 | ▃▇▂▁▁ |
#since I've done this extra data manipulation, let's see if that made any difference.
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
I used bind_rows and cbind instead of join because it was more efficient for my purposes. That reasoning can be found in the above code notes.
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
head(round1v5)
## NEWplayerID role bloodpoints rank map
## 1 1 1 30010 4 1
## 2 2 2 24189 3 1
## 3 3 2 27102 5 1
## 4 4 2 11813 4 1
## 5 5 2 17762 2 1
## 6 6 2 18572 8 2
#I only used head because I really only want to see the first few rows and how the data is classified.
Are the values what you expected for the variables? Why or Why not?
The values and the type of data is what I expected as this is still my primary data. What I was surprised by was how 7 “0” values could skew the data that much. There is still a skewedness in the “killer” role, but removing those values has pulled it closer to center.
Use
group_by()
andsummarize()
to make a summary of the data here. The summary should be relevant to your research question
round1v5 %>% group_by(role) %>% summarize(BPS = sum(bloodpoints)/length(role))
## # A tibble: 2 x 2
## role BPS
## <fct> <dbl>
## 1 1 23229.
## 2 2 18529.
#While this code produces an answer to the very basic question, (on average) do the scores between the two roles differ, it doesn't quite answer my real question - which is, are they significantly different...a t-test should do the trick
t.test(round1v5$bloodpoints ~ round1v5$role)
##
## Welch Two Sample t-test
##
## data: round1v5$bloodpoints by round1v5$role
## t = 7.2956, df = 147.84, p-value = 1.681e-11
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## 3426.422 5972.190
## sample estimates:
## mean in group 1 mean in group 2
## 23228.55 18529.24
#This code will help me run a simple independent 2-group t-test that compares the bloodpoints (my numeric variable) by a binary variable (the role of the player)
What are your findings about the summary? Are they what you expected?
Overall Finding: There is a significant difference in point differential between killers and survivors in Dead by Daylight.
The t-test was illuminating. I did not expect the differences between scores to be that significant. Though I suppose because this tournament includes people of all different play types there are likely several other factors that come into play (e.g., their level of experience with the game, the map played on, etc.). Still, this very basic question gives evidence that there is a difference between roles in the game that we need to pay attention to when setting up our tournament.
Make at least two plots that help you answer your question on the transformed or summarized data. Use scales and/or labels to make each plot informative.
ggplot(data = round1v5) +
aes(x = map, y = bloodpoints, color = role) +
geom_boxplot() +
labs(title = "Bloodpoints versus Map by Role in Game",
x = "Map Used for Round",
y = "Bloodpoints Earned", color = "Role") + scale_x_discrete(labels = c("1" = "Macmillan Estate", "2" = "Coldwind Farm", "3" = "Backwater Swamp", "4" = "Red Forest", "5" = "Ormond")) +
theme(axis.text.x = element_text(angle=45)) +
scale_color_discrete(labels = c("1" = "Killer", "2" = "Survivor"))
ggplot(data=round1v5, aes(x=map, y=bloodpoints, fill=role)) +
geom_bar(position ="dodge", stat="identity", width=0.5) +
labs(title = "Bloodpoints versus Map by Role in Game",
x = "Map Used for Round",
y = "Bloodpoints Earned", fill = "Role") + scale_x_discrete(labels = c("1" = "Macmillan Estate", "2" = "Coldwind Farm", "3" = "Backwater Swamp", "4" = "Red Forest", "5" = "Ormond")) +
theme(axis.text.x = element_text(angle=45)) +
scale_fill_discrete(labels = c("1" = "Killer", "2" = "Survivor"))
Summarize your research question and findings below.
Research Question Recap:Do killers earn significantly more points than survivors in Dead by Daylight (DBD)?
We learned that yes, killers do appear to significantly earn more points than survivors in DBD. Further we learned that when we stratified by map, these point differentials become more pronounced. If we look at the bar chart, Red Forest and Ormond appear to favor killers earning more bloodpoints, on average. When looking at the boxplots we can identify the fluctuation in median values with the killer role. Whereas survivor median values stayed relatively consistent, the killer median values appear to fluctuate near 5000 points depending on the map. The boxplots also identified 3 outlier values (2 for survivors and 1 for killers).
Are your findings what you expected? Why or Why not?
The findings generally track with what my expectation was, though I didn’t think the difference would be that pronounced. I also didn’t anticipate that maps would play such a factor into scoring. Dead by Daylight has become hugely popular since the start of the pandemic, possibly because of its ability to build community in a time when people had to build virtual ones. Part of my job in this tournament is to collect data and to listen to people’s concerns, since we want to foster a specific type of game experience. I heard for months that killers were earning more bloodpoints and that certain maps favored them, but it is easy to brush those concerns aside as people that are mad because they lost the tournament. This small study gives support to those concerns. It will be interesting to think about this during a next tourament planning phase, as we clearly need to reimagine our scoring system.