Loading packages
Please submit your .Rmd
and .html
files in Sakai. If you are working together, both people should submit the files.
You must use each of the following functions at least once:
mutate()
group_by()
summarize()
ggplot()
and at least one of the following:
case_when()
across()
*_join()
(i.e. left_join()
)pivot_*()
(i.e. pivot_longer()
)I selected the Premise Women’s Health COVID-19 Health Services Disruption Survey 2021. This survey is from Institute for Health Metrics and Evaluation (IHME) and it was developed to assess the level of disruption to a range of health services resulting from the COVID-19 pandemic. I selected the women’s health specific survey as I am interested in sexual and reproductive health matters.
This survey was conducted in 51 countries using the smartphone-based platform. There were 4,319 respondents from the general population, aged 16-49 years, who identified as women. The survey focused on the level of disruption to family planning and reproductive health services and changes in risk of gender-based violence. Data are available here: http://ghdx.healthdata.org/record/ihme-data/premise-womens-health-covid-19-health-services-disruption-survey-2021
Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
I am interested in this data set for several reasons. My dissertation research focuses on self-managed abortion in a global context and so I have a lot of knowledge of the sexual and reproductive health literature, this specific data set and research question is related but new to me. I am interested in the complex intersection of contraceptive use, fertility, safety, and alcohol use of partners. Additionally, it has a wide geographic spread and may be interesting to look at differences by country-level income.
My research question is: How has the COVID-19 pandemic altered pregnancy desires and contraceptive use among women and are there differences by country-level income?
I will also look at if there are perceived domestic safety changes including a perceived change in partners alcohol use.
Given your question, what is your expectation about the data?
My expectation is that the Covid-19 pandemic has altered pregnancy desires. The pandemic has been a multi year major disruption to the lives of many. Pregnancy desires have a lot of contributing factors and therefore, I suspect that because many life patterns have been disrupted, they may accumulate to impact pregnancy desire. I also expect to see some disruption or changes in contraceptive use. I may8 disruption to clinic-based access to be more pronounced in lower income countries. In wealthier nations such as the US, in person visits and telehealth services allow for access to contraceptives at near same efficiency as before the pandemic. In lower income nations, contraceptive access was a challenge before the pandemic. With regards to perceived domestic safety, I do not have a hypothesis, this portion of the question will be exploratory.
Load the data below and use
dplyr::glimpse()
orskimr::skim()
on the data. You should upload the data file into thedata
directory.
#Loading the data
IHME_data <- read.csv("data/IHME_PREM_WMN_HEALTH_2021_Y2021M09D14.CSV")
#checking out the data
glimpse(IHME_data)
## Rows: 4,319
## Columns: 103
## $ country <chr> "Afghanistan", "Afghanistan", "Afgha…
## $ location_id <int> 160, 160, 160, 160, 160, 160, 160, 1…
## $ observation_id <chr> "wmn_4944344565153792", "wmn_5848008…
## $ user_id <chr> "wmn_5978217477570560", "wmn_5912238…
## $ submission_time <chr> "2021-05-28 22:51:28.502 UTC", "2021…
## $ age <int> 2, 3, 2, 2, 2, 2, 3, 3, 2, 3, 3, 2, …
## $ gender <int> 2, 2, 2, 2, 2, 2, 99, 2, 2, 2, 99, 9…
## $ geography <int> 1, 2, 1, 2, 3, 2, 2, 2, 2, 1, 2, 2, …
## $ financial_situation <int> 7, 5, 2, 6, 2, 99, 2, 5, 99, 2, 99, …
## $ education <int> 10, 13, 10, 13, 9, 10, 11, 8, 10, 14…
## $ employment_status <int> 14, 11, 15, 15, 17, 15, 15, 21, 15, …
## $ ethnicity <chr> "Prefer not to answer", "Pashtun", "…
## $ religion <chr> "Sunni (Muslim)", "Sunni (Muslim)", …
## $ wmn_con_access_difficulty <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_partner_alcohol_change <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why <chr> "", "", "", "", "", "", "", "", "", …
## $ wmn_con_type_prepandemic <chr> "", "male sterilization", "male ster…
## $ wmn_con_type <chr> "", "female sterilization", "", "fem…
## $ wmn_con_use_change <int> 5, 2, 2, 3, 2, 3, 1, 1, 2, 2, 2, 2, …
## $ wmn_help_unmet_need_why <int> NA, 2, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_pregnancy_desire <int> 3, NA, 2, 1, 1, 2, 2, 1, 1, 1, 1, 3,…
## $ wmn_pregnancy_desire_another <int> NA, 1, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_pregnancy_change_how <int> NA, NA, 2, 1, 2, NA, 2, NA, NA, NA, …
## $ wmn_pregnancy_change_how_another <int> NA, 1, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_how_safe_change <int> 1, 3, NA, NA, NA, NA, NA, NA, 1, NA,…
## $ wmn_safe_place_howoften <int> NA, 1, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_safe_place_no_access_why <int> NA, 2, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_how_safe <int> 3, 4, NA, NA, NA, NA, NA, NA, 2, NA,…
## $ wmn_how_safe_community <int> 2, 4, NA, NA, NA, NA, NA, NA, 77, NA…
## $ wmn_con_needed <int> 0, 1, 2, 3, 5, 1, 0, 3, 2, 1, 1, 1, …
## $ wmn_con_accessed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_hh <int> 9, 1, 1, 1, 2, 3, 4, 2, 2, 6, 3, 1, …
## $ wmn_partner_violence_change <int> 3, 2, NA, NA, NA, NA, NA, NA, 3, NA,…
## $ wmn_partner_violence <int> 4, 4, NA, NA, NA, NA, NA, NA, 4, NA,…
## $ wmn_pregnant <int> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, …
## $ wmn_married <int> 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ wmn_pregnancy_change_another <int> NA, 1, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_birth_ever <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ wmn_con <int> 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, …
## $ wmn_pregnancy_change <int> 0, NA, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0,…
## $ wmn_sexually_active <int> 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, …
## $ wmn_pregnancy_wait_2years_another <int> NA, 0, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_pregnancy_wait_2years <int> NA, NA, 0, 0, 0, 0, 1, 1, 0, 0, 0, N…
## $ wmn_safe_place <int> 77, 1, NA, NA, NA, NA, NA, NA, 0, NA…
## $ wmn_employment_loss <int> 0, 0, NA, NA, NA, NA, NA, NA, 77, NA…
## $ wmn_help <int> 0, 1, NA, NA, NA, NA, NA, NA, 0, NA,…
## $ wmn_partner_alcohol <int> NA, 77, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_safe_place_no_access <int> NA, 1, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_help_unmet_need <int> 0, 1, NA, NA, NA, NA, NA, NA, 0, NA,…
## $ wmn_safe_place_need <int> NA, 1, NA, NA, NA, NA, NA, NA, NA, N…
## $ wmn_alone <int> 1, 1, 0, 88, 0, 88, 88, 88, 1, 0, 88…
## $ wmn_implant_missed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_iud_missed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_missed_dose_pills <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_injectable_missed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_type_1 <int> NA, 1, NA, 1, NA, NA, NA, 1, 0, 1, N…
## $ wmn_con_type_2 <int> NA, 1, NA, 1, NA, NA, NA, 1, 1, 1, N…
## $ wmn_con_type_3 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 1, N…
## $ wmn_con_type_4 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_5 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_6 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_14 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_10 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_12 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_13 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_11 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_15 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_16 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_99 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_8 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_9 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_77 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_88 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_type_7 <int> NA, 0, NA, 0, NA, NA, NA, 0, 0, 0, N…
## $ wmn_con_missed_why_4 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_1 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_2 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_3 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_7 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_8 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_99 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_88 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_6 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_missed_why_5 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ wmn_con_type_prepandemic_1 <int> NA, 0, 0, NA, 1, NA, 0, 1, 0, 1, 1, …
## $ wmn_con_type_prepandemic_2 <int> NA, 1, 1, NA, 1, NA, 0, 1, 1, 1, 1, …
## $ wmn_con_type_prepandemic_3 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 1, 0, …
## $ wmn_con_type_prepandemic_4 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_5 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_6 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_14 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_10 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_12 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_13 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_11 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_15 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_16 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_99 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_8 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_9 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_77 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_88 <int> NA, 0, 0, NA, 0, NA, 1, 0, 0, 0, 0, …
## $ wmn_con_type_prepandemic_7 <int> NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, …
#Selecting columns I'm interested in to reduce the size of the data and creating a new data set to clean
IHME_data_clean <- IHME_data %>% select(country, age, gender, geography, financial_situation, education, employment_status, religion, wmn_con_access_difficulty, wmn_partner_alcohol_change, wmn_con_use_change, wmn_how_safe_change, wmn_partner_violence, wmn_pregnancy_desire, wmn_pregnancy_change_how)
If there are any quirks that you have to deal with
NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
I can see from glimpse that there is a lot of missing data which may impact my analyses, but it is displayed as NA. Some of the NA values are entered into the survey responses as 99. These will be recoded in the data transformation section below.
Make sure your data types are correct! I want to convert the country variable from a character to a factor and then collapse into 3 groups low, middle, and high income countries based on the World Bank classifications. This will make countries comparisons easier and possible by income of nation. Note that “low-income” contains both low and lower middle income countries. Source:https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2021-2022
#converting the country variable into a factor
IHME_data_clean <- IHME_data_clean %>% mutate(country = factor(country))
#collapsing country variable into 3 groups by income
IHME_data_clean <- IHME_data_clean %>%
mutate(country_income = fct_collapse(country,
low_income = c("Afghanistan", "Algeria","Bangladesh","Benin", "Burkina Faso", "Cambodia", "Colombia", "Democratic Republic of the Congo", "Egypt", "El Salvador", "Ethiopia","Ghana", "Guatemala", "Honduras", "Indonesia", "Kenya", "Liberia", "Mali", "Mozambique", "Nicaragua", "Nigeria", "Philippines", "Rwanda", "Senegal", "Sierra Leone", "Somalia", "Tunisia", "Uganda", "Ukraine", "United Republic of Tanzania", "Viet Nam", "Yemen", "Zambia", "Zimbabwe"),
middle_income = c("Albania", "Argentina","Bosnia and Herzegovina", "Brazil", "Costa Rica", "Dominican Republic", "Ecuador", "India", "Iraq", "Jamaica", "Jordan", "Kazakhstan", "Lebanon", "Malaysia", "Mexico", "Panama", "Peru", "Serbia", "South Africa", "Thailand", "Turkey" ),
high_income = c("Bahrain", "Chile", "France", "Hong Kong", "Italy", "Ivory Coast", "United States","Lithuania", "Morocco", "Oman", "Poland", "Republic of Korea", "Saudi Arabia", "Spain", "Taiwan (Province of China)", "Trinidad and Tobago", "United Arab Emirates", "United Kingdom", "Uruguay", "Venezuela (Bolivarian Republic of)")
))
#viewing in a table to check this worked
IHME_data_clean %>% tabyl(country_income) %>% adorn_totals() %>% adorn_pct_formatting()
## country_income n percent
## low_income 2150 49.8%
## middle_income 1044 24.2%
## high_income 1125 26.0%
## Total 4319 100.0%
If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using
case_when()
, etc.
I am making several data transformations to recode the integer variables into characters so that they are interpretable. For example, presently gender = 1, this will be recoded to a new variable gender_name = woman vs man. Note that all participants in this specific survey identified as women. All information is available in the IHME code book.
#recoding gender variable
IHME_data_clean <- IHME_data_clean %>% mutate(
gender_name = case_when(gender == 2 ~ "woman",
gender ==1 ~ "man",
gender == 99 ~ "NA"))
#Change the NAs so R codes them as missing
IHME_data_clean <- IHME_data_clean %>% mutate(
gender_name = na_if(gender_name, "NA"))
#making a tabyl to check that it worked, as expected all participants are women (3.6% missing)
IHME_data_clean %>% tabyl(gender_name)
## gender_name n percent valid_percent
## woman 4161 0.96341746 1
## <NA> 158 0.03658254 NA
#making age categories out of the survey responses
IHME_data_clean <- IHME_data_clean %>% mutate(
age_category = case_when(age == 1 ~ "<16",
age == 2 ~ "16-25",
age == 3 ~ "26-35",
age == 4 ~ "36-45",
age == 5 ~ "46+",
age == 99 ~ "NA"
))
#Change the NAs so R codes them as missing
IHME_data_clean <- IHME_data_clean %>% mutate(
age_category = na_if(age_category, "NA"))
#making a tabyl to check that it worked
IHME_data_clean %>% tabyl(age_category)
## age_category n percent
## 16-25 1446 0.33479972
## 26-35 1755 0.40634406
## 36-45 920 0.21301227
## 46+ 198 0.04584395
#defining the geography
IHME_data_clean <- IHME_data_clean %>% mutate(
geography_name = case_when(geography == 1 ~ "urban",
geography == 2 ~ "peri-urban",
geography == 3 ~ "rural",
geography == 99 ~ "NA"
))
#Change the NAs so R codes them as missing
IHME_data_clean <- IHME_data_clean %>% mutate(
geography_name = na_if(geography_name, "NA"))
#making a tabyl to check that it worked
IHME_data_clean %>% tabyl(geography_name)
## geography_name n percent valid_percent
## peri-urban 1286 0.2977541097 0.2978231
## rural 1000 0.2315350776 0.2315887
## urban 2032 0.4704792776 0.4705882
## <NA> 1 0.0002315351 NA
#recoding the response to: Since the start of the global COVID-19 pandemic (March 2020-April 2020), how has your partner's alcohol use changed?
IHME_data_clean <- IHME_data_clean %>% mutate(
partner_alcohol_change = case_when(wmn_partner_alcohol_change == 1 ~ "no change",
wmn_partner_alcohol_change == 2 ~ "drinks more",
wmn_partner_alcohol_change == 3 ~ "drinks less",
wmn_partner_alcohol_change == 88 ~ "declines to say",
wmn_partner_alcohol_change == 77 ~ "don't know"
))
#making a tabyl to check that it worked, a lot of missing data to this question (unfortunately)
IHME_data_clean %>% tabyl(partner_alcohol_change)
## partner_alcohol_change n percent valid_percent
## declines to say 1 0.0002315351 0.005780347
## don't know 5 0.0011576754 0.028901734
## drinks less 71 0.0164389905 0.410404624
## drinks more 30 0.0069460523 0.173410405
## no change 66 0.0152813151 0.381502890
## <NA> 4146 0.9599444316 NA
#recoding the response to: Since the start of the global COVID-19 pandemic in March 2020-April 2020, how safe have you felt in your home?
IHME_data_clean <- IHME_data_clean %>% mutate(
safety_change = case_when(wmn_how_safe_change == 1 ~ "less safe",
wmn_how_safe_change == 2 ~ "no change",
wmn_how_safe_change == 3 ~ "more safe",
wmn_how_safe_change == 88 ~ "declined to say",
wmn_how_safe_change == 77 ~ "don't know",
))
#making a tabyl to check that it worked
IHME_data_clean %>% tabyl(safety_change)
## safety_change n percent valid_percent
## declined to say 23 0.005325307 0.01381381
## don't know 116 0.026858069 0.06966967
## less safe 360 0.083352628 0.21621622
## more safe 373 0.086362584 0.22402402
## no change 793 0.183607317 0.47627628
## <NA> 2654 0.614494096 NA
#recoding the response to: Has your use of contraception changed since the start of the global COVID-19 pandemic in March 2020-April 2020?
IHME_data_clean <- IHME_data_clean %>% mutate(
contraceptive_change = case_when(wmn_con_use_change == 2 ~ "stopped contraceptives",
wmn_con_use_change == 4 ~ "continued same method",
wmn_con_use_change == 1 ~ "switched methods",
wmn_con_use_change == 3 ~ "started using a method",
wmn_con_use_change == 5 ~ "continued not using"
))
#making sure I didn't miss, missing data, this one seems complete
IHME_data_clean <- IHME_data_clean %>% mutate(
contraceptive_change = na_if(contraceptive_change, "NA"))
#making a tabyl to check that it worked
IHME_data_clean %>% tabyl(contraceptive_change)
## contraceptive_change n percent
## continued not using 1644 0.38064367
## continued same method 1266 0.29312341
## started using a method 405 0.09377171
## stopped contraceptives 662 0.15327622
## switched methods 342 0.07918500
#recoding the response to: Would you like to have a child? How long would you like to wait from now before the birth of a child?
IHME_data_clean <- IHME_data_clean %>% mutate(
pregnancy_desire = case_when(wmn_pregnancy_desire == 1 ~ "want child soon",
wmn_pregnancy_desire == 2 ~ "want child later",
wmn_pregnancy_desire == 3 ~ "undecided",
wmn_pregnancy_desire == 4 ~ "don't want children",
wmn_pregnancy_desire == 5 ~ "unable to have children",
wmn_pregnancy_desire == 88 ~ "declined to say"
))
#making a tabyl to check that it worked
IHME_data_clean %>% tabyl(pregnancy_desire)
## pregnancy_desire n percent valid_percent
## declined to say 229 0.05302153 0.08858801
## don't want children 354 0.08196342 0.13694391
## unable to have children 93 0.02153276 0.03597679
## undecided 482 0.11159991 0.18646035
## want child later 775 0.17943969 0.29980658
## want child soon 652 0.15096087 0.25222437
## <NA> 1734 0.40148182 NA
#recoding the response to: How has your desire for a child changed since March 2020-April 2020?
IHME_data_clean <- IHME_data_clean %>% mutate(
pregnancy_desire_change = case_when(wmn_pregnancy_change_how == 1 ~ "no longer want a child",
wmn_pregnancy_change_how == 3 ~ "now want a child",
wmn_pregnancy_change_how == 2 ~ "now undecided",
wmn_pregnancy_change_how == 5 ~ "want a child sooner",
wmn_pregnancy_change_how == 4 ~ "want to delay having child",
wmn_pregnancy_change_how == 88 ~ "declined say"
))
#making a tabyl to check that it worked
IHME_data_clean %>% tabyl(pregnancy_desire_change)
## pregnancy_desire_change n percent valid_percent
## declined say 20 0.004630702 0.02877698
## no longer want a child 112 0.025931929 0.16115108
## now undecided 261 0.060430655 0.37553957
## now want a child 111 0.025700394 0.15971223
## want a child sooner 35 0.008103728 0.05035971
## want to delay having child 156 0.036119472 0.22446043
## <NA> 3624 0.839083121 NA
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
#Transformed data
glimpse(IHME_data_clean)
## Rows: 4,319
## Columns: 24
## $ country <fct> Afghanistan, Afghanistan, Afghanistan, Afgh…
## $ age <int> 2, 3, 2, 2, 2, 2, 3, 3, 2, 3, 3, 2, 2, 4, 3…
## $ gender <int> 2, 2, 2, 2, 2, 2, 99, 2, 2, 2, 99, 99, 2, 2…
## $ geography <int> 1, 2, 1, 2, 3, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2…
## $ financial_situation <int> 7, 5, 2, 6, 2, 99, 2, 5, 99, 2, 99, 99, 3, …
## $ education <int> 10, 13, 10, 13, 9, 10, 11, 8, 10, 14, 99, 9…
## $ employment_status <int> 14, 11, 15, 15, 17, 15, 15, 21, 15, 17, 14,…
## $ religion <chr> "Sunni (Muslim)", "Sunni (Muslim)", "Muslim…
## $ wmn_con_access_difficulty <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, …
## $ wmn_partner_alcohol_change <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ wmn_con_use_change <int> 5, 2, 2, 3, 2, 3, 1, 1, 2, 2, 2, 2, 2, 2, 3…
## $ wmn_how_safe_change <int> 1, 3, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA…
## $ wmn_partner_violence <int> 4, 4, NA, NA, NA, NA, NA, NA, 4, NA, NA, NA…
## $ wmn_pregnancy_desire <int> 3, NA, 2, 1, 1, 2, 2, 1, 1, 1, 1, 3, 3, NA,…
## $ wmn_pregnancy_change_how <int> NA, NA, 2, 1, 2, NA, 2, NA, NA, NA, 3, NA, …
## $ country_income <fct> low_income, low_income, low_income, low_inc…
## $ gender_name <chr> "woman", "woman", "woman", "woman", "woman"…
## $ age_category <chr> "16-25", "26-35", "16-25", "16-25", "16-25"…
## $ geography_name <chr> "urban", "peri-urban", "urban", "peri-urban…
## $ partner_alcohol_change <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ safety_change <chr> "less safe", "more safe", NA, NA, NA, NA, N…
## $ contraceptive_change <chr> "continued not using", "stopped contracepti…
## $ pregnancy_desire <chr> "undecided", NA, "want child later", "want …
## $ pregnancy_desire_change <chr> NA, NA, "now undecided", "no longer want a …
Are the values what you expected for the variables? Why or Why not?
Yes, the data that I began with had numerical codes for survey answers so I needed to mutate most of them to characters. The country_income variable is the countries collapsed into 3 factors as previously described.
Use
group_by()
andsummarize()
to make a summary of the data here. The summary should be relevant to your research question
#First I will group by country income and summarize the distinct n characters, this tells me the number of categories in each of my character variables
IHME_data_clean %>% group_by(country_income) %>%
summarise(across(where(is.character), n_distinct))
## # A tibble: 3 × 10
## country_income religion gender_name age_category geography_name
## <fct> <int> <int> <int> <int>
## 1 low_income 40 2 4 4
## 2 middle_income 42 2 4 3
## 3 high_income 37 2 4 3
## # … with 5 more variables: partner_alcohol_change <int>, safety_change <int>,
## # contraceptive_change <int>, pregnancy_desire <int>,
## # pregnancy_desire_change <int>
#Next I will group by contraceptive change and summarize count and percent
IHME_data_clean %>% group_by(contraceptive_change) %>%
summarize(count = n()) %>%
mutate(percent = round((count / sum(count))*100, 2)) %>%
arrange(desc(percent)) %>%
adorn_totals()
## contraceptive_change count percent
## continued not using 1644 38.06
## continued same method 1266 29.31
## stopped contraceptives 662 15.33
## started using a method 405 9.38
## switched methods 342 7.92
## Total 4319 100.00
#Next, I will group by pregnancy desire change and summarize count and percent; 83 % missing data so hard to draw conclusiosn but will sum up below
IHME_data_clean %>% group_by(pregnancy_desire_change) %>%
summarize(count = n()) %>%
mutate(percent = round((count / sum(count))*100, 2)) %>%
adorn_totals()
## pregnancy_desire_change count percent
## declined say 20 0.46
## no longer want a child 112 2.59
## now undecided 261 6.04
## now want a child 111 2.57
## want a child sooner 35 0.81
## want to delay having child 156 3.61
## <NA> 3624 83.91
## Total 4319 99.99
# I also want to look at this without the missing values
IHME_data_clean %>% group_by(pregnancy_desire_change) %>%
drop_na(pregnancy_desire_change) %>%
summarize(count = n()) %>%
mutate(percent = round((count / sum(count))*100, 2)) %>%
adorn_totals()
## pregnancy_desire_change count percent
## declined say 20 2.88
## no longer want a child 112 16.12
## now undecided 261 37.55
## now want a child 111 15.97
## want a child sooner 35 5.04
## want to delay having child 156 22.45
## Total 695 100.01
#Next, I will group by partner alcohol use and summarize count and percent
IHME_data_clean %>% group_by(partner_alcohol_change) %>%
drop_na(partner_alcohol_change) %>%
summarize(count = n()) %>%
mutate(percent = round((count / sum(count))*100, 2)) %>%
adorn_totals()
## partner_alcohol_change count percent
## declines to say 1 0.58
## don't know 5 2.89
## drinks less 71 41.04
## drinks more 30 17.34
## no change 66 38.15
## Total 173 100.00
#Next, I will group by safety change and summarize count and percent
IHME_data_clean %>% group_by(safety_change) %>%
drop_na(safety_change) %>%
summarize(count = n()) %>%
mutate(percent = round((count / sum(count))*100, 2)) %>%
adorn_totals()
## safety_change count percent
## declined to say 23 1.38
## don't know 116 6.97
## less safe 360 21.62
## more safe 373 22.40
## no change 793 47.63
## Total 1665 100.00
Crosstabs
Here I am looking at crosstabs to see if there are any interesting patterns to decide what to plot. Given that there is so much missing data, I will drop much of the missing data in the plots. The contraceptive change has the largest sample, I plan to look at that my country income.
#Pregnancy desire changes by country income
IHME_data_clean %>%
filter(!is.na(pregnancy_desire_change)) %>%
tabyl(pregnancy_desire_change, country_income) %>%
adorn_totals(where = c("row", "col")) %>% adorn_percentages() %>%
adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## country_income
## pregnancy_desire_change low_income middle_income high_income
## declined say 8 (40.0%) 8 (40.0%) 4 (20.0%)
## no longer want a child 56 (50.0%) 36 (32.1%) 20 (17.9%)
## now undecided 139 (53.3%) 61 (23.4%) 61 (23.4%)
## now want a child 67 (60.4%) 27 (24.3%) 17 (15.3%)
## want a child sooner 19 (54.3%) 13 (37.1%) 3 (8.6%)
## want to delay having child 71 (45.5%) 51 (32.7%) 34 (21.8%)
## Total 360 (51.8%) 196 (28.2%) 139 (20.0%)
##
## Total
## 20 (100.0%)
## 112 (100.0%)
## 261 (100.0%)
## 111 (100.0%)
## 35 (100.0%)
## 156 (100.0%)
## 695 (100.0%)
#Pregnancy desire changes by age categories
IHME_data_clean %>%
filter(!is.na(pregnancy_desire_change)) %>%
tabyl(pregnancy_desire_change, age_category) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## age_category
## pregnancy_desire_change 16-25 26-35 36-45 46+
## declined say 11 (55.0%) 5 (25.0%) 4 (20.0%) 0 (0.0%)
## no longer want a child 51 (45.5%) 39 (34.8%) 17 (15.2%) 5 (4.5%)
## now undecided 115 (44.1%) 108 (41.4%) 35 (13.4%) 3 (1.1%)
## now want a child 49 (44.1%) 51 (45.9%) 9 (8.1%) 2 (1.8%)
## want a child sooner 9 (25.7%) 21 (60.0%) 5 (14.3%) 0 (0.0%)
## want to delay having child 70 (44.9%) 73 (46.8%) 12 (7.7%) 1 (0.6%)
## Total 305 (43.9%) 297 (42.7%) 82 (11.8%) 11 (1.6%)
##
## Total
## 20 (100.0%)
## 112 (100.0%)
## 261 (100.0%)
## 111 (100.0%)
## 35 (100.0%)
## 156 (100.0%)
## 695 (100.0%)
#Contraceptive change changes by country income
IHME_data_clean %>%
filter(!is.na(contraceptive_change)) %>%
tabyl(contraceptive_change, country_income) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## country_income
## contraceptive_change low_income middle_income high_income Total
## continued not using 831 (50.5%) 360 (21.9%) 453 (27.6%) 1644 (100.0%)
## continued same method 538 (42.5%) 336 (26.5%) 392 (31.0%) 1266 (100.0%)
## started using a method 222 (54.8%) 98 (24.2%) 85 (21.0%) 405 (100.0%)
## stopped contraceptives 367 (55.4%) 167 (25.2%) 128 (19.3%) 662 (100.0%)
## switched methods 192 (56.1%) 83 (24.3%) 67 (19.6%) 342 (100.0%)
## Total 2150 (49.8%) 1044 (24.2%) 1125 (26.0%) 4319 (100.0%)
#Contraceptive changes by age categories
IHME_data_clean %>%
filter(!is.na(contraceptive_change)) %>%
tabyl(contraceptive_change, age_category) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## age_category
## contraceptive_change 16-25 26-35 36-45 46+
## continued not using 549 (33.4%) 605 (36.8%) 387 (23.5%) 103 (6.3%)
## continued same method 349 (27.6%) 590 (46.6%) 282 (22.3%) 45 (3.6%)
## started using a method 190 (46.9%) 148 (36.5%) 58 (14.3%) 9 (2.2%)
## stopped contraceptives 256 (38.7%) 257 (38.8%) 119 (18.0%) 30 (4.5%)
## switched methods 102 (29.8%) 155 (45.3%) 74 (21.6%) 11 (3.2%)
## Total 1446 (33.5%) 1755 (40.6%) 920 (21.3%) 198 (4.6%)
##
## Total
## 1644 (100.0%)
## 1266 (100.0%)
## 405 (100.0%)
## 662 (100.0%)
## 342 (100.0%)
## 4319 (100.0%)
#Partner alcohol use changes by country income
IHME_data_clean %>%
filter(!is.na(partner_alcohol_change)) %>%
tabyl(partner_alcohol_change, country_income) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## country_income
## partner_alcohol_change low_income middle_income high_income Total
## declines to say 1 (100.0%) 0 (0.0%) 0 (0.0%) 1 (100.0%)
## don't know 3 (60.0%) 2 (40.0%) 0 (0.0%) 5 (100.0%)
## drinks less 43 (60.6%) 18 (25.4%) 10 (14.1%) 71 (100.0%)
## drinks more 14 (46.7%) 6 (20.0%) 10 (33.3%) 30 (100.0%)
## no change 22 (33.3%) 17 (25.8%) 27 (40.9%) 66 (100.0%)
## Total 83 (48.0%) 43 (24.9%) 47 (27.2%) 173 (100.0%)
#Partner alcohol use changes by age categories
IHME_data_clean %>%
filter(!is.na(partner_alcohol_change)) %>%
tabyl(partner_alcohol_change, age_category) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## age_category
## partner_alcohol_change 16-25 26-35 36-45 46+
## declines to say 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%)
## don't know 1 (20.0%) 2 (40.0%) 1 (20.0%) 1 (20.0%)
## drinks less 11 (15.5%) 38 (53.5%) 19 (26.8%) 3 (4.2%)
## drinks more 9 (30.0%) 6 (20.0%) 15 (50.0%) 0 (0.0%)
## no change 8 (12.1%) 28 (42.4%) 27 (40.9%) 3 (4.5%)
## Total 29 (16.8%) 74 (42.8%) 63 (36.4%) 7 (4.0%)
##
## Total
## 1 (100.0%)
## 5 (100.0%)
## 71 (100.0%)
## 30 (100.0%)
## 66 (100.0%)
## 173 (100.0%)
#Perceived safety changes by country income
IHME_data_clean %>%
filter(!is.na(safety_change)) %>%
tabyl(safety_change, country_income) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## country_income
## safety_change low_income middle_income high_income Total
## declined to say 12 (52.2%) 6 (26.1%) 5 (21.7%) 23 (100.0%)
## don't know 56 (48.3%) 25 (21.6%) 35 (30.2%) 116 (100.0%)
## less safe 185 (51.4%) 96 (26.7%) 79 (21.9%) 360 (100.0%)
## more safe 227 (60.9%) 88 (23.6%) 58 (15.5%) 373 (100.0%)
## no change 351 (44.3%) 184 (23.2%) 258 (32.5%) 793 (100.0%)
## Total 831 (49.9%) 399 (24.0%) 435 (26.1%) 1665 (100.0%)
#Perceived safety changes by country rurality
IHME_data_clean %>%
filter(!is.na(safety_change)) %>%
tabyl(safety_change, geography_name) %>% adorn_totals(where = c("row", "col")) %>%
adorn_percentages() %>% adorn_pct_formatting(digits = 1) %>% adorn_ns(position = "front") %>% adorn_title()
## geography_name
## safety_change peri-urban rural urban Total
## declined to say 8 (34.8%) 7 (30.4%) 8 (34.8%) 23 (100.0%)
## don't know 39 (33.6%) 26 (22.4%) 51 (44.0%) 116 (100.0%)
## less safe 109 (30.3%) 78 (21.7%) 173 (48.1%) 360 (100.0%)
## more safe 111 (29.8%) 96 (25.7%) 166 (44.5%) 373 (100.0%)
## no change 256 (32.3%) 141 (17.8%) 396 (49.9%) 793 (100.0%)
## Total 523 (31.4%) 348 (20.9%) 794 (47.7%) 1665 (100.0%)
Summary Table I want to practic making a summary table
#new object of what I want in my 'table 1"
Summary_data <- IHME_data_clean %>% select(country_income, gender_name, age_category, geography_name, partner_alcohol_change, safety_change, contraceptive_change, pregnancy_desire_change)
Table_1 <- Summary_data %>% tbl_summary(by = country_income)
Table_1 %>% modify_caption("**Table 1. Participant Characteristics**") %>%
bold_labels()
Characteristic | low_income, N = 2,1501 | middle_income, N = 1,0441 | high_income, N = 1,1251 |
---|---|---|---|
gender_name | |||
woman | 2,025 (100%) | 1,037 (100%) | 1,099 (100%) |
Unknown | 125 | 7 | 26 |
age_category | |||
16-25 | 877 (41%) | 344 (33%) | 225 (20%) |
26-35 | 915 (43%) | 412 (39%) | 428 (38%) |
36-45 | 295 (14%) | 240 (23%) | 385 (34%) |
46+ | 63 (2.9%) | 48 (4.6%) | 87 (7.7%) |
geography_name | |||
peri-urban | 608 (28%) | 280 (27%) | 398 (35%) |
rural | 604 (28%) | 147 (14%) | 249 (22%) |
urban | 937 (44%) | 617 (59%) | 478 (42%) |
Unknown | 1 | 0 | 0 |
partner_alcohol_change | |||
declines to say | 1 (1.2%) | 0 (0%) | 0 (0%) |
don't know | 3 (3.6%) | 2 (4.7%) | 0 (0%) |
drinks less | 43 (52%) | 18 (42%) | 10 (21%) |
drinks more | 14 (17%) | 6 (14%) | 10 (21%) |
no change | 22 (27%) | 17 (40%) | 27 (57%) |
Unknown | 2,067 | 1,001 | 1,078 |
safety_change | |||
declined to say | 12 (1.4%) | 6 (1.5%) | 5 (1.1%) |
don't know | 56 (6.7%) | 25 (6.3%) | 35 (8.0%) |
less safe | 185 (22%) | 96 (24%) | 79 (18%) |
more safe | 227 (27%) | 88 (22%) | 58 (13%) |
no change | 351 (42%) | 184 (46%) | 258 (59%) |
Unknown | 1,319 | 645 | 690 |
contraceptive_change | |||
continued not using | 831 (39%) | 360 (34%) | 453 (40%) |
continued same method | 538 (25%) | 336 (32%) | 392 (35%) |
started using a method | 222 (10%) | 98 (9.4%) | 85 (7.6%) |
stopped contraceptives | 367 (17%) | 167 (16%) | 128 (11%) |
switched methods | 192 (8.9%) | 83 (8.0%) | 67 (6.0%) |
pregnancy_desire_change | |||
declined say | 8 (2.2%) | 8 (4.1%) | 4 (2.9%) |
no longer want a child | 56 (16%) | 36 (18%) | 20 (14%) |
now undecided | 139 (39%) | 61 (31%) | 61 (44%) |
now want a child | 67 (19%) | 27 (14%) | 17 (12%) |
want a child sooner | 19 (5.3%) | 13 (6.6%) | 3 (2.2%) |
want to delay having child | 71 (20%) | 51 (26%) | 34 (24%) |
Unknown | 1,790 | 848 | 986 |
1
n (%)
|
What are your findings about the summary? Are they what you expected? There is a lot if missing data but overall, pregnancy desires did change and more so in low income countries and among younger people where there were high responses of “now undecided” and “wants to delay pregnancy.” This is expected given the disruption that the pandemic has had on peoples health, income, stress levels etc. Also younger people have more time to consider pregnancy intentions than older people. Also older people are more likely to have children already so their desires could be more stable.
For the entire sample, contracpetive use changes have occured the start of the pandemic some; however the largest responses were ‘continued not using’ (38.06%) and continued the same method (29.31 %). I expected this may be the case because of access to service via telehealth in higher income countires. When looking by country income level there are more changes in use when compared to high and middle income but more statistical tests are need to determine if these differences are significant.
Most people report that they think their partners drink less (41.04%) or no change (38.15) although there were only n=173 respondents to this question. Of the 1665 people that responded to the question about their percieved safety since the start of the pandemic, most people also percieved no change (47.63%) compare to less safe (21.62%) and more safe (22.4%). There are lots of potential confounders that are not controlled for.
Make at least two plots that help you answer your question on the transformed or summarized data. Use scales and/or labels to make each plot informative.
My first plot looks at changes in pregnancy desires since the start of the pandemic. Since pregnancy desire could be a fuction of age, I also want to look at this by age category so I will us a stacked bar graph. I removed all missing data.
# pregnancy desires by age categories
plot_preg_age <- IHME_data_clean %>%
drop_na(pregnancy_desire_change) %>% #dropping missing values
ggplot() +
aes(x = age_category, fill = pregnancy_desire_change) +
geom_bar(position = "fill") +
labs(title = "Have pregnancy desires changed since the start of the pandemic by age categories?",
x = "Age Categories",
y = "Proportion") +
scale_fill_manual(values = wes_palette("IsleofDogs1"))
plot_preg_age
I also want to look at change of contraceptive use and see if there are any differences by country-level income which I hypothesized cound influence access
#Contraceptive change, with facet wrap country income
plot_cont <- ggplot(IHME_data_clean) +
aes(x = contraceptive_change, fill = contraceptive_change) +
geom_bar() +
facet_wrap(vars(country_income)) +
labs(title = "Has contraceptive use changed since the start of the pandemic by country-level income?",
x = "Contraceptive Use Change",
y = "Count") +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_fill_manual(values = wes_palette("IsleofDogs1"))
plot_cont
Finally, I want to look at perceived safety and perceptions of alcohol use changes, over all. And will use bar graphs
#Safety change plot
plot_safety <- IHME_data_clean %>%
drop_na(safety_change) %>% #dropping missing values
ggplot() +
aes(x = safety_change, fill = safety_change) +
geom_bar() +
labs(title = "Perceived safety change, n=1,665",
x = "Perceived safety change",
y = "Count") +
scale_fill_manual(values = wes_palette("IsleofDogs1"))
plot_safety
plot_alcohol <- IHME_data_clean %>%
drop_na(partner_alcohol_change) %>% #dropping missing values
ggplot() +
aes(x = partner_alcohol_change, fill = partner_alcohol_change) +
geom_bar() +
labs(title = "Perceived partner alcohol use change, n=173",
x = "Perceived partner alcohol use change",
y = "Count") +
scale_fill_manual(values = wes_palette("IsleofDogs1"))
plot_alcohol
Summarize your research question and findings below.
Pregnancy desires did change since the begining of the pandemic and contraceptive use changes for some but was stable for most. For the entire sample, the largest groups were continued not using (38.06%) and continued the same method (29.31 %). This is compare the stopped using (15.33%) and started using (9.38%). When I look at contraceptive changes by country level income, more people in low income countries continuted to not use (19.2%) or continuted the same method (12.5%) than people in middle income (continued not using 8.3%; continued same 7.8%) and high income countries (continued not using 10.5%; continued same 9.1%). I expected more disruption of contraceptive access in low income countries than higher income, however, access is a challenge before the pandemic so it could be that access was not impacted specifically by the pandemic. Also, this is a small dataset and these data may not representative.
Pregnancy desires overall did change since the start of the pandemic. In the full sample, the largest groups were “now undecided” (37.55%) and “wants to delay having a child” (22.45%). These shifts (undecided and delay pregnancy) were greater in lower income countries than middle and high income country and also higher in younger vs older respondents. This is consistent with what I expected. However, there is a lot of missing data which is a major limitation of this analysis. It would be useful to consider how much contraceptive use and pregnancy desires shift absent a pandemic.
Are your findings what you expected? Why or Why not?
As expected I did find some changes in pregancy desires. I also saw some shifts in contraceptive use but they were mostly stable. I believe there are several reasons why it’s a bit of a messy picture. Desires and contraceptive use likely shift anyway and there are many confounding factors (age, income, education, etc); I did not attept to control for any of that. Also, there was a lot of missing data and this survey was not designed to be a representative sample.