Who Decides What We Measure in Health Tech?

At present, there are several problems in women’s health that still remain poorly characterized and understudied. In my research on one such issue, premenstrual syndrome (PMS), it is clear that one of the largest challenges is for studies to capture the complexity of women’s and cycling people’s experiences – a challenge which, up until now, science has struggled to resolve. [1]

Scientific Research Treats Women’s Bodies as “Imperfect Men”

The exclusion of women and cycling people from scientific research is nothing new. Justifications for this exclusion typically fall into two familiar narratives, resulting in ignorance around women and cycling people’s health, leading to health-inequity and stigmatization. First, menstruating people have been excluded due to their “complicated” hormonal fluctuations, which result in more “variable” data that are feared to complicate and muddy any findings. Second, the historically patriarchal nature of scientific research predominantly conceptualizes the female body as a vessel for childbirth. After tragedies such as the thalidomide and DES disasters, which led to birth defects in thousands of children and reproductive cancers in mothers, protective regulations focused on women were implemented by the Food and Drug Administration (FDA)[2]. These regulations recommended that premenopausal women be excluded from early phase drug trials, and eventually led to women being excluded from most pharmaceutical research[3]. This attitude also applies in studies on non-human test subjects, which leads to a sampling gap between males and females throughout the research and development process, and denies the cycling body its deserved place in research. Considering menstruating bodies as “overly complicated” echoes long-held Aristotelian ideas that female bodies are simply “imperfect” male ones.

In the United States in 1993, the Institutes of Health Revitalization Act mandated that clinical studies must include women and minorities. Today the gender gap of using women in clinical research trials is closing, however, the speed of change is faster in some fields than in others. For example, women are still underrepresented in heart disease research and cancer fields. The imbalance of sexes in trials privileges men, specifically white men, and it is important to recognize that this dynamic is not only gendered but racialized too. Minority communities often face discrimination in healthcare and treatment, and inclusion of minority communities in research has a history of marginalization and exploitation. These are epitomized by unethical research such as the infamous Tuskegee syphilis experiment, or in the case of Henrietta Lacks’ stolen cell tissue[4] [5].

When women are included, the data are not always analyzed separately. Sex disaggregated data (analyzing male and female data separately) are not always made available in papers, which prevents subsequent analysis from properly accounting for cycling bodies. This has a real, tangible effect on people’s lives. For instance, in 2013, the FDA announced new labels on medication containing zolpidem, used in sleeping pills, after evidence emerged that women remained under the influence of these pills hours after the effects should have worn off, leading to next morning impairment in tasks such as driving. The FDA recommended halving their dose[6].

Exclusion effects are accrued in different intersecting minority groups, or underrepresented groups including women of color, intersex people, and those assigned female at birth. An example of these effects is the huge childbirth complication rate disparity in the US, with African American women being 3.3 times more likely to suffer life-threatening conditions before, during, and after birth, and Native American women 2.5 times more than their White American counterparts[7].

When women are included in research, and even if sex-disaggregated data is available, crucial questions about contraception, menstruation, and cycles are omitted. Even during the current COVID-19 pandemic, questions about menstruation have been absent from most large-scale studies, and measurement of menstrual disturbances have not been measured, despite many reporting them. Even when studies disaggregate based on sex, they seldom speak to whether those involved were using contraception or to the nature of their cycles, making it hard to draw out actionable insights. This lack of research into menstrual impact occurs even though menstrual cycle characteristics are being recognized as vital signs of health, providing indicators of wellbeing[8]. Using menstrual cycle characteristics as a vital sign of health is driven by the link between the immune system and the menstrual cycle. Through the course of the menstrual cycle, naturally cycling people exhibit variation in immune function, which makes it crucial that not only are women and cycling people included in the studies but that specific questions about their cycles are also included. There is clearly a representation problem in medical trials, but what about conditions that are unique to those who menstruate?

The Missing Dimension: Self-Definition

One such under-researched issue is premenstrual syndrome (PMS). Worldwide, almost half of fertile women and cycling people suffer from PMS, a chronic condition characterized by a huge variety of physical and emotional symptoms before a menstrual bleed. In the UK, it is estimated as many as 30% of women suffer from PMS[9]. While the experience of the premenstrual phase for many is thought of negatively, some report positive experiences such as heightened creativity or increased libido[10].

As PMS lacks a biological marker to diagnose it (there are no lateral flow tests for PMS!), people are diagnosed using measurement tools, such as questionnaires, given by GPs. The tools most often used to diagnose PMS focus on negative experiences, omitting positive experiences. In the US, diagnostic criteria for PMS were standardized by the National Institute of Mental Health in 1983, which included daily tracking of symptoms for two months. Today, there are 60 instruments (questionnaires) to collect data about women’s menstrual experiences. Aside from Chrisler’s Menstrual Joy Questionnaire [MJQ], none include positive symptoms, nor do any measure positive and negative together.

These tools are designed by researchers and a small pool of volunteers, who read through the questionnaires before they are piloted by wider audiences. But if the categories and tick boxes are pre-defined, how can new or unknown symptoms emerge? Measurement tools have typically consisted of Likert-style questions, whose results are scored to give a diagnosis[11]. This allows for fast diagnosis and easy comparison between survey responses over time. However, by not providing space for women and cycling people to talk about their unique experiences, these tools lack granularity, deny people the opportunity for less common experiences to be noted, and assume the most common symptoms to be the ones in the surveys. On the other hand, interviewing millions of women for individual symptoms just isn’t feasible. We are constantly trading off accurate measurement of individual experiences for big data which can easily be analyzed.

Extracting Knowledge from Experience

We live in an age where data, and a large quantity of it, is important. It is being generated constantly at a faster and faster pace. We are often used to dealing with numeric data that we can add up and score, but this omits the nuances of individual experiences. One way to capture a person’s unique experience is to provide them with a space to explain their experiences in their own words, alongside the typical symptom scoring. One option to do this is to include an unstructured text box in surveys for respondents to use. In my Ph.D., I use text analysis, mainly the tidytext package in R, to analyze and visualize topics coming from free-text survey responses on menstruation[12]. Text mining is often defined as the “extraction of useful knowledge from textual data.” It involves an automated process of cleaning and processing text to provide thematic analysis. It can also create polarity scores through sentiment analysis, in which one mines a body of text to understand the main opinion or sentiment it holds. These scores depict responses as overall “negative” or “positive,” the higher the score the more extreme polarity (e.g. very positive). A typical workflow in data analysis involves taking the text, cleaning it by removing stop words (words like “and”, “the”), applying sentiment analysis to score the negative or positive text, or creating sentiment themes, then visualizing the results. Using these techniques, not only can we look at the most common words (Figure 1), themes that emerge in the text (Figure 2) but also score sentences on their overall polarity. Also, it is possible to find the most frequently used “rare” words (words adjusted for how frequently they’re used).

Figure 1. Word cloud visualization of the most common words in people’s experiences of PMS (Created by Author)

Figure 2. Sentiment analysis of people’s experiences of PMS (Created by Author)

Text mining has long been used in advertising, for example, to understand how consumers talk about certain products. However, text analysis could prove a useful way to allow women to express their unique symptoms and experiences while saving analysis time. A major caveat though, is that these methods require annotated texts for comparison. Sentence level polarity assigns positive, negative, and neutral points to sentences by comparing the text to a predefined list of scored words. Sentences are scores by adding the scores of each word together. In addition, the points given depend on the number of negators, shifters, and intensifiers; words that alter the polarity of the sentence, such as “don’t'” or “barely”. However, these lists of predefined scored words must be trained which is a major challenge, because in these types of analysis, there is a lack of available datasets that can be used to train models. For example, a standard list might include “tender” as a positive word, scoring it as such. A female-specific one, however, might not score “tender” as a positive word, as when relating to female health, “tender” is mainly used negatively to report “tender breasts”. What is desperately missing is a word list of scored words built using women’s experiences, for women’s symptoms.

Over the centuries, women and cycling people have been neglected from research, for their “complicated hormones.” This has resulted in dismissing their symptoms, underdiagnosis, and incorrect treatment, which has sometimes proved fatal. Now that the tide is turning, and research is at last including them, we must ensure that their lived experiences are used to build and continuously develop the tools we use. The male-centered approaches used to date, to design the very boxes and categories used to diagnose women cannot continue to be the standard. Using tools such as text analysis bridges the gap between lived experience and big data collection. We can use text analysis to quickly analyze women’s responses that document their unique experiences, preserving both quantity and quality of the data, or at least differently configuring our relationship to these two categories of data. By collecting large amounts of quantitative data, such as polarity scores with a mixed-methods approach that also examines the qualitative information, we may be able to finally capture and understand the real variation in cycling people’s experiences.

Notes

[1] I strive to inclusive language in this blog where possible, as not all women menstruate, and some people have cycles are not women, so it is important to be specific with language. For this reason, I have chosen to use the terms “women and cycling people” or “menstruating people” to include transgender men and gender non-conforming people. However, when I refer to specific scientific studies that only use the words “women” or “mothers”, I use the same terminology as that used in the papers.

[2] Thalidomide was a drug used in the 1950s to treat nausea during pregnancy, which resulted in severe birth defects in tens of thousands of children; it was formally withdrawn in November 1961. Diethylstilbestrol (DES) was a synthetic hormone given during pregnancy to prevent miscarriages, between the 1940s and 1970s. A relationship between exposure to DES and breast cancer in the mother, increased risk of cervical cancer, as well reproductive issues in daughters exposed in utero was later established. The Center for Disease Control estimates that more than ten million people were exposed.

[3] Lippman, A. The inclusion of women in clinical trials: are we asking the right questions?. Women and Health Protection. 2006.

[4] The Tuskegee Study of 1932 aimed to study the effects of untreated syphilis. The study of 399 African Americans with syphilis continued until 1972 despite the fact treatment of syphilis became available in 1940. The researchers did not collect participants’ informed consent and lied about the treatment protocol. In exchange for their participation in the study, the men received free medical exams, free meals, and burial insurance.

[5] https://www.nature.com/articles/d41586-020-02494-z

[6] https://www.nytimes.com/2013/01/11/health/fda-requires-cuts-to-dosages-of-ambien-and-other-sleep-drugs.html

[7] https://www.cdc.gov/mmwr/volumes/68/wr/mm6818e1.htm?s_cid=mm6818e1_w

[8] ACOG Committee Opinion No. 651: Menstruation in Girls and Adolescents: Using the Menstrual Cycle as a Vital Sign. Obstet Gynecol. 2015 Dec;126(6):e143–e146.

[9] https://www.pms.org.uk/about-pms/

[10] King, M. & Ussher, J.M., 2013, ‘It’s not all bad: Women’s construction and lived experience of positive premenstrual change’, Feminism & Psychology, 23(3), 399–417.

[11] Likert scales are survey scales with a series of responses to choose from, ranging from one extreme (“not at all”) to the other (“all the time”).

[12] Silge and Robinson 2016. tidyr: Easily Tidy Data with ‘Spread()‘ and ‘Gather()‘ Functions. https://CRAN.R-project.org/package=tidyr.

Citations

ACOG Committee Opinion No. 651: Menstruation in Girls and Adolescents: Using the Menstrual Cycle as a Vital Sign. Obstet Gynecol. 2015 Dec;126(6):e143–e146.

King, M. & Ussher, J.M. 2013. “It’s not all bad: Women’s construction and lived experience of positive premenstrual change.” Feminism & Psychology 23(3): 399–417.

Lippman, A. 2006. “The inclusion of women in clinical trials: are we asking the right questions?” Women and Health Protection.

Petersen EE, Davis NL, Goodman D, et al. 2019. Vital Signs: Pregnancy-Related Deaths, United States, 2011–2015, and Strategies for Prevention, 13 States, 2013–2017. MMWR Morb Mortal Wkly Rep 68:423–429.

Tavernise, S. 2013. “F.D.A Requires Cuts to Dosages of Ambien and Other Sleep Drugs”. New York Times.