Impact of community masking on COVID-19: A cluster-randomized trial in Bangladesh

Persuading people to mask Even in places where it is obligatory, people tend to optimistically overstate their compliance for mask wearing. How then can we persuade more of the population at large to act for the greater good? Abaluck et al. undertook a large, cluster-randomized trial in Bangladesh involving hundreds of thousands of people (although mostly men) over a 2-month period. Colored masks of various construction were handed out free of charge, accompanied by a range of mask-wearing promotional activities inspired by marketing research. Using a grassroots network of volunteers to help conduct the study and gather data, the authors discovered that mask wearing averaged 13.3% in villages where no interventions took place but increased to 42.3% in villages where in-person interventions were introduced. Villages where in-person reinforcement of mask wearing occurred also showed a reduction in reporting COVID-like illness, particularly in high-risk individuals. —CA


A List of Supplementary Materials Appendix Figures and Tables
. What do you think was the increase in mask-wearing as a result of household mask distribution and mask promotion in the community? Table S33. What do you think was the additional effect of mask promoters reminding people to wear masks? Table S34. Do you think text message reminders to wear masks further increased mask-wearing? Table S35. How do you think mask distribution and promotion affected physical distancing? Table S36. Do you think incentive payments to village leaders further increased mask-wearing? Table S37. Do you think verbal commitments and signage to wearing masks further increased mask-wearing? Figure S1: Schematic of Cross-Randomizations Notes: Each color represents a village-level or household-level randomization, with inner circles representing village-level randomizations and outer circles representing household level randomization. Different tones of the same hue represent the control or treatment status for each randomization. The fraction of villages receiving each randomization is proportional to the area of each concentric circle. The "Colors" box in the bottom left shows the color of masks used to denote households that were assigned to the control or treatment status of the household-level randomization.  Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. The baseline rate of mask-wearing was measured through observation over a 1 week period, defined as the rate of those observed who wear a mask or face covering that covers the nose and mouth. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. §We report the mean rate of proper mask-wearing among the control villages during the baseline observation. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. 'No Active Promotion" refers to any time that surveillance was conducted while promotion was not actively occurring (regardless of the week of the intervention). This excludes surveillance during the Friday Jumma Prayers in the mosque, when promoters were present and actively encouraged mask wearing. "Other Locations" include tea stalls, at the entrance of the restaurant as patrons enter, and the main road to enter the village. "Surgical Villages" refer to all treatment villages which received surgical masks as part of the intervention, and their control pairs. "Cloth Villages" refer to all treatment villages which received cloth masks as part of the intervention, and their control pairs. These samples include surveillance from all available locations, equivalent to the to the column labeled "Full", but run separately for each subgroup. Of the 572 villages included in the analyses sample, we exclude an additional village and its pair in the mosque and market sub-samples, and two villages and their pairs in the other location sub-sample because we did not observe them in the baseline period prior to the intervention. There are 190 treatment villages which received surgical masks as part of the intervention and 96 treatment villages which received cloth masks. 9 Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline control" include controls for the number of people observed in the baseline visit. §We report the average number of people observed among the control villages during the baseline observation. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. "No Active Promotion" refers to any time that surveillance was conducted while promotion was not actively occurring (regardless of the week of the intervention). This excludes surveillance during the Friday Jumma Prayers in the mosque, when promoters were present and actively encouraged mask wearing. "Other Locations" include tea stalls, at the entrance of the restaurant as patrons enter, and the main road to enter the village. "Surgical Villages" refer to all treatment villages which received surgical masks as part of the intervention, and their control pairs. "Cloth Villages" refer to all treatment villages which received cloth masks as part of the intervention, and their control pairs. These samples include surveillance from all available locations, equivalent to the to the column labeled "Full", but run separately for each subgroup. Of the 572 villages included in the analyses sample, we exclude an additional village and its pair in the mosque and market sub-samples, and two villages and their pairs in the other location sub-sample because we did not observe them in the baseline period prior to the intervention. There are 190 treatment villages which received surgical masks as part of the intervention and 96 treatment villages which received cloth masks. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. newline §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls.
The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood.  Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood. Standard errors are in parentheses. Confidence intervals are in brackets, computed using wild bootstrap. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. The regressions "with baseline controls" include controls for baseline rates of mask-wearing. The first column reports the results of our main intervention; equivalent to the results in Table 1, using full surveillance data. §We report the mean rate of mask-wearing among the control villages during the baseline observation. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The baseline control regressions include controls for baseline rates of mask-wearing and baseline symptom rates. For the gender subgroup analyses, the baseline symptom rate and baseline mask-wearing rate was defined across all individuals, not just those among females and males, respectively. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of proper mask-wearing among the control villages during the baseline observation. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls.

14
The sex-specific subgroup is run on all locations except mosques because no females were observed at mosques. The sex-specific samples excludes 6 villages because of lack of data. The above-median and below-median samples includes 85 singleton observations which were dropped. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for. Confidence Intervals are in brackets. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for.

B Sample Size
To determine the necessary sample size for a cluster randomized trial, we used equation 5 for binary outcomes from Rutterford, et al (79). We rearranged the equation to solve for delta, the clinically relevant difference in reduction of symptomatic seropositivity. This allowed us to determine the optimum power we could achieve within budget and logistical constraints. By enrolling 600 villages with an anticipated number of 250 households per village and two eligible persons per household, we estimated our per-arm sample size would be 150,000 adults. We assumed P1=P2 and conservatively estimated the proportion of seropositivity at endline to be 9% with 4% attributed to the study period. We estimated an intercluster correlation coefficient of 0.02. This gave us a delta of 8.29E-03. Dividing by P gave us a minimum detectable effect of 9.2%. To determine the number of blood tests needed, we estimated that 12% of people enrolled would develop COVID-like symptoms over the study period and that one-third these individuals would have a SARS-CoV-2 infection. This gave us a target of 36,000 blood tests.

C Pairwise Randomization Procedure
To develop the sample frame, Innovations for Poverty Action (IPA) Bangladesh selected 1,000 rural and peri-urban unions out of 4,500 unions in Bangladesh. We excluded Dhaka district, because of high initial seroprevalence, and three hill districts, because of the logistical difficulties in accessing the region. We also dropped remote coastal districts where population density is low. The final sampling frame of 1000 unions were located in 40 different districts (zillas) (out of 64) and 144 sub-districts (upazilas) (out of 485).
We used a pairwise randomization to select 300 intervention and 300 control unions within the same sub-districts. This randomization procedure was designed to pair unions that were similar in terms of (limited) COVID-19 case data, population size, and population density. Each union consists of roughly 80,000 people, or around 80 villages. Surveyors blind to treatment assignment followed a scoping protocol (Appendix E) to identify the union's largest market and co-located village. Field staff sought consent for a baseline survey in every household in every selected village; in intervention villages every adult in consenting households was given a mask. Some unions are very small so to avoid spillover effects, so we selected only on village per union and we ensured that selected villages were at least 2 km apart. Treatment and control unions were scattered throughout the country ( Figure 1).
Villages were assigned to strata as follows: 1. We began with 1,000 villages in 1,000 separate unions to ensure sufficient geographic distance to prevent spillovers (Bangladesh is divided into 4,562 unions).
2. We collected these unions into "units", defined as the intersection of upazila x (above/below) median population x case trajectory, where above/below median population was a 0-1 indicator for whether the union had above-median population for that upazila and case trajectory takes the values -1, 0, 1 depending on whether the cases per 1,000 are decreasing, flat or increasing. We assessed cases per person using data provided to us from the Bangladeshi government for the periods June 27th-July 10th and July 11th-July 24th, 2020.
3. If a unit contained an odd number of unions, we randomly dropped one union. 4. We then sorted unions by "cases per person" based on data from July 11-24, 2020 and created pairs using adjacent unions in this sort order. We randomly kept 300 such pairs. 5. We randomly assigned one union in each pair to be the intervention union. 6. We then tested for balance with respect to cases, cases per population, and density.
7. Finally, we repeated this entire procedure 50 times, selecting the seed that minimized the maximum of the absolute value of the t-stat of the balance tests with respect to case trajectory and cases per person.

D Cross-Randomization Procedure
Villages were assigned to village-level cross-randomizations as follows: 1. We began with the 300 union-pairs (600 villages total) identified in the pairwise randomization procedure, and limited to only the villages in the intervention group.
2. Using a random number generator, we ordered the villages, and assigned the first 1/3 of the intervention villages to be distributed cloth masks and 2/3 to be distributed surgical masks.
3. Within the mask-type randomization, we randomly reordered the unions, then assigned the first 1/2 of villages to hang signage on their door as a visual commitment to mask-wearing, and 1/2 of villages to not have signage on their door. 4. Within the previous two randomizations, we randomly assigned 1/4 of villages to receive no incentive, 1/4 to receive a monetary award, and 1/2 to receive a certificate incentive. If there was an odd-number of villages within this randomization, then we broke the difference by rounding the number of villages in the randomization to the nearest whole number. 5. In villages without signage, we randomly ordered the villages and assigned the first 2/3 to receive texts encouraging mask-wearing, and the remaining 1/3 receive no such messages. If the number of villages was not divisible by thirds, then we broke the difference by rounding the number of villages to the nearest whole number.
Unions were assigned to household-level cross-randomizations using the following procedure.
Note that each village was assigned to one and only one household-level randomization.
1. In villages with the signage randomization, we assigned 2/3 of villages to receive messages emphasizing the self-protection benefits of masks, and the remaining 1/3 to receive altruistic messages about the benefits of mask-wearing in addition to the self-protection messages. If the number of villages was not divisible by thirds, we broke the difference by rounding to the nearest whole number.
2. In villages without the signage randomization, we assigned 2/3 of villages to receive messages emphasizing the self-protection benefits of masks, and the remaining 1/3 to receive messages emphasizing the altruistic reasons to wear masks in addition to the self-protection messages.
3. In the villages without the signage randomization and no household-level altruism randomization, we asked some households to make a verbal commitment to be a mask-wearing household while the remaining were not asked to make a commitment. 4. In villages with the signage randomization and no household-level altruism randomization (and by definition, no village-level text message randomization), we assigned 1/4 of villages to receive no household-level text-message randomization, 1/2 of villages to have 50% of their households receive text-message reminders, and the remaining 1/4 of villages to have 100% of their households receive texts.

E Scoping and Recruitment
All households in selected villages were eligible for participation in the study. At each household, field staff sought consent to participate in respiratory symptom surveys from the adult who answered the door. The scoping staff that mapped enrolled villages were blind to study arm assignment. However, the implementation staff that consented households was not blind to study arm assignment. 93.3% of households consented to participate in the study and completed a baseline symptom survey. Of the households that were surveyed in the baseline household visit, 83.2% of households provided a response to the week 5 symptom survey. 94.4% of households provided a response to the week 9 symptom survey. 98.1% of households provided a response to the week 5 or week 9 symptom survey. There were no statistically significant differences between response rates in the treatment and control groups.
Individuals who reported symptoms any time during the 8-week study period were sought out for collection of a blood sample; blood sample collection was conducted only after additional informed, written consent was provided. 39.7% of symptomatic participants agreed to blood collection. Blood consent rates are not significantly different in the treatment and control group and are comparable across all demographic groups, we cannot rule out that the composition of consenters differed between the treatment and control groups. If we assume that consenters and non-consenters have similar seroprevalence rates, then we would expect true symptomatic seroprevalence to be perhaps 2.5 times than the rates we report.
Surveillance staff were instructed to record details about the mask-wearing behavior of every person they saw while stationed at public places throughout the community: in mosques, at (predominantly open-air) markets, at outdoor tea stalls, on the main road, and outside restaurants. In other words, they conducted a census of all individuals within their field of view during surveillance activities. Surveillance staff were provided an example schedule for surveillance that suggested visiting 9 locations over the course of the day, spending one hour at each location. This included 2 hour outside of a restaurant or at a tea stall, 2 hours on the main road near the entrance to the village and transportation stations, 3 hours at a mosque, and 1 hour at a market. However, staff were free to vary the timing and location of their surveillance activities to maximum surveillance in crowded locations or locations with relatively higher numbers of people.
This observed sample is representative of the rural Bangladeshi population that is present in crowded public places during the day; this population is largely men, who have more social contacts outside the home than women. This is reflected in our surveillance in at mosques, markets, tea stall, restaurants, and on the main road, in which men constituted 88.2% of all observed adults in these areas. (Men constituted 100% of all observed adults at mosques and 87-89% of all observed adults in each of the other locations.) There was no difference in the number of people observed in public areas between treatment and control groups. The distinct appearance of project-associated masks and elevated maskwearing in treatment villages made it impossible to blind surveillance staff to study arm assignment. However, study staff were not informed about the exact purpose of the study.

F Details on Mask Materials and Design
In focus groups conducted prior to the study, participants said they preferred cloth over surgical masks because they perceived surgical masks to be single-use only and cloth masks to be more durable. Focus group participants also provided feedback on different cloth masks designs and sizes. Both types of masks were manufactured in Bangladesh. The cloth masks were produced by Bangladeshi garment factories within 6 weeks after ordering.
The cloth mask had an exterior layer of 100% non-woven polypropylene (70 grams/square meter [gsm]), two interior layers of 60% cotton / 40% polyester interlocking knit (190 gsm), an elastic loop that goes around the head above and below the ears, and a nose bridge. The surgical mask had three layers of 100% non-woven polypropylene, elastic ear loops, and a nose bridge.
The filtration efficiency was 37% (standard deviation [SD] = 6%) for the cloth masks, and 95% (SD = 1%) for the surgical masks. The filtration efficiency test was conducted using a Fluke 985 particle counter that has a volumetric sampling rate of 2.83 liters per minute. The measurement was taken of particles 0.3-0.5 µm in diameter flowing through the material with a face velocity of 8.5 cm/s. In our internal testing, we found that cloth masks with an external layer made of Pellon 931 polyester fusible interface ironed onto interlocking knit with a middle layer of interlocking knit could achieve a 60% filtration efficiency. Upon discussions with the manufacturers, we learned that those materials could not be procured. Using materials that were available, the highest filtration efficiency possible was 37%.

G Details on Surveillance
The mask distribution and promotion was conducted by the Bangladeshi NGO GreenVoice, a grassroots organization with a network of volunteers across the country. Household surveys and surveillance were performed independently by Innovations for Poverty Action (IPA). The same staff member conducted surveillance at paired intervention and control villages at baseline and then once per week on weeks 1, 2, 4, 6, 8, and 10 after the intervention. The 10-week observation was conducted two weeks after all intervention activities had ceased. We also collected longerterm data on mask-wearing behavior 20-27 weeks after the launch of interventions. Each village was observed on two alternating days of the week. Across all villages, observations took place on all seven days of the week, with observation in 150 villages occurring on Friday to over-sample days when mosques were most crowded. Observations generally took place from 9 am to 7 pm.
In 10 unions we conducted audits to assess the validity of surveillance data by pairing one monitoring officer with surveillance staff; in all cases the difference in their results was <10%, our pre-determined threshold.
Surveillance staff observed a single individual and recorded that person as practicing physical distancing if s/he was at least one arm's length away from all other people. This is consistent with the WHO guideline that defines physical distancing as one meter of separation https://www. who.int/westernpacific/emergencies/covid-19/information/physical-distancing. Accessed January, 30 2021. Note that compliance with WHO guidelines does not require physical distancing; for example, members of the same household need not remain physically distant (and presumably would not change their distancing behavior as a result of our intervention).
After 5 weeks of surveillance in wave 1, it was clarified that surveillance staff should only record mask-wearing behavior of people who appear to be 18 years or older. Prior to this, some surveyors included children (especially older children) in their counts. Since the same staff member conducted surveillance in paired intervention and control villages, this change affected the treatment and control groups equally.

H Antibody Testing
Serum samples were diluted 1:100 with sample dilution buffer. 50 microliters of diluted specimens were added to the SCoV-2 antigen-coated microtiter strip plates. After one hour of incubation at 37°C, the plate was washed six times with wash buffer, and conjugate solution was added to each well. The plate was incubated for another 30 minutes at 37°C and washed six times with wash 27 buffer. 75 microliters of liquid TMB substrate were added to all wells followed by 20 minutes of incubation in the dark at room temperature before the reaction was stopped. The absorbance was read on a microplate reader at 450nm (GloMax® Microplate Reader, Promega Corporation, Madison, WI). After calibration according to positive, negative, and cut-off controls, the immunological status ratio (ISR) was calculated as the ratio of optical density divided by the cut-off value.
Samples were considered positive if the ISR value was determined to be at least 1.1. Samples with an ISR value 0.9 or below were considered negative. Samples with equivocal ISR values were retested in duplicate, and resulting ISR values were averaged. Individuals were coded as symptomatic seropositive if they reported symptoms consistent with the WHO COVID-19 case definition, their blood was collected, and the antibody test was positive.
I Impact of Masks on Symptoms, Seroprevalence, and Seroconversions Our primary outcome measures symptomatic seroprevalence: this is the fraction of individuals who are symptomatic during our intervention period and seropositive at endline. Some of these individuals may have antibodies from infections occurring prior to our intervention. If so, the impact of our intervention on symptomatic seroprevalence may understate the impact on symptomatic seroconversions occurring during our intervention (i.e. the fraction of symptomatic infections prevented by masks). In this section, we discuss the relationship between these two quantities.
Let SC, the symptomatic seroconversion rate, denote the probability that an individual is SARS-CoV-2 antibody-positive during our intervention and symptomatic. Then the symptomatic seroprevalence is SS = SC + P prior , where P prior denotes the probability that an individual was infected prior to our intervention and is symptomatic during our intervention for some non-COVID reason.
The change in seroconversions between the treatment and control group is given by ∆SC = SC(1) − SC(0) where the notation SC(T i ) denotes the potential outcome of seroconversions as a function of treatment status. Our goal is to estimate ∆SC/SC(0), the percentage change in sero-conversions as a result of our intervention.
More generally, if the intervention both alleviates symptoms and reduces infections, then the relative impact on symptomatic seroconversions and symptomatic seroprevalence will depend on whether masks are more effective at preventing COVID-19 or other respiratory diseases (with a larger proportional reduction in symptomatic seroconversions in the former case). The magnitude of the difference between symptomatic seroconversions and symptomatic seropositives will depend on the fraction of symptomatic seropositives which are pre-existing at baseline.

J Behavioral Mechanisms
Our intervention combines multiple distinct elements: we provide people with free masks; we provide information about why mask-wearing is important; we conduct mask promotion in the form of monitors encouraging people to wear masks and stopping non-mask-wearing individuals 29 on roads and public places to remind them about the importance of masks; we partner with local public officials to encourage mask-wearing at mosques and markets; and in some villages, we provide a variety of reminders and commitment devices as well as incentives for village leaders.
In this section, we attempt to decompose which elements were most critical to increase mask use. We first report results from several cross-randomizations, and then we report non-randomized evidence based on changes over time as our intervention details changed between the rounds of piloting, launch of the full project, and thereafter.

J.1 Village-level Cross-randomizations
Results from the same regression specification as our primary analysis, adding indicators for each village-level cross-randomization are reported in Figure S3 and Table S16. None of the villagelevel cross-randomizations had any statistically significant impact on mask-wearing behavior, beyond our basic intervention package. These null effects are fairly precise (with standard errors ranging from 2.5-3.9 percentage points). Text message reminders, incentives for village-leaders, or explicit commitment signals explain little of the mask increase we document. The figures corresponds to the regressions in S16, upper panel, among the full surveillance data. Villages were assigned to the treatment or control arms of one of the following four village-level randomizations: Texts: 0% or 100% of households in a village receive text reminders on the importance of mask-wearing; Incentives: Villages either received no incentive, a certificate, or a monetary reward for meeting a mask-wearing threshold, Public Signage: All or none of the households in a village are asked to publicly declare they are a mask-wearing households; Mask Type: Villages receive either a cloth or surgical mask. For a more detailed description of the village-level cross randomizations, see Section .

J.2 Household-level Cross-randomizations
We analyzed the effects of household-specific randomized treatments (e.g., verbal commitments or not) by regressing the probability of wearing a mask color corresponding to the treatment on indicators for each household-level randomization, as well as controls for color and surgical masks (recall that the mask-color corresponding to treatment varied across villages).
Results of the household-level cross-randomizations are reported in Figure S4 and Table S17.
The coefficients indicate the impact of each cross-randomization relative to the core intervention (identified since some villages had no household randomization other than mask color). Once again, we saw no significant effects of any of the household-level cross-randomizations: compared to self-protection messaging alone, altruistic messaging had no greater impact on mask-wearing, and twice-weekly text messages and a verbal commitment had no significant effects.
We did see an impact of mask color on mask adoption. In villages where surgical masks were distributed, blue surgical masks were 2.7 percentage points more likely than green surgical masks to be observed. In villages where cloth masks were distributed, purple masks were 2.2 percentage points less likely than red masks to be observed. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline control" include controls for the number of people observed in the baseline visit. Baseline symptom rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset,   Table S17.
Villages were assigned to the treatment or control arms of one of the following four village-level randomizations: Texts: 0%, 50% of 100% of households in a village receive text reminders on the importance of mask-wearing; Messaging: Households receive messaging emphasizing the altruistic or self-protective benefits of mask-wearing; Verbal Commitment: Households were asked to verbally commit to mask-wearing; Mask Colors: Surgical masks distributed to households were blue or green. Cloth masks distributed to households were purple or red. For a more detailed description of the household-level cross-randomizations, see Section . Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. The regression includes a control for the mask type to separate the effect of mask colors. Surgical masks distributed to households were blue or green. Cloth masks distributed to households were purple or red.

J.3 Mask Promotion
As noted above, we ran two pilots prior to launching the full project. Both pilots were conducted in Naogaon and Joypurhat districts, but in different unions. While the unions were not selected at random, there was no systematic difference in the selection process between the two pilots. In both cases, unions were selected based on convenience and proximity to existing Greenvoice personnel.
Both pilots included elements 1, 2, 3, and 5 enumerated in Section : masks were distributed at households, markets, and mosques, and there was role-modeling and advocacy by local leaders, including Imams. The second pilot added to these elements explicit mask promotion: mask promoters patrolled public areas a few times a week and asked those not wearing masks to put on a mask. The full intervention also included mask promotion.
The comparison between the two pilots is thus instructive about the impact of active mask promotion. This comparison is shown in Table S10. The difference is striking. The first pilot

K Statistical Analysis
This section describes details of our statistical analyses.

Mask-Wearing
We created a data set with an observation for each village j. We defined proper mask use as anyone wearing either a project mask or an alternative face-covering that covered their mouth and nose. We considered two definitions of the proportion of observed individuals wearing masks (p j ). In our primary specification, we defined p j using all observed adults. In a secondary specification, we considered adults observed only in locations where we there was not simultaneous mask distribution. The purpose of this second specification was to investigate separately whether the intervention increased mask-wearing in places where we did not have promoters on site.
Our goal was to estimate the impact of the intervention on the probability of mask-wearing, de- where T j is an indicator for whether a village was treated and x j is a vector of the village-level covariates, including the prevalence of baseline mask-wearing in each village (constructed analogously to p j ), baseline respiratory symptom rates, and indicators for each pair of villages from our pairwise stratification method.
We estimated this equation at the village-level with an ordinary least squares regression, using analytic weights proportional to the number of observed individuals (the denominator of p j ) and heteroskedastic-robust standard errors. In this specification, the dependent variable is p j , the independent variable of interest was T j , and controls were included for the x j covariates.
Physical Distancing Using analogous methods, we estimated the impact of the intervention on the probability that wearing a mask influenced physical distancing (being within one arm's length of any other person at the time of observation).

K.1 Estimating Effects of Village-level Cross-randomizations
We analyze all four village level cross-randomizations jointly via a linear regression: where D k = 1 if the village has been assigned to the intervention group of the village-level crossrandomization denoted by letter k, and 0 otherwise. This specification is otherwise identical to our estimating equation for the impact of intervention on mask-wearing, with the addition of the D k terms.

K.2 Estimating Effects of Household-level Cross-randomizations
To evaluate the effect of household-level cross-randomizations, we constructed a regression with an observation for each village where we ask whether masks of the color representing the treatment were more commonplace than masks of the color representing the control. In each village, we computed ∆ j , the difference in the fraction of individuals wearing treatment mask colors vs. control mask colors. We alternated across villages which color corresponds to intervention, so we can control directly for whether specific colors are more popular (denote these by d jc ; d jc = 1 if treated masks in village j are color c). We index the various household randomizations by m. Our estimate for each household randomization will be α 0m , given by: α 0m tells us how much more likely individuals are to wear masks of the treated color than masks of the control color. surgical j is, as its name implies, a dummy for whether surgical masks were distributed in village j. We estimate this equation at the village-level by ordinary least squares, using analytic weights proportional to the number of observed individuals (the denominator of ∆ j ) and heteroskedasticity-robust standard errors.

L Additional Balance Tests
While our stratification procedure should have achieved balance with respect to variables observed at the time of randomization, given the many possible opportunities for errors in implementation, we nonetheless confirm in this Appendix L that our control and treatment villages resemble each other at baseline with respect to key variables of interest. This assessment was not preregistered.
We find that the control and treatment groups are balanced with respect to our primary outcomes of interest: mask-wearing, symptoms and symptomatic seropositivity. In this Appendix we investigate several other covariates and find a few small imbalances, and conduct robustness checks.
In Table S4 we present balance test results for our mask-wearing specification (at the village level). In our main specification, this is a regression of mask-wearing on a constant, an intervention indicator, and indicators for each control-intervention pair with analytic weights proportional to the number of adults recorded in the baseline household survey as well as heteroskedasticity robust standard errors. For the balance tests, we replace the dependent variable with several variables measured at baseline: symptomatic seroprevalence, WHO-Defined COVID-19 symptoms, and baseline mask-wearing rate. We find that all of these variables appear balanced.
In Table S18, we report results from analogous balance tests based on the specification used for our primary biological outcome. We replace the dependent variable (symptomatic seroprevalence) with baseline covariates of interest to assess balance. We also report a bottom-line F-test which again fails to reject balance. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. The baseline rate of mask-wearing was measured through observation over a 1-week period, defined as the rate of those observed who wear a mask or face covering that covers the nose and mouth. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood.
We also ran balance tests with respect to several other covariates and detected a few balance failures. While small in magnitude, we investigate these further in order to understand the severity of the underlying problem. We believe the imbalances with respect to age and household size likely arose because households in the treatment group were more likely to report teenagers as being over 18 in order to receive additional masks. We believe the imbalance with respect to the number of households likely occurred for a similar reason, with implementers in the treatment group including more "borderline" households as part of the village in order to distribute masks to those households.
To check for these mechanisms, we drop from the sample individuals under 30 and villages with over 350 households -the latter only very coarsely targets "extra" households that lie on the border of villages. After imposing these restrictions, we find in Table S20 that the imbalances with respect to age and household size disappear entirely (this also occurs with the age restriction alone), and the imbalance with respect to household count shrinks by 25% but remains significant.
In Table S21, we repeat our primary specification in this restricted sample with better balance and find that our results are qualitatively unchanged.   Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regression in the top panel includes controls for baseline rates of mask wearing, baseline symptom rates, number of households in a village, and sex. The regression in the bottom panel controls for baseline rates of mask wearing and baseline symptom rates. §We report the mean rate of symptomatic-seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood. The bottom panel runs sample excludes an additional 107,111 individuals up to the age of 30 and 9 villages that have more than 350 households.

M Persistence of Mask-Wearing Behavior
In Table S22, we report estimates of our primary specification separately by week of surveillance.
Week 10 is especially interesting, as it was two weeks after intervention activities ceased. This analysis was not preregistered.
We find no evidence that the impact of the intervention attenuates over the 10 weeks. In the 414 villages for which we have 10 weeks of surveillance, the point estimates are slightly smaller in week 10 (a 23.3 percentage point increase) than week 1 (30.4 percentage points), although this difference is not statistically significant. This is consistent with social norms around mask-wearing taking hold, where adoption by some in the community has a demonstration effect that encourages subsequent adoption by others. If mask-wearing was driven by a "novelty factor" associated with our mask promotion campaign, we would have instead expected some attenuation over the course of the 8 weeks of intervention. The point estimates of the impact of intervention by week for the panel of 414 villages for which we have data in all weeks are plotted in Figure S5.
We additionally conducted a follow-up surveillance 5 months after the start of the intervention  Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair, baseline rates of mask-wearing and baseline symptom rates. Baseline symptom rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset,

N Imputed Symptomatic Seroprevalence
In this section, we analyze the results of our intervention by wave, as well as assessing the sensitivity of our analysis to alternative methods of imputing missing values. These analyses turn out to be related, as there is one wave with imbalanced consent rates, meaning that dropping non-consenters distorts the treatment and control comparison.
In Table S23, we analyze the impact of our intervention on mask-wearing and physical distancing by waves. In all waves, mask-wearing increased by between 24.2 and 35.7 percentage points, and physical distancing increased by between 4.2 and 7.7 percentage points.
When we analyze our second stage results by wave, reported in the top panel of Table S24, we find that most waves have comparable effect sizes with two exceptions: in wave 7 we find an especially large impact of masks on symptomatic seropositivity and in wave 2 we find an opposite signed impact on symptomatic seropositivity.
Probing further, we believe the opposite signed result in wave 2 is due to imbalanced consent rates in that wave. In Table S25, we show consent rates for control and treatment groups by waves.
Since we drop individuals who do not consent in our primary specification, lower consent rates appear as lower rates of symptomatic seropositivity. In Table S1, we found that consent rates were balanced across treatment and control groups, as well as by age and gender. In Table S25, we find that consent rates are generally comparable across waves, although they appear notably lower in the control group of wave 2 relative to the treatment group. Consent rates are somewhat lower in the control group of wave 5 and the treatment group of wave 7, although differences are not as stark.
To check whether our results are driven by differential consent, we consider an alternative method of dealing with missing data. Instead of dropping symptomatic individuals who did not consent to blood collection, we impute for those individuals the mean (conditional) seropositivity observed among all individuals in the data.
These results are shown in the bottom panel of Table S24. Several points are worth noting.
First, the point estimate for the main effect of masks on seropositivity becomes substantially larger.
There is a mechanical effect due to the fact that rates of symptomatic seropositivity in the data now increase by a factor of 2.5 since we previously dropped the 60% of symptomatic individuals who did not consent to blood samples. Scaling our original estimate by this factor would give an effect size of -0.0018. The effect size with seropositivity imputed is slightly larger, at -0.0022, and much more precisely estimated than the main effect in our data. Additionally, with this imputation method, the anomalous Wave 2 result disappears. In Table 4 in the main text, we further disaggregate the results by cloth and surgical masks.
Two points are notable in these results. First, when we drop non-consenters (in the original specification), cloth masks appear to impact symptomatic seropositivity only in the 40-50 age group (and have an insignificant effect pooling all ages). However, if we instead impute seropositivity at the average value for non-consenters, cloth masks appear most effective at older ages, and about as effective as surgical masks.
Tables S28 and S6 report analogous results with respect to gender for symptoms and symptomatic seropositivity. We see a similar pattern to the age results: we see similar effects for both genders for symptoms and symptomatic seroposivity when we impute seropositivity at the average value for non-consenters. If we instead drop non-consenters, the symptomatic seropositivity estimates for men become less precise and are no longer significantly different from zero. Proper Mask-Wearing  Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis in the top panel utilizes the pre-registered sample, equivalent to Table S7; it includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood. The analysis in the bottom panel replicates the regression in the top panel, but imputes the seropositivity of individuals for who we did not draw blood. For symptomatic individuals we did not draw blood from, we simulate their symptomatic-seroprevalence status by using the average rate of conditional seropositivity among all symptomatic individuals. This analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for. "Rate of COVID symptoms" reports the proportion of individuals that report WHO-defined COVID symptoms in the midline or endline surveys. "Blood Draw Consent Rate" reports the proportion of individuals that consented to a blood draw in the endline, conditional on being symptomatic. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions also include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis in the top panel utilizes the pre-registered sample, equivalent to Table 2; it includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood. The analysis in the bottom panel replicates the regression in the top panel, but imputes the seropositivity of individuals for who we did not draw blood. For symptomatic individuals we did not draw blood from, we simulate their symptomatic-seroprevalence status by using the average rate of conditional seropositivity among all symptomatic individuals. This analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for. Confidence Intervals are in brackets. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls. The analysis in the top panel utilizes the pre-registered sample, equivalent to Table S7; it includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood. The analysis in the bottom panel replicates the regression in the top panel, but imputes the seropositivity of individuals for who we did not draw blood. For symptomatic individuals we did not draw blood from, we simulate their symptomatic-seroprevalence status by using the average rate of conditional seropositivity among all symptomatic individuals. This analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for. Confidence Intervals are in brackets. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of mask-wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, (2) all people live or work in an area with high risk of transmission of virus and (3) all people have been a contact of a probable or confirmed case of COVID-19 or are linked to a COVID-19 cluster. §We report the mean rate of symptomatic seroprevalence at endline. This is not equivalent to the coefficient on the constant due to the inclusion of the pair indicators as controls.
The analysis includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for.

O Variation of Effects
In this Appendix, we investigate how WHO-Defined COVID symptoms and symptomatic seropositivity relate cross-sectionally to changes in mask-wearing and changes in physical distancing relative to baseline. This comparison should be interpreted with caution, since the observational variation across villages in mask-wearing and measured physical distancing is not random. For example, within the treatment or control group, some villages might have more mask-wearing precisely because people were observed with COVID-19 symptoms. Were this the case, even if masks reduced COVID-19, we might see a positive relationship between mask-wearing and biological outcomes; a similar bias could be present for physical distancing.
With these caveats in mind, Figure S6 shows the relationship between each biological outcome variable and the changes in mask-wearing and physical distancing graphically. Table S29 shows coefficients from a regression of each outcome on the respective change, controlling for the same covariates as our baseline regression, except for pair fixed effects (omitting these effects is necessary if we want to study cross-sectional variation across villages, rather than only pairwise comparisons). We report these results for each covariate separately, as well as both together (note that the latter specification makes sense as a causal model only if mask-wearing does not directly cause physical distancing).
We find clear evidence of a negative relationship between mask-wearing and both symptoms and symptomatic seropositivity. Once we control for mask-wearing, we see no significant relationship between physical distancing and symptomatic seropositivity. The standard deviation of the change in mask-wearing across villages is also considerably larger than the change in physical distancing, at 0.21 vs. 0.13 respectively, so even were the coefficients the same, the change in mask-wearing in the causal interpretation would account for more of the variation outcomes.  Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include controls for baseline rates of mask-wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset, The analysis in the bottom panel includes all people surveyed in the baseline household visits, excluding individuals that we did not collect midline or endline symptoms for, symptomatic individuals that we did not collect blood from, and individuals that we drew blood from but did not test their blood.

P Additional Preregistered Specifications
In this section, we discuss additional preregistered specifications not reported in the text. For reference, our pre-analysis plan is available at: https://osf.io/vzdh6/. Our initial intention had been to collect blood from a single high-risk individual within each household at endline. When we failed to collect as many baseline bloodspots as hoped, we decided to test all symptomatic individuals at endline rather than a single high-risk individual in each household. The only data observed at the time of this decision was the count of total baseline bloodspots collected.
We had initially planned to do only telephone surveys at weeks 5 and 9. Near the start of our Week 9 activities, we switched to in-person household surveys in Week 9 in order to increase the survey response rate. At the time this decision was made, analysts were still blinded to treatment and control villages. In total, we surveyed 104,063 households in week 5 (all phone surveys), and 118,018 households in week 9. Of these, 102,871 were household surveys.
Our pre-registration document suggests that we can compute the impact of our intervention on seroconversions by comparing our effect size to the difference between endline and baseline seropositives among individuals symptomatic during our intervention. As the analysis in Appendix I makes clear, this is not quite correct. If P prior , the fraction of symptomatic seropositives due to infections prior to baseline, is zero, then the estimated impact on symptomatic seropositives equals the impact on symptomatic seroconversions and no further adjustment is needed. More generally, the impact on symptomatic seropositives incorporates both seroconversions, as well as reductions in symptomatic seroconversions due to non-COVID respiratory diseases. We cannot determine the impact on seroconversions without knowing both P prior (0) and the relative impact of masks on COVID-19 and non-COVID respiratory diseases. If the latter two quantities are equal in proportion, the impact on symptomatic seropositives again equals the impact on symptomatic seroconversions with no further adjustment needed.
Given that we find no evidence of an impact of any of the cross-randomizations, we did not estimate the specification flexibly interacting them.
We did not proceed with the "individual intervention" described in the pre-registration docu-59 ment which was designed to test the protective benefits of masks to the wearer, because we were unable to entice a sufficient number of markets and vendors to participate in that trial and switch mask-wearing behavior.
We did not collect the intended pharmacy data to use as an auxiliary outcome, and we did not collect follow-up hospitalization and mortality data due to the expense of revisiting households.
We also do not yet have data on distance to nearby city or estimated average village-wealth.
In Table S30, we report our pre-specified instrumental variable regressions. If we assume that the entire impact of our intervention is via proper mask-wearing, then we estimate that going from zero percent to one hundred percent of villagers wearing masks would reduce symptomatic seroprevalence by -0.0024, a 32% reduction. Essentially, this specification scales our "intent-totreat" estimates by a factor of 3.33, the reciprocal of the first stage. Standard errors are in parentheses. *** Significant at the 1 percent level. ** Significant at the 5 percent level. * Significant at the 10 percent level. All regressions include an indicator for each control-intervention pair. The regressions "with baseline controls" include controls for baseline rates of proper mask wearing and baseline symptom rates. Baseline Symptom Rate is defined as the rate of surveyed individuals in a village who report symptoms coinciding with the WHO definition of a probable COVID-19 case. We assume that (1) all reported symptoms were acute onset,

Q Intervention Cost and Benefit Estimates
The average person-day of staff time in our intervention cost $20 of wages plus $0.50 of communication costs. All management salaries, benefits, support, internal monitoring, and equipment costs $71,696. We exclude these from the below calculation as they will vary from setting to setting. As reported in the main text, we estimate that we induced 51,660 people to regularly wear masks, or 173 people per intervention village. attributable to COVID-19. We projected the impact of the intervention using surgical masks on deaths over four months following one month of intervention. We calculated the absolute risk reduction as the difference in death rate over the intervening period with and without the surgical mask intervention. We applied a 35% reduction of deaths among those 60 and older and a 23% reduction of deaths among those aged 50-60 based on the study findings and age-adjusted COVID-19 mortality rates for Bangladesh (81). We assumed no change in deaths for those under age 50.
We determined the number needed to treat by taking the inverse of the absolute risk reduction.
As shown in  Many cost elements can be brought down further through "at-scale implementation". This is because some of our information campaigns and promotion activities had to be individualized for the purposes of conducting a trial with a control group, whereas at scale the government could use mass media and social media based dissemination strategies more cost-effectively. Additionally, surgical masks are about 8 times cheaper than cloth masks, and factory production costs can be brought down at scale. We calculate based on our current at scale activities that conducting the intervention for one month for the entire country of Bangladesh would cost $1.50 USD/person.
Following out the effects for four months after one month of intervention, this translates to sub-stantially lower costs per life saved: $10,022-$52,502 (Table S31).
For context, (51) estimate that the value of a statistical life is $205,000 in Bangladesh, implying that our intervention at scale is 4-20 times more cost-effective than what the typical Bangladeshi would be willing to pay to reduce mortality risk, and therefore a "very good buy" for policymakers.
This cost-effectiveness analysis was not pre-specified.

R.1 Polling and Policy-Maker Priors
To assess how our findings compared to the priors of relevant policy makers, we polled participants during presentations to the World Health Organization, the World Bank, and the National Council of Applied Economic Research in Delhi, India. In total, more than 100 audience members with expertise and specific interest in public health and mask-wearing were surveyed and asked to make predictions about the impact of our various interventions on mask-wearing and physical distancing, just before we showed them our empirical results (at the time, our biological outcomes were unavailable).
There are three main takeaways from this polling exercise: first, only a tiny fraction of policymakers correctly predicted the impact of our core intervention on mask-wearing and physical distancing. Second, policy maker predictions varied widely, both for effects of the intervention on mask-wearing and physical distancing. Third, policy-makers systematically underestimated the overall impact of our intervention and especially the impact of in-person reinforcement on maskwearing.
When asked if they thought the intervention would increase mask-wearing by 5, 10, 20, 30, or 40 percentage points, only 21% of respondents correctly predicted that the intervention increased mask-wearing by 30 percentage points (about what we would expect if they guessed randomly).
The expected value of the predicted increase in mask-wearing was 22 percentage points whether we described the intervention with or without mask promotion included. The difference in mask-wearing observed in our two pilot studies suggests that in-person reinforcement increased maskwearing by 18 percentage points. In other words, policy-makers makers believed that in-person reinforcement would have no additional impact, despite our piloting suggesting it is the single most important element of our intervention. With regard to behavioral adjustments, 64% of respondents predicted that physical distancing would either decrease or remain unchanged as a result of the mask-promotion interventions, when in fact, it increased.
Policy-makers consistently believed that our cross-randomizations would increase mask-wearing, when in fact, we find that none of them had a significant effect (often with fairly precise zeros).
68% of respondents believed that text messages would help (they didn't), 62% of respondents believed that incentives for village-leaders would help (they didn't), and 77% of respondents believed that verbal commitments or commitments made using signs on one's door would increase mask-wearing (they didn't). More results from this polling exercise are presented in the tables below. These are polls taken in response to the prompt: "We provided free masks to all households and promoted mask-wearing in mosques and markets with community leaders and imams. What do you think happened to mask-wearing relative to the 13% proper mask usage rate in the control villages without any interventions?" The results were collected from audience participants during live presentations to the World Health Organization (WHO), the National Council of Applied Economic Research (NCAER) in Delhi, and the World Bank. These are polls taken in response to the prompt: " In addition to the mask distribution and promotion activities described previously, we had mask promoters periodically monitor passers-by and remind them to wear masks. What do you think happened to mask-wearing relative to the 13% proper mask usage rate in the control villages without any interventions?" The results were collected from audience participants during live presentations to the World Health Organization (WHO), the National Council of Applied Economic Research (NCAER) in Delhi, and the World Bank.  These are polls taken in response to the prompt: "How did mask distribution and promotion affect individuals' physical distancing?" The results were collected from audience participants during live presentations to the World Health Organization (WHO), the National Council of Applied Economic Research (NCAER) in Delhi, and the World Bank. These are polls taken in response to the prompt: "We promised the village and leaders an incentive payment if we saw increases in mask-wearing. Do you think this increased mask-wearing further?" The results were collected from audience participants during live presentations to the World Health Organization (WHO), the National Council of Applied Economic Research (NCAER) in Delhi, and the World Bank. These are polls taken in response to the prompt: "We had households verbally committing to wear masks and putting up signs to display to others that they were a mask-wearing household. Do you think this increased mask-wearing further?" The results were collected from audience participants during live presentations to the World Health Organization (WHO), the National Council of Applied Economic Research (NCAER) in Delhi, and the World Bank. 68