Involving men and boys in family planning: A systematic review of the effective components and characteristics of complex interventions in low‐ and middle‐income countries

Abstract Background Involving men and boys as both users and supporters of Family Planning (FP) is now considered essential for optimising maternal and child health outcomes. Evidence on how to engage men and boys to meet FP needs is therefore important. Objectives The main objective of this review was to assess the strength of evidence in the area and uncover the effective components and critical process‐ and system‐level characteristics of successful interventions. Search Methods We searched nine electronic databases, seven grey literature databases, organisational websites, and the reference lists of systematic reviews relating to FP. To identify process evaluations and qualitative papers associated with the included experimental studies, we used Connected Papers and hand searches of reference lists. Selection Criteria Experimental and quasi‐experimental studies of behavioural and service‐level interventions involving males aged 10 years or over in low‐ and middle‐income countries to increase uptake of FP methods were included in this review. Data Collection and Analysis Methodology was a causal chain analysis involving the development and testing of a logic model of intervention components based on stakeholder consultation and prior research. Qualitative and quantitative data relating to the evaluation studies and interventions were extracted based on the principles of ‘effectiveness‐plus’ reviews. Quantitative analysis was undertaken using r with robust variance estimation (RVE), meta‐analysis and meta‐regression. Qualitative analysis involved ‘best fit’ framework synthesis. Results We identified 8885 potentially relevant records and included 127 in the review. Fifty‐nine (46%) of these were randomised trials, the remainder were quasi‐experimental studies with a comparison group. Fifty‐four percent of the included studies were assessed as having a high risk of bias. A meta‐analysis of 72 studies (k = 265) showed that the included group of interventions had statistically significantly higher odds of improving contraceptive use when compared to comparison groups (odds ratio = 1.38, confidence interval = 1.21 to 1.57, prediction interval = 0.36 to 5.31, p < 0.0001), but there were substantial variations in the effect sizes of the studies (Q = 40,647, df = 264, p < 0.0001; I 2 = 98%) and 73% was within cluster/study. Multi‐variate meta‐regression revealed several significant intervention delivery characteristics that moderate contraceptive use. These included community‐based educational FP interventions, interventions delivered to women as well as men and interventions delivered by trained facilitators, professionals, or peers in community, home and community, or school settings. None of the eight identified intervention components or 33 combinations of components were significant moderators of effects on contraceptive use. Qualitative analysis highlighted some of the barriers and facilitators of effective models of FP that should be considered in future practice and research. Authors' Conclusions FP interventions that involve men and boys alongside women and girls are effective in improving uptake and use of contraceptives. The evidence suggests that policy should continue to promote the involvement of men and boys in FP in ways that also promote gender equality. Recommendations for research include the need for evaluations during conflict and disease outbreaks, and evaluation of gender transformative interventions which engage men and boys as contraceptive users and supporters in helping to achieve desired family size, fertility promotion, safe conception, as well as promoting equitable family planning decision‐making for women and girls.

comparison groups (odds ratio = 1.38, confidence interval = 1.21 to 1.57, prediction interval = 0.36 to 5.31, p < 0.0001), but there were substantial variations in the effect sizes of the studies (Q = 40,647, df = 264, p < 0.0001; I 2 = 98%) and 73% was within cluster/study. Multi-variate meta-regression revealed several significant intervention delivery characteristics that moderate contraceptive use. These included community-based educational FP interventions, interventions delivered to women as well as men and interventions delivered by trained facilitators, professionals, or peers in community, home and community, or school settings. None of the eight identified intervention components or 33 combinations of components were significant moderators of effects on contraceptive use. Qualitative analysis highlighted some of the barriers and facilitators of effective models of FP that should be considered in future practice and research.
Authors' Conclusions: FP interventions that involve men and boys alongside women and girls are effective in improving uptake and use of contraceptives. The evidence suggests that policy should continue to promote the involvement of men and boys in FP in ways that also promote gender equality. Recommendations for research include the need for evaluations during conflict and disease outbreaks, and evaluation of gender transformative interventions which engage men and boys as contraceptive users and supporters in helping to achieve desired family size, fertility promotion, safe conception, as well as promoting equitable family planning decisionmaking for women and girls. Engaging men and boys in enhancing gender equality for women and girls as part of family planning programming was highlighted as a key strategy, but this remains an under-used strategy.

| What is the review about?
This systematic review of intervention evaluation studies is about how to enhance future programming with men and boys to meet needs for family planning for women and men in low-and middleincome countries (LMICs).
Addressing unmet needs for family planning is a major challenge in LMICs. Addressing male involvement in family planning is also a challenge, as it is in these countries where men's control over family planning decisionmaking for women and girls is known to be greatest.
It is important to involve men and boys in ways that support women's and girls' choices, as well as men's own family planning needs.
We used a novel method called causal chain analysis to focus on the content of interventions that may work better than others. This involved developing a picture of important programming components with stakeholders and testing how these components affect the impact of different interventions on family planning outcomes.

What is the aim of this systematic review?
This review assesses the strength of evidence of involving men and boys as users and supporters of family planning.
The review also aims to uncover the effective components and critical process-and system-level characteristics of successful interventions.

| What studies are included?
We included 127 papers which examined the effectiveness of interventions that included men and/or boys in LMICs as programme participants using experimental or quasi-experimental methods.
We also included 23 qualitative studies and process evaluations which reported why and how some programmes might have been effective.
The studies were conducted worldwide in LMICs, over half in Africa. A third of the studies were conducted on programmes that made a special effort to engage males. Less than a quarter of the studies addressed gender inequality as part of the programme.
1.4 | What are the main findings of this review?
When considered together, the interventions included in this review were effective in increasing contraceptive use. The most effective interventions are community-based educational programmes offered in schools, communities and homes or community facilities, and interventions involving multiple components, delivered by professionals, trained facilitators or peers to both males and females for over seven months.
Brief programmes of less than three months are also effective.
Added to this, related implementation studies identified the importance of promoting gender-equitable attitudes and social norms for women and girls among men and women at the individual, wider family, community, health service and societal level as part of family planning programming.
Some studies also emphasised structural factors such as the importance of widening women's access to education and labour markets.

| What do these findings mean?
A wide range of family planning interventions which involve men and boys in LMICs have shown efficacy in increasing contraceptive use.
The success of family planning programmes that involve men and boys is most often measured by contraceptive use to the relative neglect of other outcomes, such as met need for family planning, equitable family planning decisionmaking, or gender equality. Our analysis indicates some promising intervention characteristics, which are more effective in promoting contraceptive use than other characteristics.
Our qualitative analysis also highlights the under-used strategy of addressing gender equality attitudes and norms, from the individual to the structural level.
The findings of this review will be of interest to programme designers wanting to increase male engagement in family planning in gender-equitable ways. The review can also help in measuring programme efficacy beyond contraceptive use, to also include gender equality and met family planning needs.
1.6 | How up to date is this review?
The review authors searched for experimental evaluations in August 2020 and 'connected' process evaluations and qualitative studies in June 2021.

| The problem
The World Health Organisation estimates that there are approximately 300,000 deaths per year, or 800 every day, among women and girls during childbirth or arising from pregnancy-related complications, including unsafe abortion. Almost all (94%) of these preventable female deaths occur in low-and middle-income countries (LMICs) (World Health Organisation & Press, 2019). The problem is especially acute among adolescent girls. Complications during pregnancy and childbirth are the leading cause of death for 15-19-year-old girls globally, with the vast majority of these occurring in LMICs (World Health Organization, 2020). Unintended and mistimed pregnancies also contribute to the burden of high infant morbidity and mortality (Kozuki et al., 2013;Say et al., 2014;A. Singh et al., 2013). Around 2.7 million new-borns die every year in LMICs and many more suffer from diseases relating to preterm birth, being small for gestational age or malnutrition (Guttmacher, 2017).
The importance of sexual and reproductive health and rights (SRHR) as the bedrock to maternal and child health, economic growth, and the wellbeing of humanity was recognised 25 years ago in the international agreement of the International Conference on Population and Development (Starrs et al., 2018). As part of the contemporary global agenda to attain the sustainable development goals (SDGs), SRHR constitutes two targets (3.7 and 5.6), interlinking the SDGs of health and gender equality (United Nations & UN General Assembly, 2015). Family planning (FP) is a central tenet of SRHR enabling people to avoid unintended pregnancy, attain their desired number of children, and/or determine the spacing of pregnancies. Effective FP is achieved through the use of contraceptive methods, provision of safe abortion, and prevention and treatment of infertility. Worldwide, however, more than 200 million have an unmet need for family planningwanting to avoid pregnancy but not using modern contraception and each year 25 million unsafe abortions take place (Starrs et al., 2018).
Involving men and boys in FP is increasingly recognised as essential to addressing unmet FP needs and in turn transforming maternal and child health outcomes (Croce-Galis et al., 2014;Hardee et al., 2017;Lohan et al., 2022;Phiri et al., 2015a;Sahay et al., 2021), with programmes that adopt a focus on transforming gender inequalities for women and girls showing particular promise (Barker et al., 2007;Phiri et al., 2015b;Ruane-McAteer et al., 2020).
The underpinning rationale for involving men in FP recognises that, in many countries, men are the primary decision-makers on family size and may control or inhibit women's use of FP as well as acknowledging that men themselves may have unmet needs in relation to FP (Nzioka & Press, 2002). In practice, 'involving' men and boys in FP can range from encouraging men to be supporters of autonomous FP decision-making among women and girls, to more inclusive conceptualisations of men and boys as both supporters and users of contraceptive methods, leading change in relation to addressing unmet FP needs in their families and communities as AVENTIN ET AL. | 3 of 47 well as meeting their own reproductive health needs (Hardee et al., 2017;Lohan, 2015;Sahay et al., 2021).
International policy debates on SRHR, and FP specifically, have therefore moved beyond the polemic of whether to involve men and boys towards the important question of how to involve men and boys (Ruane-McAteer et al., 2020). The how question relates to how to involve men and boys in LMICs in ways that challenge patriarchal control over women and girls' use of FP and how to involve men as users and co-users of FP. The question is further to address what characteristics or components of FP interventions allow men to engage with FP alongside women in ways which enhance health and gender equality for all.

| The intervention
The review reported here included behavioural and service-level interventions aiming to improve the uptake of FP and involve men or boys in LMICs as intervention recipients. Eligible interventions included those that aimed to increase the uptake of FP (male and/or female contraception; safe abortion and safe post-abortion care) in order to ensure decreased unmet need for FP; avoidance of unintended or unwanted pregnancies; birth spacing (i.e., choice in relation to time period between pregnancies); and/or birth limiting (i.e., choice in relation to limiting family size). The review focuses on 'complex' interventions. While we recognise that some interventions, such as those with only one component, may be considered 'simple', following UK Medical Research Council guidelines (Craig et al., 2008) we recognise that even interventions with one component may be considered complex when they target a number of different behaviours, a variety of outcomes, or may effect behaviours via a number of different pathways.
While FP methods also include medical, surgical, and behavioural (lifestyle) interventions for addressing infertility, we did not examine these in the current review. The majority of fertility-focused interventions are medical or surgical in nature (Ruane-McAteer et al., 2019), and those that target behavioural determinants are generally focused on lifestyle changes such as reducing smoking and obesity and increasing exercise (Lan et al., 2017). In consultation with our study's https://www.qub.ac.uk/sites/involve-fp/ ExpertAdvisoryGroup/, we agreed that because the theoretical basis, components, and characteristics of such interventions differ greatly from those aiming to prevent unintended pregnancy, they were outside the scope of the current study. While we agreed that should an included study address infertility alongside any of the other FP outcomes it would be eligible for inclusion, no such studies were identified.
Eligible interventions include those delivered in education, health or community settings aiming to increase capability (knowledge, skills), opportunity (access, social support) and motivation (attitudes, norms) to use FP methods via mass, small or social media information, face-to-face communication; health service enhancements; monetary and other incentives; and access to FP methods. The intervention approaches were grouped under the following categories: • Theoretical approach (e.g., behaviour change theory; gender theory); • Approach to intervention design (e.g., co-design or co-production); • Materials & procedures (including approach to engaging men and type of contraceptive method); • Who provides (e.g., health or education professionals, peers, trained facilitators); • Who receives (e.g., adolescents/youth/adults; males only; males and females); • Modes of delivery (e.g., face-to-face, online; individuals/couples/ community); • Delivery setting (e.g., home, community, educational); • Dose and intensity (how much, how often, how long); and • Tailoring, modifications, adherence or fidelity.
Interventions that vary on whether and how they address unequal gender norms in FP were also included. The modification of gender norms can be categorised on a continuum from 'genderunequal/neutral' approaches which reinforce or ignore unequal norms, roles and relations, thereby perpetuating gender-based discrimination; to 'gender-sensitive/specific' approaches, which do consider gender norms, roles and relations and/or men and women's specific needs or roles but do not seek to change gender inequalities; to 'gender transformative' approaches which are inclusive of gender-sensitive and gender-specific strategies, but also challenge gender inequalities by transforming harmful gender norms, roles and relations through programmatic strategies that foster progressive changes in power relationships between women and men (Interagency Gender Working Group, 2017; World Health Organisation, 2011).

| How the interventions might work
This review draws upon a Causal Chain Analysis (CCA) (Kneale et al., 2015, the first step of which is to use a logic model to encapsulate how an intervention might work. The logic model is used to frame data extraction and subsequent analysis of intervention characteristics and outcomes presented (see Section 5.3). This approach addresses a common criticism of systematic reviews and meta-analyses on the need to go beyond effectiveness analyses towards a more nuanced identification of the active ingredients of effective interventions (Pawson et al., 2005), testing of causal pathways, and identification of system-and process-level barriers and facilitators to effective intervention.
The initial review logic model (Supporting Information: Appendix 1.0) was built based on: (a) a consultation with our expert advisory group; (b) a rapid review of programme theories used in FP interventions involving men and boys  and (c) the research team members' own expertise of intervention design and evaluation in SRHR and involvement in prior systematic reviews conducted for the WHO on male engagement interventions in SRHR (Ruane-McAteer et al., 2019. It provides a visual representation of how, and under what circumstances, FP interventions might work to increase uptake of FP, help people attain their desired family size and ultimately result in improvements in SRHR, maternal and child health, gender equality, quality of life and livelihoods for all. Informed by realist interpretations of causality (Pawson et al., 2005), the logic model sets out the multiple possible pathways through which each intervention component, or combination of components, would bring about positive outcomes and change. In essence, we hypothesise that in order to positively impact maternal and child mortality and morbidity indicators, FP interventions involving men and boys first need to effect change in one or more outcomes at proximal (individual), intermediate (interpersonal, community, organisational/service) and distal (structural) levels. As illustrated in the model, changes in these outcomes follows from exposure to an intervention, although different combinations of intervention characteristics are possible and may have differential impact and may also be influenced by the characteristics of the participants and the context in which the intervention takes place.
Each FP intervention will include core components as well as a set of resources and theory underlying its implementation. Further, the logic model recognises that interventions can fail to produce change because of issues relating to design or implementation processes (e.g., the intervention may not be well implemented, implementation may not trigger mechanisms or mechanisms may not generate outcomes) and, therefore, incorporates ways of understanding the success of the implementation. It also recognises that potential negative outcomes are possible for every intervention and incorporates potential indicators of these.

| Why is it important to do this review?
To the best of our knowledge, this is the first systematic review in the field which focuses on understanding the effective characteristics and components of interventions involving men and boys in FP using causal chain analysis. Our review builds upon prior research in the field of male engagement and SRHR, which includes two WHO evidence and gap maps (EGM) two evidence and gap maps (https://srhr.org/masculinities/ rhoutcomes/ https://srhr.org/masculinities/wbincome/) (srhr.org) and a systematic review of reviews of male engagement interventions across all SRHR outcomes (Ruane-McAteer et al., 2019). There are also two previous systematic reviews of male engagement in relation to gendertransformative SRHR interventions (Barker et al., 2007;Ruane-McAteer et al., 2020) focusing on intervention evaluation as well as the characteristics of effective interventions.
Specifically in the field of FP, three previous reviews focus on an analysis of the characteristics and components of FP interventions, including an analysis of male involvement (Lopez et al., 2009;Mwaikambo et al., 2013;Phiri et al., 2015a). A further relevant review specifically on male engagement in FP and examining programme components was published while we were conducting the current systematic review (Sahay et al., 2021).
While our review analysis is based upon quantitative experimental evaluations of interventions, the review also includes an analysis of the available qualitative process evaluations of the interventions under study. The qualitative analysis helped to inform hypotheses of effective characteristics and components as well as our interpretation of review findings. Our review also benefits, as noted above, from consultations held with a multi-disciplinary international advisory group based and/or working in LMICs in relation to SRHR. The findings of this review will be of benefit to programme planners and policy makers in family planning because of the wide policy interest in male engagement and the specific focus of effective programming components of interventions involving men and boys in FP The review will also help to inform the WHO's Research Priority Setting Exercise on Masculinities and SRHR https:// masculinities.srhr.org/.

| OBJECTIVES
The primary aim of this review was to uncover the effective components and characteristics of complex FP interventions involving men and boys in LMICs. In addressing this, we examined the following questions: (1) What is the nature and extent of experimental evidence on engaging men and boys in FP and what gaps in research knowledge exist?
(2) What are the impacts of FP interventions involving men and boys on FP-related outcomes?
(3) What are the effective components of interventions that achieve positive change in intended FP outcomes?
(4) What characteristics and combinations of characteristics are associated with positive FP-related outcomes?
(5) Do outcomes vary by context and participant characteristics? (6) Are there any unintended or adverse outcomes?
(7) What are the system-and process-level barriers to and facilitators of effective models of FP involving men and boys? 4 | METHODS 4.1 | Criteria for considering studies for this review 4.1.1 | Types of study designs As per our protocol , included studies were randomised trials (individual or cluster) and quasi-experimental studies, including quasi-randomised trials (groups allocated using non-random methods) and pre-and post-test studies with a comparison group and, where available, their associated qualitative/mixed methods studies (e.g., formative qualitative research, process evaluations, and qualitative research exploring accounts of how the interventions work). Nonexperimental pre-and post-test studies (i.e., those without a comparison group) were excluded. Mixed methods evaluations were included when the quantitative design satisfied the criteria mentioned above.
Included studies must have reported interventions or programmes implemented in countries categorised as Low Income, Lower-Middle AVENTIN ET AL.

| 5 of 47
Income, or Upper-Middle Income by the World Bank (World Bank, 2019) at the time the search was conducted. Studies that reported on multicountry interventions were eligible if they met the criteria as occurring in at least one LMIC.

| Types of participants
The review focuses on FP interventions delivered in LMICs, which involved men or boys as recipients. Included studies must therefore have involved males of any age, of any sexual orientation and gender identity. While we considered outcomes for both women and men, studies were only included if boys or men received the intervention.
Studies or interventions that including girls or women only were excluded.

| Types of interventions
Included interventions were FP-focused behavioural and servicelevel interventions, directly targeting or involving men or boys in LMICs. The interventions were delivered in health, education, and community settings in LMICs. Comparators included alternative interventions, usual standard care and no intervention.

| Types of outcome measures
The outcomes for this review were selected in a stakeholderinformed logic model development phase. We consulted with FP experts to develop a review logic model (see Aventin et al., 2021) which illustrated relevant proximal and distal outcomes relating to maternal and child health and FP. While we anticipated that some outcomes featured in the review logic model, such as community, organisational and structural level outcomes and distal impacts, may not have been measured in the included studies, we aimed to examine any combination of outcomes provided.
Examples of eligible primary outcomes included: sexual and reproductive health behaviours (e.g., male and female contraceptive uptake and sustained use, reductions in unprotected sex, birth spacing, birth limiting); gender equitable attitudes and behaviours (e.g., changed attitudes and norms, decreased male-dominated FP decision-making); FP service use and engagement (e.g., knowledge and use of FP services, use of safe abortion; support for partner engagement an increased trust in FP services); Fertility (e.g., adolescent/early pregnancy and unintended pregnancy rates). Finally, we included met need for FP as a key rights-based primary outcome.
Examples of eligible secondary outcomes included: psychosocial determinants of FP such as knowledge, attitudes and social norms; factors relating to relationship quality and discordance such as couple communication and intimate partner violence; attitudes towards FP services including more positive attitudes towards help-seeking in relation to FP; and community, organisational and structural level outcomes including gender equitable attitudes and support for FP in wider social contexts.

| Search methods for identification of studies
As we sought to include both quantitative studies and qualitative studies in the review, the search had two phases. The first phase was a comprehensive search for randomised trials and quasi-experimental studies. The second phase was a search for qualitative studies limited to the specific experimental evaluation studies identified in phase one to be included in the causal-chain analysis. We used EndNote x9 software to remove duplicates in the search. We used EPPI Reviewer 4 software for data management, screening, extraction, and appraisal and further identification of duplicates with its more sensitive and configurable duplicate identification tool.

Evaluation studies
The Phase 1 search was conducted using searches of the databases, grey literature sources and other approaches in August 2020 detailed below. The search included any available studies up until the specified dates.

Connected papers
The Phase 2 search was conducted using the Connected Papers resource (Eitan et al., 2021) to identify relevant papers by searching prior and derivative work. This resource generates 'citation maps' from similar or related publications based on co-citation and text similarity assessed by machine learning across Scopus Databases.
A Connected Papers graph was generated for each of the included studies in the review. The titles and abstracts of all linked results provided by the mapping tool were hand searched for relevance.

Evaluation studies
The search was not limited by publication status, date, or language of publication.

Connected papers
To keep the number of studies manageable, previous research by study authors not directly related to the intervention of interest and secondary analyses of data conducted outside the intervention study were not eligible for inclusion. Searches were tested and adjusted as necessary to account for the unique indexing, field codes and truncation for each database.

Interventions
Given the very broad range of potential interventions we did not limit our searches by intervention terms in the initial stages. However, we subsequently developed this search string as follows: (1) Search for the combination of the terms for population AND family planning AND study design AND LMIC in two databases (PsycInfo and Medline).
(2) Scan the first 200 records retrieved in each database to quickly identify studies that appear to meet our eligibility criteria (400 records screened).
(3) We used this selection of studies to develop and test a comprehensive list of intervention terms.
(4) We then screened a further selection of 200 records in each database to identify a new set of potentially eligible studies. This new set was then used to verify that the newly developed string captured the second set of potentially eligible studies and did not exclude any potentially relevant study.
(5) The first set of intervention terms failed to capture one potentially relevant study identified in step 4. The intervention term list was expanded to capture the relevant term (in this case 'training') and the process above was repeated once more. All relevant records were identified in the next round. We were therefore satisfied that adding intervention terms improved search specificity without adversely affecting sensitivity.
We recognise that the strategy combines five search strings, which can result in a less sensitive search. However, given the breadth of the interventions of interest, this was necessary to maximise the specificity of the search and reduce the number of irrelevant records retrieved.

| Data collection and analysis
To ensure the most effective use of finite time and resources, subsets of the data were used for different review questions (see Table 1). While all 127 studies were included, a subset of studies reporting contraceptive use outcomes (72 studies) was used in the meta-analysis, and a further subset of 33 studies which included interventions with a male engagement component (see Table 1 for definition) and reported contraceptive use outcomes, were used to examine impacts on intermediate outcomes. The decision to focus the bulk of the quantitative analyses on studies that reported contraceptive use outcomes was driven by, firstly, contraceptive use being the most reported FP outcome and thus yielded the most data for further analysis. Other outcomes (such as FP service use or birth spacing) were less frequently reported limiting the potential for adequately powered analysis. Secondly, resource limitations prevented dual extraction of all outcome data for all 127 studies.
The decision to focus on the male engagement studies for elements of the CCA was informed by discussions among the review team and the International Advisory Group to focus attention on interventions that involved active and intentional male engagement.

Evaluation studies
Records identified in the searches were entered into EndNote v9 and duplicates removed. Two review authors independently screened titles and abstracts to exclude studies that were obviously irrelevant.
To ensure quality control, Cohen's kappa was calculated between three reviewers on the first 100 records, selected at random, and discussed to resolve any disagreements of eligibility. This process was repeated until Cohen's kappa reached 0.41 or above and we were satisfied that the screeners were making consistent decisions. We then retrieved studies considered potentially eligible in full text. Dual independent screening of all full texts was undertaken by two review authors. The screening and quality control process outlined above was repeated with a smaller sample of 10 full texts, employing independent dual screening of records thereafter. Any disagreements were discussed with a third review author until a consensus was reached. Cohen's Kappa was once again calculated for this initial full text screening, and for the completed full text screening process ensuring adequate inter-rater reliability (McHugh, 2012).

Connected papers
A citation map was generated for a sub-set of included evaluation studies (33 studies with a male engagement component) and the connected publications were examined to identify eligible process evaluations and qualitative studies ('connected papers'). This included investigations of the programme under evaluation conducted in intervention piloting and refinement, simultaneously with delivery, or following implementation assessing aspects of its design and delivery. This led to the identification of 8 qualitative studies and 15 process evaluations for analyses in this review. These studies related to 14 of the 33 male engagement studies.

| Data extraction and management
Evaluation studies included studies only. We evaluated the reliability of this approach and concluded that it was acceptable in accordance with accepted standards (Landis & Koch, 1977;McHugh, 2012), thus the extraction of Study Characteristics, Intervention Characteristics, and Risk of Bias Appraisal by one review author was implemented for the remaining studies.
As the characteristics and components of interventions were a central feature of this review, care was taken to extract and code this Interventions which may be inclusive of gender sensitive, and gender aware education, but also include discussion of gendered norms, or gender power and challenging of gender-inequalities.

Information & Education
Providing information and education about FP methods, practices and outcomes.
Information provision in clinics; educational programme; informational materials dissemination. (2) reflections of the original authors on how specific elements of an intervention worked/might have worked; and (3) statements on how specific mediators, moderators, and systemand process-level barriers and facilitators impacted/may have impacted on outcomes.

Evaluation studies
Assessment of methodological quality and risk for bias in randomised trials was conducted using the Cochrane Risk of Bias tool for Randomised Controlled Trials (RoB 1) (Higgins et al., 2011). This is a standard tool, which takes the forms of a series of questions about the randomisation procedures and blinding. Non-randomised studies were coded using ROBINS-I (Sterne et al., 2016). As noted above, dual risk of bias appraisal was conducted for 50% [n = 64] of included evaluation studies. We evaluated the reliability of this approach and concluded that it was acceptable in accordance with accepted standards (Landis & Koch, 1977;McHugh, 2012), and risk of bias appraisal by one review author was implemented for the remaining studies.

Connected papers
Qualitative studies were coded by one review author using the Jimenez and colleagues (Jimenez et al., 2018)

| Measures of treatment effect
Outcomes were typically reported as dichotomous data so metaanalysis was conducted using odds ratio (OR), with a random effects model. We focused our analysis on contraceptive use because this was the most measured outcome across all studies.

Multiple intervention groups
We used RVE to account for dependencies in the data and to allow us to make use of multiple effect sizes reported in single studies.

Multiple interventions per individual
We coded each study according to intervention components. We used meta-regression to assess the effectiveness of individual and combined intervention components.

| Dealing with missing data
Of the 127 included studies, 12 study reports did not contain sufficient data to allow calculation of effect size estimates for the primary outcome of our analyses, contraceptive use. When appropriate, we contacted the original authors to request necessary summary data, such as means and standard deviations or standard errors.
Where no information was provided, the study was not included in the meta-analysis and was included in the narrative synthesis only.
We were unable to retrieve information for 40 effect sizes across 12 included studies. These studies were included in the review but excluded from the meta-analysis.

| Assessment of heterogeneity
Heterogeneity was assessed first through visual inspection of forest plots and checking for overlap of confidence intervals and second through the Q, I 2 and Tau 2 statistics. Investigation of the source of heterogeneity is addressed in data synthesis section. AVENTIN ET AL.
| 9 of 47 4.3.9 | Assessment of reporting biases We assessed small study bias (such as publication bias) using a regression test for funnel plot asymmetry (Egger et al., 1997). The model used was a weighted regression with multiplicative dispersion using sampling variance as predictor.
To ensure robustness of the review and to account for individual studies that appear to exert an undue influence on findings, process sensitivity analysis was carried out on domains relating to the quality of the included studies (Cooper, 2016).
The logic model was tested using appropriate meta-analytic techniques combined with findings from narrative synthesis of evaluation study findings and qualitative analysis of connected papers. The process involved the following: (1) Multivariate pairwise meta-analysis to assess the overall effectiveness of the interventions on reported FP outcomes; (2) Meta-regression to assess the impact of multiple intervention components and characteristics on FP outcomes; and (3) Narrative synthesis involving the identification of characteristics and components of included interventions and 'best-fit' framework synthesis of connected qualitative studies and process evaluations to identify barriers and facilitators to effective models of FP.
As noted, different subsets of the data were used for the review questions (see Table 2). All 127 studies were included in the narrative synthesis relating to review questions 1 and 6. The subset of 72 studies are those that report contraceptive use outcome data and had outliers removed and this subset was used for questions 2-5. 3. What are the effective components of interventions that achieve positive change in intended FP outcomes?
Meta-regression to estimate variance accounted for by the identified intervention components and combinations of components for 72 studies.
4. What characteristics and combinations of characteristics are associated with positive FP-related outcomes?
Meta-regression on extrinsic (year of publication); methodological (study design); and substantive (intervention design, dosage, intervention setting; intervention theory of change; who delivers) variables for 72 studies.

Do outcomes vary by context and participant characteristics?
Multivariate meta-analysis of dependent effect sizes with robust variance estimation on characteristics of context (region) and participants (age and sex) for 72 studies.
6. What adverse effects were reported? Narrative synthesis of any reported adverse effects in 127 studies and qualitative synthesis of 23 connected papers (See Supporting Information: Appendix 7.3).
7. What are the system-and process-level barriers to and enablers of effective models of FP involving men and boys?
Qualitative synthesis using a 'best-fit' framework synthesis approach for 23 connected papers (11 connected qualitative studies and 12 connected process evaluations).

| Approach to meta-analysis
Given the diverse range of interventions included in this review, random effects models, using RVE, were used as the basis for metaanalysis. The analyses were conducted using r and the range of commands externally developed to conduct meta-analysis with r including metafor and clubSandwich (Megha Joshi, 2022;Michael Kossmeier, 2020;Pustejovsky & Tipton, 2018;Viechtbauer, 2010).

| Main effects
The main effects analysis, synthesising the evidence on the effects of the interventions was undertaken using multivariate pair-wise metaanalysis outlined above for each outcome in turn.

| Sensitivity analysis
For each outcome, the following sensitivity analyses was undertaken to assess whether there were potential influences relating to studies that appear to exert an undue influence on findings. We used metaregression to assess the impact of: • Year study was conducted • Study design (cluster-RCT, RCT, Quasi-experimental) We did not conduct sensitivity analysis on study risk of bias due to the mixture of RCTs and non-RCTs.

| Subgroup analysis and investigation of heterogeneity
The complexity of the logic model means that there were many possible subgroup analyses and meta-regressions to assess the differential effects in relation to the components of interventions, characteristics of the intervention delivery, population of interest and context. Using robust variance estimates, we conducted analysis for the following: • Geographical region

| Treatment of qualitative data
Qualitative data extracted from the 23 connected papers (15 process evaluations and 8 qualitative studies) were analysed using a 'best-fit' framework synthesis approach (Booth & Carroll, 2015;Carroll et al., 2013). Where possible, qualitative data was also extracted from the subset of 33 male engagement studies. The a priori framework used to code the data constituted categories from the review logic model Duplicate extraction and appraisal were subject to evaluation by the review team to ensure consistent decision-making by a single reviewer.
To assess inter-rater agreement and provide a measure of internal validity, we present the kappa statistic, κ. Generally, Cohen's kappa is used most often as it determines agreement between reviewer A and reviewer B (Landis & Koch, 1977) but the Fleiss kappa statistic may be used where there are multiple reviewers extracting the same data (Fleiss, 1971). The kappa statistic is preferable to reporting percent agreement, as the possibility of agreement occurring by chance is included in the equation. We used this to establish internal consistency across the team. This measure was checked using the irr package in R.
Reliability of data extraction was deemed acceptable in accordance with accepted standards (Landis & Koch, 1977;McHugh, 2012), Finally, in a deviation from our per-protocol analysis we did not conduct analysis separately for different follow-up times as planned.
Instead, we used RVE to allow us to combine multiple effect sizes on the same outcome from each study while accounting for dependency in the data. We did not conduct separate analysis where the same outcome construct was measured but across multiple time domains, such as through the collection of both post-test and further follow-up data.  These criteria were applied in sequential order for the purposes of exclusion and inclusion of records in title and abstract screening and led to the following exclusions: • not related to a psycho-social or behavioural FP intervention (n = 2864, 80.5%) (e.g., surveys of family planning attitudes or practices, commentary on family planning, an intervention unrelated to family planning behaviours), • ineligible study design (n = 633, 17.8%) (e.g., pre-post-intervention designs, lack of a comparison group, intervention protocol or development paper, review of interventions), • did not involve men or boys in intervention delivery (n = 55, 1.6%), • not conducted in a LMIC (n = 5, 0.01%), • unavailable publication abstract or full text, thus awaiting classification (n = 5, 0.01%).
Following title and abstract screening of identified studies, 280 records were subject to full-text screening. In assessing studies for eligibility at this stage, the same four criteria were applied to the records marked for inclusion at the title and abstract screening stage.
This led to the exclusion of a further 147 records for the following reasons: • Did not evaluate an intervention (n = 40, 27.2%) • Did not evaluate a relevant intervention (n = 32, 21.8%) • Ineligible study design, i.e., no comparison group (n = 48, 32.7%) • Did not deliver intervention to men or boys (n = 21, 14.3%) • Was not conducted in a LMIC (n = 6, 0.4%) Five records were removed following closer examination during data extraction for the following reasons, which are in line with the eligibility criteria for this review: lacking a comparison group exposed to a different or no intervention (Baochang et al., 1998;Nabaggala et al., 2019); intervention content related to HIV prevention exclusively (Harvey et al., 2000;Vernon et al., 1990); intervention delivered to females only despite appearing to encourage male involvement (Jahanfar et al., 2005). The review team was unable to acquire abstract or full-text resources for a total of 19 records, meaning these were labelled as 'Awaiting Classification' and did not advance to eligibility assessment or inclusion.  the 127 studies included in the review. Tables 3 and 5 provide summary statistics for all included studies and male engagement studies, respectively.

| Included studies
Study characteristics all evaluation studies (n = 127).
Year of publication, participants and study design. Among the most common study sites were Kenya (n = 10), South Africa (n = 7), Nigeria (n = 6). This was followed by Asia (n = 37), with China (  • Explicit targeting of husbands for counselling to increase acceptance of female family planning methods (Amatya et al., 1994;Fisek & Sumbuloglu, 1978;Ha et al., 2005a).
• Male promoters used to disseminate information to males and increase acceptability of male family planning methods and participation Shattuck et al., 2011).  • Intervention objectives specifically and exclusively targeting men (Exner et al., 2009;Sahip & Turan, 2007;Shattuck et al., 2011). delayed first pregnancy, intentions to limit family size, total fertility rate) was assessed chiefly in interventions delivered to adults or groups inclusive of all individuals of reproductive age (n = 36).
Interpersonal level outcomes were assessed in n = 38 studies. Of these, the most common were Communication (n = 29) and Joint decisionmaking around FP (n = 12). Perhaps unsurprisingly, interventions attempting to address these were those involving males and females in delivery (n = 10 out of 12). The remaining two studies/interventions were delivered to males exclusively, however, these emphasised building communication skills and the promotion of joint FP decision-making.
Organisation-level outcomes were assessed in n = 8 studies and chiefly addressed increasing service engagement and accessibility for all, not necessarily specifically for males. A small number of studies, however, did consider enhancing gender equitable beliefs among service providers (n = 3) (Khatun et al., 2011;Timol et al., 2016;Vernon & Dura, 2004).

| Connected papers
We appraised qualitative studies using Jimenez et al. (2018) Table 6). A full breakdown of risk of bias judgements for all domains of each tool is included in Supporting Information: Appendix 5.0. Three process evaluations (Daniele, 2017;Mantell et al., 2006) were judged to have low risk of bias, while the remainder were judged to have moderate risk of bias. One qualitative study (McCarthy, 2019) was judged to have a high risk of bias, because of lack of full reporting on several of the domains.

| Synthesis of results-Causal chain analysis
5.3.1 | Review question 2: What are the impacts of FP interventions involving men and boys on FP outcomes?
The effects of FP interventions on 'contraceptive use' outcomes The meta-analysis of 72 studies (k = 265) revealed that the FP interventions had statistically significantly higher odds of improving contraceptive use when compared to comparison groups (OR = 1.38, CI = 1.21 to 1.57, PI = 0.36 to 5.31, p < 0.0001).
The groups who received the FP interventions were one and a third times more likely to experience improved contraceptive use.
As there were substantial variations between the studies in terms of their effect sizes (heterogeneity Q = 40,647, df = 264, p < 0.0001; I 2 = 98%), we investigated I 2 further and found that 25% of heterogeneity was between cluster/study and 73% was within cluster/study. We know that the multilevel model contains two variance components (sigma^2_1 and sigma^2_2), for the between-cluster heterogeneity and the within-cluster heterogeneity. Therefore, about 25% of the total variance is estimated to be due to between-cluster heterogeneity, 70% due to withincluster heterogeneity, and the remaining 5% are sampling variance. This is an investigation of the total remaining variance after outliers were removed following the process outlined by (Viechtbauer, 2010).
To test for publication bias, a weighted regression with multiplicative dispersion using sampling variance as a predictor was utilised. This test found no evidence of publication bias (p = 0.48) (see Figure 16), indicating that there was an accurate representation of the literature of interest.  and this indicates that the explained variance across this data is significantly greater than the unexplained variance, overall.

T A B L E 6 Impact of male engagement FP interventions on intermediate outcomes
As highlighted in

| Review questions 4 and 5:
What characteristics and combinations of characteristics are associated with positive FP-related outcomes? Do outcomes vary by context and participant characteristics?
All included studies (n = 127) In Table 10, we present ten potential moderators of contraceptive use using robust variance estimates. This exploratory analysis used a single-variable, no-intercept model. Estimates presented are ORs.
T A B L E 7 Summary of correlated effects meta-regression results linking intervention components to contraceptive use

| Review question 6: What adverse impacts were reported?
None of the evaluation studies reported adverse outcomes, although one study (Harrington Elizabeth et al., 2019) did report potential negative consequences. Namely, some women were concerned male partners may suspect them of engaging in covert contraceptive use, and that factual information about potential bleeding and other side effects as a result of a LARC method may discourage male acceptance of these.
Four connected papers mentioned adverse consequences relating to involving men and boys in FP. Only two studies directly indicated evidence relating to a lack of adverse effects on family life and FP decision-making (Daniele, 2017;Turan et al., 2001).
While not directly implicated as an adverse outcome, one study (Harrington et al., 2016)  Two studies mentioned the importance of migrant status, relating the negative impact of men working away from the household for periods of time as a barrier to FP uptake Daniele, 2017). One study  reported the advantage of urban versus rural residence when considering FP intervention implementation.

Individual attitudes, values and beliefs about FP, including
attitudes about FP services, were indicated as important in eight studies Daniele, 2017;Doyle et al., 2014;Hartmann et al., 2012;Khan et al., 2008;Nair et al., 2019). Some reported that increased perceptions of risk caused by delayed initiation of contraception  or beliefs that FP use would have economic advantages (Hartmann et al., 2012) were associated with positive impacts while misconceptions that contraception causes infertility (Khan et al., 2008) and negative attitudes about condom-use within marriage had opposite effects (Ghule et al., 2015) Two studies noted that attitudes about reduced sexual pleasure acted as a barrier to condom use (Ghule et al., 2015;Khan et al., 2008). One study noted the facilitating effects of positive past FP behaviours and experiences indicating that a history of safe sexual practice was predictive of continued FP use (Daniele, 2017).
Two studies reported that knowledge played an important intermediary role in contributing to increased couple communication (Daniele, 2017;Hartmann et al., 2012).
Eight studies discussed the influence of perceived gender and cultural norms on acceptance and use of FP (M. Daniele, 2017;Ghule et al., 2015;Harrington et al., 2016;Jewkes et al., 2010;O. L. McCarthy et al., 2018bO. L. McCarthy et al., , 2019. These studies noted the inequalities that favoured men as household decision-makers and stigmatised sex outside of marriage. Male consent or 'permission' for women's use of FP emerged as a sub-theme of perceived gender and cultural norms (Daniele, 2017;Harrington, 2017a;Harrington et al., 2016Harrington et al., , 2017. Some studies noted that women's acceptance of gender norms relating to FP were common (Daniele, 2017;Doyle et al., 2014;Jewkes et al., 2010) while one study highlighted women's responses to inequalities. These included 'sweet talk' with sexual partners or concealed use of contraception when they were experiencing a lack of congruence with cultural expectations on childbearing or thought joint-decision-making about FP unattainable (Harrington et al., 2016) One study reported an adverse impact of men 'dominating conversations' in couple counselling sessions (Daniele, 2017). A central barrier to couple communication about FP or promoting joint or female-led decision-making was perceived gender and cultural norms that saw women as responsible for family planning and cultural norms that stigmatised men's move away from dominance as the head of the household decision-making (Doyle et al., 2014;Ghule et al., 2015;Harrington et al., 2016).
The importance of communication about FP and decisionmaking norms and preferences emerged as an important theme that was not highlighted in the a priori framework. Ten studies (Daniele, 2017;K. Doyle et al., 2014;Ghule et al., 2015;Harrington, 2017a;Harrington et al., 2016;Hartmann et al., 2012;Jewkes et al., 2010;McCarthy et al., 2018b;Nair et al., 2019;Turan et al., 2001) referred to this. While some studies reported that male decision-making about FP remained an accepted norm and preference for both women and men (Daniele, 2017;Ghule et al., 2015;Harrington et al., 2016, there were also reports of the positive influence of improved spousal communication and joint decisionmaking about FP [ (Daniele, 2017;Doyle et al., 2014;Harrington et al., 2017;Hartmann et al., 2012;Nair et al., 2019;Turan et al., 2001) or female led decision-making on the contraceptive method used (Daniele, 2017). One study (Jewkes et al., 2010) reported Relatedly, three studies mentioned the influence of marital status/ type on FP use (Daniele, 2017;Khan et al., 2008;McCarthy et al., 2018b). As noted, newly married couples were often subject to social expectations for early pregnancy (Khan et al., 2008;O. L. McCarthy et al., 2018b). One study (Daniele, 2017) alluded to the potential differences in men's willingness to engage with FP when they were in a monogamous versus polygamous marriage, with the latter proposed as leading to less investment in the healthcare of each wife. HIV status was mentioned as a key factor in two studies, with both noting that HIV positive status was associated with increased contraceptive use (Mantell et al., 2014;Ngure et al., 2012).
Reproductive history and intentions for future childbearing and the sex of existing children emerged as key influences on FP use (Ghule et al., 2015;Harrington et al., 2016;Nair et al., 2019;J. A. Ross & Bang, 1966). Two studies noted preferences for sons (Ghule et al., 2015;Nair et al., 2019) and three (Ghule et al., 2015;Harrington et al., 2016;J. A. Ross & Bang, 1966) reported the cultural significance of childbirth early in marriage. All noted that the absence of either would result in limited use of FP. Further, birth spacing norms were highlighted as important in one study (Khan et al., 2008).
While co-residence with extended family (an a priori category) was McCarthy, 2019). 'Mothers of husbands' were noted as particularly influential (Khan et al., 2008;O. L. McCarthy et al., 2018b). This theme appeared to be linked to the broader concept of community knowledge about FP.
Barriers and facilitators at the external system level. Four key themes relating to external systems emerged from the connected papers.
Three of these were a priori categories (gender, cultural and religious norms; health systems and services; FP supply chain] and one additional category emerged from the thematic analysis (social network influences).
The positive influence of social networks beyond the family on Health systems and services was noted as an important factor by four studies (Baqui et al., 2018;Daniele, 2017;Doyle et al., 2014). Four studies discussed the impacts of incorporating FP services within existing maternal and child health (MCH) services (Baqui et al., 2018;Daniele, 2017;Doyle et al., 2014).
One of these (Baqui et al., 2018) reported that the addition of FP services that engaged men and boys did not have adverse effects on existing services, while the others noted that men were concerned that they would not be welcome to attend MCH settings (Daniele, 2017) or did in fact experience barriers including overcrowded delivery rooms, as well as biased, undermining and negative attitudes from healthcare workers (Doyle et al., 2014;Nair et al., 2019).
Two studies indicated the importance of FP supply chain, the availability of contraceptives and services, in encouraging FP uptake and use (Ahmed et al., 2013;J. A. Ross & Bang, 1966).
Barriers and facilitators at the process level. Six a priori categories (intervention acceptability; intervention costs, sustainability, and replicability; quality of delivery; provider-preparedness; participant recruitment, retention, and representativeness; and study design and characteristics) emerged as potentially important influencing factors.
Two additional categories (reach and favourability of contraceptive method) also emerged as relevant.
Four studies (Akhter et al., 1993;Ghule et al., 2015;Harrington, 2017a; noted the importance of intervention acceptability. Two studies noted the facilitating effects of culturally acceptable interventions (Harrington, 2017a;. Satisfaction with contraceptive methods was noted by two studies (Akhter et al., 1993;Ghule et al., 2015), with physical sideeffects presented as key barriers to female use of FP. Another (O. L. McCarthy et al., 2018b) noted the negative impact of intervention costs. Two studies (Ahmed et al., 2013;Nair et al., 2019) indicated the importance of quality of delivery on intervention outcomes. Five studies commented on provider-preparedness to deliver FP (including provider characteristics), with the trustworthiness, knowledge, and flexibility of providers highlighted as key (Daniele, 2017;Khan et al., 2008;McCarthy et al., 2018b). One study noted challenges in engaging men when healthcare providers were female (Daniele, 2017).
Four studies mentioned participant recruitment and retention as potential influences on programme effectiveness, issues around engaging men in couple-focused sessions or with MCH service settings highlighted as particularly challenging (Daniele, 2017;Harrington, 2017a;Nair et al., 2019;Turan et al., 2001) Further, five studies Daniele, 2017;Jewkes et al., 2010;McCarthy et al., 2018b) highlighted the importance of the specific characteristics of the study design as important. One study (Harrington, 2017a) noted contamination across intervention and control communities as a barrier to impact, while the others implicated particular aspects of their programme design (e.g., communications, assertiveness skills session, instant messages) as key. Relatedly, two studies Ghule et al., 2015), discussed the importance of reach. These related to how study processes might ensure that they are able to reach those in the most rural or hard-to-reach areas. Finally, two studies (Akhter et al., 1993;Ghule et al., 2015) noted the importance of the favourable attitudes towards or satisfaction with the contraceptive method being used.

Sensitivity analysis
Only one of the connected papers was deemed to have a high risk of bias (McCarthy, 2019). This study contributed data to two of the themes (co-residence with extended family and gender norms Our analysis revealed that the high heterogeneity of effects among included studies was mostly due to within study variability. We therefore sought to uncover the effective characteristics and The evidence also supported approaches targeting adolescents or adults alone, as well as those that targeted both age groups. In contrast to the findings of previous research (Lopez et al., 2009)

| Revised logic model
Based on the available evidence and the input collected during our stakeholder meeting, we revised the initial review logic model in the following ways (see Figure 17): • All information that was not evidenced (i.e., not significant or not included) in the included evaluation studies and connected papers was changed from black to grey font to highlight areas for future research to consider.
• Intervention component headings were changed to reflect more appropriately terms used in the literature. In particular, 'gender dialogue' was changed to 'gender transformative'; information was changed to 'information and education', 'skills-building and problem-solving' were combined, 'social support' was changed to 'social/peer mentor support', 'incentivisation' was changed to 'subsidisation' and incentivisation', 'Communication' was changed to 'Communication about FP' and 'male involvement' was changed to 'male engagement'. Additionally, 'free contraceptives' was added to 'subsidisation' and incentivisation' and 'subsidised or free FP methods' removed from 'health service enhancement'.
• Under intervention characteristics only evidence-based characteristics after the sub-headings were left in place. The 'why' and 'tailoring & modifications' headings were removed. The remaining elements that were not reported or evidenced were changed to grey coloured font.
• Under potential negative outcomes 'male resistance to FP leading to covert use and unmet need' was added and the remaining items which were not reported were changed to grey font.
• Under process metrics, reach and favourability of contraceptives were added.
• For the remaining sections, all information not reported was greyed out and information reported left in place.

| Overall completeness and applicability of evidence
We followed a pre-registered peer-reviewed protocol that was interventions that involve men and boys with those that do not. We also noted that most of the included studies targeted older adolescents and men, so these findings cannot be reliably applied to younger adolescents or children. Country range in the included evaluation studies was narrow within regions with fewer studies from the Americas and Asia than from Africa. Further, as we did not conduct analyses regarding urban and rural settings for intervention delivery, it was not possible to conclude whether findings from rural areas are applicable to urban areas and vice-versa. Given potential differences in these settings and the implications for intervention implementation, for example, some interventions may require the availability of facilities only present in urban areas or community networks only evident in rural areas, this is an important consideration.
We also found that there was an absence of comparable

| Limitations and potential biases in the review process
Our inclusion criteria led to a larger than expected number of included studies. Our decision, due to resource restraints, to focus some of the analysis on a subset of studies (namely first those that We included all contraception use in the contraceptive use outcome. This included the very small number of studies that included withdrawal as a method of contraception. We used systematic review process methods to minimise bias during the review process. As noted, a deviation from protocol was implemented in relation to dual extraction of data relating to study characteristics, intervention characteristics, and risk of bias assessments. Although this process introduces the possibility of bias, we are confident that the reliability of this approach is in line with accepted standards (Landis & Koch, 1977;McHugh, 2012 While information was available in most studies to calculate effect sizes, this was not always the case. The study team contacted authors to request additional data to facilitate this, however, no authors responded with the required information. It is possible that this resulted in biased findings. Also, we did not extract data on the urban/rural breakdown, a limitation of our review, considering that there was some evidence that this distinction is important. Moderator analyses are exploratory in nature and should always be interpreted with caution (Borenstein et al., 2009). Additionally, these types of analyses generally have low statistical power owing to missing data in the primary research due to the incomplete reporting of many of the variables of interest. Analyses are restricted considerably due to this issue and robust conclusions from these analyses are constrained.

| Agreements and disagreements with other studies or reviews
Overall, the findings of this review reinforce and expand the findings from prior research in this field. Our finding relating the effectiveness of FP interventions involving men and boys confirm and expand on those of a review by Phiri and colleagues (2015a), which involved a narrative synthesis of findings from ten randomised controlled trials.
Building on findings of prior reviews conducted by members of the current review team ( ). In addition, however our review has identified that brief interventions of less than three months in the field of family planning also demonstrate effectiveness. In support of findings of a recent review from Sahay and colleagues (2021) and an analysis of the FP2020 commitments made by several LMICs in relation to involving males in FP programmes (Hook et al., 2021), this review confirms the importance of improving knowledge and attitudes related to contraception as a means of increasing its uptake and use.  . While the incorporation of explicit theoretical grounding may serve to advance the field, this may not be sufficient in isolation, with calls for evidence-led programme development also (Raj et al., 2016). These results indicate successful programme development and implementation may therefore be theory-or data-driven, and prompt recommendation that both approaches be incorporated.

| AUTHORS CONCLUSIONS
Family planning interventions that involve men and boys alongside women and girls are effective in improving uptake and use of contraceptives. Programmers across the world have developed and evaluated a wide range of interventions, as rich and varied as the contexts in which they are delivered. This variability, while necessary to some degree, also has implications for evidence synthesis.
Heterogeneity of components, characteristics and outcomes meant that some meta-analyses were not possible with the current data set.
This review did, however, unravel some parts of the causal chain, 5. Carefully consider proximal and distal outcomes. The evidence presented in this review revealed a gap in interventions that move beyond the interpersonal level to impact community, organisation/service, and structural level outcomes. There is also a need to consider and measure the longer terms impacts of FP interventions and more uniform methods of outcome measurement. This will facilitate drawing conclusions on the effectiveness of FP approaches in future.

| Implications for research
The analysis identified some gaps in evidence in relation to our review questions that have implications for future research.
First, in relation to the populations under study, few studies were available from South and Central America, the Middle East, and Northern Africa. Within regions, research tended to be focused on particular countries, with only 17 LMICs represented in the review.
Given the importance of local cultural norms as barriers or facilitators of uptake of FP, much more evaluation research is needed internationally, with research funds targeted at countries in which unmet need for FP and robust evidence is lacking. It would also be valuable for future reviews to collate data on whether studies are conducted in urban or rural settings. Data collection for this study occurred during the COVID-19 pandemic so we examined the data for studies that took place during disease outbreaks. We found none.
Equally, we found no studies that took place in conflict, disaster, or climate stressed contexts. Given the continued impact of these factors across the world and their potential implications for increasing unmet need for FP, further research in these settings is urgently needed.
In relation to intervention and study characteristics, we found that reporting within studies was variable, with many studies not following recommended reporting guidelines. We extracted PROG-RESS Plus criteria (O'Neill et al., 2014) when it was available but these details were too sporadically reported to include in the analysis.
Similarly, some studies provided insufficient or unclear information on intervention characteristics, with a variety of terms used for the same components. This made it very difficult to code and categorise data. Future intervention evaluation studies should use recognised behaviour change terminology such as that proposed by Michie et al. (2011) and also ensure to use appropriate intervention reporting guidelines such as TiDiER.
On an outcome level there were few studies that examined interventions delivered beyond the individual or interpersonal levels.
There is much room for programme planners and evaluators to intervene as these levels as recommended elsewhere (Ruane-McAteer et al., 2020). Further, none of the included studies reported the use of participatory designs, an approach that is recommended for future work to ensure the relevance of intervention and study designs for particular contexts. More research is also needed on intervention designs based on incorporating male involvement in FP with maternal and child health programmes. Some studies included reported promising results, but the studies were too few to conduct meaningful analysis.
A related recommendation for researchers relates to how outcomes are measured across included studies. We established high heterogeneity in relation to how outcomes were measured across different studies, using different assessment methods, outcome measures, timings and methods of reporting. Very few studies distinguished between primary and secondary outcomes. This makes synthesis and meta-analysis of results challenging. Research AVENTIN ET AL.
using standardised measures is highly recommended and reporting of experimental studies should follow the CONSORT checklists.
A further recommendation relates to the absence of economic evaluations of these interventions. Only three of the included studies Diop et al., 2004;Townsend et al., 1987) examined cost-effectiveness, so research exploring this important factor is urgently needed. Similarly, all interventions included in this review adhered to binary and cis-normative concepts of gender identity and sexuality.
Family planning remains a pertinent issue for those identifying as LGBTQI+, with authors noting that even those who have transitioned socially or hormonally are in need of support to ensure they can achieve their desired family size (Francis et al., 2018). The experiences of transgender individuals remain critically underinvestigated in relation to family planning, hence given the novel and unmet need for this group further research is called for into the need to involving transgender men in family planning.
Finally, notably absent from the interventions included in this review were behavioural interventions that support those who do not ever wish to become parents. Given the reported pressures placed on young couples to engage in childbearing noted in this review and increasing trends of individuals deciding to delay or avoid parenthood (Mauceri & Valentini, 2010;Nomaguchi & Milkie, 2020;Umberson et al., 2010), it is likely that this subgroup of people represent a significant yet neglected population that deserve the attention of future research.

PRELIMINARY TIMEFRAME
The preliminary timeframe for submission of the completed review was one year following protocol publication. This was delayed by two months (the protocol was published in January 2021 and the completed review submitted in March 2022).

PLANS TO UPDATE THIS SYSTEMATIC REVIEW
The authors seek support to update the results of this review in line with emerging evidence in the field. We anticipate that the need for the next update will be considered in 5 years.

DIFFERENCES BETWEEN PROTOCOL AND REVIEW
Deviations from the review protocol are presented in Section 4.2.1.