Prioritization of Outcomes in Efficacy and Effectiveness of Alcohol Brief Intervention Trials: International Multi-Stakeholder e-Delphi Consensus Study to Inform a Core Outcome Set

Objective : Outcomes used in alcohol brief intervention (ABI) trials vary considerably. Achieving consensus about key outcomes can enhance evidence synthesis and improve healthcare guidelines. This was an international, e-Delphi study to prioritize outcomes for ABI trials as one step in a larger effort to develop an ABI core outcome set (COS). Method : 150 registrants from 19 countries, and representing researchers, policymakers, and patients, participated in a two-round e-Delphi study. In Round 1, participants (n=137) rated 86 outcomes, derived from a review of the literature and a patient and public involvement panel, by importance. In Round 2, participants (n=114) received feedback on importance ratings for each outcome and a reminder of their personal rating before rating the outcomes for importance a second time. Seven additional outcomes suggested in Round 1 were added to the Round 2 questionnaire. We defined consensus a priori as 70% agreement across all stakeholder groups. Results : Seven consumption outcomes met inclusion criteria: typical frequency, typical quantity, frequency of heavy drinking, alcohol-related problems, weekly drinks, at risk drinking, and combined consumption measures. Others meeting the threshold were: alcohol-related injury; quality of life; readiness to change; and intervention fidelity. Conclusions : This is the first international e-Delphi study to identify and prioritize outcomes for use in ABI trials. The use and reporting of outcomes in future ABI trials should improve evidence synthesis in systematic reviews and meta-analyses. Further work is required to refine these outcomes into a COS that includes guidance for measurement of outcomes.


Introduction
Alcohol brief interventions (ABIs) have emerged as the main approach to addressing hazardous and harmful alcohol use in a range of settings, including primary care, emergency departments, hospitals, online, criminal justice, workplaces, probation, and universities.
According to NICE guidance PH24 (National Institute for Health and Clinical Excellence, 2010), ABIs are suitable for non-treatment-seeking alcohol users aged 16 or over who are currently experiencing, or are at risk of experiencing, problems from their alcohol use. ABIs are behavioral and/or motivational interventions designed to help drinkers reduce their alcohol consumption. They typically consist of a short, single session of feedback and tailored advice (brief advice), or longer, motivationally-based interventions that explore motivations for drinking and personal barriers to change (extended BI) (Cunningham et al., 2017). Essential components of ABI's are defined here as the assessment of personal alcohol use and tailored feedback provided directly to the drinker. Systematic reviews of ABI trials do not always agree on the efficacy and effectiveness of ABIs to change alcohol use (e.g. Davoren et al., 2016;Kaner et al., 2018;Khadjesari et al., 2011;White et al., 2010). There are many possible reasons for this disagreement, such as changes in the population being studied over time, changes in baseline drinking, variability in ABI content and reporting, and inclusion and exclusion criteria variations, among other issues. An avoidable source of disagreement in the literature, however, arises from the wide variation in outcomes used and the difficulty in combining diverse outcomes in metaanalyses (Cumming, 2013;Kaner, et al., 2018). This can be variation in 'what' outcome is measured or, for a given outcome, variation in 'how' the outcome is measured.
Given the increasing role of systematic reviews and meta-analyses in determining health policy and given the potential for outcome heterogeneity to compromise these reviews and analyses, there is a growing effort across a wide range of disciplines and disease categories to standardize trial outcomes . The importance of standardizing trial outcome measurement is recognized by the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT; Chan et al., 2013) and Consolidated Standards of Reporting Trials (CONSORT; Moher et al., 2010) statements; both statements recommend the use of a well-designed core outcome set (COS). A formal process for defining a COS has been established by the Core Outcome Measures in Effectiveness Trials (COMET) Initiative Williamson et al., 2017), and numerous studies using this process have been undertaken (Gargon et al., 2017). Given the current lack of a COS for ABI trials and the increasing importance of ABIs in alcohol policies worldwide, an ABI COS is urgently needed.
The selection and application of a COS is relevant to all ABI stakeholders, including beneficiaries of ABIs (service users), practitioners, and policymakers . A COS ensures outcomes meaningful to service users are routinely considered in clinical trials and policymakers' perspectives are reflected in trial outcomes. Without a COS, the selection of trial outcomes remains at the sole discretion of the involved researchers whose decisions about which outcomes to include may be impacted by implicit biases and cause unnecessary heterogeneity in the outcomes used across trials. The systematic review which informed this work assessed what outcomes are used and how they were measured in all ABI trials since 2000 across all settings. Briefly, in 405 eligible trials (out of 33,134 studies screened), 2,641 outcomes were reported, measured in approximately 1,560 different ways (Shorter et al., under review). As every researcher has the opportunity to select from a range of outcomes, better standardization of the minimum requirement to measure change will maximize the potential of ABI research to influence decision-making, as it has in other research areas such as eczema or rheumatoid arthritis (Boers, 1994;Schmitt et al., 2011).
To achieve improved standardization of the outcomes used in ABI trials, the Outcome Reporting in Brief Intervention Trials: Alcohol (ORBITAL) project  is working to establish a COS for ABIs using COMET procedures. Endorsed by the Group (RMS-SIG), ORBITAL undertook three, inter-related efforts to establish an ABI COS . The first was a comprehensive systematic review to determine what outcomes are reported in ABI trials (Shorter et al., under review). The second, and the focus of this paper, was an international, multi-perspective e-Delphi consensus study to prioritize outcomes for use as a minimum set of reported outcomes in all ABI efficacy and effectiveness trials. The third step in the COMET process is to recommend a final set of specific measures using criteria recommended by the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative (Mokkink et al., 2016).
Although ORBITAL is developing the COS as a minimum data standard for ABI trials, trials can use other measures alongside the COS as appropriate. Based on the COMET methodology , ORBITAL places importance on involving a wide range of stakeholders in the development of the COS, and in particular the client group considered to benefit most from an efficacious or effective intervention. As reported in this paper, we used an e-Delphi approach to understand what outcomes are priorities for ABI stakeholders in addition to prioritizing outcome domains for use in ABI trials.

Method
We conducted an international, multi-stakeholder e-Delphi consensus study by generating a list of relevant outcomes which participants were asked to rate in two successive rounds. The e-Delphi approach is an iterative consensus technique which presents a series of sequential questionnaires asking individuals to rank outcomes in terms of priority for inclusion in a COS for ABI efficacy or effectiveness trials. Ethical approval was granted by the School of Health and Social Care Ethics Committee at Teesside University (reference: 018/17).
The e-Delphi is an online implementation of the Delphi approach for consensus building (Hasson & Keeney, 2011). The e-Delphi approach solicits the opinions of thoughtleaders and experts on a particular topic in successive rounds, with each round providing input into the next (Sinha et al., 2011). All Delphi studies use at least two rounds, but some use three or more. E-Delphi panelists are informed of the results of prior rounds and allowed to revise their opinion based on those results. The goal is to achieve some predefined threshold of consensus. Key to the e-Delphi approach is the anonymity of panelists.
By ensuring panel members remain anonymous throughout the process, panelists are free to revise their opinion without fear of reputational harm or to refuse to revise their opinion without pressure from the group to do so (Hasson, Keeney & McKenna, 2000).

Participants
There are no accepted guidelines for panel size to achieve stable consensus in an e-Delphi study. As such, we were guided by practicality, scope, and time available (Blackwood et al., 2015). Consistent with the purposive or criterion sampling approach used by many Delphi studies (Hasson et al, 2000), our sampling strategy focused on identifying electronic forums used by the relevant stakeholders and then allowing the sample to evolve organically as stakeholders shared the invitation to participate. Our sampling approach is best described as a purposive, snow-ball sampling approach. Participants were recruited in the following ways: emails to relevant mailing lists of researchers, practitioners, and policymakers in the field, such as the INEBRIA google group; emails to corresponding authors of ABI trials identified by the systematic review; a tweet circulated on the @teamalphatees at Teesside University; and emails forwarded by recruited participants to additional contacts with relevant expertise.
Participants were recruited between July 4 th and August 1 st 2017. Those recruited included trial investigators, INEBRIA RMS-SIG members, executive leadership of scientific organizations, Cochrane Review Group on Drugs and Alcohol members, NICE alcohol use disorder prevention PH24 membership group, trialists, statisticians, COS developers, service users/patient and public representatives, practitioners, groups involved in developing ABI clinical guidance, funders, and research ethics committee members .
Participants often held multiple roles. Consistent with COMET methodology, we included researchers, policymakers, and service users/patients in our sample of experts to ensure a broad representation of opinions. The patient perspective is particularly important to the COMET methodology since user input into this kind of study adds the lived experience of the alcohol consumer. Outcomes perceived to be relevant by stakeholders further removed from the user experience can appear less relevant to users (Henihan et al., 2015;Henihan et al., 2016). Also, users can suggest outcomes not immediately apparent as important to researchers. Recruitment text and round instructions are available from the corresponding author. To minimize attrition, recruitment text stressed the importance of completing both rounds.
Given the organic and evolving nature of our recruitment procedure, it is impossible to say how many individuals received an invitation to participate during the window of recruitment. For example, the INEBRIA Google group has 653 members, but not all members actively monitor the group. Among those members that do monitor the group, we cannot determine, nor could we have monitored, how many members forwarded the invitation to colleagues. Similarly, we have no way of tracking how many of the 458 @teamalphatees twitter followers saw the e-Delphi invitation. Furthermore, the anonymous nature of the e-Delphi made it impossible to track the acceptance rate of the approximately 250 invitation emails we sent in any systematic way. Thus, we cannot provide a response rate in a traditional sense. The relevance of a traditional response rate is unclear, however, given the purposive, anonymous, snow-ball sampling approach we used. Therefore, rather than focus on a response rate per se, we monitored the number of panelists in each of the three basic stakeholder types: researcher, policymaker, and patient.
We did not attempt to "balance" the participants across types but rather tried to ensure a sufficient number of each type.

The Delphi questionnaire and rounds
The e-Delphi used a bespoke online e-management system 'DelphiManager', maintained by the COMET initiative to facilitate core outcome set development. In both rounds, participants scored each outcome using the Grading of Recommendations Assessment, Development, and Evaluations (GRADE) scale of 1-9, with 1-3 labelled 'not important for inclusion', 4-6 labelled 'important but not critical' and 7-9 labelled 'critical for inclusion' (Guyatt et al., 2011). Outcomes were derived from the first 100 papers in a systematic review of existing ABI effectiveness and efficacy trials (Shorter et al, under review; Registered at PROSPERO, CRD42016047185; Shorter et al., 2016). These papers were not randomly selected but did represent a range of ABI settings and the full spectrum of years from 2000-2016. Current or former hazardous drinkers (n=9) formed a patient and public involvement panel, some from an established service user representative group (Belfast Experts by Experience), and others known to the lead author as drinking hazardously or above (and not researchers, clinicians, or members of other professional groups related to drinking or other addictive behaviors). The hazardous drinking individuals on this panel (n=5) were recruited through personal invitation from the lead author, and were verified as hazardous drinkers by an AUDIT score of eight or more. The patient and public involvement panel added additional outcomes to the questionnaire at Round 1. Every outcome was given a descriptor. The outcomes were discussed and refined for clarity by the patient and public involvement panel and the authors.
In Round 1, there were 86 outcomes presented to participants. Participants could add additional outcomes and comment on the reason for their outcome ranking. Suggested outcomes from Round 1 were reviewed and coded to determine their novelty (i.e. that they were not covered by existing outcomes in the questionnaire). The additional outcomes and decisions made can be seen in supplementary material A. Round 2 included the 86 original outcomes, the seven additional outcomes, the individual's personal ranking, and rankings grouped by stakeholder group (researchers, healthcare and other professionals, and service users/representatives). Round 1 and Round 2 both used the same GRADE ranking system.
All those who registered in Round 1 were invited to take part, with Round 2 closing on September 12 th 2017. Consensus was defined a priori  as 70% or more of the respondents scoring an outcome from seven to nine and fewer than 15% scoring it one to three (Blackwood et al., 2015;Eleftheriadou et al., 2015). This would illustrate an outcome agreed critically important by the majority and little or no importance by a small minority . Although there is no formal guidance for the reporting of e-Delphi studies, we followed recommendations by Sinha et al. (2011). Participants received no financial incentive to participate.

Results
There were 150 total registrants. Overall, 137 took part in at least one question in Round 1 (including five partial completions) and 114 took part in at least one question in Round 2 (including 10 partial completions) -referred to as participants. In total, 107 took part in at least one question in both rounds, 30 completed Round 1 only, seven completed Round 2 only, and seven registered but did not complete either round. A single person's response contributed between 1.1%-0.7% (Round 1), and 1.4%-0.9% (Round 2) to a percentage total (variability range includes missing data or 'prefer not to answer'). As noted in a recent systematic review (Boulkedid et al., 2011), few Delphi studies report response rates for all rounds, so it is difficult to determine if our rate of attrition from Round 1 to Round 2 is typical. This same review found the median number of invited participants in Delphi studies was only 17. Thus, we conclude that our sample size is more than sufficient and since our sample is intentionally purposive, not representative, we also conclude that any attrition from Round 1 to Round 2 is not problematic.
Details of participants/registrants are given in Table 1. The largest proportions of respondents were researchers, female, and from the UK or USA. Because ABI "patients" are most often hazardous drinkers who are not treatment seeking, and often do not consider themselves to be alcohol patients, this group was the most problematic to identify and recruit, and consequently had the lowest representation across participant types. In total, participants were from 19 countries (several noted 'other' but without stating country name). The majority had been involved in at least one ABI trial (70.7%) and around a quarter had been involved in four or more. Most participants had no experience of reviews of ABIs or of developing measurement instruments (59.3% and 60.7% respectively). In addition, the majority had no experience with core outcome set development (71.3%). Of those with previous experience, most had been involved in developing one core outcome set. The majority of ABI trial experience was in a healthcare setting: 38.0% had experience in alcohol or drug treatment settings; 36.7% in primary care; outpatient and inpatient care both had 33.3% each; and 31.3% in emergency care settings.
The ranking of consumption measures is given in Table 2. Based on Round 1 ranking, four met the 70% threshold. On review by participants in Round 2, seven met this criterion. These were: typical frequency; frequency of heavy drinking; number of drinks in a week; hazardous or harmful drinking; alcohol-related problems; combined consumption measure; and typical quantity. There was least change in views on alcohol-related problems, with an increase of 0.8% in those ranking this outcome 'critical for inclusion' between rounds. By contrast, the largest increase in those ranking 'critical for inclusion' was in the typical quantity outcome which increased by 15%.
Rankings of the remaining domains are given in Table 3. Biomarkers were typically under-ranked, with a higher proportion selecting 'unable to score' than in any other domain. However, of those that were ranked, the highest ranked were levels of Phosphatidylethanol and Alanine aminotransferase, but none met either threshold for scores in the 'critical for inclusion' range or 'not important for inclusion' range. In the resource use and economic factors domain, none met the 70% threshold for those in the 'critical for inclusion' range. However, four met the lowered 60% threshold. These were: alcohol-related injury; use; alcohol or drug treatment; emergency healthcare; and hospitalization. In the life impact domain, the highest ranked outcome was quality of life. This outcome was ranked 79.4% in the 'critical for inclusion' range at the end of Round 2, reflecting an increase of 16% from the corresponding range in round 1. Only one of the health domain outcomes met the lowered criterion of 60% in the 'critical for inclusion' range: psychological or mental health (64.7% in the 'critical for inclusion' range at Round 2). Only one item from the psychological factors domain met the 70% threshold of 'critical for inclusion' range.
This was 'interest in making changes around alcohol use'. Finally, only one item in the intervention factors domain was ranked as 'critical for inclusion'. This related to whether the intervention was delivered as planned. The ranking for this item was 81.4% in the 'critical for inclusion' range in Round 2.

Discussion
Given that ABIs are a key component of alcohol policies worldwide, it is vital that policy makers, service commissioners, and practitioners are able to access and synthesize robust, consistent evidence to inform their implementation (Babor et al., 2007;Bernstein et al., 2010). A key factor currently impeding existing evidence synthesis efforts is a lack of standardized outcomes used in ABI trials . As seen in other fields, standardization of outcomes will improve the ability of others to synthesize and evaluate the literature. Thus, the COMET Initiative has developed a formal, multi-phase methodology Williamson et al., 2017) that researchers can use to establish a core outcome set (COS).
As part of the larger, multi-phase ORBITAL project endorsed by INEBRIA, this study is one step in establishing a COS for ABI trials using the COMET methodology. ORBITAL aims to simplify and inform future ABI trial decision-making (Daykin et al., 2016;Daykin et al., 2017) and move beyond individual trial researcher preference as the primary vehicle by which outcomes are chosen to one of consensus between stakeholders .
This study presents the results of the ORBITAL e-Delphi study and is the first attempt to seek international, multi-stakeholder perspectives on which outcomes should be prioritized for ABI trials.
The results of our e-Delphi study suggest that considerable standardization of outcomes used in the ABI trials is possible. A systematic review conducted as part of the larger ORBITAL effort (Shorter et al., under review) found that 2,641 outcomes, measured in approximately 1,560 different ways, were reported in ABI trials, suggesting enormous variability in the outcomes that the ABI research community prioritize. Yet our e-Delphi study found only nine outcomes met our a priori consensus threshold, seven of which were related to alcohol consumption. Relaxing our a priori threshold resulted in an additional five outcomes, four of which were related to healthcare use. Thus, our e-Delphi study suggests that much of the variation in the outcomes used in the ABI literature is driven by idiosyncratic decisions by individual researchers regarding the specific outcomes for any given trial rather than by a fundamental diversity of relevant outcome domains. If this conclusion is correct, then the ORBITAL effort to develop an ABI COS will greatly improve the ability of ABI researchers to provide consistent, policy-relevant evidence across studies on the outcomes they view as most important.
The validity of this conclusion, however, depends on the composition of our e-Delphi panel. As recommended by current best practice guidelines, we included a diverse set of panelists in our e-Delphi (Blackwood, et al., 2015). Participants were from a range of countries (19 countries across six continents) and stakeholder groups (researchers, policymakers, and service users/patients) in order to capture a broad range of perspectives.
Most were from the UK or the USA, however, and participants from South America and Asia were under-represented. We must therefore be cautious about the cultural relevance of prioritized outcomes in these locations (Hula et al., 2014). Furthermore, most panelists were researchers, which may have over-represented the consensus views of ABI researchers compared to other vital perspectives such as those of healthcare professionals, policymakers, and patient or public representatives. Diverse perspectives are likely to result in wider acceptance of the prioritized outcomes deemed critical to include in ABI studies, although we note priorities may differ in different participant groups and ABI settings (Hula et al., 2014).
For example, despite a wide range of critical outcomes identified by the panel, no critical outcomes were identified in the biomarkers domain. We can only speculate as to why no biomarker measure made it to the critical measure threshold. It may be that biomarker measures were less well understood by our online Delphi participants. However, it is also important to note they are less commonly reported in ABI trials (Kypri, 2007). This may be because biomarkers are generally considered more relevant to dependence, have poor sensitivity and/or specificity, or are inconvenient to use in comparison to self-report (Allen & Litten, 2003;Babor et al., 2000). Despite the lack of biomarker measures, our Delphi study identified outcomes across six domains, broadening the types of outcomes typically considered by any given ABI trial, while at the same time offering the possibility of standardizing outcomes across studies. This broadening highlights the importance of selecting outcome measures based on a consensus of the field rather than simply relying on what has been measured in prior research (Sinha et al., 2011).
Although our online Delphi study is the most rigorous attempt to identify the appropriate outcome measures for ABI trials thus far, it is subject to some limitations. There is ambiguity as to what constitutes consensus (Sinha et al., 2011) and so our a priori choice of 70% agreement is subject to possible criticism. Although we attempted to balance perspectives within our Delphi panel, difficulties in recruiting some participant types, particularly policymakers and patients, may have skewed the overall panel recommendations towards a researcher perspective. Similarly, the predominance of Englishspeaking countries among our panelists, especially the UK and the USA, may also have influenced our results and suggests caution with regard to the generalizability of our findings to non-English speaking and to low-or middle-income countries. Finally, given the nature of recruitment into our Delphi panel, it is not possible to determine the true response rate to our Round 1 invitation. This, combined with attrition between Rounds 1 and 2, may limit the validity of our results. Our use of anonymous voting and the diverse composition of our panel, however, adds to what is known about outcome priorities in the ABI field.
This study is the first attempt to identify outcomes using consensus methods for consideration in a core outcome set in ABI trials. It prioritizes outcomes that are most important to a range of key stakeholders in the field and will help guide researchers in choosing outcomes in future trials. The items prioritized here will be useful to improve evidence synthesis in future systematic reviews in the field. However, the prioritization of these outcomes is a dynamic rather than fixed process. More research is needed to: a) further prioritize these outcomes into a core outcome set for all trials of ABIs ; b) replicate this priority list over time and in under-represented groups; c) identify the best measures to represent these outcomes; and d) to determine if the adoption of these recommended outcomes improves standards in the field. The ORBITAL project, with oversight from the INEBRIA RMS-SIG, is pursuing these next steps to fulfill its charge of developing a consensusbased ABI COS to help drive the future of ABI research.    (4) EtG -Ethyl glucuronide on head hair (7) Analyzing hair for ethyl-glucuronide (ethylglucuronide is present in hair up to 3/4 days after alcohol is consumed-it shows if someone has had an alcoholic drink recently) Not currently present Biomarkers: Hair analysis for ethyl-glucuronide(4) Clinician Satisfaction (7) Clinician satisfaction (Clinician satisfaction with the alcohol brief intervention)

Tables
Not currently present broader measures of harm to others e.g. domestic violence for economic outcomes (7) Alcohol causing harm to other people often called 'harm to others' (The general impact that a person's alcohol has on other people than the drinker) This was partially covered by other issues (such as alcohol related offences, or role and relationship factors but was considered a distinct composite outcome) Improved social aspectsfinance (6) Improvement in finances (Changes in the amount of money a person has (either more or less) as a result of change in how much is being spent on alcohol) Not currently present WBAA (6) Levels of whole blood-associated acetaldehyde (whole blood-associated acetaldehyde tests can

Not currently present
Linking powered by eXtyles detect heavy alcohol consumption through the presence of acetaldehyde (a by-product of alcohol consumption) for up to three weeks following use) Suggested outcomes covered by existing outcomes Concomitant use of Tobacco; Marijuana; Other Drug (8) The use of alcohol with another drug Was covered by existing outcome Self-reported general health (8) (7) Use of drug or alcohol treatment services provided in a healthcare setting or by a healthcare professional Was covered by existing outcome; timing is a measurement issue to be covered by guidance on how to measure an outcome if selected for core outcome set Likelihood of seeking to change alcohol use in the future (6) Interest in making changes around alcohol use Was covered by existing outcome; timing is a measurement issue to be covered by guidance on how to measure an outcome if selected for core outcome set Other sources of help (current) (6) Seeking help for alcohol or drugs not from a healthcare provider Was covered by existing outcome; timing is a measurement issue to be covered by guidance on how to measure an outcome if selected for core outcome set Improved social aspectsemployability or education (general) (6) Workplace or college productivity Was covered by existing outcome Number of arrests (6); General criminal justice costs Was covered by existing outcome