A novel decision-making model for selecting a construction project delivery system

. It is crucial for the owner of a construction project to select an appropriate project delivery system (PDS) during early decision-making stages of the project. Due to project uncertainty or a lack of project information, the parameters of a PDS are difficult to measure and quantify. Therefore, there are still major challenges to the objective selection of PDSs. This research proposes a novel systematic decision-making model to select the appropriate PDS by using the combination of case-based reasoning (CBR) and robust nonparametric production frontier method. The Bayesian-Structural Equation Modeling (SEM) supported Z-order-m method is interpreted into the case retrieves process of traditional CBR method in order to eliminate the deteriorative internal and external influence for PDS selection. The case study was based on questionnaire survey conducted in China and used to test the validation of the proposed model. The findings reveal that the systematic decision-making model can overcome some problems of the traditional methods and improve the accuracy of PDS selection. As a result, this research has both theoretical and practical implications for the construction industry.


Introduction
Project delivery system (PDS) is one of the crucial factors that influence the success of a construction project.It stipulates the project owner's management functions during the project and reflects the roles, responsibilities and risks of project parties (American Society of Civil Engineers, 2012).Once a PDS is selected, it cannot be changed during the implementation of a project.The management functions of a project owner can work in a more effective way only when the appropriate PDS and contract strategy are selected (Anderson & Oyetunji, 2003).Project performance problems, such as schedule delay, cost overrun, and quality defect, are often attributed to an inappropriate selection of PDS at the beginning of a project (Mostafavi & Karamouz, 2010;Minchin et al., 2013;Khanzadi et al., 2016).
Due to the uncertain situation and multiple criteria to be considered, it is difficult for the owner of a construction project to select an appropriate PDS in the early stage of the project (Mafakheri et al., 2007).To solve this welldefined problem, multi-criteria decision making methods, including analytical hierarchy process (AHP), multi-attribute utility method, case-based reasoning (CBR), and data envelopment analysis (DEA), as well as other methods such as artificial neural network (ANN) and their variants, are extensively applied to select the most appropriate PDS (Khanzadi et al., 2016).The literature on PDS selection methods is listed in Table 1.In fact, these previous researches evaluated different PDSs, relying mainly on the experts' experience.Rather than the characteristics of the project itself, they specifically ignored the interference of external environment.
AHP designed by Saaty (1972) streamlines a complex problem into a hierarchy structure and elicits the preference by converting the subjective comparison of relative importance into the overall scores or weights.Several studies have been conducted using AHP to either select the PDS (Alhazmi & McCaffer, 2000;Oyetunji & Anderson, 2006) or analyse the indictor of PDS selection (Al Khalil, 2002;Mahdi & Alreshaid, 2005).To address the impreciseness or uncertainty in this decision problem, rough set (Mafakheri et al., 2007) and fuzzy set (Khanzadi et al., 2016) were combined with AHP in order to increase the accuracy of selection.However, subjectivity still exists to a certain extent due to the preferred structure of AHP and the strong dependence on evaluation experts' experience (Chang & Ive, 2002;Chen et al., 2010).
To overcome the shortcomings of AHP, many scholars, such as Chan et al. (2001) and An et al. (2018), have tried to avoid the hierarchical structure of AHP and directly applied multi-attribute utility method to the selection of PDSs.The overall utility of different PDSs is calculated by multiplying the weights by the utility of the indicators.This method still relies on the experience of experts to determine the weight of the indicator, which make it challenging to ensure the validity and reliability of the whole selection model.The statistical approaches, including descriptive statistics (Ojo et al., 2011), T-test (Minchin et al., 2013), and principal component analysis (Qiang et al., 2015), were also utilized to make the selection through the objective description or comparison.However, it is difficult to capture the general statistical features of each PDS due to the properties of the construction project, which characterized in diversity of the project nature and client objectives (Luu et al., 2005).
To avoid the subjective influence (e.g.assigning weights or values) and simplify calculations, DEA and ANN methods were involved separately or integrated into the process of PDS selection.As a nonparametric method to study the production efficiency of inputs and outputs (Charnes et al., 1978), various DEA model were adopted to measure the efficiency of different PDSs as the pre-processing step (Lo et al., 2007;Chen et al., 2010Chen et al., , 2011;;Shi et al., 2014).Based on the results, fuzzy logic (Shi et al., 2014) or ANN (Chen et al., 2011) were respectively used Fuzzy distance calculation Luu et al. (2003Luu et al. ( , 2005Luu et al. ( , 2006) ) DEA DEA Lo et al. (2007) Ignoring the environmental factors and the intermediate process Super-efficiency DEA Chen et al. (2010) to choose the appropriate PDS.In additional, without any prior knowledge, ANN could automatically generate identifying characteristics (output) from the learning data sets (input) that it processes.Taking advantage of this characteristic, Gazder et al. (2018) unitized ANN independently to determine the PDS.However, these studies mainly focus on the input and output of the model, ignoring the environmental factors and the intermediate process that may have a big effect on the result.CBR is a method based on rule-based reasoning and is not dependent on the judgment of expert.CBR could offer a paradigm which is similar to the methods the decision maker adopts in problem solving.Through the technique, the PDS of a new project can be determined based on the PDS of previous similar project by calculating the similarity between a new project (target case) and historic projects (source case) (Kumaraswamy & Dissanayaka, 2001;Ribeiro, 2001;Luu et al., 2005).However, these researchers mainly focus on the construction of indicator framework and similarity calculation; little attention has been given to the quality of cases themselves.If the cases are not screened and processed, especially when random noises, measurement errors, extreme points or abnormal values exist in the case base, the accuracy and precision of the solution will be deteriorated.
Therefore, it is necessary to develop new methods or techniques to address the shortcomings of the existing methods.This research proposes a hybrid method combining CBR and the robust nonparametric production frontier method to build a novel systematic decisionmaking model of PDS selection.By utilizing the CBR techniques, the owner of a new project (target case) can retrieve the potential historic project (source cases) from a database according to the similarity of the PDS selection preference.The improved nonparametric production frontier method, supported by Bayesian-Structural Equation Modeling (SEM), is subsequently used to measure the efficiency of each potential historic project and then finally determine the PDS for new project according to the PDS of the identified historic project with optimal efficiency.It is expected that the accuracy of PDS selection can be improved.This research has four objectives: (1) identifying the indicators and criteria that influence PDS selection for case retrieval; (2) investigating the feasibility of adapting the nonparametric method for case reuse and revise; (3) building the systematic model of PDS selection; and (4) testing the validation of the proposed model through the case study.Each of them will be further discussed accordingly throughout the following sections.

The types of PDS
PDS is termed as the contractual arrangement of the design, procurement and construction (Khanzadi et al., 2016).Various PDS are available in construction industry, including Design-Bid-Build (DBB), Design-Build (DB), Engineering-Procurement-Construction (EPC), Design-Build-Operate, Design-Build-Operate-Manage, etc. (Rowlinson & McDermott, 2005).The extended PDSs refer to those with the nature of financing, such as Design-Build-Finance-Operate and Design-Build-Finance-Maintain (Merna & Al-Thani, 2018), and those with the nature of coordination/management, such as Construction Management (CM) and Management Contracting (MC) (Hughes et al., 2015).

Project performance indicators
PDS selection plays an important role in achieving project success, and many construction studies have been conducted to analyze project performance indicators in different PDSs.Konchar and Sanvido (1998) conducted a performance analysis of 351 construction projects that adopt different PDSs (e.g.DB, DBB, and CM at risk) using a series of performance indicators.The majority of these indicators are objective and quantitative, such as cost growth, construction speed, progress growth, etc.Some quantitative performance indicators, such as the difficulty of equipment startup and the cost for operation and maintenance, are also included in Konchar and Sanvido (1998).Thomas et al. (2002) analyzed the impact of PDS selection on project performance in the aspects of cost, schedule, safety, rework and change.Ibbs et al. (2003) compared DBB and DB in terms of construction period, cost and production efficiency.Ling et al. (2004) and Ojo et al. (2011) predicted time performance, cost performance, quality performance and client satisfaction in DBB and DB projects.A. P. C. Chan and A. P. L. Chan (2004) pointed out that key performance indicators to measure project success include time, cost, quality, safety, commercial value, stakeholder satisfaction, and environmental performance.Based on these studies, particularly Chan et al. (2002), Ling et al. (2004) and Chen et al. (2010), a framework of project performance is outlined in Table 2, in which cost, schedule, quality, safety, contract/business, and others are considered as six main performance categories.Each category consists of 1-4 performance indicators.

PDS selection criteria
PDS selection relies on criteria for decision-making.A considerable number of construction researchers have made effort for the investigation of PDS selection criteria.For example, Oyetunji and Anderson (2006) provided a series of PDS selection criteria, including facilitating control of time growth, ensuring shortest reasonable schedule, facilitating control of cost growth, ensuring lowest reasonable cost, minimizing rate of expenditure, facilitating accurate early cost estimates, promoting early design and purchase of long lead materials and equipment, capitalizing on expected low levels of changes, etc.In this research, Table 3 is adopted and modified from Chen et al. (2010) by adding updated references, in which 20 PDS section criteria identified from a literature review constitute a framework and are grouped into five categories: schedule, cost, owners and contractors, project, and external environment.
Owner critically requires early cost figures to facilitate financial planning and business decision Quality and health/safety are essential requirements for any construction projects; however, it is difficult to select PDS based on quality and health/safety because their importance for any project is equivalent.Therefore, they are not considered as two categories in the framework.

Order-m and Z-order-m methods
As a nonparametric production frontier method that does not rely on restrictive hypothesis on the data source, DEA can measure the production efficiency of inputs and outputs that are continuous and has become increasingly favorable (Oh & Shin, 2015).However, the general DEA method should satisfy the assumption of convexity for the production possibility set p, and (x, y)∈p, where (x 1 , y 1 ) and (x 2 , y 2 ) denote the input and output of DMU 1 (Case 1) and those of DMU 2 (Case 2), respectively, where DMU stands for decision making unit (Kneip et al., 1998).When input and output variables are discrete, however, this assumption is no longer valid.In addition, the DEA method may have a deviation to some extent in the parameter estimation.This makes the production frontier constructed from DEA highly sensitive to the change of variable values (Simar & Wilson, 1998).
In comparison to the convex set of DEA that includes all data points (cases or projects), the partial production possibility set of the order-m method proposed by Simar (2003) releases the requirement of convexity and excludes extreme points or abnormal values.This makes the order-m method more suitable for construction projects in which extreme points and abnormal values are inevitable.For a construction project represented by (x, y) where x and y denote its input and output respectively, when it is compared with m random cases rather than with all source cases in case base of construction projects, the order-m efficiency value can be steadier under exceptional circumstances, such as the schedule extension for a suspended contract or the schedule deduction for a terminated contract.
Compared with the above internal influence of construction project, the external environmental factors should also not be ignored since they may influence the production efficiency but are neither inputs nor outputs under the control of the producer (Daraio & Simar, 2005).However, both DEA and order-m method do not consider environmental factors.Based on the order-m method, Daraio and Simar (2005) further proposed the nonparametric production frontier method, namely Z-order-m, which could introduce the environmental factors and therefore become able to analyze the impact of environmental factors on production efficiency values.These robust nonparametric production frontier methods, i.e. order-m and Z-order-m, were extensively used in different applications, including production risk (Serra & Lansink, 2014), wastewater treatment plants (Guerrini et al., 2016) and health care (Gearhart & Michieka, 2018), etc.

Bayesian-structural equation modeling (SEM)
Regarding production frontier methods, a challenge of dimensionality may exist when there are a large number of inputs and outputs.Some studies, such as Sueyoshi and Goto (2009), have applied the principal component analysis approach to reduce the computational burden of multi-dimensional data when adopting non-parametric production frontier methods in the construction industry.Unlike principal component analysis that has no hypotheses about the number of latent factors and the relationship between latent factors and observed variables, structural equation modeling (SEM) tests the structural relationship between latent factors and observed variables (Gupta & Kim, 2008;De Carvalho & Chima, 2014).SEM is divided into structure model and measurement model.It is integrated with confirmatory factor analysis, path analysis and multiple linear regression analysis, among which confirmatory factor analysis and multiple linear regression analysis can be used for dimension reduction (Timothy, 2015).Especially when there is only one dependent variable, unlike multiple linear regression analysis, SEM is not necessary to particularly follow the assumption of the observed variables that are independent of each other.

A hybrid Z-order-m based CBR
CBR is a paradigm of artificial intelligence and cognitive science.Essential processes to build a typical CBR system include: (1) case retrieve: retrieving the similar source cases in case base according to target case; (2) case reuse and revise: adapting to the target case by revising former solutions of the retrieved source cases if necessary; and (3) case retain: storing the target case as a new case for future retrievals (Aamodt & Plaza, 1994;Juan, 2009).The case retrieve is critical process of CBR method as its result will significantly affect the accuracy and performance of the entire CBR system, and it generally includes case representation and similarity assessment (Zhao et al., 2017).Traditional information systems rely on accurate input in order to produce meaningful outputs (Brock & Khan, 2017).In the context of a construction project, which are unrepeatable and mutually independent, the characteristics and external environment of each project is diverse.In such heterogeneous environments, the calculation of similarity of construction will be difficult and data-cleaning techniques are required to solve the "garbage in, garbage out" problem.Therefore, a two-stage decision support model is developed in this research by utilizing a hybrid Z-order-m based CBR method to select the optimal PDS.Within this model, the Z-order-m is integrated into the retrieve process of CBR, which can quantitatively exclude the influence of extreme points, abnormal values and external environment for the source case, and therefore enhance the robustness of CBR operation.

Methodology
This research attempts to integrate Z-order-m into the CBR method.As shown in Figure 1, the CBR method adopted in this research includes two steps, namely, case retrieve and case retain.And the case retrieve process is divided into three phases, including (1) case representation; (2) similarity calculation; and (3) Z-order-m.The three research phases are discussed in detail systematically in this section.

Case representation
Case representation is important in CBR as the rulebased reasoning capability of CBR depends primarily on the structure and content of cases (El-Sappagh & Elmogy, 2015).Cases can be diversely represented either in simple attributes vector, or complex object-oriented and tree representations (Richter & Weber, 2013).The choice of a specific representation is predominantly determined by the information stored within a case.According to the analysis of previous studies on PDS selection in hereinabove, project information, PDS selection criteria and project performance indicators, all of which are collected by a questionnaire survey, can be adopted to represent the source case.
The questionnaire in this research is divided into three parts.In the first part, a respondent is required to choose a completed project he/she has experienced and provide the background information of the project, including duration, budget and PDS type, some of which can be used as inputs of the Z-order-m method.The second part measures the performance of the project using the 14 indicators shown in Table 2. Project performance is measured

Updated case base
Case retrieve

Case retain
Environmental factor according to a five-point Likert scale from 1 (poor performance or out of control) to 5 (good performance or well controlled).The performance indicators are used in Zorder-m as outputs.The third part refers to PDS selection criteria in the project, including a total of 20 PDS selection criteria listed in Table 3. Eight PDS selection criteria (with "★") may change in two directions.They can be split into pairs of forward and backward criteria.As a result, the total number of PDS selection criteria increase from 20 to 28.In the third part, a respondent is required to rank PDS selection criteria according to their relative importance.

Similarity calculation
Both the attributes of the PDS themselves and the time when the owner chooses the PDS determine that the PDS selection is mainly based on the PDS selection criteria rather than project performance indicators.Although the preference for the owner of each construction project is not exactly the same, there are still certain same patterns for a particular PDS, such as risk sharing, cash flow requirements, etc.Therefore, case retrieve can be achieved by calculating the similarity of PDS selection criteria between the new project (target case) and the historic project (source case).Since the PDS selection criteria is linear and simple in structure, the nearest neighbor approach, a popular case retrieve method is suitably used in such circumstance (de Mántaras & Plaza, 1997).The missing value can be filled in by the average value.Eqns (1) and ( 2) shows the calculation of similarity between construction projects for PDS selection when using the nearest neighbor approach.
( ) where S(T, S) is the similarity between a target case T (a new project) and a source case S (a historic project) in the case base represented by the revised reference set, sim (f i T , f i S ) denotes the similarity between Case T and Case S for Criterion i (i = 1 -n) of PDS selection, n is the total number of PDS selection criteria (n = 28), f i T and f i S stand for the values of Criterion i of PDS selection for Case T and Case S, respectively.Since each criterion of PDS selection has a rank number, If any criterion of PDS selection is not ranked during the questionnaire survey for any project, 0 is assigned to f. Eqn (2) is used to determine sim (f i T , f i S ): ( ) Both S(T, S) and sim (f i T , f i S ) have a value range of 0-1.The larger the value is, the higher the similarity Case T and Case S have.The similarity value of 1 means complete matching between the two cases.On the other hand, the value of 0 refers to complete non-matching between them.As a result of similarity analysis, historic projects that have the highest similarity to a new project are retrieved from the case library to determine the appropriate type of PDS for the new project.

Z-order-m
Construction projects are unrepeatable and mutually independent.From statistical perspective, outliers (extreme points/abnormal values), which are often seen in construction projects, have a detrimental role in establishing the production frontier with DEA.Under such circumstance, this order-m method can be treated as a better solution for PDS selection.Moreover, existing studies, such as Smith et al. (2014), show that construction projects are generally influenced by various environmental factors.Therefore, it is necessary to analyze the impact of environmental factors on the production efficiency of construction projects.The Z-order-m model is adopted in this research because it takes environmental factors into consideration.Consequently, the source cases, which have optimal efficiency value in Z-order-m method, can be deemed as the cases that are not affected by either outliers or environmental factors.
The Z-order-m method, in this model, is subsequently used to identify the source cases with optimal efficiency values.The potential cases of CBR for PDS selection consists of all the source cases collected from the questionnaire survey.Based on the results of Z-order-m analysis, the source cases without optimal efficiency values are removed and the remaining construction projects constitute the optimal source cases of CBR, whose PDS selection criteria are useful to guide PDS selection decision making in a new project.

Input and output variables of Z-order-m
The nonparametric production frontier methods, including the Z-order-m model, need to find appropriate input and output variables.According to Lo et al. (2007), duration and budget can be used as the inputs that are necessary to complete a construction project.Similarly, Chen et al. (2011) considered project objectives in terms of duration and budget as well as personnel as input variables in a construction project.Duration and budget may vary significantly from one project to another, depending on project scale.At the beginning of a construction project, duration and budget are planned so that project parties can control time and cost performance during the project.On the other hand, the owner of a construction project generally sets maximum acceptable schedule and maximum acceptable cost.Rather than the direct use of duration and budget, this research defines the following acceptable schedule variance rate and acceptable cost variance rate as schedule and cost inputs.
The project performance indicators are adopted in this research as output variables.Table 2 shows a total of 14 performance indicators.If two inputs (see Eqns (3) and ( 4)) and 14 outputs are included in the Z-order-m model, there will be a 16-dimensional space.Since the number of inputs and outputs has a key role in determining the number of DMUs, the curse of dimensionality is a non-negligible challenge when using the nonparametric production frontier methods.To avoid large variances, which are caused by high dimensions, keeping a reasonable dimensionality has to be considered in the Z-order-m model.Dyson et al. (2001) indicated that the number of observations must be at least two times of the product of the number of inputs and the number of outputs.Raab and Lichty (2002) proposed that the number of DMUs must be at least three times of the sum of the number of inputs and the number of outputs.Furthermore, Simar and Wilson (2008) revealed that, with respect to the number of inputs and outputs, the number of DMUs increases exponentially (not linearly) to maintain the same order of estimation error.
As the analysis in hereinabove, SEM can be used in this research to reduce the dimensionality of project performance indicators.There are different estimation methods for SEM, e.g. the maximum likelihood method and the Bayesian estimation method.The maximum likelihood method requires variables to follow a certain distribution, such as a uniform distribution or a trapeziform distribution.Unlike the maximum likelihood method, the Bayesian estimation method does not have any distribution requirements.In this research, project performance indicators are represented by ordinal variables.The distribution of project performance indicators is unknown.Therefore, this research adopts the Bayesian estimation method for SEM to reduce the dimensionality of project performance indicators.
The research tries to build a linear structural model, which reduces and aggregates the 14 performance indicators into one variable, which is termed as F output .And F output can be used as the output variable for each DMU within the Z-order-m method.The value of convergence statistic (C.S.) is calculated to test the above-mentioned SEM model.According to Gelman et al. (2014), the model is acceptable if C.S. satisfies 1 ≤ C.S. ≤ 1.10.If the SEM model is effective, Eqn ( 5) is displayed below to calculate where a k is the regression coefficient obtained from the Bayesian estimation method for Performance Indicator k, x k is the score of each performance indicator in a project collected from the questionnaire survey, and K is the total number of project performance indicators (K = 14).As a result, F output indicates the aggregated output variable for the use of nonparametric production frontier methods, especially the Z-order-m method.

Environmental variables of Z-order-m
Based on the order-m method that ignores external variables, the Z-order-m method takes environmental factors into account and therefore it is applied in this research to analyze the impact of environmental factors on the production frontier for PDS selection.Some construction studies, such as Konchar and Sanvido (1998) and Chan and Park (2005), consider intensity as an environmental factor when investigating project determinants.Intensity in these studies refers to unit cost divided by total time.It represents an environmental factor that influences the production process.This research replaces unit cost with project budget and defines the ratio of project budget to project duration as the environmental factor.The ratio represents a new intensity in the project environment.
Based on the input, output and environmental variables, the efficiency value for each source case can be calculated according to the procedures proposed by Daraio and Simar (2005).

Data collection
The questionnaire was distributed to 200 reputable organizations in the Chinese construction industry, such as construction contractors, project owners, management consultants, and suppliers of materials and equipment.
The respondent who had a minimum of ten years of occupational experience were eligible for the survey.In total, 99 responses were returned, among which 96 responses were valid for Parts 1 and 2 in the questionnaire while 67 responses were valid for Part 3.This means that 29 participants did not complete Part 3 and therefore their responses were excluded.As a result, there were 67 valid responses for all the three parts.Of the valid responses, more than 70% were from contractors while less than 30% were from owners, consultants and suppliers.The analysis of the valid responses reports the application of different PDSs in the surveyed projects.The PDSs with more than 5% of responses are: DBB (including DBB + MC) (18.48%),DB/EPC (46.74%),Multi-stage DB/EPC (16.30%), and EP + C (8.70%).By comparison, DBB, DB, EPC and their derivative systems are more commonly used in the Chinese construction industry.

Reliability analysis and statistical perspective of questionnaire results
Reliability analysis is used in this research to test the consistency and stability for project performance indicators collected from the questionnaire survey.As a result of reliability analysis, the Cronbach's Alpha coefficient based on original data is 0.719 > 0.7 while the Cronbach's Alpha coefficient based on standardized data is 0.721 > 0.7.Both coefficients are greater than 0.7, indicating that the surveyed projects' performance information is valid and credible for statistical analysis.As a result, PDS selection criteria are ranked in this research through the analysis of questionnaire responses.The average ranking of PDS selection criteria is listed in Table 4 for the 67 valid questionnaire responses.

The result of Bayesian-SEM
A total of 67 responses are valid for all the three parts in the questionnaire, which may not be sufficient for the 16-dimensional space when using the Z-order-m model.For this reason, Bayesian-SEM is used in this research for dimension reduction.The test result of C.S. is 1.0240 after 500 + 48562 iterations using Bayesian estimation method in AMOS.Consequently, the SEM model is deemed effective.The result can be seen in Table 5. Eqn ( 5) is then used to compress the 14 performance indicators into one aggregated output variable F output .

Analysis results of order-m model
As discussed above, acceptable schedule variance rate and acceptable cost variance rate are used in this research as two input variables.On the other hand, the aggregated project performance indicator F output is treated as an output variable.To exclude the influence of outliers (extreme points, abnormal values), the input and output variables are firstly substituted into DEA and order-m models, to obtain the input-oriented efficiency values of each source case.The parameter m refers to the order in the order-m and Z-order-m models.The order m of the production frontier has an interpretation of benchmarking against m competitors (Daraio & Simar, 2007).According to Simar and Wilson (2008), the value of m is somewhat less than the size of the sample (the reference set) although there is no standard for the selection of m when using the orderm and Z-order-m method.Since the sample size in this research is 67 and it is possible to assume that each case (each project) has more than 50% of competitors in the sample, m = 35 can be obtained.The results of input-oriented efficiency values are presented in Table 6.Table 6 demonstrates order-m ≥ DEA in terms of estimated input-oriented efficiency values.The production frontier of DEA envelops all the data points (i.e.DMUs), including extreme/abnormal data points.Compared to DEA, the production frontier of order-m envelops neither extreme points nor abnormal ones.Therefore, the order-m envelop area is the minimum between the two frontiers.The distance between a DMU and the order-m frontier is the shortest and therefore the order-m efficiency value is the greatest.By comparison, the order-m frontier is not only insensitive to outliers (extreme points/abnormal values) but also easy to explain the significance of reality.

Analysis results of Z-order-m model
Based on the analysis of order-m, this case study further explores the influence of environmental factor budget/ duration ratio.Daraio and Simar (2007) highlighted the importance of selecting bandwidth h of a kernel that is used to smooth the Z-order-m method.Based on the dataoriented program adopted by Simar and Wilson (2008), h with the budget/duration ratio as the environmental factor is determined in this research.If h is 60, the estimated efficiency value deviates least (see Figure 2).
As outlined in 3.3, the Z-order-m method introduces environmental factors based on the order-m method.Q z is the ratio of the estimated efficiency value with an environmental factor (Z-order-m) to that without an environmental factor (order-m).Z is the value of the environmental factor.The budget/duration ratio is adopted in this research as the environmental factor.Figure 3 describes the relationship between Q z and Z, showing that Q z decreases as Z increases.In other words, the efficiency under the environmental condition is always smaller than the unconditional efficiency.This is consistent with Simar and Wilson (2008), confirming its correctness in the construction research field.
Based on the 67 valid questionnaire responses, the Z-order-m method is used in this research to calculate input-oriented and output-oriented efficiency values for each source case, considering the budget/duration ratio as an environmental factor.The results of input-oriented and output-oriented efficiency values are presented in Table 7.The projects with both input-oriented efficiency values ≥1 and output-oriented efficiency values ≤1 can be regarded as optimal cases.PDS selection for optimal cases is also shown in Table 7.On the other hand, non-optimal cases for PDS selection are ignored because they have no contributions to the updated CBR case base.
The CBR method is chosen for PDS selection in the current study.The initial reference set of CBR is based on the 67 valid questionnaire responses.After removing the 24 cases (projects) that fail to meet the requirements for both input-oriented efficiency values ≥1 and output-oriented efficiency values ≤1, 43 remaining cases (projects) in Table 7 constitute the revised reference set of CBR, namely the revised case library, which covers six types of PDSs in construction projects: DBB, DBB (early procurement), DBB + CM (at agent), DB/EPC, Multi-stage DB/EPC and EP + C. When using the CBR method, the similarity between construction projects is calculated according to Eqns (1) and ( 2).

The Similarity assessment and accuracy of PDS selection model
To validate the proposed hybrid Z-order-m based CBR model, this research presents a comparative analysis between the proposed method and the traditional CBR model without considering Z-order-m.The case base of traditional CBR includes all the initial 67 cases, whereas the case base for the hybrid Z-order-m based CBR model solely involves the optimal 43 cases.Each case in the two case bases is selected sequentially as the target case to calculate the similarity using Eqns (1) and ( 2) separately.
Except for the target case itself, each target case is labelled with the PDS of the source case which has the greatest similarity with the target case.With the comparison between the labelled and the original type of each case, the accuracy of the model can be calculated.Through the iterative calculation using Python 3.7.5, the overall accuracy rates are 89.55% and 95.34% for the initial and updated case base, respectively (see Table 8).Based on Table 8, it can be found that for the PDS with small frequency (i.e.DBB (early procurement) and DBB + CM (at agent)), the performance for both case libraries is not ideal.Note: a. ▲ represents a non-optimal project according the Z-order-m production frontier; b.PDS 1: DBB; PDS 2: DBB (early procurement); PDS 3: DBB + CM (at agent); PDS 4: DB/EPC; PDS 6: Multi-stage DB/EPC; PDS 7: EP + C.
However, for the accuracy of remaining PDS and the overall accuracy, the updated case base gets a better result compared with the initial case base.As a result, the accuracy is improved significantly after the utilization of Z-order-m method to remove non-optimal cases from the case base.

Conclusions
To address the problems within existing studies on PDS selection, this research adopts a combination of CBR and robust nonparametric production frontier method to build a novel model.It demonstrates the feasibility of integrating the nonparametric production frontier method into the CBR retrieve process for PDS selection in construction projects.It also compares DEA-type methods, based on which the Z-order-m method is chosen to establish the partial production frontier for PDS selection.A case study based on the questionnaire survey is conducted to test the validation of the proposed model.With the comparative analysis between the traditional CBR and the hybrid Zorder-m based CBR, the results reveal that the later could effectively improve the accuracy for the selection of PDSs.This research overcomes the shortcomings of the traditional DEA-type methods, such as inevitability of outliers and ignorance of environment factors.It utilizes the Z-order-m method to better estimate the efficiency values and establish a more stable production frontier.To meet the dimension limit during the Z-order-m estimation process, it employs Bayesian estimation-based SEM to reduce the dimension of project performance indicators.It confirms that environmental factors have an impact on the efficiency estimation of construction projects.Improving the accuracy rate of PDS selection by the hybrid Z-orderm based CBR method characterizes an important contribution of this research to the body of knowledge.This research also provides industry practitioners with a good example of using quantitative analysis for PDS selection in their projects.
An updated case base of CBR, based on nonparametric production frontier theory, is developed in this research.In the developing process, the optimal cases obtained from the Z-order-m efficiency estimation are retained in the updated case base while the non-optimal cases are removed.Doing like this greatly reduces the number of cases (projects) whose information is collected from the questionnaire survey, which may result in a limitation.Another limitation of this research is that the bias for the minority group of PDS is high.Therefore, future research is recommended to collect more questionnaire responses from different types of projects to improve the case library.In this research, the budget/duration ratio is used as an environmental factor.More environmental factors, such as weather, geographical condition and change in legislation, can be recommended in future research to enhance the efficiency estimation.

Figure 1 .
Figure 1.Structure of the proposed research based on the robust non-parametrical production frontier theory

Figure 2 .Figure
Figure 2. Determination of bandwidth when using Z-order-m

Table 1 .
Methods of PDS selection

Table 2 .
Framework of project performance indicators

Table 4 .
Ranking of PDS selection criteria

Table 5 .
Results of Bayesian estimation

Table 6 .
Input-oriented efficiency values of production frontier

Table 8 .
The partial and overall accuracy rate for the PDS selection model

Table 7 .
Z-Order-m efficiency values with budget/duration ratio as environmental factor