Statistical Model for Schedule Prediction: Validation in a Housing-Cooperative Construction Database

: There are often considerable differences between the planned schedule for a construction project and what later develops during actual construction. This paper introduces an innovative approach that uses Markov Chain models to support predictions during earned value analyses. A statistical model was developed to predict possible deviations in a project schedule and the future progress of a project. This model, based on Markov chains, uses data from the past to adjust future predictions. A case study was built from a database of 90 housing cooperative construction projects and was validated in 12 more projects. A cross validation of three interactions was also carried out, obtaining an error of 2.38% in the prediction of future progress and an error of 4.29% in the prediction of construction timing. The innovative prediction model presented in this paper contributes to the management body of knowledge by introducing a new tool for the management and control of construction timing. The method presented improves construction management because it predicts future deviations in schedules with reduced errors and determines total deviation from a construction schedule with great precision. This allows better control over work timing and represents important input in determining strategies and future actions. DOI: 10.1061/(ASCE)CO.1943-7862.0001396. © 2017


Introduction
The problem of project schedule overruns in the construction industry is a global one (Sambasivan and Soon 2007).Alkhathami (2004) defined a schedule overrun as extra time required to finish a given construction project beyond its original planned duration.Assaf and Al-Hejji (2006) defined a schedule overrun as time either beyond the completion date specified in a contract or beyond the date that the parties agreed on for delivery of a project.It is basically a project slipping over its planned schedule and is considered a common problem in construction projects worldwide.Delivery of a project within the contract-stipulated time is one of the yard sticks of a successful project.Despite its proven importance, however, construction projects that fail to achieve their objectives are not uncommon (Memon et al. 2012).
The construction process can be divided into three important phases: conception, design, and construction.Usually, the vast majority of project delays occur during the construction phase, where many unforeseen factors are always involved (Chan and Kumaraswamy 1997).This is why study of the construction phase and development of a delay prediction tool are of relevance.
One of the most popular and accepted tools for controlling and monitoring construction timing is schedules (Aristondo 2003).This paper presents a statistical model for predicting schedule deviation and validates it in a Uruguayan case study using a database of 90 housing cooperative construction projects plus 12 more projects for cross validation.The model allows managers to take into account historic behaviors for the prediction of schedule deviations in order to make corrections and take management decisions during the construction phase.

Literature Review
Earned value analysis (EVA) is an accepted theoretical technique advocated for the control of projects.This section describes the literature relevant to EVA and the state of the art relating to Markov chains and their use in construction schedule prediction.

Earned Value Management
Earned value management (EVM) is a methodology used to measure and communicate the actual physical progress of a project that integrates the three critical elements of project management: scope, time, and cost.It takes into account work done, time taken, and costs incurred to complete the project, and it helps to evaluate and control project risks (Fleming and Koppelman 2010).According to Lipke (2003), EVM is a technique for controlling the performance of a project by comparing the amount of work up to a certain moment with estimates made before the project was started.In this way, there is a measure of the amount of work finished and the amount of work remaining.Moreover, EVM allows the efficiency of the original schedule to be determined.
The concept of earned schedule (ES) is an extension of the concept of EVM, and it refers to the amount of additional time needed to reach the established progress goals when construction is behind schedule.If construction is ahead of schedule, it can be defined as the range of time in which construction can make no progress without producing a delay in the schedule.This concept is shown in Fig. 1.Therefore, the indicator used for this work, denoted ES total and measured as the percentage of expected progress, indicates the amount of additional time needed to reach 100% progress (i.e., project completion).If construction is ahead of the original schedule, ES total is measured as the percentage of time (in terms of expected progress) that construction may take without falling behind schedule.
Earned schedule is used to determine the efficiency of schedules.For example, Lipke (2003) carried out research with real data from 16 building projects in the United States, comparing the efficiency of some EVM methods and ES in estimating and predicting schedules.Anbari (2003), Fleming and Koppelman (2010), and Vanhoucke (2010, 2014), studied EVM and ES, showing their applicability in measuring scheduling efficiency.Liberatore et al. (2001) and De Marco and Narbaev (2013) used EVM indicators to evaluate construction project performance.Forbes and Ahmed (2010) validated the use of EVM indicators for several studies in civil construction.
Given the importance of some EVM indicators in the control and management of construction, this paper presents a statistical model that predicts and adjusts construction schedules, making it possible to predict total deviation from the schedule's ES total and its deviation percentage at a specific time of construction, Sv t (Fig. 1) for future construction.The model is based on Markov chains, which not only make it possible to adjust predictions while construction is taking place but also use data collected during construction to adjust future predictions.

Stochastic Process and Markov Chains
A series of observations X 1 ; X 2 ; : : : ; X n is called a stochastic process if the values of these observations cannot be exactly predicted but the probabilities for different possible values can be specified at any time (Manjia et al. 2013).For example, X 1 defines the initial state and X n defines the state at time n.For each possible value of the initial state s 1 , and for each succeeding value of the states X n (n ¼ 2; 3; : : : ; k), it can be said that in a stochastic process the probability of an event at moment n þ 1 can be defined as in Eq. ( 1) if what happened moments before is known A Markov chain corresponds to a specific class of stochastic process in the field of probabilistic models (Bertsekas and Tsitsiklis 2000;Ross 2003) in which the current state X n and the previous states X 1 ; : : : ; X n−1 are known.Therefore, for n ¼ 1; 2; : : : ; k and for any succession of states s 1 ; : : : ; s nþ1 , Eq. ( 1) can be redefined as Markov chains have the probabilities of stationary transitions if for each pair of states s i and s j there is a probability of transition p ij , such that what is established in Eq. (3) holds For a stochastic process with k possible states s 1 ; : : : ; s k , denoted p ij ðmÞ ¼ PðX nþ1 ¼ s j jX n ¼ s i Þ, where p ij ðmÞ is the element in row i and column j of the transition matrix P ðmÞ with (m ¼ 2; 3; : : : ; k − 1), the matrix m of transition P ðmÞ of the Markov chain is defined as There are several cases in which Markov chains have been used for predictions.Yuan (1999) used Markov chains for predicting protein subcellular locations; Logofet and Lesnaya (2000) used them for an ecological prediction model; and Zhu et al. (2002) used them for prediction in adaptive websites.However, in the literature no evidence of Markov chains being used in civil construction management or earned value analysis has been found, and for this reason the present research can be considered valuable and innovative.

Statistical Model for Forecasting Schedule Deviations Based on Markov Chains
For the construction of the prediction model, some formal aspects had to be taken into account.Because projects have different construction times, to standardize and compare data the values of estimated completion time needed to be discretized.Thus the expected construction time was divided into 10 equal parts over ten tenths (Fig. 2) so that two projects with different construction times could be compared based on the percentage of progress up to a certain time.
In the model, x t indicates the state of a construction project at moment t.This defines a stochastic process that corresponds to the sequence x 1 ; x 2 ; x 3 ; : : : ; x 10 , which represents the gradual progress percentage of a construction project from one state to another, where the value of x t usually relies on the previous values in the sequence x t−1 .As time goes on, the state changes in probabilistic terms, represented by state transition probabilities.The hypotheses contemplated in the Markovian model for this stochastic process are as follows: • There are a finite number of states describing the behavior of construction projects; for the prediction model, there are 9 transition states among the 10 tenths of a construction project and an additional transition state to predict ES total when the gradual progress percentage of the last tenth is known, totaling 10 transition matrices; • There is a known distribution of probabilities at the beginning of the study's projection (t ¼ 1); and • The transition from a current state to a future one depends on the current state (Markovian property), which means that progress in tenth x nþ1 is influenced by the progress made in tenth x n .
The model for predicting the gradual percentage of construction progress using Markov chains consists of building 10 transition matrices between states t and t þ 1 according to the selected database.In the matrix's rows and columns, incremental progress of 1% is considered from one period of study to the next.Thus there is a probability that in tenth m, construction will progress between 2 and 3% if it progressed between 0 and 1% in tenth m − 1, and that probability is in place p 2-3%=0-1% ðmÞ in the P ðm−1Þ transition matrix, as shown in Eq. ( 5) In this way, the transition matrices P ðmÞ (16 × 16) take into account a gradual progress of 1% from one period to another (varying between 0 and 16%).These values are calculated taking into consideration previous experiences, where at no time did the progress of a project exceed 15% between periods of study.
Subsequently, in order to predict the behavior of a new construction project, knowing the gradual progress percentage in one of the tenths is enough to enable the prediction of the following tenth with the corresponding transition matrix.The progress prediction is obtained as the weighted mean between the row containing the advance of the previous month of the corresponding transition matrix and Vector (0.5; 1.5; 2.5; 3.5; :::;15.5%),which contains the gradual progress middle point for each of the k intervals considered in the method.If it is necessary to determine ES total from the P ð10Þ transition matrix, a multiplication operation has to be carried out between the row related to the monthly progress of Tenth 10 and Vector: ( 4; 12; 20; 28; : : : ; 124%).

Sample Characterization
The indicator ES total refers to the additional amount of time needed to complete construction as a proportion of the expected progress.However, if progress is greater than expected, it is measured as the percentage of time construction may advance without any delay (and compared with expected progress) For example, the expected construction time for the construction project in Fig. 2 is 24 months, which represents progress up to Tenth 10.The actual construction duration was 29 months-5 months more than expected-which is 20.83% more than the initially expected time.Consequently, the actual 100% progress value on the x-axis shown in tenths of expected progress is an additional 20.83% from the last expected tenth-that is, 12.08.
The database of 90 housing construction projects includes both planned and executed schedules.The Markovian matrices Fig. 2. Example of real progress and expected progress for a construction project presented in the Supplemental Data are constructed with the executed schedules, taking into account the procedure described in the section "Statistical Model for Forecasting Schedule Deviations Based on Markov Chains."In this way, data were collected for ES total and the actual progress for each tenth of the database.Table 1 summarizes mean progress for each tenth in the database, mean ES total , and calculated standard deviation for the data.
The database was also characterized by measuring certain management indicators that are provided by EVM and comparing them against the ES total indicator.In EVM the planned value (PV) is defined as the amount of time expected to be spent on the completion of construction at any point in the original schedule's timing (in this case, the percentage of construction progress).Earned value (EV), then, is the sum of construction progress up to a certain point, also measured as a percentage of construction progress.
All schedule performance indicators used in this research refer to the percentage of construction progress and not to monetary amounts.The database is made up of turnkey projects, where it makes more sense to evaluate the construction completion percentage (Alsakini et al. 2004;Flatscher 2015;Assaf and Al-Hejji 2006).Regardless of the methodology, the results are still valid for use with the monetary measurement of EVM indicators depending on the database selected.They are valid when the database refers to schedules expressed as a percentage of work progress and schedules expressed as the cost of construction.
The two indicators used to characterize progress, schedule variance (SV) and the schedule performance index (SPI), are defined by Eqs. ( 6) and ( 7), respectively The SV indicator (the difference in the percentage of progress against the expected plan) and the ES indicator are shown in Fig. 1.An SV indicator below zero indicates that construction is behind the original schedule, whereas an SV indicator above zero indicates that construction has advanced beyond the original schedule.As for ES total , a value above zero represents the delay percentage according to the expected timing (Fig. 1).Finally, the SPI indicator indicates the progress percentage according to the expected plan.For construction to proceed efficiently, this indicator must be higher than 1; a value lower than 1 indicates a delay relative to the planned value.
The results of analysis of the 90 construction projects in the database are provided in Table 2: mean values for the three previously defined indicators (SV, SPI, and ES total ) and both actual and initially expected timing.For the three management indicators, the standard deviation and upper and lower limits for a confidence interval (CI) of 99% are also provided.
From the case study, it can be deduced that the SV indicator is −9.73%; in other words, the construction of the housing cooperatives under study fell 10% behind the initially expected schedule.The SPI indicator is 0.83, which reaffirms the hypothesis that cooperative construction tends to fall behind expected timing.Regarding ES total , the case study shows that in general this project needs 34.68% more time (measured as a percentage of expected construction time) to reach 100% progress.

Research Method
Ten transition matrices were constructed for the database in order to predict future schedule deviations and validate the forecasting method presented.The proposed method allows prediction of the gradual progress percentage in one tenth if the advance of the previous tenth and the corresponding transition matrix are known.For instance, knowing the gradual progress percentage of construction in Tenth 4 allows the gradual progress percentage for Tenth 5 to be predicted using the P ð4Þ transition matrix and the known progress.If the gradual progress percentage of a construction project at Tenth 4 is between 2 and 3%, the progress prediction for Tenth 5 is obtained as the weighted mean between Row 3 of the P ð4Þ transition matrix and Vector (1 × 16).Then, if the P ð4Þ matrix shown in Table 3 is considered, predicting progress for the fifth tenth of the project only requires calculating the weighted mean of the matrix's third row [Eq.( 8)] and Vector (1 × 16), which contains the considered averages for gradual progress [Eq.( 9)] ½ 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ð 8Þ ½ 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 ð 9Þ For this example, X 5 is calculated in Eq. ( 10) As mentioned previously, the methodology to determine future progress by knowing past monthly progress is based on building 10 transition matrices between states.For the database, transition matrices between the states x 1 ; x 2 ; x 3 ; : : : ; x 10 have been built; these are presented in the Supplemental Data.In each of these matrices, the probability of transition from one state to the next based on the behavior of the 90 construction projects is shown.Following that, another 12 projects were taken into account in order to verify and validate the methodology.There was an expected schedule for these 12 projects and real percentage values for gradual progress (in each tenth).
The expected monthly prediction was made with these facts and using Markov matrices for each state.

Results
For illustrative purposes, the validation is presented for one of the 12 verifying projects using the P ð4Þ transition matrix between the gradual progress percentage values for the fourth and fifth tenths.The P ð4Þ transition matrix is then considered.Real progress values obtained for the 12 verifying projects, O1; O2; O3; : : : ; O12, in the fourth tenth of each project (Table 4) should also be considered.
To predict progress in the fifth tenth of Cooperative O1, progress at the fourth tenth must be known and the weighted mean of Row 10 of the P ð4Þ matrix [Eq.( 11)] and Vector (1 × 16) [Eq.( 9)] must be calculated.State X 5 is obtained through Eq. ( 12) ½ 0.0 0.0 0.0 0.0 0.14 0.0 0.29 0.0 0.0 0.29 0.0 0.14 0.14 0.0 0.0 0.0 ð 11Þ By knowing the gradual progress percentage values for states x 1 ; x 2 ; x 3 ; : : : ; x 10 (Table 5) and using the 10 transition matrices included in the Supplemental Data, the expected progress for each of the 12 verifying cooperatives (Table 6) is determined.
To discover the precision of the method in predicting monthly progress and ES total , percentage differences between real progress values for each verifying construction project (for each state studied) and expected progress values were determinedthat is, the percentage differences between the values in Table 5 and the values in Table 6.Table 7 lists these percentage differences in the Means row, where the mean between the nine transition states predicting monthly advances for each construction project appear.The average between mean values in the case of monthly advances is 2.74%, and the average between ES total values is 4.75%.
A cross validation was carried out taking into account the same criteria.The error or deviation estimation could vary depending on the data in the selected database and on the data used for validating the methodology.Cross validation is used to assess the results of the statistical analysis and to ensure that they are independent of the partition between training and testing data.It consists of repeating and calculating the arithmetic mean obtained from evaluation measures over different partitions.It is used when the main aim is the prediction and estimation of a model's precision (Garcia 2005).
A k-fold cross validation was used for this study, where the sample data were divided k times into subgroups analyzing the 12 projects used for verification and the 90 projects from the database, where k ¼ 3. Then the randomly obtained subgroup with the greatest amount of data in each division was used as testing data and the other 12 construction projects were used as training data.The cross-validation process was repeated during the three interactions, and finally the arithmetic mean of the results was calculated in order to obtain a single result.

Discussion
After carrying out the validation, it is shown that the proposed methodology predicts progress with a deviation of 2.47%.Furthermore, this methodology allows for the prediction of ES total with a deviation of 4.75%.
Using Markov chains, it is possible to predict the deviation of 9.73% with an error of 2.74% in the first tenth, but this error is reduced when more is known about the behavior of the work.The estimation error for construction progress (Es total ) reaches a value of 4.75%.This method of determining progress in construction and schedule deviations is a helpful tool for controlling the scheduling of a construction project in that it enables the anticipation of possible delays with a considerably lower error rate than the rate initially estimated.Furthermore, cross-validation analysis shows that the statistical prediction method resulted in a mean difference of 2.38%.Additionally, a mean deviation of 4.29% was obtained in the prediction of ES total from the three cross-validation interactions.This also represents a considerable improvement.The possibility of making more accurate predictions about future progress and the ES total indicator allows a construction project to be managed in a more effective way, providing the opportunity to correct deviations in advance.Although the case study was constructed from a data set of 90 construction projects (and validated for 12 more), the prediction method can be used for any similar database.For this study, the Uruguayan database was selected because of the large amount of data it contains, but the applicability of the prediction method is independent of the selected database.

Conclusions
A statistical prediction model has been presented that makes it possible to predict future progress for a construction project by knowing its past behavior and using Markov chains.The model was validated using data from 90 housing construction projects.From this, it could be deduced that the tool makes it possible to anticipate future progress with a mean error of 2.38% and total deviation in construction time with an error of 4.29%.Moreover, the model was validated using 12 other construction projects.More important, a cross validation of three interactions was carried out and similar results were obtained.
The innovative prediction model presented in this paper contributes to the body of project management knowledge by introducing a new tool for managing and controlling construction timing.The method presented allows construction management to be improved because it predicts future deviations in schedules with reduced errors and determines total deviation in construction timing with great accuracy.Moreover, the model is the first of its kind to use Markov chains to predict deviations in construction projects.
Although the case study was constructed from a database of 90 housing projects (and validated for 12 more), the prediction method can be used for any similar database.

Table 1 .
Summary of Real Progress and ES total Data Measured in Tenths of Expected Progress for the 90 Construction Projects Studied

Table 4 .
Actual Gradual Progress as a Percentage at Tenth 4 of Construction for the 12 Verifying Construction Projects

Table 3 .
Transition Matrix between States X 4 and X 5 : Gradual Progress Percentage P

Table 7 .
Summary of Differences between Real and Expected Progress for the 12 Verifying Construction Projects

Table 6 .
Predictions of Gradual Progress Percentages for the 12 Verifying Construction Projects for States x 2 ; x 3 ; x 4 ; : : : ; x 10