Quality Engineering ISSN: 0898-2112 (Print) 1532-4222 (Online) Journal homepage: https://www.tandfonline.com/loi/lqen20 Control charting methods for autocorrelated cyber vulnerability data Anthony Afful-Dadzie & Theodore T. Allen To cite this article: Anthony Afful-Dadzie & Theodore T. Allen (2016) Control charting methods for autocorrelated cyber vulnerability data, Quality Engineering, 28:3, 313-328, DOI: 10.1080/08982112.2015.1125926 To link to this article: https://doi.org/10.1080/08982112.2015.1125926 Published online: 31 Mar 2016. Submit your article to this journal Article views: 119 View Crossmark data Citing articles: 4 View citing articles Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=lqen20 QUALITY ENGINEERING , VOL. , NO. , – http://dx.doi.org/./.. Control charting methods for autocorrelated cyber vulnerability data Anthony Afful-Dadziea and Theodore T. Allenb aBusiness School, University of Ghana, Accra, Ghana; bIntegrated Systems Engineering, The Ohio State University, Columbus, Ohio ABSTRACT KEYWORDS Control charting cyber vulnerabilities is challenging because the same vulnerabilities can remain from autocorrelation; average run period to period. Also, hosts (personal computers, servers, printers, etc.) are often scanned infre- length (ARL); control charts; quently and can be unavailable during scanning. To address these challenges, control charting of EWMA control charts; the period-to-period demerits per host using a hybridmoving centerline residual-based and adjusted statistical control demerit (MCRAD) chart is proposed. The intent is to direct limited administrator resources to unusual cases when automatic patching is insufficient. The proposed chart is shown to offer superior average run length performance compared with three alternative methods from the literature. The methods are illustrated using three datasets. Introduction repaired each month by automatic patching without Cyber attacks are on the increase and many orga- local intervention. Typically, only a tiny fraction of nizations are losing substantial amounts of money vulnerabilities are repaired manually because of auto- as a result. A study of the financial impact, cus- matic patching and limited resources. As a result, it tomer turnover, and actions taken by 51 compa- may be of interest for administrators to intervene only nies in the United States concluded that, on aver- when there is something unusual occurring (i.e., an age, the cost of a successful attack in 2010 increased assignable cause) or, alternatively, amajor threat is clear to $7.2 million, up 7% from $6.8 million in 2009 (e.g., an on-going attack). Therefore, this article focuses (Ponemon Institute 2011). Cyber vulnerabilities are on a statistical process control approach designed to ways that hosts such as personal computers, servers, signal the presence of assignable causes. and printers can be exploited. Examples of vulnera- Previous authors have developed monitoring tech- bilities include: weak passwords, weak authentication niques relating to cyber vulnerabilities. Yet, some have processes, unsupported operating systems, informa- used data that is not available in vulnerability reports. tion disclosures, and the use of software with known For example, Dowdy (2012) discusses the challenges exploitable bugs. Reportedly, over 90% of successful in integrating data from many sources to summarize attacks exploit known vulnerabilities for which a patch risks. Abedin et al. (2006) also use traffic volumes as exists but has not been applied by the system admin- part of a comprehensive network evaluation approach. istrators (Legard 2002). Therefore, while new technol- Further, Abedin et al. (2006) introduce exponential ogy to identify and patch vulnerabilities is important, functions in their formulations which potentially com- securing and focusing human resources to eliminate plicate the interpretation. Others authors have based known vulnerabilities is also important. their metrics on forecasted quantities without invoking The objective of this article is to propose control the concepts from statistical process control (Ahmed charting methods for cyber vulnerabilities to direct the et al., 2008). In this article, a relatively simple monitor- attention of system administrators to unusual occur- ing technique based on readily available data and sta- rences that correspond to assignable causes that they tistical process control is proposed. can address. As noted in Afful-Dadzie and Allen Cyber vulnerability data are often providedmonthly (2014), a substantial fraction of vulnerabilities are with reference to the Common Vulnerability Scoring CONTACT Theodore T. Allen allen.@osu.edu Integrated Systems Engineering, TheOhio State University,  Neil Avenue –  Baker Systems, Columbus, OH . Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/lqen. ©  Taylor & Francis 314 A. AFFUL-DADZIE AND T. T. ALLEN System (CVSS) described in Mell et al. (2007). CVSS advantage that the charted quantity is intuitive, i.e., scores range from 0.0, meaning no vulnerability, to it is the demerits per unit (Nembhard and Nemb- 10.0, indicating that the program evaluating the sys- hard 2000). Yet, Runger and Willemain (1995) noted tem (scanner) is in a position to take over the host sys- the poor average run length performance of residual tem. Common scanning technology divides vulnera- charts given their diminished capacity to identify shifts bilities into categories based on the CVSS score: low after the first subgroup following the shift (Runger and (0.0–3.9), medium (4.0–6.9), high (7.0–9.9), and criti- Willemain 1995). MCD charts are also based on resid- cal (10.0). A given host could have multiple vulnerabil- uals and can be expected to have similar performance. ities, e.g., 2 mediums and 1 critical. Therefore, the sit- This deficiency motivates two new charts that are pro- uation is somewhat analogous to manufacturing with posed in this article. These are “adjusted demerit” (AD) nonconformity counts of different levels of severity. and hybrid moving centerline residual-based demerit Weightings of these counts or demerits and the asso- and adjusted demerit (MCRDAD) charts for monitor- ciated demerit charting techniques are potentially rel- ing cyber vulnerabilities. An average run length (ARL) evant in this case (e.g., see Nembhard and Nembhard comparison is also described to confirm the benefits of 2000). Also, because of the infrequent (often monthly) the proposed methods. nature of relevant data, charting without subgroups or The remainder of this article is organized as follows. “individuals” control charting of demerits is relevant. First, alternative statistical process control (SPC) charts However, unlike in manufacturing, the hosts (or relevant to cyber vulnerability data are described. units) with the vulnerabilities (or nonconformities) are Because of the repeat nature of cyber vulnerabilities, not shipped each period. Instead, these hosts might the focus is on procedures specifically addressing auto- be personal computers which are used for multiple correlated data. The reviewed procedures includemov- months and might likely have the same vulnerabili- ing centerline demerit (MCD) charts from Nembhard ties for an extended period. On-going “patching” elim- and Nembhard (2000) and moving centerline charts inates a fraction of the vulnerabilities each month, but based on AR(1) residuals. Issues with residual-based far fewer than 100%. The accumulation of vulnera- charting are used to motivate the proposed adjusted bilities almost unavoidably induces autocorrelation or demerit (AD) and moving centerline residual-based correlation in period-to-period nonconformity counts. and adjusted demerit (MCRAD) methods. The aver- Autocorrelation is amajor issue related to control chart age run lengths (ARLs) of the alternative methods are performance (Alwan and Roberts 1988; Montgomery then compared. Next, the application of the proposed and Mastrangelo 1991; Runger and Willemain 1995; methods is illustrated using three cyber vulnerability Loredo et al. 2002; Nembhard and Nembhard 2000). datasets from different organizations. Finally, conclu- An additional complication is that local vulnerabil- sions are presented and opportunities for future work ities are influenced by external causes including con- are described. tinual discoveries of new vulnerabilities for the soft- ware in use. These phenomena could cause a constant increase in vulnerability counts over time onmany sys- Statistical process control charting tems (Alhazmi and Malaiya 2005). In this section, four alternative methods are described. Charting based on autoregressive (AR) moving As mentioned previously, the carryover of vulnera- average modeling promises to eliminate the adverse bilities from one period to the next causes a high effects of autocorrelation and trending because the degree of autocorrelation in related vulnerability data. model residuals are generally uncorrelated and de- Therefore, the focus here is on methods specifically trended (Montgomery and Mastrangelo 1991; Runger addressing autocorrelation rather than general tech- and Willemain 1995). Perhaps the simplest of the niques such as exponentially weighted moving aver- relevant schemes is based on the first-order autore- age (EWMA) charts. Also, the charting methods can gressive or AR(1) model. Authors have noted the be applied both retrospectively as an analysis tech- ability of such approaches for addressing autocorrela- nique and also built into scanning software for active tion as well as underlying trends (Runger and Wille- monitoring. main 1995). Another relevant approach is moving The first alternativemethod explored here ismoving centerline demerit (MCD) charts which offer the centerline demerit (MCD) charting from Nembhard QUALITY ENGINEERING 315 and Nembhard (2000). The second is a trivial combi- centerline approach is functionally identical to resid- nation of the residual charting methods from Runger ual charting in that the charts would deliver the same and Willemain (1995) and the moving centerline con- out-of-control signals in identical situations and yet the cept from Nembhard and Nembhard (2000). Next, an charted quantity is the demerits per unit. Second,Nem- adjusted demerit chart and a hybrid moving centerline bhard and Nembhard (2000) argued that time series residual-based and adjusted demerit (MCRAD) meth- modeling might be too complicated for many possi- ods are proposed. The motivations for the proposed ble users and exponentially weighted moving average methods relate to the objectives of improved average (EWMA) offers similar predictions with only a single run length performance and interpretability. adjustable parameter, λ. Therefore, they based the cen- terline (CLi) of theirmoving centerline demerit (MCD) chart on the following EWMA formula: Residual-based charts CLi = ŷi+1 = λyi + (1− λ) ŷi−1 [3] In general, the residuals of a defensible time series model are approximately independent, and identically where λ is the weight given to themost recent weighted distributed from a normal distribution assuming that value and must satisfy 0< λ  1 the process is under control (for example, when there Then, the MCD upper control limit UCLi+1, and are no shifts). The properties of the residuals can be lower control limits LCLi+1 are: evaluated using autocorrelation function (ACF) and UCL = ŷ +Mσ̂ partial autocorrelation function (PACF) residual plots. i+1 iLCL = ŷ − [4]Mσ̂ The charting of residuals from time series models i+1 i such as AR(1) was described in Runger andWillemain where M is a potentially adjustable parameter given (1995). For the AR(1) processes, the model prediction in Nembhard and Nembhard (2000), and usually M can be written: = 3.0. The parameter σ̂ is the standard deviation for the one step ahead prediction errors e = yi − ŷi, which ŷi = μ+ ϕyi−1 [1] are independent and uncorrelated with mean of zero. Nembhard and Nembhard (2000) proposed two pro- and the model residual is simply: cedures for estimating λ and σ̂ . The first of which is ε̂i = yi − ŷ . [2] used for illustration and involves selecting λ to mini-i mize the sum of squared residuals and σ̂ as the root where yi is the dependent variable (or the demerit per mean squared residual. unit in our model) at time period i, ε̂i is a white noise A trivial variant of the MCD charts is to sim- with zeromean and constant variance, andμ and−1 < ply base the predictions on the time series model in ϕ < 1 are constants to be determined. The symbol “^” Eq. [1] instead of the EWMA model in Eq. [3]. This denotes the estimated or predicted value based on the approach offers the benefit of MCD charts in that the data, yi for time period i = 1,…, p. In a residual chart, charted quantity is the intuitive demerits per unit. Also, the charted quantity is ε̂i in (2). the predictions are based on the likely more accurate Nembhard and Nembhard (2000) examined charts time series models instead of the EWMA model. The based on residuals in Eq. [2] and proposed two mod- proposed variant is referred to as moving center-line ifications. First, they argued that charting of residu- residual-based demerit (MCRD) charts. The MCRD is als is not intuitive for decision-makers in that they are slightly different than a residual chart because unlike generally more interested in the process mean than the residual chart theMCRDwill adjust the lower limit the model residuals. Instead of residual charting, they to zero in situations when the calculated lower control proposed using a moving centerline based on model limit is negative. predictions and moving limits based on the standard As mentioned previously, MCD and MCRD charts deviation of the residuals. Their proposed approach are approximately equivalent to residual charts in the is in accord with the insights in Alwan and Roberts signals generated. Also, Runger and Willemain (1995) (1988), who had argued that residual charting was documented the average run-length (ARL) properties insufficient, while offering the simplicity of a single of residuals charts with two notable findings. First, chart. The Nembhard and Nembhard (2000) moving residual charts offer run-length performance that may 316 A. AFFUL-DADZIE AND T. T. ALLEN be considered poor based on the tables provided by Then, the center line (CL) of the demerit control Runger and Willemain (1995) compared with alter- chart is: natives for cases without autocorrelation and EWMA ∑m charts. Second, the poor performance relates to the fact CL = wkD̄k. [7] that residual charts offer a relatively high probability k=1 of generating an out-of-control signal in the first sub- The upper and lower control limits for period i are: group after a shift. After the first subgroup, the chance of detecting the shift is greatly diminished. In the next UCLi = CL+Mσ̂i section, two charting techniques are proposed with the and objective of offering improved ARL performance com- = [( ) ]pared with MCD and MCRD charting procedures. LCLi max CL−Mσ̂i , 0 [8] where √∑m Adjusted demerit charts 2= k=1 wkD̄kσ̂i , [9] Standard demerit charts are generally considered to ni be inapplicable to cases involving significant autocor- and where M is a potentially adjustable parameter relation (e.g., see Montgomery, 2012). These charts which, in standard demerit charts, is 3.0. are based on the assumption of independently Poisson The standard demerit chart given above is likely to distributed demerits. While the Poisson distribution foster high false alarm rates if applied to charting cyber seems approximately appropriate for weighted vulner- vulnerabilities for the following reasons. The estimated ability counts, the assumption of independence from standard deviation in Eq. [9] is based on the assump- period to period does not apply because of the signif- tion that the demerit counts of different levels of sever- icant autocorrelation. In what follows, we first present ity are uncorrelated. For the cyber vulnerabilities in the standard demerit control chart model, point out its the case studies shown later, at least two counts of vul- limitations to charting demerits per unit of cyber vul- nerabilities are significantly correlated for all three of nerability data, and proposed an adjusted demerit con- the organizations considered. In the presentation here, trol chart for overcoming such limitations. the anonymous organizations are assigned labels corre- The standard demerit control chart formulas from sponding to their size, so that organization #1 had the Dodge (1928) are derived as follows. Let di be the most hosts. For example, the correlation between the weighted total number of demerits in period i, ni be the high and critical counts for organization #1 in Table 1 sample size, and cik, be the number of class k noncon- is 0.97 which is significant with a p-value less than formities, k = 1, 2, . . . ,m. If wk is the weight of non- 0.001. Also, Dodge and Romig (1928) assumed that the conformity class k, the weighted demerits di, and the charted quantities (demerits per host), yi, exhibit no demerit per unit Di (which is the charted quantity and autocorrelation if the system is under statistical con- referred to in this article as demerit per host) in period trol. As noted in Table 4 later, the autocorrelation coef- i are: ficients are significant for all three organizations. ∑m It was the violations of assumptions of control charts di = wkcik that motivated new methods such as those in Runger k=1 and Willemain (1995) and Nembhard and Nembard (2000). Runger and Willemain (1995) evaluated resid- and ual charts and determined that applying individuals d D = i . [5] control charts (e.g., see Montgomery, 2012) to batchedi ni Table . The estimated AR() parameters for the three organiza- The average number of demerits, D̄k, across all the p tions (cases). periods, for nonconformity class k is: Case  Case  Case  ∑p Coefficient (ϕ̂) . . . = ∑i=1 cik Mean ( μμ̂ =D̄ [6] 1− ) . . .ϕk p . Sigma (σ̂ ) . . . i=1 ni QUALITY ENGINEERING 317 observations offered relatively desirable average run A user might seek even greater average run length per- lengths. Then, instead of charting the demerits per host formance by optimizing simultaneously over M1 and for each period, yi, one would chart the average of M2. It is also possible, the desired in-control average m subgroups. Runger and Willemain (1995) recom- run length cannot be attained using the default value mended batch sizes to reduce the autocorrelations to ofM1 = 3.0. Then, bothM1 andM2 should be adjusted less than 0.1. For cases such as organization #1 with simultaneously to achieve desired in-control average autocorrelation coefficients greater than 0.98, the rec- run length with, again, the in control model being the ommended batch size was m = 58. With each period estimated time series model. lasting a single month, there would be a single sub- group every 4.8 years, which is impractical for cyber vulnerability charting. Comparison of average run lengths With the goal of providing desirable average run In this section, the four charting procedures are com- length performance with an intuitive charted quantity, pared using average run lengths (ARLs). While ARL the following adjusted demerit charting procedure is calculations are skewed by rare long run lengths, we proposed. include them to provide a direct comparison with pre- Step 1: Apply time series modeling from Box and Jenk- vious research on charts for autocorrelated data. The ins (1994) to develop a time series model of the derived ARL values are based on a simulated demerit demerits per host. For example, Table 1 shows the per host data from an autoregressive model. The four coefficients for the AR(1) models derived in the case charting procedures to be compared are: moving cen- studies. terline demerit (MCD) from Nembhard and Nemb- Step 2: Obtain a value of M in Eq. [7] such that the hard (2000), moving centerline residual-based demerit average run length (ARL) with the process in con- (MCRD) which is an extension of residual charts from trol achieves a desired value, e.g., ARL(in control)= Runger and Willemain (1995), adjusted demerit (AD), 200.0. The value ofM can be determined using sim- and moving centerline residual and adjusted demerit ulation based on the model derived in Step 1. Apply (MCRAD) charts. The ARL values are estimated using charting to the demerits per host using the derived 20,000 simulations in which the shift (δ) occurs on the M values. first subgroupwith the initial subgroup being subgroup zero following the procedure in Runger andWillemain The above adjusted demerit procedure is facilitated (1995). Therefore, all the ARL estimates have standard by modern computing. Using this procedure, there is deviations less than 1% (0.007× standard deviation) of no assumption about the autocorrelation or cross cor- the estimated ARL values making virtually all compar- relation other than that it can bemodeled appropriately isons significant simultaneously. Therefore also, after in Step 1. the first subgroup all responses derive from Eq. [10] with δ (in increment of 0.5) added. In each case, the simulated demerits per host derived from the standard Moving centerline residual-based and AR(1) model of the demerits per host (yi) with a singleadjusted demerit charts lag can be written for period i: An alternative approach is to chart the demerits per y = μ+ ϕy + ε , [10] host using limits frombothmoving centerline residual- i i−1 i based (MCR) and adjusted demerit (AD) charts. If where the εi are assumed to be independent identically the demerits per host cross any of the control limits, distributed (IID) N(0, σ 2). The coefficients, μ and ϕ, an out-of-control signal is generated by the derived can be estimated through least squares regression using hybrid moving centerline residual-based and adjusted a lag variable, which is available in standard software demerit (MCRAD) chart. Let M1 refer to the param- under the time series menus. eter in Eq. [4] associated with MCD limits and M2 to Table 1 contains the three sets of parameters needed refer to the parameter in Eq. [7] associated with AD for simulating the demerit per host data. These were limits. As a default and for simplicity, we set M1 = 3.0 obtained from the three case study datasets described and then findM2 using a two-step procedure similar to later. The related ARL results are shown in Tables 2–4, the one for determining adjusted demerit chart limits. where values under M = 3 are presented to show the 318 A. AFFUL-DADZIE AND T. T. ALLEN Table . Average run length values for an AR() process with estimated parametersϕ= .,μ= ., and σ = . based on data from Case . MCD MCRD AD MCRAD δ/σ M= . M= . M= . M= . M= . M= . M = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table . Average run length values for an AR() process with estimated parameters ϕ = .,μ= ., and σ = . for Case . MCD MCRD AD MCRAD δ/σ M= . M= . M= . M= . M= . M= . M = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . relatively arbitrary performance levels if the standard to directly address the test cases such that its residu- choices are used. For example, theM= 3 in-control run als are IID N(0, σ 2). Similarly, the ARL for the MCD lengths for the demerit charts are generally so short that andMCRDexceed that for theADandMCRADcharts. false alarms wouldmake their application prohibitively The exception is for the largest shifts (δ/σ = 4.0) under expensive. the assumptions in Table 2. Then, the MCRD chart For the MCRAD chart, the simulations involve val- offers a lower average run length than theADchart. Yet, ues ofM1 = 3.0 in Eq. [4] andM2 in Eq. [8] that gener- the MCRAD chart dominates the MCD and MCRD ate ARL in-control (with no assignable causes active, charts in all cases. Therefore, it is concluded that the δ/σ = 0.0) values approximately equal to 200. In all use of MCD or MCRD charting in the context of cyber cases where the ARL in-control value is approximately vulnerabilities is generally inadvisable since AD and 200, the ARL values for the MCD chart exceed that of MCRADmethods offer generally superior ARL perfor- the MCRD chart. This is explained by the fact that, in mance. This assumes that the ability to perform time using the same AR(1) internalized within the MCRD series modeling and simulation is within the capabili- chart, an advantage is conferred to theMCRD chart. In ties of the practitioners. The authors have excel-based other words, the MCRD charting method is designed software available upon request for generating the M Table . Average run length values for an AR() process with estimated parameters ϕ = .,μ= ., and σ = . for Case . MCD MCRD AD MCRAD δ/σ M= . M= . M= . M= . M= . M= . M = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QUALITY ENGINEERING 319 or M2 values required by the AD and MCRAD charts, have vulnerability 23 (out-of-date operating system) respectively. and vulnerability 35 (weak password) while host 2 Further, the AD chart dominates the MCRAD chart might have vulnerability 35 only. Combining results for small shifts (δ/σ < 3.0) while the MCRAD dom- from all three of our case studies results in 183 dis- inates for large shifts (δ/σ > 3.0). This corroborates tinct vulnerabilities. Runger and Willemain (1995). Residual-based charts Step 2. List the specific hosts known to have each of offer a relatively high probability of identifying a large the observed vulnerabilities in each month. Table 5 shift on the first subgroup, but adjusted demerit charts shows a portion of this listing. The numbers in the offer improved detection probabilities in other cases. table are the CVSS scores for the specific vulnera- The differences are larger for the assumptions inTable 2 bilities. If an item is blank it implies that either the which is attributed here to the higher degree of auto- host did not have the vulnerability or the host was correlation. Runger and Willemain (1995) also found unavailable during the scanning period. It was found larger differences in ARL values among alternative that only 36 of the 498 hosts exhibited any vulnera- charts when the degree of autocorrelationwas relatively bility during the 28 months. Therefore, the vulner- high. The MCRAD charts are recommended for cyber abilities were concentrated on approximately 7% of vulnerability modeling because we feel that the ability the hosts. to quickly detect large shifts is likely relevant in appli- Step 3. Tabulate the counts of low, medium, high, cations. Yet, the AD charts also offer relative simplic- and critical vulnerabilities on all hosts for each ity and competitive ARL performance making them a month. Table 6 shows the counts of vulnerabilities viable alternative. of different levels of severity for two of the hosts. The instances in which hosts were unavailable are marked with borders and bolding. Case studies Step 4. Impute the missing data using the sample In this section, the case studies that motivated our averages of the counts from the months before research are described. The section begins with the and after each instance (possibly including multi- steps taken to prepare the data for attribute chart- ple month gaps), i.e., mean-based imputation was ing and the report from the local system administra- applied (Enders 2010). For missing data in the first tor about assignable causes. Next, the applications of or last months, counts were inserted from the clos- Box-Jenkin’s time series modeling are then described. est month in time for which there were data. Note Finally, results illustrate possible insights gained using that such imputations are likely necessary as miss- the proposed adjusted demerit (AD) and moving cen- ing data in vulnerability datasets is the common terline residual-based and adjusted demerit (MCRAD) result of hosts being unavailable during the sys- charting procedures. tem scans. The results are shown in Table 6 in bold font. Step 5. Tabulate the total number of sampled hosts (ni) Vulnerability data preprocessing successfully scanned in each period i and the total The organizations under study had (altogether) 498 counts (cik) for severity levels k = 1,…4 for low, hosts over a 28-month period using data from the medium, high, and critical vulnerabilities. monthly Nessus scans. Nessus is a vulnerability scan- Step 6.Calculate the demerits (Di) per period i using: ning software developed by Tenable Network Security, ∑4 widely regarded as a world leader in vulnerability scan- Di = wkcik [11] ning. One of the main challenges during a scan is that, k=1 if a host is turned off or its firewall is turned on, it would not appear in the final vulnerability report even if it had with weights w1= 2.0, w2= 5.5, w3= 8.5, and w4= 10, vulnerabilities. for low, medium, high, and critical, respectively. These The steps to generate attribute data were as follows. weights are determined with reference to the common vulnerability scoring system (CVSS; Mell et al. 2007). Step 1. Identify all distinct vulnerabilities across all Also, the demerits per host, yi, were derived using yi = hosts and all 28 months. For example, host 1 might Di = di/ni. The resulting data are shown in Table 7 for 320 A. AFFUL-DADZIE AND T. T. ALLEN Table . Excerpt of the data of vulnerabilities and their CVSS scores for a single month. Host# month Vul  Vul  Vul  Vul  Vul  Vul  Vul  … Vul      …   . …    . . …   . . …   …   . . …   …   …   …   …   …   …              … . the largest of the three organizations. Tables A1 and in the Nessus scan with little chance of causing A2 in the appendix contain similar information for the actual intrusions), and changing host permissions. The other organizations. actions also included the resolution of three then on- The choice of mean-based imputation in Step 4 going cyber-attacks. This was accomplished through was motivated by our inspection of the available data removing host permissions and vulnerable software for the 36 hosts for which there were vulnerabili- manually. There was no awareness of any actions taken ties. Table A3 in the appendix provides the CVSS with respect to hosts in organizations #2 and #3 during score for the most severe vulnerability on each host the 28 months. by CVSS score. The data indicate a high degree of Despite the actions taken by the administrator to constancy in the vulnerabilities on the hosts. The 462 patch a select number of hosts, the administrator per- hosts not shown in Table A3 are believed to have ceived only a single unusual occurrence or assignable had no known vulnerabilities during the entire 28 cause during the 28 months. The remaining variation months. was perceived to be typical or associated with com- The system administrator was also interviewed for mon causes only. The assignable cause occurred dur- organization #1, which was the largest of the three. ing month 19 and began to affect counts on month The administrator reported taking 16 manual actions 20. During month 19, there was a major organi- during the 28 month period following direct requests zational change and the administrator lost respon- from the host users. These included manually apply- sibility for approximately 200 hosts. This change ing patches for hosts with automatic patching turned included none of the hosts having vulnerabilities in off, identifying false positives (vulnerabilities reported Table A4. Table . Vulnerability counts for two hosts with imputed data bolded. Host 1 Host 2 Month Low Medium High Critical Low Medium High Critical                        0 1.5 2 0  0 1 0 0                                                                0 3 2 0  0 1.5 0 0                                                  QUALITY ENGINEERING 321 Table . Tabulation of the hosts scanned successfully and numbers of vulnerabilities of different levels of severity for Case . Also included are demerit and demerit per host data. Number of Vulnerabilities Month ni Low Medium High Critical di yi = Di       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . . Time series models and autocorrelation residuals for the case #1 data are shown in Figure 2. The All of the charting methods consider here involve two- corresponding ACF and PACF plots are excluded for step approaches. In the first step, a procedure such cases #2 and #3 since they appear similar to those of as standard time-series modeling (Box and Jenkins case #1. 1994) is applied (MCRD, AD, and MCRAD). If the The relevance of AR(1) models for cyber vulnera- time-series model provides a good fit, the residuals are bility demerits is likely a general phenomenon because uncorrelated, approximately normally distributed, and it relates to the carryover of the same vulnerabilities provide useful inputs for charting. and associated demerits from period to period. With In this section, the application of time series models the complete data represented by Table 5, it was pos- to the data from the three organizations (three cases) is sible to identify vulnerabilities that appeared in one described. By examining the autocorrelation function month but not the next month. Some of the miss- (ACF) and partial autocorrelation function (PACF) for ing vulnerabilities were presumably the result of the the demerits per host for the three cases (Table 7; host being turned off or its firewall turned on. Yet, by Tables A1 and A2), it was determined that AR mod- assuming that all of the appearing and disappearing els offer an appropriate choice for all 3 cases. Figure 1 vulnerabilities were patched, an upper estimate of the shows the ACF and PACF for organization #1 (case 1). average patching rate is obtained. For example, for The results for the 3 cases indicate that AR(1) models organization #1 966 total vulnerabilities (not distinct) are good fit for the vulnerability data at hand. These were identified over 28 months and a total of 265 choices are confirmed by studying the ACF and PACF instances in which vulnerabilities were present one plots of the AR(1) model residuals. In all three cases, month and absent the next. Therefore, the upper bound the residuals show no evidence of autocorrelation and on the average monthly patching rate is determined as normal probability plots (not shown) indicate approxi- 100% × 265/966 = 27.4%. Table A4 in the Appendix mate normality (with the exception of the outlier asso- contains the upper bound percentages of vulnerabili- ciated with an assignable cause described above). The ties that were patched from one month to the next fororganization #1. 322 A. AFFUL-DADZIE AND T. T. ALLEN Figure . Case #: Demerits per host (a) autocorrelation function and (b) partial autocorrelation function. Figure . Case #: AR() residuals (a) autocorrelation function and (b) partial autocorrelation function. The AR model form was given in Eq. [10]. While The application of the proposedmethods the same AR(1) model form in Eq. [10] applied to all three cases, the degree of autocorrelation represented Upon consultation with the relevant system admin- by the coefficient (ϕ) and the mean ( ) and standard istrator and inspection of the data in Table 5, anμ deviation ( ) varied. Table 1 summarizes the coeffi- assignable cause relating to an unusual shift in respon-σ cients for the three cases from AR(1) modeling. There- sibility was identified. Through a re-organization, fore, organization #1 had tens of carry-over vulnerabil- approximately 200 hosts were shifted to a different ities from period to period (high ) while organization organization in period 20. Therefore, this detection canϕ #2 had many demerits per host (over 5) overall (high indeed be considered an assignable cause. The system ). Organization #3 had relatively lower carryover and administrator commented that no other occurrencesμ good quality levels (low and low ). during the 28 months seemed unusual.ϕ μ Figure . Adjusted demerit (AD) chart for the data from organization #. QUALITY ENGINEERING 323 Figure . MCRAD chart for the data from organization # withM = . andM = .. In applying the adjusted demerit (AD) charting, the better able to identify causes in periods following a derived values ofM for the three organizations and data shift. The MCRAD chart combines the strengths of sets are M = 20.90, 7.06, and 2.405, respectively. Note residual-based and adjusted demerit charts. that the value 20.90 is much larger than 3.0 because of Based on data in Table 7 from organization #1, the the relatively high degree of autocorrelation for orga- value M2 = 22.9 was used to achieve an approximate nization #1. The adjusted demerit chart for the data in-control average run length equal to 200. The derived from organization #1 in Table 7 is given in Figure 3. It MCRAD control chart is given in Figure 4. The chart is noteworthy, perhaps, that the adjusted demerit chart generates the desired signal on subgroup 20 relating to failed to identify the period 20 shift that both theMCD the assignable shift of 200 hosts that were moved out- and MCRD charts (not shown) identified. It is conjec- side the relevant organization. The absence of a lower tured that this failure highlights the relative strength of control limit from the adjusted demerit-related limits is residual-based charts relating to immediate identifica- due to the value M2 = 22.79. Larger values of M2 may tion of shifts in the underlying process. Yet, the adjusted generally be expected if the degree of autocorrelation demerit chart has the potential advantage of being is high. The value ϕ = 0.920 (Table 1) associated with Figure . MCRAD chart for the data from organization # withM = .,M = .. 324 A. AFFUL-DADZIE AND T. T. ALLEN Figure . MCRAD chart for the data from organization # withM = . andM = .. the organization #1 dataset indicates a relatively high Application of two residual-based methods taken carryover of vulnerabilities from period to period. from the literature is then investigated. The applica- The MCRAD charts for the data from organization tion involves moving centerline demerit (MCD) chart- #2 (Table A1) and organization #3 (Table A2) are given ing from Nembhard and Nembhard (2000) and a in Figures 5 and 6, respectively. Concerning organiza- slight extension of the residual charts from Runger tion #2, the chart in Figure 5 shows no out-of-control and Willemain (1995). The MCD chart offers the signals. For organization #3, the chart in Figure 6 shows advantage of charting the relatively intuitive demer- an out-of-control signal in month three. This is as a its per host instead of residuals which motivated result of the demerit per host for month 3 exceeding the extension to create moving centerline residual- the residual-based chart control limit. From the per- based demerit (MCRD) charts. The proposed adjusted spective of the authors, this signal appears to be a false demerit (AD) and hybrid moving centerline residual- alarm. based and adjusted demerit (MCRAD) charting meth- ods are based on using simulation to determine the control limits. Average run length (ARL) comparisons Conclusions were based on assumptions relevant to the three case This article addresses the important problem of mon- studies. From this it is concluded that the proposed itoring cyber vulnerabilities using statistical process AD and MCRAD offer improved ARL performance control (SPC) methods. The problem is important compared withMCD andMCRD charts. TheMCRAD because of the high and growing threat level associated charts in particular are recommended as a dashboard with cyber-attacks and the widespread use of personal formonitoring cyber vulnerabilities. Also, the concepts computers and other hosts. A process is proposed to of AD and MCRAD charts have applicability beyond convert vulnerability data into demerits per host based cyber vulnerabilities and demerit charts tomany chart- on the commonvulnerability scoring system (Mell et al. ing situations involving autocorrelation. 2007). The application of standard time series mod- els to cyber vulnerability data from three organizations is then described. The conclusion is that AR models with a single lag, i.e., AR(1) processes accurately model About the authors the three datasets and are possibly relevant for many Anthony Afful-Dadzie is a lecturer at the Univer- other vulnerability modeling problems. The motiva- sity of Ghana Business School. He received his Ph.D. tion for this choice relates to the carryover of the same from The Ohio State University in Industrial & Sys- unpatched vulnerabilities from one period to the next. tems Engineering and his MPhil from Cambridge. His Since the hosts aremonitored instead of parts, one does research and teaching interests include quality engi- not have new units each period. neering, cyber security, and economic models. QUALITY ENGINEERING 325 Theodore T. Allen is an associate professor of Inte- Cox, D. R. 1961. Prediction by exponentially weighted moving grated Systems Engineering at The Ohio State Univer- averages and relatedmethods. Journal of the Royal Statistical sity. He is a fellow of the American Society for Qual- Society. Series B 23(2): 414–422. ity and the author of over 50 peer reviewed publica- Dodge, H. F. 1928. Amethod of rating a manufactured product.Bell System Technical Journal 7: 350–368. tions including 2 textbooks.His research interests focus Dowdy, J. 2012. The cybersecurity threat to US growth and on optimization with parametric uncertainty including prosperity. In Securing cyber space: a new domain for optimal experimental design and cyber security main- national security, eds. N. Burns and J. Price, Washington, tenance plan design. DC: Aspen Institute. Enders, C. K. 2010. Applied missing data analysis, New York: Guildford Press. Funding Loredo, E. N., D. Jearkpaporn, and C. M. Borror. 2002. Model- based control chart for autoregressive and correlated data. This work was partially supported by National Science Founda- Quality and Reliability Engineering International 18: 489– tion (NSF) grant #1409214. 496. Montgomery, D. C. 2012. Introduction to statistical quality con- trol. 7th ed. Hoboken, NJ: Wiley. References Mell, P., K. Scarfone, and S. Romanosky. 2007. CVSS: A com- plete guide to the common vulnerability scoring system ver- Abedin, M., S. Nessa, E. Al-Shaer, and L. Khan. 2006. Vulnera- sion 2.0, In Forum of Incident Response and Security Teams. bility analysis for evaluating quality of protection of security Montgomery, D. C., andMastrangelo, C.M. 1991. Some statisti- policies. Proceedings of the 2nd ACM Workshop on Quality cal process control methods for autocorrelated data. Journal of Protection, Alexandria, Virginia, pp. 49–52. of Quality Technology 23 (3): 179–193. Ahmed, M. S., E. Al-Shaer, and L. Khan. 2008. A novel quan- Nembhard, D. A., and H.B. Nembhard. 2000. A demerit con- titative approach for measuring network security. Proceed- trol chart for autocorrelated data. Quality Engineering 13 ings of the 27th IEEE INFOCOM 2008 Mini-Conference, (2): 179–190. Phoenix, Arizona, pp. 1957–1965. Runger, G. C., and T. R. Willemain. 1995. Model-based and Alwan, L. C., and H. V. Roberts. 1988. Time series modeling for model independent control of autocorrelated processes. statistical process control. Journal of Business and Economic Journal of Quality Technology 27: 283–292. Statistics 6(1): 87–95. Ponemon, L. 2010. Fifth Annual US Cost of Data Breach Study: Box, G.E.P., G.M. Jenkins, and Reinsel, G. C. 1994. Time series Understanding Financial Impact, Customer Turnover analysis: forecasting and control, 3rd ed. Englewood Cliffs, and Preventive Solutions, Traverse City, MI: Ponemon NJ: Prentice Hall. Institute. 326 A. AFFUL-DADZIE AND T. T. ALLEN Appendix This appendix includes additional data about vulnerabilities from our case studies. Table A1 describes the demerits for organization #2 and Table A2 describes the demerits for organization #3. Table A3 describes the score for the highest vulnerability on each host for the 36 hosts which had vulnerabilities (out of 498) for organization #1. Table A4 provides data on the worldwide known vulnerability counts and local patching percentages during the 28 month period for organization #1. Table A. Tabulation of the hosts scanned successfully and numbers of vulnerabilities of different levels of severity for organization #. Also included are demerit sums based on the counts. Month Number of Hosts Low Medium High Critical Demerits Demerits Per Host       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . . Table A. Tabulation of the hosts scanned successfully and numbers of vulnerabilities of different levels of severity for organization #. Also included are demerit sums based on the counts. Month Number of Hosts Low Medium High Critical Demerits Demerits Per Host       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . .       . . QUALITY ENGINEERING 327 Table A. Complete listing of CVSS scores for themost severe vulnerabilities on each host over the months. Hosts with no vulnerabilities were omitted from the list. Empty cells aremissing data. Host Number #                                       . .  . .  . . .   .  . . . .   .   . . .   .  . . .  .  . .   .  . . .  .  .  . . .  . . . . .    . . . . . . .  . . . .   . . . . .  . . . .     . .  . . .  .  . .    . . . . .  . . . .   .   .  .  . . .  . . .    . . . . . .  . . . .    . . . . . . . . . .   .  . . .  . . . . . . . .     . . . .  . . . . . . . . . . .    . . . . . . . . . . . . .    . . . . . . . .  . . . . .      . .    . .      .        . .    . .   .  .  . .    . . . . .  .  .  . . .   . . . . . .  . . . . . .    . . .  . .   .   . . .  .   . . .  . .   .   . .  .  . . . .  . . .       .  .  . . . .  . .      .  .  . .  . . .   . . . .  .  . . .  .  . . .   . . . . . .  . . . . 328 A. AFFUL-DADZIE AND T. T. ALLEN Table A. Monthly cumulative number of worldwide vulnerabilities and the local percentage of new vulnerabilities that is patched each month, i.e., the number of vulnerabilities present in one month scan but missing in the next month scan divided by the total number of vulnerabilities in the first month. Month Cumulative Count of Distinct, Known Vulnerabilities Worldwide Vulnerability Patching Percentage  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .   .   .  , .  , .  , .  , .  , .  , .  , .  , .  , .  , .