# Estimation Of SARS Daily New Cases

H Liang

###### Keywords

aic, aicc, auto-regression, model selection, sars transmission, time series

###### Citation

H Liang. *Estimation Of SARS Daily New Cases*. The Internet Journal of Infectious Diseases. 2005 Volume 5 Number 2.

###### Abstract

We used an auto-regression model to fit the daily new case number from the 2003 severe acute respira¬tory syndrome outbreak in Beijing, and demonstrated that the conventional model selection criteria are inappropriate for a selection of the model order. An improved AIC procedure was suggested for over¬coming the deficiency of these criteria. The resulting model indicated that we may use the cases of the previous 15 days to estimate the new case number in the current day. The conclusion of our modeling may give insights into ongoing outbreaks that may facilitate public health responses.

### Introduction

In the past two years, mathematicians, statisticians, and biologists proposed a variety of models to analyze severe acute respiratory syndrome (SARS) cases since it occurred in the Southeast Asian countries in 2003. Riley et al. (_{1}) and Lipsitch et al. (_{2}) used general dynamic models to study the respective transmission dynamics of SARS in Hong Kong and Singapore. Their models may be too complicated to be used in practice as pointed out by Hsieh, Chen, and Hsu (_{3}). The latter authors (_{3}) used a linear system of equations and applied three-stage least squares to estimate the parameter, which can delineate the rapid epidemic growth. Zhou and Yan (_{4}) used Richards model (_{5}), a logistic-type model, to fit the cumulative number of SARS cases reported daily in Singapore, Hong Kong, and Beijing, and properly confirmed that the epidemic might be brought under control if the current intervention measures were continued. Hsieh and Cheng (_{6}) further used a variation of the single-equation Richards model to fit the daily cumulative case data from the 2003 SARS outbreak in Toronto, and the authors estimated the turning points and case numbers during the 2 phases of this outbreak. Cauchemez et al. (_{7}) proposed a Bayesian statistical framework for estimating the reproduction number early in an epidemic, and applied their approach to the SARS epidemic that started in February 2003 in Hong Kong. Intuitively SARS cases in the current day are strongly related to the situation of the previous days. In this article, we use an auto-regression (AR) model (_{8}) to fit the SARS cases from April 21 to June 7 in Beijing.

On April 21, 2003, the World Health Organization (WHO) reported 3,861 probable severe acute respiratory syndrome (SARS) cases with 217 deaths globally (_{9}); At that time, Beijing already had 588 probable cases containing 143 new cases, the capsheaf of transmission. The daily new cases gradually decreased and reached zero on June 2, 2003. Although there were two cases appeared on June 7 and 11, 2003, SARS transmission in Beijing was basically controlled.

### Method

The AR model is of form:

where X(t) ..., X(t-p), denotes the observations of the _{1},... ,b_{p} are the unknown auto-regression coefficients, and ε_{t} is the measurement error. This model means that the current term of the series can be estimated by a linear weighted sum of previous terms in the series. The weights are the auto-regression coefficients. The ultimate goal is to derive an appropriate model, which may be used to forecast the cases of the proceeding day using the cases of the prior days. We present the SARS case numbers in Beijing from April 21 to June 7, 2003 in Figure 1.

##### Figure 2

Around April 21, the case numbers vibrated but explicitly went down. A total of 48 case numbers will used in our analysis. To assess the degree of dependence in the data, we calculate the sample autocorrelation function (ACF) of the data and show it in Figure 2, in which the vertical bars show the corresponding sample ACF at lags 0,1,..,20 and dotted horizontal lines are the bounds ±1.96/√48. 1.96 is the .975 quantile of the standard normal distribution.

If the data are independent, we would expect roughly (20 * 0.05=)1 value to fall outside the bounds, while this plot shows us that the first ten of forty-eight values outside the bounds ±1.96/√48. This feature reflects that the observations are consecutively dependent and a time series analysis of these SARS data is worthwhile.

How to select an appropriate _{10}) and BIC (_{11}), are widely used for variable selection, and can be easily implemented in common commercial software such as Splus, SAS, and Matlab. Their deficiency in small sample was pointed out by Hurvich and Tsai (_{12}). The authors showed that AIC may be drastically biased for time series, and developed a modified version, denoted AICC, which is nearly unbiased and provides better model choices than AIC and BIC in small samples. In this article, we use the AICC for selection of model order and estimated the auto-regression coefficients after identifying an appropriate order. We use the AR model to fit the data by first letting _{0}
_{o}

### Results

Based on the rule that the smaller the criterion value, the better the model, it was found that both AIC and BIC attach minimum at

On a basis of the AICC criterion, we obtain the “best” model to the SARS data, of form

X(t)=0.484X(t-1)+0.377X(t-2)-0.294X(t-3)-0.02X(t-4)+0.599X(t-5) -0.207X(t-6)-0.227X(t-7)+0.308X(t-8)+0.043X(t-9)-0.272X(t-10) +0.142X(t-11)+0.101X(t-12)-0.344X(t-13)-0.045X(t-14)+0.2X(t-15).

The coefficient of determination ^{2}
^{2}
^{2}

### Discussion

The derived model suggests that the number of average days of one SARS patient transmitting to other people is within 15 days. This result is partially because (i) most deaths occurred within a fortnight; (ii) Beijing is hot after June 7 and the SARS virus hardly survived; and (iii) the Chinese government made all possible efforts to control transmission. The result may be not true for other cities or other periods. However, this modeling procedure is still available to fit the SARS data from other cities/countries. Given the limited data available, the model may be not perfect. With more and better data, more realistic model may be feasible. In this article, we didn't consider patients' demographic factors such as age and gender, profession and family history. This needs a further investigation. Observing the scatter presented in Figure 1, one may see that the variance of

### Acknowledgment

This research was partially supported by two grants from the NIAID/NIH.

### Correspondence to

Hua Liang, Ph.D. Department of Biostatistics and Computational Biology University of Rochester Medical Center 601 Elmwood Avenue, Box 630 Rochester, NY 14642 Fax: 585-273-1031 Email: hliang@bst.rochester.edu