aic, aicc, auto-regression, model selection, sars transmission, time series
H Liang. Estimation Of SARS Daily New Cases. The Internet Journal of Infectious Diseases. 2005 Volume 5 Number 2.
We used an auto-regression model to fit the daily new case number from the 2003 severe acute respira¬tory syndrome outbreak in Beijing, and demonstrated that the conventional model selection criteria are inappropriate for a selection of the model order. An improved AIC procedure was suggested for over¬coming the deficiency of these criteria. The resulting model indicated that we may use the cases of the previous 15 days to estimate the new case number in the current day. The conclusion of our modeling may give insights into ongoing outbreaks that may facilitate public health responses.
In the past two years, mathematicians, statisticians, and biologists proposed a variety of models to analyze severe acute respiratory syndrome (SARS) cases since it occurred in the Southeast Asian countries in 2003. Riley et al. (1) and Lipsitch et al. (2) used general dynamic models to study the respective transmission dynamics of SARS in Hong Kong and Singapore. Their models may be too complicated to be used in practice as pointed out by Hsieh, Chen, and Hsu (3). The latter authors (3) used a linear system of equations and applied three-stage least squares to estimate the parameter, which can delineate the rapid epidemic growth. Zhou and Yan (4) used Richards model (5), a logistic-type model, to fit the cumulative number of SARS cases reported daily in Singapore, Hong Kong, and Beijing, and properly confirmed that the epidemic might be brought under control if the current intervention measures were continued. Hsieh and Cheng (6) further used a variation of the single-equation Richards model to fit the daily cumulative case data from the 2003 SARS outbreak in Toronto, and the authors estimated the turning points and case numbers during the 2 phases of this outbreak. Cauchemez et al. (7) proposed a Bayesian statistical framework for estimating the reproduction number early in an epidemic, and applied their approach to the SARS epidemic that started in February 2003 in Hong Kong. Intuitively SARS cases in the current day are strongly related to the situation of the previous days. In this article, we use an auto-regression (AR) model (8) to fit the SARS cases from April 21 to June 7 in Beijing.
On April 21, 2003, the World Health Organization (WHO) reported 3,861 probable severe acute respiratory syndrome (SARS) cases with 217 deaths globally (9); At that time, Beijing already had 588 probable cases containing 143 new cases, the capsheaf of transmission. The daily new cases gradually decreased and reached zero on June 2, 2003. Although there were two cases appeared on June 7 and 11, 2003, SARS transmission in Beijing was basically controlled.
The AR model is of form:
where X(t) ..., X(t-p), denotes the observations of the
Around April 21, the case numbers vibrated but explicitly went down. A total of 48 case numbers will used in our analysis. To assess the degree of dependence in the data, we calculate the sample autocorrelation function (ACF) of the data and show it in Figure 2, in which the vertical bars show the corresponding sample ACF at lags 0,1,..,20 and dotted horizontal lines are the bounds ±1.96/√48. 1.96 is the .975 quantile of the standard normal distribution.
If the data are independent, we would expect roughly (20 * 0.05=)1 value to fall outside the bounds, while this plot shows us that the first ten of forty-eight values outside the bounds ±1.96/√48. This feature reflects that the observations are consecutively dependent and a time series analysis of these SARS data is worthwhile.
How to select an appropriate
Based on the rule that the smaller the criterion value, the better the model, it was found that both AIC and BIC attach minimum at
On a basis of the AICC criterion, we obtain the “best” model to the SARS data, of form
X(t)=0.484X(t-1)+0.377X(t-2)-0.294X(t-3)-0.02X(t-4)+0.599X(t-5) -0.207X(t-6)-0.227X(t-7)+0.308X(t-8)+0.043X(t-9)-0.272X(t-10) +0.142X(t-11)+0.101X(t-12)-0.344X(t-13)-0.045X(t-14)+0.2X(t-15).
The coefficient of determination
The derived model suggests that the number of average days of one SARS patient transmitting to other people is within 15 days. This result is partially because (i) most deaths occurred within a fortnight; (ii) Beijing is hot after June 7 and the SARS virus hardly survived; and (iii) the Chinese government made all possible efforts to control transmission. The result may be not true for other cities or other periods. However, this modeling procedure is still available to fit the SARS data from other cities/countries. Given the limited data available, the model may be not perfect. With more and better data, more realistic model may be feasible. In this article, we didn't consider patients' demographic factors such as age and gender, profession and family history. This needs a further investigation. Observing the scatter presented in Figure 1, one may see that the variance of
This research was partially supported by two grants from the NIAID/NIH.
Hua Liang, Ph.D. Department of Biostatistics and Computational Biology University of Rochester Medical Center 601 Elmwood Avenue, Box 630 Rochester, NY 14642 Fax: 585-273-1031 Email: firstname.lastname@example.org