Prioritization of Malaria endemic zones in Arunachal Pradesh: A novel application of self organizing maps (SOM)
U Muty, N Arora
U Muty, N Arora. Prioritization of Malaria endemic zones in Arunachal Pradesh: A novel application of self organizing maps (SOM). The Internet Journal of Tropical Medicine. 2006 Volume 4 Number 1.
Malaria continues to pose a serious threat to public health in North- Eastern states of India. Arunachal Pradesh is highly endemic for Malaria predominately with Plasmodium falciparium infections. Despite continuous efforts by government, a desirable level of control has not been achieved. The present study describes the application of self organizing maps (Kohonen maps), a data mining tool for prioritization of malaria endemic zones in this region. 60 PHCs (Public Health Centers) were randomly selected from Arunachal Pradesh and 6 malariometric parameters via Annual Blood Examination rate (ABER), Annual Parasite Incidence (API), Slide Positivity Rate (SPR), Annual Falciparum Incidence (AFI) and Slide Falciparum Rate (SFR) were considered which reflected the intensity of malaria transmission in this region. Self Organizing Maps yielded 9 clusters based on neighborhood distance, which reflects about zones based on status of intensity of malaria epidemiology. Such maps would make it possible to target control measures at high-risk areas and greatly increase the cost efficiency of malaria control programmes.
Malaria, the third leading cause of death attributable to an infectious disease worldwide, has plagued mankind for countless generations. Malaria remains a public health problem in 90 countries in the world (1) and causes more than 300 million acute illnesses and at least one million deaths annually (2).The annual malaria burden in India estimates to nearly 2 to 2.5 million cases. North-Eastern region of India is in the Indo-Chinese hill zone of Macdonald's classification of stable malaria (3) and contributes nearly 9% of total malaria cases in India (4). In this region, efficient malaria transmission is maintained during most months of the year and slashes potential economic growth and thus is a major impediment to the overall development and progress of these areas.
Despite of several anti-malaria programmes being implemented under National Vector Borne Diseases Control Programme, this region has seen little tangible progress in alleviating the burden of malaria (5, 6). Apparently, there are definite inadequacies that continue to dampen the spirit of public health specialists even during the halcyon days of malaria eradication.
On closer scrutiny, it was evident that, there being financial and technical constraints common to all states of India, operational difficulties are hampering the effective malaria control in the North-Eastern region (7). These very areas remain inaccessible owing to floods and poor road communication. The major reasons of perennial and persistent malaria transmission are predominance of
Materials and Methods
Data Collection: Raw data was collected from the Directorate of Health, Govt. of Arunachal Pradesh, which consists of Epidemiological aspects of Malaria cases encountered in 60 randomly selected Public Health Centers belonging to 12 districts of Arunachal Pradesh in 2005.Raw data pertaining to malaria incidence was collected and standard malariometric parameters (ABER, API, SPR, SFR, AFI) were calculated based on this data to be used in this study. (Table1)
Data Analysis: Data mining – Self Organizing Maps
In SOM, neurons compete with each other to earn the right of representing the input data (30, 31). As a result, data in the multidimensional attribute space can be abstracted to a much smaller number of latent dimensions organized on a basis of a predefined geometry in a space of lower dimensionality, usually a regular two-dimensional array of neurons. Via this way the structures embedded in the input data can be revealed which is placed in the input space and is spanned over the inputs distribution. Using a SOM network, it is possible to obtain a map of input space where closeness between units or clusters in the map represents closeness of the input data. Processing units in the SOM lattice is associated with weights of the same dimension of the input data. Using the weights of each processing unit as a set of coordinates the lattice can be positioned in the input space. During the learning stage the weights of the units change their position and “move” towards the input points. This “movement” becomes slower and at the end of the learning stage, the network is “frozen” in the input space. After the learning stage the inputs can be associated to the nearest network unit. When the map is visualized, the inputs can be associated to each cell on the map. One or more cell that clearly contains similar objects can be considered as a cluster on the map. These clusters are generated during the learning phase without any other information. It is not necessary to supply to the network cluster prototypes or examples. SOMs cluster the data in a manner similar to cluster analysis, but have an additional benefit of ordering the clusters and enabling the visualization of large numbers of clusters. These clusters are arranged in a low-dimensional topology-usually a grid structure that preserves the neighborhood relations in the high dimensional data (32, 33). This technique is particularly useful for the analysis of large datasets where similarity matching plays a very important role (34). The characteristic that distinguishes the SOM net from the other classification algorithms is that not only similar inputs are associated to the same cell but also neighborhood cells contain similar documents. This property together with the easy visualization makes the SOM map a useful tool for visualization and clustering of large data sets.
Normalized data is clustered using SOM yielded 9 clusters on a 3x3 (shown in figure1). Unsupervised learning was done on the fly using the data using a learning constant of 0.01 and for 10,000 iterations following which the data got clustered among clusters based on the neighborhood distance.
Legends: ABER= Annual Blood Examination Rate, API= Annual Parasite Incidence, SPR= Slide Positivity Rate, SFR= Slide Falciparum Rate, AFI= Annual Falciparum incidence
The application of Data mining and artificial intelligence in Epidemiology is still in its infancy. In spite of numerous evidences of incorporation of artificial intelligence as an aid in data analysis of various epidemiological studies, medical entomologists are still unable to tap its potential in vector control except for data acquisition and storage. Information Technology in vector control operations has been extended to construction of Databases on different aspects of vector borne diseases and various forecasting systems based on computer simulation models (35). Application of Artificial Intelligence in combating vector borne diseases can give a completely new dimension to existing control programs. Artificial neural Networks such as Kohonen Maps have a natural propensity to learn–they learn how to solve problems from data as opposed to solving problems based on explicit problem specification (36). Self Organizing maps (SOM) are deemed as being highly effective as a sophisticated visualization tool for visualizing high dimensional complex data with inherent relationships between the various features comprising the data. These have been successfully exploited in Medical and Health Informatics in fields as varied as Medical image processing (37), disease diagnosis(38), gene prediction(39), gene sequence analysis(40), expression analysis (41,42), structural recognition of protein families (43) and drug designing (44) and drug utilization(45). In recent past, SOMs have been employed for data exploration in major public health diseases like Diabetes (27), Glaucoma (46). In this paper, we have shown the use of Self Organizing Maps as valuable tool in prioritization of malaria endemic zones which will assist in decision making on the location and deployment of health care services and prioritization of intervention strategies. In areas like Arunachal Pradesh which suffer from perennial malaria transmission, and where difficult terrain and geographical features are big hurdles in carrying out effective and timely vector control operations, SOM will be very effective in bridging the gap between policy makers and Health workers. Recognizing consistent foci of cases would permit control efforts to be directed at specific geographic areas, reducing costs and increasing effectiveness. In a country like ours where resources are scarce, reliable methods for the stratification of zones on basis of the prevalence or transmission intensity of malaria are urgently required. Such clustering and data visualization tools are essential for assessing the austerity of the problem, and hence the resources needed to emulate malaria. This approach will serve as yardstick for assessing the progress of control and indicate which geographic areas should be prioritized, so that large amount of man power and resources can be saved. Because of underlying simplicity in data visualization, SOM will prove to be a powerful weapon in arsenal in fight against this dreaded disease. This strategy will play a crucial role in bridging research and control and it is quite likely that besides reducing the malaria burden, the entire public health system will benefit from such a strategy if adopted and extrapolated to other regions across the world for other vector borne diseases.
Authors are grateful to the Director, IICT, Hyderabad for his continuous support and encouragement. Neelima Arora thanks CSIR for Senior Research Fellowship.