Abstract. This paper proposes conceptual model which can be used to facilitate the discovery, integration and analysis of environmental data in cancer-related risk studies. Persistent organic pollutants were chosen as a model because of their persistence, bioaccumulation potential and genotoxicity. Part dealing with cancer risk is primarily focused on population-based observations encompassing a wide range of epidemiologic studies, from local investigations to national cancer registries. The proposed model adopted multilayer hierarchy working with characteristics of given entities (POPs, cancer diseases as nomenclature classes) and couples "observation -measurement" as content defining classes. The proposal extends formally used taxonomy applying multidimensional set of descriptors including scores of measurement validity and precision. This solution has the potential to aid multidisciplinary data discovery and knowledge mining. The same structure of descriptors used for environmental and cancer part enables the users to integrate different data sources recognizing their methodical origin, time & space coordinates and validity.Keywords: Persistent organic pollutants, cancer risk, data model, data discovery.
Introducing Problems with Data Accessibility"Data rich -information poor" is becoming obligatory phrase or accepted "professional dialect" associated with environmental monitoring. It also extends to the cancer risk assessment which has recently attracted increasing attention. Most problems can be explained by the heterogeneity of input data ranging from laboratory bio-tests to multilevel epidemiologic observations. Progress increasingly requires standardized access to multi-disciplinary information resources, including chemical, geological, meteorological, epidemiologic and demographic data. Each broadly ranged ecological or human risk study must adopt both following scenarios [1,2]: 462 L. Dušek et al.1. retrospective exploitation of data sources and their description in discovery process 2. prospective arrangement enabling effective electronic data capture in future From the viewpoint of informatics, environmental risk assessment can be characterized as processing of heterogeneous data leading to probabilistic estimation of some uncertain (prospective approach) or on the other hand relatively certain (retrospective approach) risk event. Main complications that hamper progress in this field are highlighted in the following list:1. Extremely wide range of data types and structures in environmental studies 2. Insufficient metadata description and standardization 3. Lack of well established repositories based on standardized protocols which is in strong contrast to methodical progress in environmental and medical sciences 4. Variability of technologies, coding and reporting systems used by different research groups 5. Growing number of small and not adequately published and described studies, which however produce valuable and important data.Especially last point deserves special attention. Growing number of s...