The work of Ziegler and Dittrich (nd) reports that integration is "becoming more and more indispensable in order not to drown in data while starving for information." The goal of data integration is "to combine data from different sources by applying global data model and by detecting and resolving schema and data conflicts so that a homogenous, unified view can be provided." (Ziegler and Dittrich, nd) There are two reasons for data integration:
(1) Given a set of existing data sources, an integrated view is created to facilitate data access and reuse through a single data access point;
(2) Given a certain information need, data from different complementing sources is to be combined to gain a more comprehensive basis to satisfy the information need. (Ziegler and Dittrich, nd)
Foundations of the SIRUP Approach
Foundations of the SIRUP approach are stated to include the following principles:
(1) Semantic Perspectives -- "a user defined conceptual model of an application domain with explicit queryable semantics for all entities and relationships appearing in it."
(2) Bipartite Integration Process -- generally two primary roles: data providers and data users. It is reported that there are two distinct phrases in the integration process of the SIRUP approach: (a) a data provision phase where administrators of local data sources explicitly declare the data and its semantics that is offered for integration; and (b) a Semantic Perspective modeling phase where users who know their application domain for which data is to be integrated define the desired Semantic Perspective.
(3) IConcepts -- An IConcept is short for Intermediate Concept and is a basic conceptual building block that acts as a linking element between data providers and data users interested in data for their information needs. Each IConcept has a queryable link to at least one concept of an ontology to explicitly define the semantics of the real-world concept it represents. Data sources are stated to provide attributes for an ontological concept represented by a particular IConcept. Through this, the data sources are able to declare which attribute data they are capable and willing to provide concerning a given IConcept. For each of the attributes it is reported that additional structural metadata is provided. (Ziegler and Dittrich, nd, paraphrased) IConcept provide data providers with a way to specifically identify the semantics and structure of the data offered for integration that is user-specific. IConcept is for data users "an access point to retrieve data from different data sources referring to the same real-world concept." (Ziegler and Dittrich, nd) IConcepts are additionally reported to conceal technical and structural heterogeneity from data users and assist in resolving semantic conflicts according to the perception of the user of the application domain.
(4) User Concepts -- a user-specific concept that is built through selection and combination of user specific copies of IConcepts.
(5) Semantic Multidatasource Language -- a declarative language is provided for provision of data in addition to specification of User Concepts and Semantic Perspectives. This language is reported to provide support for querying of explicit semantics and metadata assigned to User Concepts and IConcepts.
(6) Ex-ante View Definition -- users can specify views only on top of already existing schemas and this approach is referred to as 'ex-post view definition' because the view is "created after a schema is defined." (Ziegler and Dittrich, nd)
(7) Pragmatic Data Integration -- approaches that integrate data against one or more global ontologies and assume an ideal world in which data for all ontology concepts is available. (Ziegler and Dittrich, nd)
The work of Bernstein, Halevy, and Pottinger (nd) entitled "A Vision for Management of Complex Models" reports on the challenges that are met in the construction of applications for database systems (DBMSs) and how this is inclusive of "the manipulation of models." Models are described as "a complex discrete structure that represents a design artifact, such as an SML DTD, web-site schema, interface definition, relational scheme, database transformation script, workflow definition, semantic network, software configuration or complex document." (Bernstein, Halevy, and Pottinger, nd)
The use of models is inclusive of management of the changes that take place in models and the data transformation from one to the other, which is reported to make a requirement of "an explicit representations of mappings between models." (Bernstein, Halevy, and Pottinger, nd) It is the belief of Bernstein, Halevy, and Pottinger that the DBMS could be made easier to use through "making 'model' and 'mapping' first-class objects with high-level operations that simplify their use…" which is referred to as "model management." (Bernstein, Halevy, and Pottinger, nd)
Bernstein, Halevy, and Pottinger state that their work in writing makes two primary contributions:
(1) It argues that general-purpose model management functions are needed to reduce the amount of programming required to manipulate models; and (2) It proposes a data model that captures model management functions. (nd)
According to Bernstein, Halevy, and Pottinger (nd) the data model is comprised by "formal structures for representing models and mappings between models and of algebraic operations on those structures. " Model management applications presently while being functionally advanced through relational and OO DBMSs "still include a lot of complex code for navigating graph-like structures. Producing, understanding, tuning, and maintaining navigational code is a serious drag on programmer productivity, making model management applications expensive to build." (Bernstein, Halevy, and Pottinger, nd)
Proposed by Bernstein, Halevy, and Pottinger is to raise the "level of abstraction beyond current DBMSs through introduction of "high levels operations on models and model mappings." (Bernstein, Halevy, and Pottinger, nd) Examples are "matching, merging, selection and composition" all of which are not particularly novel operations. (Bernstein, Halevy, and Pottinger, nd) The following model examples and mappings are stated to illustrate the "pervasiveness and scope of model management." (Bernstein, Halevy, and Pottinger, nd) Those are stated as follows:
(1) mapping an XML schema of one application to that of another in order to guide the exchange of XML instances between the applications;
(2) mapping a web site's content to its page layout in order to drive the generation of web pages;
(3) mapping data sources into data warehouse tables in order to generate programs that transform production data and load it into a data warehouse; mapping the DB schema of one software release into that of the next release, to guide the migration of DBs;
(4) mapping source make files into target make files in order to drive the transformation of make scripts and thereby help port complex applications from one programming environment to another; and (5) mapping the components of a complex application to the components of a system where it will be deployed in order to drive the generation of installation, upgrade, and de-installation programs. (Bernstein, Halevy, and Pottinger, nd)
Construction of generic functions in model creation and mappings enables them to be manipulated as single objects serving to create a better environment for the tasks just stated previously. The glue provided between the systems is reported to be provisioned by "simple adapters that:
(1) import or export a model in the model management system from or to a schema in the target platform; or (2) interpret a mapping in the model management system to transform instances of one target model to those of another." (Bernstein, Halevy, and Pottinger, nd) It is stated there are many challenges in identifying architectures that are sound for system coupling.
The leverage of building model management functionality is stated to be "highly generic…[and]…widely applicable." (Bernstein, Halevy, and Pottinger, nd) Model management applications are described as "metadata management" and it is stated that the primary effort in building such an application is "in manipulating descriptions of a thing of interest, rather than the thing itself." (Bernstein, Halevy, and Pottinger, nd) The question is posed as to whether keywords are actually data or if they are metadata and it is stated that model management "takes a different cut at the problem. It focuses attention on a particular kind of metadata, structure and mathematical semantics of descriptive information." (Bernstein, Halevy, and Pottinger, nd)
Stated to be a primary goal of model management is the provision of support for managing change in models and for mapping data between models. Therefore, it is believed that model mappings must be manipulated as first-class citizens. Key elements underlying the approach of Bernstein, Halevy, and Pottinger (nd) to model mappings include:
(1) the need to manipulate model mappings much as models are manipulated;
(2) mapping consists of connections between instances of two models, which are often different types;
(3) there may be more than one mapping between a given pair of models;
(4) a mapping may relate a set of objects in on model to a set of objects in another via a language for building complex expressions;
(5) mappings must be able to nest because this enables the reuse of mappings: a mapping on a model M. To be used a component of a mapping on models that contain M. (nd)