Integrating Heterogeneous Data Using Web Services dissertation

Download this dissertation in word format (.doc)

Note: Sample below may appear distorted but all corresponding word document files contain proper formatting

Excerpt from dissertation:

solution of the heterogeneous data integration problem is presented with the explanation if the criteria to be employed in the approval of the validity. The tools to be used are also indicated.

The proposed solution is to use semantic web technologies (Semantic Data Integration Middleware (SIM) Architecture) for the initial integration process (Cardoso,2007) and then couple it with broker architecture to improve integration and interoperability while solving the problem of multi-level impedance (Kashyap and Sheth,2002).

For an elaborate diagram see figure the figure below.

Integration via the semantic web technologies According to Barnett and Standing (2001) the rapid developments in the business environments due to the adoption of internet-based technologies have resulted in the need to implement improved business models, development of improved network systems as well as alliances and the implementation of creative marketing strategies. The strategy to be developed for integrating heterogeneous data must take into account the organization-specific data and the general information based on the internet. The whole idea is to come up with a semantic web that is beneficial to individuals and organizations alike. In efforts geared towards the gaining of competitive advantage, organizations employ business-mediated channels in an effort to create internal and external. This is through the formulation of technology convergent strategies (through heterogeneous data integrations) and the organizing of resources based on knowledge and the existing relationships between the knowledge based as pointed out by Rayport and Jaworski (2001). The internal and external value is created on the basis of the information available and the organization of the resources related to knowledge and the corresponding relationships. This requires organizations to identification of the various data assets. The data assets could be in the form of relational databases, plain text files, web pages, XML files, and Electronic Data Interchange (EDI) document and web services. The proposed solution for this project should be able to integrate information from autonomous, heterogeneous and distributed (HAD) data schema. As pointed out by Ouskel and Sheth (1999) three forms of heterogeneity can be achieved. These are syntactic heterogeneity in which the technology used in the support of data sources is different (such as databases and webpages). In order to provide transactional data, it is important to make use of The Extensible Markup Language since it effectively provide consistent and reliable ML streams and web services (XML,2005). The second type of heterogeneity that is to be achieved is schematic heterogeneity which involves data source schemas that possess different structures. Semantic heterogeneity is the last form of data stream that is to be achieved by the proposed solution. XML is to be used in order to provide syntactic interoperability (Busler,2003). Its downside is that it lacks the required semantics for the current web environment (Shabo et al., 2006). The proposed solution should be capable of solving the semantic heterogeneity problem by enabling the autonomous, heterogeneous and distributed systems to share as well as exchange information in a manner that is semantically viable as pointed out by Sheth (1998). The solution is to employ the capabilities of semantic web via the concept of shared ontology. One of the main impacts of employing semantic web services is their ability to impact the organizational need for data integration from semantically dissimilar sources. The fact that semantic web services have successfully been deployed in Bioinformatics, Digital Libraries and the rest is a great motivator for the success of this project. The solution to data integration in this project entails the use of Semantic data Integration Middleware (SIM) and its consequent integration with the broker architecture to improve integration and interoperability. This is as a means of solving multi-level impedance for top notch unified data integration. Semantic data Integration Middleware (SIM)

This is a special data integration technique with a basis on single query. The technique effectively integrates the information that resides in different data sources having dissimilar structures, formats, schemas as well as semantics. The data wrapper or rather extractor knowledge is used in the transformation of data to semantic knowledge. The middleware extractor is ontology-based and multi-sourced as pointed out by Silva and Cardoso (2006). The SIM is made up of two main modules; 1) Semantic Transformation module and 2) the Syntactic-to-Semantic Transformation module (Cardoso,2007).

3.2 Semantic Transformation module

The Semantic Transformation module is responsible for the integration of the data that resides in various different data sources that possess dissimilar formats, schema and structure.

Syntactic-to-Semantic Transformation module

This module is used to map the maps XML Schema documents to the already available OWL ontology. It is also responsible for the automatic transformation of the XML instance documents onto the separate instances of the mapped ontology as pointed out by Rodrigues et al.,(2006). This module is critical for the operation of transforming XML-based syntactic data to a semantic one by means of OWL.

3.2.1 The Semantic data Integration Middleware (SIM) architecture

The Semantic data Integration Middleware (SIM) architecture is important for the process of integrating heterogeneous information since it is used in solving the problem of semantics that is inherent in the XML data schema and representation. Our choice of semantic data representation emanates from the fact that it marks the most current and most efficient state of data representation (Cardoso, 2007,p.2).The SIM architecture is illustrated in the figure below;

Figure 1: The SIM architecture (Source- Cardoso,2007).

The Semantic data Integration Middleware (SIM) architecture has four main layers. These are; the source of data, the Schematic transformation layer, the Syntactic-to-Semantic transformation layer and finally the ontology layer. The correlation between these layers is indicated in the diagram above.

Sources of data (D)

The data sources are the ones that dictate the scope of the information integration system. The diversity of the data source provides an enhance level of data visibility. The Semantic data Integration Middleware (SIM) architecture connects the formats of the database like the unstructured (such as plain text and web pages) semi-structured (XML) and structured databases (such as relational databases). The data sources can include other unmentioned formats.

The schematic transformation

The schematic transformation of data source (D) to XML is executed a module that integrates the data from different sources having different structures, formats, database schema as well as semantics. The module employs a data extractor that is multi-sourced in the transformation of the available data to XML.

The transformation from Syntactic to Semantic

This process is carried out by a module that employs the JXML2OWL framework so as to map the XML Scheme to the already available OWL ontologies. The module transforms the instance of XML into separate independent documents that are appropriately mapped into the ontology.

The Ontologies (OWL)

The Semantic data Integration Middleware (SIM) architecture brings about the capability of extracting data from various sources having different data types (structured, semi-structured or semi-structured) and then wrap the outcome in a Web Ontology Language (OWL) format (OWL, 2004). The importance of this is that it provides a homogenous data access to otherwise heterogeneous data sources. The adoption of OW ontology is based on its preference by the World Wide Web Consortium (W3C).

The semantic model

NIST (1993) described a semantic data model as a conceptual data model within which semantic data is included. The implication of this is that the model is a description of the meaning of the various instances. The semantic model is therefore an abstraction that is utilized in the definition of the instance data (stored symbols) correlate to the real world situations. In order to effectively conceptualize a given areas in a format that is machine readable, an ontology such as OWL is employed. The function of the ontology is the promotion as well as the facilitation of system interoperability to enhance intelligent processing and reuse the available knowledge. The ontology therefore provides a shared understanding of a given domain.

The schema of ontology defines both the data structure and the semantics. The extraction process can proceed without a schema. The ontology is important for the creation of the mapping between the schema and the data sources. The ontology also provides the specification of the query. As Rodrigues et al. (2006) pointed out the framework employed is JXML2OWL which has two subsystems; JXML2OWL Mapper and the JXML2OWL API. The JXML2OWL API is a reusable library that is also both generic and open source that is used to map the XML schemes to the OWL ontologies.The Mapper on the other hand is special application that is Java based and has a graphical user interface (GUI).

The documents that can effectively be mapped by the JXML2OWL to the OWL ontology are DTD, XMK and XSD. The process of mapping takes some time in a series of steps. The initial step is the creation of a new mapping project as well as the loading of XML schema and the OWL ontology. Should the XML schema be missing, then the JXML2OWL would come up with an appropriate schema. This step is followed by the creation of class mapping by the user. The mapping takes place between…[continue]

Cite This Dissertation:

"Integrating Heterogeneous Data Using Web Services" (2011, May 09) Retrieved October 22, 2016, from

"Integrating Heterogeneous Data Using Web Services" 09 May 2011. Web.22 October. 2016. <>

"Integrating Heterogeneous Data Using Web Services", 09 May 2011, Accessed.22 October. 2016,

Other Documents Pertaining To This Topic

  • Integrating Heterogeneous Data Using Web Services

    IEEE-Computer Science -- Literature Review IEEE-Computer Science Integration Approaches Practices The work of Ziegler and Dittrich (nd) reports that integration is "becoming more and more indispensable in order not to drown in data while starving for information." The goal of data integration is "to combine data from different sources by applying global data model and by detecting and resolving schema and data conflicts so that a homogenous, unified view can be provided." (Ziegler

  • Data Mining Evaluating Data Mining

    The use of databases as the system of record is a common step across all data mining definitions and is critically important in creating a standardized set of query commands and data models for use. To the extent a system of record in a data mining application is stable and scalable is the extent to which a data mining application will be able to deliver the critical relationship data,

  • Service Oriented Architectures in it Service Oriented

    Web Services in the context of an SOA framework are designed to be the catalyst of greater order accuracy and speed, further increasing performance of the entire company in the process. The collection of Web Services is meant to not replace the traditional and highly engrained ERP systems in a company; rather Web Services are meant to extend and enhance their performance and making them more agile over time

  • Wide Web Is Available Around

    The reward for the effort of learning is access to a vocabulary that is shared by a very large population across all industries globally" (p. 214). Moreover, according to Bell, because UML is a language rather than a methodology, practitioners who are familiar with UML can join a project at any point from anywhere in the world and become productive right away. Therefore, Web applications that are built using

  • Branding New Service Dominant Logic

    Branding in Service Markets Amp Aim And Objectives Themes for AMP Characteristics Composing Branding Concept Branding Evolution S-D Logic and Service Markets Branding Challenges in Service Markets Considerations for Effective Service Branding Categories and Themes Branding Theory Evolution S-D Logic and Service Markets Branding Challenges in Service Markets Considerations for Effective Service Branding Branding Concept Characteristics Characteristics Composing Branding Concept Sampling of Studies Reviewed Evolution of Branding Theory Evolution of Marketing Service-Brand-Relationship-Value Triangle Brand Identity, Position & Image Just as marketing increasingly influences most aspects of the consumer's lives, brands

  • Computer Clustering Involves the Use of Multiple

    Computer clustering involves the use of multiple computers, typically personal computers (PCs) or UNIX workstations, multiple storage devices, and redundant interconnections, to form what appears to users as a single integrated system (Cluster computing). Clustering has been available since the 1980s when it was used in Digital Equipment Corp's VMS systems. Today, virtually all leading hardware and software companies including Microsoft, Sun Microsystems, Hewlett Packard and IBM offer clustering technology.

  • Replication Today There Are a

    Each of the databases exports a view of its tables or objects that conforms to the shared data model, so that queries can be expressed using a common set of names for properties and relationships regardless of the database. The queries are then translated so that they are actually run against the local data using local names in the local query language; in the reverse direction results may be

Read Full Dissertation
Copyright 2016 . All Rights Reserved