Faculty of Computer Science and Information Technology >
PhD Theses : Computer Science >
Please use this identifier to cite or link to this item:
|Title: ||Ontology-based approach for resolving semantic schema conflicts in the integration of semi-structured data sources|
|Authors: ||Hajmoosaei, Abdolreza|
Web data sources
Web data integration system
Semantic schema heterogeneities
|Issue Date: ||Jun-2010 |
|Publisher: ||University Malaya|
|Abstract: ||The web is the platform for information publishing; it is the biggest resource of information of any type. There are a lot of valuable data and business data on the web that organizations or users may need in order to improve the decision making process. It is therefore, very important and critical that this information are complete, precise and can be acquired on time. Most web data sources provide data in semi-structured forms on the internet. The process of combining semi-structured data from different sources on the internet often fails due to syntactic and semantic differences. The access, retrieval and utilization of information from different web data sources require the data to be integrated. Integration of web data is a complex process because of the heterogeneity nature of the web data and thus needs some kind of a web data integration system.
There are many types of heterogeneity and differences among web data sources that makes data integration a difficult process (e.g., different data model, different syntax and semantics in schema and data instance level among web data sources). Semantic schema heterogeneity, which refers to the misinterpretation of data at the schema level, is one major obstacle that needs to be overcome in web data integration process. Semantic schema heterogeneity has been identified as one of the most important problems when dealing with interoperability and cooperation among multiple data sources on the internet.
In this research, the major aim is to give a solution for resolving semantic schema heterogeneities in web data integration. For this purpose we first recommend an approach and system architecture for web data integration. The proposed web data integration system relies on the ontology technology for resolving of semantic heterogeneity among heterogeneous web data sources. Our proposed web data integration system covers all abstraction levels of data heterogeneity conflicts between web data sources. The system applies:
• ontology as a solution for resolving schema heterogeneities;
• wrapper as solution for resolving data model heterogeneities;
• converter as solution for resolving data value heterogeneities;
In the second part of the work, we focus on semantic mapping module of proposed web data integration system and propose an approach and algorithm for resolving semantic conflicts between web ontologies. We use semantic ontology mapping as a solution for the reconciliation of semantic schema conflicts between web data. The proposed algorithm uses query path as a technique to enhance the quality of the mapping results and reduce the runtime of the algorithm. The algorithm searches domain ontology in order to find user query concept and its query attributes through query path. The query path gives two strength points to algorithm as follows:
1. Reduce runtime of each achievement of mapping results: It directs algorithm toward query concept and its attributes and causes to reduce the search domain of algorithm.
2. Gain higher quality mapping results: The query path possesses concepts which have some semantically relation with query concept. Therefore the algorithm has further information about meaning of query concept that helps to find corresponding term with query concept with higher quality mapping results.|
|Description: ||Thesis (PhD) -- Faculty of Computer Science & Information Technology, University of Malaya, 2010.|
|Appears in Collections:||PhD Theses : Computer Science|