<fmt:message key='jsp.layout.header-default.alt'/>  
 

DSpace@UM >
Faculty of Computer Science and Information Technology >
Masters Dissertations: Computer Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1812/1019

Title: Learning semantic role labeling via bootstrapping with unlabeled data
Authors: Samad Zadeh Kaljahi, Rasoul
Keywords: Semantic role labeling
SRL
Natural language processing tasks
Statistical learning methods
Self-training
Issue Date: Dec-2010
Publisher: University Malaya
Abstract: Semantic role labeling (SRL) has recently attracted a considerable body of research due to its utility in several natural language processing tasks. Current state-of-the-art semantic role labeling systems use supervised statistical learning methods, which strongly rely on hand-crafted corpora. Creating these corpora is tedious and costly with the resulting corpora not representative of the language due to the extreme diversity of natural language usage. This research investigates self-training and co-training as two semi-supervised algorithms, which aim at addressing this problem by bootstrapping a classifier from a smaller amount of annotated data via a larger amount of unannotated data. Due to the complexity of semantic role labeling and a high number of parameters involved in these algorithms, several problems are associated with this task. One major problem is the propagation of classification noise into successive bootstrapping iterations. The experiments shows that the selection balancing and preselection methods proposed here are useful in alleviating this problem for self-training (e.g. 0.8 points improvement in 􀜨􀬵 for the best setting). In co-training, a main concern is the split of the problem into distinct feature views to derive classifiers based on those views to effectively co-train with each other. This work utilizes constituency-based and dependency-based views of semantic role labeling for co-training and verifies three variations of these algorithms with three different feature splits based on these views. Balancing the feature split to eliminate the performance gap between underlying classifiers proved to be important and effective. Also, co-training with a common training set for both classifiers performed better than with separate training sets for each of them, where the latter degraded the base classifier while the former could improve it by 0.9 􀜨􀬵 for the best setting. All the results show that much more unlabeled data is needed for these algorithms to be practically useful for SRL.
Description: Dissertation (M.C.S.) -- Faculty of Computer Science & Information Technology, University of Malaya, 2010.
URI: http://dspace.fsktm.um.edu.my/handle/1812/1019
Appears in Collections:Masters Dissertations: Computer Science

Files in This Item:

File Description SizeFormat
WGA080008.pdfFull Thesis4.05 MBAdobe PDFView/Open


This item is protected by original copyright



Your Tags:

 

  © Copyright 2008 DSpace Faculty of Computer Science and Information Technology, University of Malaya . All Rights Reserved.
DSpace@UM is powered by MIT - Hawlett-Packard. More information and software credits. Feedback