|
DSpace@UM >
Faculty of Computer Science and Information Technology >
Masters Dissertations: Computer Science >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1812/1019
|
|
| Title: | Learning semantic role labeling via bootstrapping with unlabeled data |
| Authors: | Samad Zadeh Kaljahi, Rasoul |
| Keywords: | Semantic role labeling SRL Natural language processing tasks Statistical learning methods Self-training |
| Issue Date: | Dec-2010 |
| Publisher: | University Malaya |
| Abstract: | Semantic role labeling (SRL) has recently attracted a considerable body of research due
to its utility in several natural language processing tasks. Current state-of-the-art
semantic role labeling systems use supervised statistical learning methods, which strongly rely on hand-crafted corpora. Creating these corpora is tedious and costly with the resulting corpora not representative of the language due to the extreme diversity of natural language usage. This research investigates self-training and co-training as two semi-supervised algorithms, which aim at addressing this problem by bootstrapping a
classifier from a smaller amount of annotated data via a larger amount of unannotated data. Due to the complexity of semantic role labeling and a high number of parameters involved in these algorithms, several problems are associated with this task. One major problem is the propagation of classification noise into successive bootstrapping iterations. The experiments shows that the selection balancing and preselection methods proposed here are useful in alleviating this problem for self-training (e.g. 0.8 points
improvement in for the best setting). In co-training, a main concern is the split of the problem into distinct feature views to derive classifiers based on those views to
effectively co-train with each other. This work utilizes constituency-based and
dependency-based views of semantic role labeling for co-training and verifies three
variations of these algorithms with three different feature splits based on these views.
Balancing the feature split to eliminate the performance gap between underlying
classifiers proved to be important and effective. Also, co-training with a common
training set for both classifiers performed better than with separate training sets for each of them, where the latter degraded the base classifier while the former could improve it by 0.9 for the best setting. All the results show that much more unlabeled data is needed for these algorithms to be practically useful for SRL. |
| Description: | Dissertation (M.C.S.) -- Faculty of Computer Science & Information Technology, University of Malaya, 2010. |
| URI: | http://dspace.fsktm.um.edu.my/handle/1812/1019 |
| Appears in Collections: | Masters Dissertations: Computer Science
|
This item is protected by original copyright
|
|