این سایت در حال حاضر پشتیبانی نمی شود و امکان دارد داده های نشریات بروز نباشند
صفحه اصلی
درباره پایگاه
فهرست سامانه ها
الزامات سامانه ها
فهرست سازمانی
تماس با ما
JCR 2016
جستجوی مقالات
شنبه 6 دی 1404
International Journal of Information and Communication Technology Research (IJICT
، جلد ۷، شماره ۳، صفحات ۶۳-۷۲
عنوان فارسی
چکیده فارسی مقاله
کلیدواژههای فارسی مقاله
عنوان انگلیسی
Density-Based K-Nearest Neighbor Active Learning for Improving Farsi-English Statistical Machine Translation System
چکیده انگلیسی مقاله
Labeled data are useful resources for different application in different fields like image processing, natural language processing etc. Producing labeled data is a costly process. One efficient solution for alleviating the costly process of annotating data is managing the sampling process. It is better to query for essential samples instead of a group of unnecessary ones. Active learning (AL) attempts to overcome the labeling bottleneck by sending queries for unlabeled instances to be labeled with the help of an annotator. This technique is applied to Natural Language Processing (NLP) especially in Statistical Machine Translation (SMT) tasks that we also focus on in this work. In Statistical Machine Translation, parallel corpora are scarce resources, and AL is a way of solving this problem. It attempts to alleviate the costly process of data annotating by sending queries just for translation of the most informative sentences which are essential for system improvement. The contribution of our work is proposing a new approach in AL for selecting sentences through a soft decision making process. In this algorithm, in addition to scoring sentences according to their information, the distribution of the space of unlabeled data is also considered. Each sentence (either labeled or unlabeled) changes to a vector of feature scores. Then each new coming sentence is observed in the feature space and gets two probabilities: how probable it is to be either labeled or unlabeled. These probabilities are calculated according to the position of new instance related to its labeled and unlabeled neighbors. We have applied the proposed model for improving training corpus of a SMT system. Also Farsi-English language pairs are selected as the base-line SMT system. We have sampled the best sentences that can improve the quality of our SMT system and send query for their translations. In this way the costly approach of making parallel corpus is alleviated. Finally, our experiments show significant improvements for sampling sentences by soft decision making in comparison to the random sentence selection strategy.
کلیدواژههای انگلیسی مقاله
نویسندگان مقاله
| Somayeh Bakhshaei
| Reza Safabakhsh
| Shahram Khadivi
نشانی اینترنتی
http://ijict.itrc.ac.ir/browse.php?a_code=A-10-27-68&slc_lang=fa&sid=1
فایل مقاله
اشکال در دسترسی به فایل - ./files/site1/rds_journals/417/article-417-1212373.pdf
کد مقاله (doi)
زبان مقاله منتشر شده
fa
موضوعات مقاله منتشر شده
فناوری اطلاعات
نوع مقاله منتشر شده
پژوهشی
برگشت به:
صفحه اول پایگاه
|
نسخه مرتبط
|
نشریه مرتبط
|
فهرست نشریات