سامانه اطلاعات پژوهشی ایران

این سایت در حال حاضر پشتیبانی نمی شود و امکان دارد داده های نشریات بروز نباشند

یکشنبه 30 آذر 1404


پردازش علائم و داده ها، جلد ۱۵، شماره ۴، صفحات ۱۲۳-۱۳۰


عنوان فارسی	برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

چکیده فارسی مقاله	برچسب‌گذاری ادات سخن یکی از مسائل مطرح در حوزۀ پردازش زبان‌های طبیعی است. هدف در این مسئله تعیین نقش واژگان در جمله است. برحسب این برچسب‌گذاری ویژگی‌های دستوری و نحوی واژگان نیز مشخص می‌شود. در این مقاله یک روش مبتنی بر آماری برای ادات سخن فارسی پیشنهاد شده است. در این روش محدودیت‌های روش‌های آماری با استفاده از معرّفی یک مدل شبکه فازی کاهش پیدا کرده است؛ بهطوریکه درصورت وجود تعداد کمی دادۀ آموزشی، مدل فازی پارامترهای قابل اطمینان‌تری را تخمین می‌زند. در این روش ابتدا هنجار‌سازی به‌عنوان پیش‌پردازش صورت گرفته و سپس فراوانی هر واژه با توجه به برچسب مربوطه به‌صورت یک تابع فازی تخمین زده و سپس مدل شبکه فازی تشکیل شده و درجۀ هر یال در این شبکه با استفاده از یک شبکۀ عصبی و تابع عضویت مشخص می‌شود. درنهایت بعد از اینکه مدل شبکۀ فازی برای یک جمله ساخته شد، از الگوریتم ویتربی برای تعیین محتمل‌ترین مسیر در این شبکه استفاده شده است. نتایج آزمایش روی پیکرۀ بی‌جن‌خان کارایی این روش را تأیید کرده و نشان می‌دهد که روش پیشنهادی در شرایطی که داده‌های آموزشی کمتری در اختیار باشد، از روش‌های مشابه، مثل مدل مخفی مارکوف عملکرد بهتری دارد.

کلیدواژه‌های فارسی مقاله

عنوان انگلیسی	Part Of Speech Tagging of Persian Language using Fuzzy Network Model

چکیده انگلیسی مقاله	Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical category of the words in a sentence. Grammatical and syntactical features of words are determined based on these tags. The function of existing tagging methods depends on the corpus. As if the educational and test data are extracted from a corpus, the methods are well-functioning, or if the number of educational data is low, especially in probabilistic methods, the accuracy level also decreases. The words used in sentences are often vague. For example, the word 'Mahrami' can be a noun or an adjective. Existing ambiguity can be eliminated by using neighbor words and an appropriate tagging method. Methods in this domain are divided into several categories such as:based on memory [2], rule based methods [5], statistical [6], and neural network [7]. The precision of more of these methods is an average of 95% [1]. In the paper [13], using the TnT probabilistic tagging and smoothing and variations on the estimation of the three-words likelihood function, a tagging model has been created that has reached 96.7% in total on the Penn Treebank and NEGRA entities. [14] Using the representation of the dependency network and extensive use of lexical features, such as the conditional continuity of the sequence of words, as well as the effective use of the foreground in the linear models of linear logarithms and fine-grained modeling of the unknown words, on the Penn Treebank WSJ model, 97.24% accuracy is achieved. The first work in Farsi that has used the word neighborhoods and the similarity distribution between them. The accuracy of the system is 57.5%. In [19], a Persian open source tagger called HunPoS was proposed. This tag uses the same TnT method based on the Hidden Markov model and a triple sequence of words, and 96.9% has reached on the ''Bi Jen Khan'' corpus. In this paper a statistical based method is proposed for Persian POS tagging. The limitations of statistical methods are reduced by introducing a fuzzy network model, such that the model is able to estimate more reliable parameters with a small set of training data. In this method, normalization is done as a preprocessing step and then the frequency of each word is estimated as a fuzzy function with respect to the corresponding tag. Then the fuzzy network model is formed and the weight of each edge is determined by means of a neural network and a membership function. Eventually, after the construction of a fuzzy network model for a sentence, the Viterbi algorithm as s subset of Hidden Markov Model (HMM) algorithms is used to specify the most probable path in the network. The goal of this paper is to solve a challenge of probabilistic methods when the data is low and estimation made by these models is mistaken. The results of testing this method on ``Bi Jen Khan'' corpus verified that the proposed method has better performance than similar methods, like hidden Markov model, when fewer training examples are available. In this experiment, several times the data is divided into two groups of training and test with different sizes ascending. On the other hand, in the initial experiments, we reduced the train data size and, in subsequent experiments, increased its size and compared with the HMM algorithm. As shown in figure 4, the train and test set and are directly related to each other, as the error rate decreases with increasing the training set and vice versa. In tests, three criteria involving precision, recall and F1 have been used. In Table 4, the implementation of HMM models and a fuzzy network is compared with each other and the results are shown.

کلیدواژه‌های انگلیسی مقاله

نویسندگان مقاله	محمد بادپیما \| mohammad badpeima MUT دانشگاه مالک‌اشتر فاطمه حورعلی \| Fatemeh hourali Esfarayen مجتمع آموزش عالی اسفراین مریم حورعلی \| Maryam hourali MUT دانشگاه مالک‌اشتر

نشانی اینترنتی	http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-941-1&slc_lang=fa&sid=1
فایل مقاله	اشکال در دسترسی به فایل - ./files/site1/rds_journals/1315/article-1315-1358698.pdf
کد مقاله (doi)
زبان مقاله منتشر شده	fa
موضوعات مقاله منتشر شده	مقالات پردازش متن
نوع مقاله منتشر شده	پژوهشی

برگشت به: صفحه اول پایگاه \| نسخه مرتبط \| نشریه مرتبط \| فهرست نشریات

ارسال پیام برخط

در صورت مشاهده هر نوع اشکال در داده های پایگاه و یا برای ارسال نظرات و پیشنهاد های خود می توانید با پر کردن فرم تماس ما را در جریان قرار دهید.
برای پر کردن فرم تماس اینجا را کلیک کنید.

آمار پایگاه

نمایه شده در ISI 135

نمایه شده در PubMed 109

نمایه شده در Scopus 192

کاربران برخط 1165

بازدید امروز 9038

بازدید کل 39499619

اطلاعات تماس

آدرس : تهران، سعادت آباد، بلوار پاکنژاد شمالی، بالاتر از میدان سرو، نبش کوچه ندا، پلاک ۶۸، ساختمان جاوید، واحد ۱۶

پست الکترونیک: yektaweb-AT-gmail.com

توجه

کلیه حقوق این وب سایت و مطالب آن متعلق به شرکت یکتاوب بوده و استفاده از مطالب آن با ذکر منبع بلامانع است
طراحی و برنامه نویسی: یکتاوب افزار شرق