An Arabic Language Resource for Computational Morphology Based on the Semitic Model

An Arabic Language Resource for Computational Morphology Based on the Semitic Model
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : OCLC:1225568550
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis An Arabic Language Resource for Computational Morphology Based on the Semitic Model by : Alexis Neme

Download or read book An Arabic Language Resource for Computational Morphology Based on the Semitic Model written by Alexis Neme and published by . This book was released on 2020 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: We developed an original approach to Arabic traditional morphology, involving new concepts in Semitic lexicology, morphology, and grammar for standard written Arabic. This new methodology for handling the rich and complex Semitic languages is based on good practices in Finite-State technologies (FSA/FST) by using Unitex, a lexicon-based corpus processing suite. For verbs (Neme, 2011), I proposed an inflectional taxonomy that increases the lexicon readability and makes it easier for Arabic speakers and linguists to encode, correct, and update it. Traditional grammar defines inflectional verbal classes by using verbal pattern-classes and root-classes. In our taxonomy, traditional pattern-classes are reused, and root-classes are redefined into a simpler system. The lexicon of verbs covered more than 99% of an evaluation corpus. For nouns and adjectives (Neme, 2013), we went one step further in the adaptation of traditional morphology. First, while this tradition is based on derivational rules, we found our description on inflectional ones. Next, we keep the concepts of root and pattern, which is the backbone of the traditional Semitic model. Still, our breakthrough lies in the reversal of the traditional root-and-pattern Semitic model into a pattern-and-root model, which keeps small and orderly the set of pattern classes and root sub-classes. I elaborated a taxonomy for broken plural containing 160 inflectional classes, which simplifies ten times the encoding of broken plural. Since then, I elaborated comprehensive resources for Arabic. These resources are described in Neme and Paumier (2019). To take into account all aspects of the rich morphology of Arabic, I have completed our taxonomy with suffixal inflexional classes for regular plurals, adverbs, and other parts of speech (POS) to cover all the lexicon. In all, I identified around 1000 Semitic and suffixal inflectional classes implemented with concatenative and non-concatenative FST devices.From scratch, I created 76000 fully vowelized lemmas, and each one is associated with an inflectional class. These lemmas are inflected by using these 1000 FSTs, producing a fully inflected lexicon with more than 6 million forms. I extended this fully inflected resource using agglutination grammars to identify words composed of up to 5 segments, agglutinated around a core inflected verb, noun, adjective, or particle. The agglutination grammars extend the recognition to more than 500 million valid delimited word forms, partially or fully vowelized. The flat file size of 6 million forms is 340 megabytes (UTF-16). It is compressed then into 11 Mbytes before loading to memory for fast retrieval. The generation, compression, and minimization of the full-form lexicon take less than one minute on a common Unix laptop. The lexical coverage rate is more than 99%. The tagger speed is 5000 words/second, and more than 200 000 words/s, if the resources are preloaded/resident in the RAM. The accuracy and speed of our tools result from our systematic linguistic approach and from our choice to embrace the best practices in mathematical and computational methods. The lookup procedure is fast because we use Minimal Acyclic Deterministic Finite Automaton (Revuz, 1992) to compress the full-form dictionary, and because it has only constant strings and no embedded rules. The breakthrough of our linguistic approach remains principally on the reversal of the traditional root-and-pattern Semitic model into a pattern-and-root model.Nonetheless, our computational approach is based on good practices in Finite-State technologies (FSA/FST) as all the full-forms were computed in advance for accurate identification and to get the best from the FSA compression for fast and efficient lookups.


An Arabic Language Resource for Computational Morphology Based on the Semitic Model Related Books

An Arabic Language Resource for Computational Morphology Based on the Semitic Model
Language: en
Pages: 0
Authors: Alexis Neme
Categories:
Type: BOOK - Published: 2020 - Publisher:

DOWNLOAD EBOOK

We developed an original approach to Arabic traditional morphology, involving new concepts in Semitic lexicology, morphology, and grammar for standard written A
Arabic Computational Morphology
Language: en
Pages: 306
Authors: Abdelhadi Soudi
Categories: Language Arts & Disciplines
Type: BOOK - Published: 2007-10-01 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

This is the first comprehensive overview of computational approaches to Arabic morphology. The subtitle aims to reflect that widely different computational appr
Introduction to Arabic Natural Language Processing
Language: en
Pages: 170
Authors: Nizar Y. Habash
Categories: Computers
Type: BOOK - Published: 2022-06-01 - Publisher: Springer Nature

DOWNLOAD EBOOK

This book provides system developers and researchers in natural language processing and computational linguistics with the necessary background information for
Computational Nonlinear Morphology
Language: en
Pages: 210
Authors: George Anton Kiraz
Categories: Computers
Type: BOOK - Published: 2001-12-17 - Publisher: Cambridge University Press

DOWNLOAD EBOOK

By the late 1970s phonologists, and later morphologists, had departed from a linear approach for describing morphophonological operations to a nonlinear one. Co
Language Processing and Acquisition in Languages of Semitic, Root-Based, Morphology
Language: en
Pages: 400
Authors: Joseph Shimron
Categories: Language Arts & Disciplines
Type: BOOK - Published: 2003-04-28 - Publisher: John Benjamins Publishing

DOWNLOAD EBOOK

This book puts together contributions of linguists and psycholinguists whose main interest here is the representation of Semitic words in the mental lexicon of