Sanskrit Compound Processor

Kumar, Anil; Mittal, Vipul; Kulkarni, Amba

doi:10.1007/978-3-642-17528-2_5

Cited by 21 publications

(8 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mittal (2010) used OpenFST and augmented it with sandhi rules and finally validated the segments using optimality theory. Kumar et al (2010) developed a segmenter exclusively for Sanskrit compounds using probabilistic methods and optimality theory. Natarajan and Charniak (2011) proposed a statistical sandhi splitter using a Bayesian approach handling sandhi formations.…”

Section: In Sanskritmentioning

confidence: 99%

Normalized Dataset for Sanskrit Word Segmentation and Morphological Parsing

Krishnan

Kulkarni

Huet

2023

Preprint

View full text Add to dashboard Cite

Sanskrit processing has seen a surge in the use of data-driven approaches over the past decade. Various tasks such as segmentation, morphological parsing, and dependency analysis have been tackled through the development of state-of-the-art models despite working with relatively limited datasets compared to other languages. However, a significant challenge lies in the availability of annotated datasets that are lexically, morphologically, syntactically, and semantically tagged. While syntactic and semantic tags are preferable for later stages of processing such as sentential parsing and disambiguation, lexical and morphological tags are crucial for low-level tasks of word segmentation and morphological parsing. The Digital Corpus of Sanskrit (DCS) is one notable effort that hosts over 650,000 lexically and morphologically tagged sentences from around 250 texts but also comes with its limitations at different levels of a sentence like chunk, segment, stem and morphological analysis. To overcome these limitations is to look at alternatives such as Sanskrit Heritage Segmenter (SH) and Samsaadhanii tools, that provide information complementing DCS’ data. This work focuses on enriching the DCS dataset by incorporating analyses from SH, thereby creating a dataset that is rich in lexical and morphological information from both DCS and SH. Furthermore, this work also discusses the impact of such datasets on the performances of existing segmenters, specifically the Sanskrit Heritage Segmenter.

show abstract

Section: In Sanskritmentioning

confidence: 99%

Normalized Dataset for Sanskrit Word Segmentation and Morphological Parsing

Krishnan

Kulkarni

Huet

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Although also ruled by psycholinguistic conditions, the syntagmatic limits to compounding seem to be more flexible than the limits on the number of affixes in affixation (cf. Booij (2016) on Dutch composition, Barz (2016) on German composition, Kumar et al (2010) on Sanskrit composition and Bauer et al (2013:507-508) and Bauer (1983: 69) for English derivational affixation).…”

Section: Some Remarks On Our Objectivementioning

confidence: 99%

Limits on the extension of affixal combination: structural restrictions and processing conditions

Rodrigues¹

2017

suvlin

View full text Add to dashboard Cite

Limits on the extension of affixal combination: structural restrictions and processing conditions The study of the mental lexicon has been fostered by the analysis of the way complex words are mentally represented and processed. This paper concerns the syntagmatic extension of multiple affixation; specifically, the processing of complex words that contain four suffixes that operate in word-formation patterns of Portuguese. Although the individual addition of suffixes obeys structural constraints, the multiple combination results in complex words with low frequency and low expectedness by the speaker, which contribute to the lack of semantic transparency and of affixal salience of the combination. Our study demonstrates a relation between these factors and the experience of the speaker with the affixal combination, which determines the pattern character of the combination. We suggest that a suffix exerts the prediction of other suffixes as long as the combination is expected. Non-frequent heterocategorial complex words with a combination of four suffixes are contrasted with non-frequent words containing pleonastic affixation. In the latter type of words, the redundancy of semantic structures increases the semantic transparency of the word, which suggests a prediction effect operating on the semantic level of the affixal combination. Processing of complex words is dependent on the level of expectedness of the speaker towards the affix combination, which constrains the level of word acceptance by speakers.

show abstract

“…But mere splitting the compound may not give complete meaning all the time. To understand the meaning of a compound, first identify the meaning of components and then the relationship between them [11]. For instance, a compound 'rAmunitOkapirAju' is formed by two words 'rAmunitO + kapirAju'.…”

Section: Process Of Splitting Wordsmentioning

confidence: 99%

Key Issues in Vowel Based Splitting of Telugu Bigrams

Rao¹,

Prasad²

2014

SpecialIssue

View full text Add to dashboard Cite

Abstract-Splitting of compound Telugu words into its components or root words is one of the important, tedious and yet inaccurate tasks of Natural Language Processing (NLP). Except in few special cases, at least one vowel is necessarily involved in Telugu conjunctions. In the result, vowels are often repeated as they are or are converted into other vowels or consonants. This paper describes issues involved in vowel based splitting of a Telugu bigram into proper root words using Telugu grammar conjunction ('sandhi') rules for MT.

show abstract

Sanskrit Compound Processor

Cited by 21 publications

References 9 publications

Normalized Dataset for Sanskrit Word Segmentation and Morphological Parsing

Normalized Dataset for Sanskrit Word Segmentation and Morphological Parsing

Limits on the extension of affixal combination: structural restrictions and processing conditions

Key Issues in Vowel Based Splitting of Telugu Bigrams

Contact Info

Product

Resources

About