Cross-lingual Information Retrieval: application and Challenges for Indian Languages

Patel, Jay; Makvana, Kamlesh; Shah, Parth

doi:10.1109/i2ct45611.2019.9033563

Cited by 3 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Spoken https://www.indjst.org/ languages have a specific set of rules, symbols and grammar that form a meaningful conversation. Previous research focussed on Monolingual, Bilingual, Cross Lingual (1) Text Translation and Monolingual Speech (2) Translation but their exists a gap in the previous research that Multilingual Speech and Cross Lingual Speech is not addressed. Hence, in the current research we have addressed the issue of Multilingual and Cross Lingual Speech.…”

Section: Introductionmentioning

confidence: 99%

HiTEK Pre-processing for Speech and Text: NLP

Rudrappa¹,

Reddy²,

Hanumanthappa³

2023

IJST

View full text Add to dashboard Cite

Objective: To develop a system that accepts a sentence consisting of two and/or four languages and convert it to a target language text, termed as Cross Language Speech Identification and Text Translation System. Methods: A combinatorial model consisting of Hidden Markov Model, Artificial Neural Networks, Deep Neural Networks and Gaussian Mixture Model are utilized for direct and indirect speech mapping. Trained dataset consisting of thousand phonemes for each of the Hindi, Telugu, English and Kannada languages, initially for bank, hospital domains, later the grammatical phonemes of each language were added and wave files consisting of cross lingual spoken sentence were created which incurred a six months period to build from scratch, as cross lingual vocal data-set is not available. Hindi language dataset Shabdanjali was also referred. The basic parameters considered for creation of structured dataset are loudness, pause, pitch, tone, noise cancellation, sampling frequency, threshold etc. Findings: Comparative analysis of various techniques, target languages and features are tabulated. Research idea emerged from the comparative analysis of Monolingual Systems where there was a gap for cross lingual speech to text translation. The architecture can be enhanced in future for other regional languages of India. Novelty: A new bench mark for Cross Language dataset was created. This work presents CLSITT tool applicable in transforming public speeches spoken in multiple languages to a selected target language and the tool is helpful for a regional news editor, rural and agricultural activities, medical applications, defence and so on.

show abstract

Section: Introductionmentioning

confidence: 99%