Learning finite-state models for machine translation

Casacuberta, Francisco; Vidal, Enrique

doi:10.1007/s10994-006-9612-9

Cited by 19 publications

(18 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Only a few techniques to learn SFSTs can be found in the literature (Bangalore and Riccardi, 2002;Knight and Al-Onaizan, 1998;Casacuberta and Vidal, 2007). However, a relation between regular translations generated by SFSTs and regular languages over some alphabet of string pairs was established through morphisms (Berstel, 1979).…”

Section: Model Training With Greatmentioning

confidence: 99%

“…This property was used to propose a method of inference of SFSTs based on the inference of stochastic finite-state automata (SFSAs) (Casacuberta and Vidal, 2004). This method, which has been widely used in SMT applications (Casacuberta and Vidal, 2007;Pérez et al, 2008;González and Casacuberta, 2009), is known as GIATI and is the training framework of GREAT.…”

Section: Model Training With Greatmentioning

confidence: 99%

See 1 more Smart Citation

GREAT: open source software for statistical machine translation

González

Casacuberta

2011

Machine Translation

Self Cite

View full text Add to dashboard Cite

Section: Model Training With Greatmentioning

confidence: 99%

Section: Model Training With Greatmentioning

confidence: 99%

GREAT: open source software for statistical machine translation

González

Casacuberta

2011

Machine Translation

Self Cite

View full text Add to dashboard Cite

“…Finite-state transducers are versatile models that count on thoroughly studied efficient implementations for training (Casacuberta and Vidal, 2007) and decoding (Mehryar Mohri and Riley, 2003). Definition and layout for probabilistic finite-state machines (automata and transducers) were comprehensively described in (Vidal et al, 2005a,b), and so we are going to follow that formalism.…”

Section: Stochastic Finite-state Transducersmentioning

confidence: 99%

Joining linguistic and statistical methods for Spanish-to-Basque speech translation

Pérez

Torres

Casacuberta

2008

Speech Communication

Self Cite

View full text Add to dashboard Cite

To cite this version:Alicia Pérez, M. Inés Torres, Francisco Casacuberta. Joining linguistic and statistical methods for Spanish-to-Basque speech translation. Speech Communication, Elsevier : North-Holland, 2008, 50 (11-12) AbstractThe goal of this work is to develop a text and speech translation system from Spanish to Basque. This pair of languages shows quite odd characteristics as they differ extraordinarily in both morphology and syntax, thus, attractive challenges in machine translation are involved. Nevertheless, since both languages share official status in the Basque Country, the underlying motivation is not only academic but also practical.Finite-state transducers were adopted as basic translation models. The main contribution of this work involves the study of several techniques to improve probabilistic finite-state transducers by means of additional linguistic knowledge. Two methods to cope with both linguistics and statistics were proposed. The first one performed a morphological analysis in an attempt to benefit from atomic meaningful units when it comes to rendering the meaning from one language to the other. The second approach aimed at clustering words according to their syntactic role and used such phrases as translation unit. From the latter approach phrase-based finite-state transducers arose as a natural extension of classical ones.The models were assessed under a restricted domain task, very repetitive and with a small vocabulary. Experimental results shown that both morphological and syntactical approaches outperformed the baseline under different test sets and architectures for speech translation.

show abstract

“…SFSTs also permit a simple integration with other information sources, which makes it easy to apply SFSTs to more difficult tasks such as speech translation [Casacuberta et al, 2004]. SFSTs and the corresponding training and search techniques have been studied by several authors, in many cases explicitly motivated by MT applications [E. Vidal and Segarra, 1989, Oncina et al, 1993, Knight and Al-Onaizan, 1998, Mäkinen, 1999, Amengual et al, 2000, Alshawi et al, 2000a, Casacuberta, 2000a, Vilar, 2000, Vogel and Ney, 2000, Picó and Casacuberta, 2001, Bangalore and Riccardi, 2003, Kumar and Byrne, 2003, Casacuberta and Vidal, 2004, Tsukada and Nagata, 2004, Casacuberta et al, 2005, Kumar et al, 2006, Casacuberta and Vidal, 2007, Mariòo et al, 2006. There are other statistical models for MT that are based on alignments between words (statistical word-alignment models) or between word sequences (phrase-based models or alignment templates) [Och andNey, 2004, Zens, 2008].…”

Section: Introductionmentioning

confidence: 99%

“…The GIATI technique [Casacuberta and Vidal, 2007] has been applied to machine translation [Casacuberta and Vidal, 2004], speech translation [Casacuberta et al, 2004] and computed-assisted translation [Barrachina et al]. The results obtained using GIATI suggest that, among all the SFST learning techniques tested, GIATI is the only one that can cope with translation tasks under real conditions of vocabulary sizes and amounts of training data available.…”

Section: Introductionmentioning

confidence: 99%

Statistical approaches for natural language modelling and monotone statistical machine translation

Ferrer¹

View full text Add to dashboard Cite

AcknowledgementsEstas líneas son las más difíciles de escribir de la tesis aunque no contengan formulas ni formalismos matemáticos. Si tuviese que enumerar a todas las personas que han afectado positivamente (ó negativamente) al resultado de esta tesis, probablemente, tendría que anotar toda la tesis con comentarios al margen. Por ello, solo nombraré a las personas cuyas aportaciones han sido constantes e/o importantes sin por ello menospreciar al resto de personas que han influido.Comenzaré por proximidad, agradeciendo a los habitantes del laboratirio 101L del departamento. No sólo a los miembros actuales, sino también a los fundadores de "Teachings of the final convergence". En particular, agradecer a Ramón Granell por nuestras discusiones; a Adrián Gimenez por ejercer de "diablo" en las reflexiones de algunos de mis problemas; y a Ricardo Sánchez por ser tan "coloquial y amigable".Esta tésis no habría existido de no ser por el apoyo económico de la "Generalitat Valenciana" formalizado en la beca FPI con referencia CTBPRA/2005/004 recibida bajo el amparo de la "Conselleria d'Empresa, Universitat i Ciència". Así mismo estoy en deuda con Alfons Juan Císcar tanto por haber aceptado ser mi director de tesis, como por su consejo y su tesón en evitar mi dispersión en tantos temas interesantes que existen en la investigación. También debo agradecimiento a Francisco Casacuberta Nolla por su reflexiones, correcciones y guía. A parte del agradecimiento a mis directores de tesis me gustaría agradecerle a Prof. Hermann Ney por haberme acogido en Aquisgrán permitiendome así "disfrutar" tanto del clima de dicha región germana, como de sus apasionantes reflexiones que me permitieron refinar e incrementar mis conocimientos.Mención especial requieren todos de mis compañeros de fatigas del PRHLT y del ITI en general y algunos en particular como por ejemplo Jorge Civera por soportar mis reflexiones sobre asuntos no sólo de investigación y por la plantilla de esta tesis que me ha ahorrado muchos quebraderos de cabeza; o Daniel Ortiz e Ismael García Varea por nuestras colaboraciones que son parte de esta tesis; así como a todos aquellos que con asiduidad han asistido las cenas minimalistas y no minimalistas.Apropiandome de una cita del célebre Adrián Giménez "Cada tesis esconde un drama personal"; y este caso no podría ser una excepción. Así que estoy en deuda con todos aquellos que me han permitido disfrutar de la vida durante esta ardua e interminable tarea. Me veo en el deber de comenzar por la fuerte amistad que forgé con mis compañeros de carrera: Alex, Gabi y Jesús; amistad que todavía hoy perdura. v Desde mi retiro como exremero, no puedo sino que agradecer a todo el equipo de remo de la Universidad Politécnica de Valencia, tanto por las millas y el sudor derramado en la mar; como por las cervezas ingeridas y derramadas en la mesa de algún bar. En especial a todos aquellos remeros que me han acompañado desde el comienzo no sólo de esta tesis sino de mis estudios en informática cuando decicí practicar tan gratificante deport...

show abstract

Learning finite-state models for machine translation

Cited by 19 publications

References 47 publications

GREAT: open source software for statistical machine translation

GREAT: open source software for statistical machine translation

Joining linguistic and statistical methods for Spanish-to-Basque speech translation

Statistical approaches for natural language modelling and monotone statistical machine translation

Contact Info

Product

Resources

About