Covilhã, Novembro de 2010ii I would like to dedicate this thesis to my loving wife, Adelina Amorim, and my three precious children: David, Tiago, and André.iii iv
AcknowledgementsA work like this is almost impossible to be achieved by a single individual working alone.There are always several very important persons involved, more or less directly and in various levels: scientific, personal, or even in the sentimental/spiritual level. In this few lines I would like to express my deepest gratitude to those having been determinate for the conclusion of this long and hard work.I would like to start by acknowledging my supervisor, Prof. Doctor Gaël Harry Dias, for his constant and relentless support throughout this journey. He was indeed an always presently supporter, guiding me many times, and even motivating me toward new unexplored scientific and technological territories. My co-supervisor, Professor Pavel Bernard Brazdil, was equally very important for the developing and conclusion of this work. With his long experience as a leading scientist, every piece of advise received from him were carefully observed and incorporated here. I also want to thank all the teachers I had, spe-
AbstractThe field of Automatic Sentence Reduction has been an active research topic, with several relevant approaches being recently proposed. However, in our view many milestones still need to be reached in order to approach human-like quality sentence simplification.In this work, we propose a new framework, which processes huge sets of web news stories and learns sentence reduction rules in a fully automated and unsupervised way. This is our main contribution. Our system is conceptually composed of several modules. In the first one, the system automatically extracts paraphrases from on-line news stories, using new lexically based functions that we have proposed. In our system's second module, the extracted paraphrases are transformed into aligned paraphrases, meaning that the two paraphrasic sentences get their words aligned through DNA-like sequence alignment algorithms, that has been conveniently adapted for aligning sequences of words. These alignments are then explored and specific text structures called bubbles are selected.Afterwards, these structures are transformed into learning instances and used in the last learning module that exploits techniques of Inductive Logic Programming. This module learns the rules for sentence reduction. Results show that this is a good approach for learning automatic sentence reduction, while some pertinent issues still need future investigation.
KeywordsSentence reduction, sentence compression, sentence simplification, paraphrase extraction, paraphrase alignment, automatic text summarization, natural language processing, inductive logic programing, machine learning, artificial intelligence.
Palavras-chaveRedução de frases, compressão de frases, simplificação de frases, extracção de paráfrases, alinhamento de paráfrases, sumarização automática de texto, processamento da linguagem natural, programaç...