Workshop

Incontro conclusivo del progetto PARLI: risultati raggiunti e sfide per il futuro

12 settembre 2012
Aula Magna del Rettorato dell'Università di Torino

Il workshop conclusivo del progetto PARLI si terrà a Torino il 12 settembre 2012 presso l'Aula Magna del Rettorato dell'Università degli Studi di Torino (Via Verdi 8). Il workshop è finalizzato alla presentazione delle attività svolte durante il progetto ed alla discussione di quelle future relative al NLP per la lingua italiana.

Programma del workshop:

09:30 Apertura dei Lavori
Leonardo Lesmo, Università degli Studi di Torino:
Il progetto PARLI.
10:00
Rodolfo Delmonte, Università Ca' Foscari di Venezia:
Dependency Treebank Annotation and Null Elements: an experiment with VIT. (slides)
open/close abstract
State of the art parsers are currently trained on converted versions of Penn Treebank into dependency representations which however don’t include null elements. This is done to facilitate structural learning and prevent the probabilistic engine to postulate the existence of deprecated null elements everywhere (see R. Gaizauskas, 1995). However it is a fact that in this way, the semantics of the representation used and produced on runtime is inconsistent and will reduce dramatically its usefulness in real life applications like Information Extraction, Q/A and other semantically driven fields by hampering the mapping of a complete logical form. What systems have come up with are “Quasi”-logical forms or partial logical forms mapped directly from the surface representation in dependency structure. We show the most common problems derived from the conversion and then describe an algorithm that we have implemented to apply to our converted Italian Treebank, that can be used on any CONLL-style treebank or representation to produce an “almost complete” semantically consistent dependency treebank.
10:30
Maria Simi, Università di Pisa,
Cristina Bosco, Università degli Studi di Torino,
Simonetta Montemagni, Istituto Linguistica Computazionale di Pisa:
Towards harmonization and merging of Italian Dependency Treebanks.
open/close abstract
The talk describes a methodology for the construction of a Merged Italian Dependency Treebank (MIDT) starting from the already existing resources TUT and ISST-TANL. In particular, the effort has been oriented to a detailed comparative analysis of the structures annotated in TUT and ISST-TANL and to the harmonization of the annotation schemes of these resources. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed with a particular emphasis on the definition of a set of linguistic categories to be used as a bridge between the specific schemes. As a result of this study we have implemented conversion scripts from TUT and ISST-TANL to MIDT, obtained a preliminary version of a shared resource in MIDT format, and measured the performance of the DeSR statistical parser trained on the new MIDT resource.
11:00
Manuela Sanguinetti, Università degli Studi di Torino,
Cristina Bosco, Università degli Studi di Torino ,
Leonardo Lesmo, Università degli Studi di Torino:
ParTUT and translation shift study.
open/close abstract
Parallel corpora, and parallel treebanks in particular, are currently considered among the crucial resources for a variety of NLP tasks, e.g. machine translation and cross-lingual information extraction, and for research in the field of translation studies and contrastive linguistics. In this talk we present ParTUT, an ongoing project for the development of a parallel treebank for Italian, English and French annotated in the pure dependency format of the Turin University Treebank. The main topic of the talk will include a brief discussion on the study of translational divergences and their implications for the development of an alignment tool of parallel parse trees that, benefitting from the linguistic information provided, could properly deal with such divergences. As a final remark, we will discuss whether and to what extent the specific features of the TUT representation format may affect the design and implementation of an alignment system so format-dependent.
11:30
Cristina Bosco, Università degli Studi di Torino,
Anna Corazza, Università Federico II di Napoli,
Anita Alicante, Università Federico II di Napoli,
Alberto Lavelli, Fondazione Bruno Kessler di Trento:
Evaluation methodologies and Italian word order.
open/close abstract
The aim of this talk is to describe the methodology applied in the PARLI project for the evaluation task. The contribution focusses in particular on the debate on the issues raised by Morphologically Rich Languages, and more precisely on the investigation, in a cross-paradigm perspective, of the influence of constituent order on the data-driven parsing of one of such languages (i.e. Italian). It shows therefore new evidence from experiments on Italian, a language characterized by a rich verbal inflection, which leads to a widespread diffusion of the pro-drop phenomenon and to a relatively free word order. The experiments are performed by using state-of-the-art data-driven parsers (i.e. MaltParser and Berkeley parser) and are based on an Italian treebank available in formats that vary according to two dimensions, i.e. the paradigm of representation (dependency vs. constituency) and the level of detail of linguistic information. The aim of this works however goes beyond the results obtained; it sought instead to contribute to the debate by exploring new methdological perspectives in the evaluation field.
12:00
Simona Colombo, Università degli Studi di Torino,
Elisa Corino, Università degli Studi di Torino:
CMC e Corpora: usare il web per studiare la lingua.
open/close abstract
Il gruppo di ricerca dell’Università di Torino, sotto la supervisione della Professoressa Marello e di Manuel Barbera, negli ultimo anni ha creato un gruppo di corpora diversi per studiare diversi aspetti della lingua. Verrà fatta un’analisi dei corpora sviluppati, illustrandone l’interfaccia di utilizzo ed i possibili usi applicativi nello studio dei diversi aspetti della lingua. Le risorse analizzate includeranno l’analisi dei NUNC (www.corpora.unito.it), corpora multilingue, specialistici e generici, costruiti collezionando le risorse dei newsgroup, di VALERE (http://www.progettovalere.org) per lo studio dei differenti registri dell’italiano, di RIDIRE (http://lablita.dit.unifi.it/projects/RIDIRE) web corpus suddiviso in domini semantici, nato come risorsa per lo studio dell’italiano come L2.
Bibliografia
(1) Barbera, M., Corino, E. & Onesti, C. 2008. Corpora e linguistica in rete. Perugia: Guerra Edizioni.
(2) Baroni, M. & Bernardini, M. S (eds) 2006. Wacky! Working papers on the Web as Corpus.Bologna: GEDIT. http://wackybook.sslmit.unibo.it/
12:30 Pausa Pranzo
14:00
Giuseppe Attardi, Università di Pisa:
Syntactic Dependencies: learning, adapting, annotating
open/close abstract
Dependency annotations allow representing the syntactic structure of sentences in a way that closely reflects the underlying semantic relations. Dependency trees can be exploited in many tasks of text analysis, including entity recognition, sentiment analysis, text entailment, text classification, relation extraction, machine translation. High accuracy in parsing can be achieved by training a statistical parser on an annotated corpus. However the accuracy decreases when the documents to analyze come from a different domain than the original trianing corpus. Therefore the training resources must be extended to cover a sample as large as possible of the language. We explored several ways to achieve this: self-training, active learning and crowd sourcing. We report on our experiments with these techniques. Active learning proved quite effective in several tasks and competitions, like Evalita and SPLeT. In order to exploit crowd sourcing, we built a "game with a purpose", called Phratris, that engages users in composing a dependency tree in a fashion similar to the popular game of Tetris. We finally describe how to represent dedendencies in an enriched search index, that allows very fast search over documents based on dependency relations present in sentences. Such an index has been built for the Italian and English Wikipedia allowing for semantic search.
14:30
Alessandro Moschitti, Università di Trento:
Fast Prototyping of Natural Language Systems using Kernel Methods.
open/close abstract
Building NLP resources is usually rather expensive in terms of time and human labor. On one hand, this has led to the definition of methods for cheaply obtaining the desired annotation, e.g., the Phrase Detective game to gather coreference resolution data in Italian and English. On the other hand, machine learning methods that can limit the use of training data have been studied, especially in case of high costly syntactic and semantic annotation. In this talk, we describe both approaches by focusing on the use of syntactic and semantic kernels for the design of Natural Language Processing applications. These kernels used in Support Vector Machines enable the design of core NLP systems such as automatic Question Classification, Semantic Role Labeling and Verb Class categorization. This research has been partially developed within the PARLI project in cooperation with the University of Rome Tor Vergata and has lead to the design of accurate Italian SRL systems.
15:00
Roberto Basili, Università di Roma Tor Vergata,
Danilo Croce, Università di Roma Tor Vergata:
Semantic Role Labeling for Italian.
open/close abstract
In the talk we will present the main aspects of the semantic role labeling (SRL) model over texts in Italian developed during the PARLI project. The corresponding system, based on Charles Fillmore's frame semantics paradigm, integrates structured learning methodologies and is strongly based on a semantic tree kernel formulated by the Tor Vergata group in cooperation with the University of Trento during the project activities. The model achieved the best SRL performances in the evaluation campaign of EvalIta 2011, FLaIT task. In the talk, the architecture of the system, its flexibility in the treatment of SRL for Italian and English as well as its extensive evaluation will be presented.
15:30
Tavola Rotonda: Applicazioni di NLP presenti e futuri
Chairperson: Giuseppe Attardi;
Gian Piero Oggero, Expert System;
Alessio Bosca, CELI
17:00 Chiusura lavori

Sottomissione degli abstract e pubblicazione

tutti i partecipanti al progetto sono invitati a sottomettere abstract relativi all'attività svolta
la scadenza per la sottomissione è il 25 agosto
gli abstract vanno inviati per email a mazzei@di.unito.it e bosco@di.unito.it
tutti gli abstract sottomessi verranno messi online su questo sito prima del workshop
è inoltre prevista la pubblicazione di un volume di post-proceedings contenente una versione estesa degli abstract sottomessi.

Informazioni pratiche:

Hotel Amadeus e Teatro - Via Principe Amedeo 41 bis (5 minuti a piedi dal luogo del workshop); per informazioni e prenotazioni contattare info@hotelamadeustorino.com, 0118174951; l'hotel offre le seguenti tariffe convenzionate con l'Università di Torino: 78 euro al giorno stanza singola, 99 euro al giorno stanza doppia, 88 euro al giorno stanza doppia uso singola (più tassa soggiorno).
Residenze Edisu (riservate a dottorandi e studenti) - per informazioni contattare il Booking Office Edisu Piemonte ospitalita@edisu-piemonte.it, 011 653 1106 - 011 653 1042 - 011 653 1063; per prenotazioni online http://clio.edisu-piemonte.it:8088/Login.aspx; le residenze offrono a studenti e dottorandi le seguenti tariffe: 11,38 euro al giorno e persona in stanza singola, 9,66 euro al giorno e persona in stanza doppia.

Workshop

Incontro conclusivo del progetto PARLI: risultati raggiunti e sfide per il futuro

Ultimo aggiornamento 10 Agosto 2012, Contatti: bosco[at]di.unito.it