Strumenti

La lista seguente include gli strumenti sviluppati per il NLP della lingua italiana da tutti i ricercatori che operano in quest'area. Cliccando sul nome di ogni strumento si apre una scheda con le informazioni di base ad esso relative e il link al suo sito web, se esiste. (Allo scopo di limitare le ambiguità terminologiche si è preferito introdurre la classificazione degli strumenti e le informazioni su di essi in lingua inglese.)
Nella sezione Link si trova invece un elenco si strumenti che, anche se non specificamente sviluppati per questa lingua, le sono stati applicati.

Le informazioni relative ai sistemi e risorse che hanno partecipato alla campagna di valutazione Evalita 2011 saranno rese disponibili molto presto dopo il workshop (Roma, 24-25 gennaio 2012).

Tutte le segnalazioni e proposte relative a strumenti presenti e non presenti nella lista sono benvenute e possono essere inviate utilizzando il form della sezione Segnalazioni di questo sito.

Tokenization
  • Regexp_tokenizer
    Nome Regexp_tokenizer
    Autore/i Marco Baroni
    Descrizione It is a tokenizer that splits a text into tokens on the basis of a set of regular expressions that are specified by the user in a parameter file. In this way, the tokenizer can be personalized for different languages and/or tokenization purposes..
    Licenza e download free
    Link http://sslmit.unibo.it/~baroni/regexp_tokenizer.html
    Contatti marco.baroni[at]unitn.it
Morphologic analysis/Pos-Tagging
  • CORISTagger
    Nome CORISTagger
    Autore/i Fabio Tamburini
    Descrizione CORISTagger is an high-performance PoS-tagger for Italian. The system is composed of an Hidden Markov Model tagger followed by a Transfomation Based tagger.
    Licenza e download -
    Link -
    Contatti fabio.tamburini[at]unibo.it
  • C4
    Nome C4
    Autore/i Simone Romagnoli
    Descrizione C4 is a portable statistical part of speech tagger based on a second order Markov model technique, implemented in C++ using standard template libraries.
    Licenza e download -
    Link -
    Contatti simone.romagnoli3[at]unibo.it
  • Felice-POS-Tagger
    Nome Felice-POS-Tagger
    Autore/i Felice Dell'Orletta
    Descrizione The Felice-POS-Tagger is a combination of six component taggers, with three different algorithms, each of which is used to construct a left-to-right tagger and a right-to-left tagger. The algorithms are the TnT and others based on ILC-UniPi MaxEnt PoS tagger and used with different learning approaches in order to build the ensemble system.
    Licenza e download -
    Link -
    Contatti felice.dellorletta[at]ilc.cnr.it
  • ILC-UniPi MaxEnt PoS Tagger
    Nome ILC-UniPi MaxEnt PoS Tagger
    Autore/i Felice Dell'Orletta, Maria Federico, Simonetta Montemagni, Vito Pirrelli
    Descrizione The ILC-UniPi MaxEnt PoS Tagger is a combination of two Maximum Entropy PoS taggers, operating on the output of MAGIC, an Italian rule-based morphological parser, equipped with a general-purpose lexicon of about 100.000 entries.
    Licenza e download -
    Link -
    Contatti felice.dellorletta[at]ilc.cnr.it, maria.federico[at]ilc.cnr.it, simonetta.montemagni[at]ilc.cnr.it, vito.pirrelli[at]ilc.cnr.it, alessandro.lenci[at]ilc.cnr.it
  • TagPro
    Nome TagPro
    Autore/i Emanuele Pianta, Roberto Zanoli
    Descrizione TagPro, a system for PoS-tagging based on Support Vector Machine. TagPro exploits a rich set of features, including morphological analysis. It scored as the best system in the Italian Pos Tagging task at EVALITA 2007.
    Licenza e download -
    Link -
    Contatti pianta[at]fbk.eu, zanoli[at]fbk.eu
  • UniPiSynthema POS tagger
    Nome UniPiSynthema POS tagger
    Autore/i Carlo Aliprandi, Carmignani Nicola, Deha Nedjma, Mancarella Paolo Maria, Rubino Michele
    Descrizione The UniPiSynthema POS tagger basic assumption is that contextual information affects the environment where the word has to be tagged. In order to tag the word with the most likely PoS it is necessary to have a high-order representation of the context. This assumption has been consolidated into stochastic methods that are based on a second order Markov Model.
    Licenza e download -
    Link -
    Contatti aliprandi[at]synthema.it, nicola[at]di.unipi.it, deha[at]di.unipi.it, paolo[at]di.unipi.it, rubino[at]di.unipi.it
  • UniToPOStagger
    Nome UniTo POS tagger
    Autore/i Leonardo Lesmo
    Descrizione This rule-based PoS tagger is developed by the NLP Group of the Dipartimento di Informatica of the University of Torino, and it is part of the TULE framework. It takes as input the result of the morphological analysis of a sentence, which may include multiple entries for each word when an ambiguity is present. The output of the tagger is a sequence of single entries, each of which is associated with an input word.
    Licenza e download free download
    Link http://www.tule.di.unito.it/
    Contatti lesmo[at]di.unito.it
  • VEST
    Nome VEnice Symbolic Tagger (VEST)
    Autore/i Rodolfo Delmonte
    Descrizione VeST is a symbolic rule tagger that uses little quantitative and statistical information. Most of its computational work is based on tagged lexical information available in datasets made available from previous work in the field. The system also uses a morphological analyzer which is only activated for derivational nouns, cliticized verbs and some adjectives. It is also activated as a guesser by unknown, and out of vocabulary words which will end up with a default classification in case of failure: uppercase words are labeled proper nouns, lowercase words common nouns.
    Licenza e download -
    Link -
    Contatti delmont[at]unive.it
Parsing (Syntactic analysis)
  • DeSR
    Nome Dependency Shift-Reduce (DeSR)
    Autore/i Giuseppe Attardi
    Descrizione DeSR is a statistical dependency parser for natural language sentences. DeSR is part of the TANL framework, that provides the required tools to completely analyze sentences starting from text. It has been used both for Italian and other languages. DeSR (exequo with TULE) scored as the best system in the Italian dependency parsing task at EVALITA 2009.
    Licenza e download free software that can be redistributed and/or modified under GNU General Public License v. 3
    Link http://sites.google.com/site/desrparser/
    Contatti attardi[at]di.unipi.it
  • TUP
    Nome Turin University Parser (TUP)
    Autore/i Leonardo Lesmo
    Descrizione TUP is a rule-based dependency parser which is part of the TULE framework. It currently supports Italian and English. Extensions to English, Spanish, Catalan, French and Hindi are under development.
    Licenza e download free download
    Link http://www.tule.di.unito.it/
    Contatti lesmo[at]di.unito.it
Parsing environment (including tokenizer, PoS tagger and parser)
  • CHAOS
    Nome CHAOS
    Autore/i Roberto Basili, Maria Teresa Pazienza, Fabio Massimo Zanzotto
    Descrizione A robust syntactic parser for Italian and for English. The system implements a modular and lexicalised approach to the syntactic parsing problem. It is based on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful representation mechanism in a shallow parsing approach. The system offers a collection of modules for designing parsing architectures.
    Licenza e download free download for research purpose, but protected (send e-mail to the contact to obtain the account for the protected area)
    Link http://ai-nlp.info.uniroma2.it/external/chaosproject/
    Contatti chaos[at]info.uniroma2.it
  • GraFo
    Nome GraFo
    Autore/i Emanuele Pianta
    Descrizione GraFo is a left corner parser for Italian, based on explicit rules manually coded in a unification formalism. As the linguistic coverage of GraFo is still quite limited, the parser produces complete parse trees for a small percentage of sentences.
    Licenza e download -
    Link -
    Contatti pianta[at]fbk.eu
  • TANL
    Nome Tanl Italian Parser (TANL)
    Autore/i Giuseppe Attardi
    Descrizione The Tanl Italian Parser is a Web service for parsing Italian texts and producing parse trees according to the Tanl Dependency Notation. The service uses the DeSR dependency parser and other linguistic tools from the Tanl Suite. The input is plain text, the output is in CoNLL X format.
    Licenza e download free download
    Link http://paleo.di.unipi.it/parse
    Contatti attardi[at]di.unipi.it
  • TULE
    Nome Turin University Linguistic Environment (TULE)
    Autore/i Leonardo Lesmo
    Descrizione TULE is the enviroment where are integrated both the PoS Tagger of the University of Torino and the dependency parser TUP. The output of TULE is in plain text and the output in TUT format, since TULE has been developed in parallel with the Turin University Treebank (TUT) and shares with this resource the same format. TULE scored as the best system in the Italian dependency parsing task at EVALITA 2007 and 2009 (exequo with DeSR).
    Licenza e download free download
    Link http://www.tule.di.unito.it/
    Contatti lesmo[at]di.unito.it
Word Sense Disambiguation
  • JIGSAW
    Nome JIGSAW
    Autore/i Pierpaolo Basile, Giovanni Semeraro
    Descrizione JIGSAW, is a knowledge-based WSD system that attempts to disambiguate all words in a text by exploiting external lessical knowledge-base. The main assumption is that a specific strategy for each Part-Of-Speech (POS) is better than a single strategy.
    Licenza e download -
    Link -
    Contatti basilepp[at]di.uniba.it, semeraro[at]di.uniba.it
Information Retrieval (search engines, voice search, document classification, text categorization)
Information Extraction and text mining
Named Entity Recognition
  • Bidirectional Sequence Classication for NER
    Nome Bidirectional Sequence Classication for NER
    Autore/i Andrea Gesmundo
    Descrizione The Bidirectional Sequence Classication is a system for Named Entities Recognition, based on the Perceptron Algorithm. In the proposed framework, the order of the inference is not forced into a monotonic behavior (left-to-right), but is learned together with the parameters of the local classifier. It applies a semi-supervised training approach, which extends the Guided Learning framework.
    Licenza e download -
    Link -
    Contatti andrea.gesmundo[at]unige.ch
  • EntityPro
    Nome EntityPro
    Autore/i Emanuele Pianta, Roberto Zanoli
    Descrizione EntityPro is a system for NER based on Support Vector Machines, which is part of TextPro, a suite of modular NLP tools developed at FBK. It was trained with a large number of both static and dynamic features.
    Licenza e download free for research puroposes from the following link
    Link http://textpro.fbk.eu/demo.php
    Contatti pianta[at]fbk.eu, zanoli[at]fbk.eu , manspera[at]fbk.eu
  • Typhoon
    Nome Typhoon
    Autore/i Silvana Marianela Bernaola Biggio, Roberto Zanoli, Claudio Giuliano
    Descrizione Typhoon is a classifier combination system for NER, in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. The system consists of two classifiers in cascade, but it is possible to use a single classifier making the system faster; whereas the second classifier in the cascade can be used when more accuracy is needed.
    Licenza e download -
    Link http://textpro.fbk.eu/typhoon.html
    Contatti manspera[at]fbk.eu, zanoli[at]fbk.eu, pianta[at]fbk.eu, giulianog[at]fbk.eu
  • Tanl Named Entity Recognizer
    Nome Tanl Named Entity Recognizer
    Autore/i Giuseppe Attardi, Stefano Dei Rossi, Felice Dell'Orletta, Eva Maria Vecchi
    Descrizione The Tanl tagger is a generic, customizable text chunker, which can be applied to tasks such as PoS tagging, Super Sense tagging and Named Entity recognition. The chunker uses a Maximum Entropy classifier for learning how to chunk texts. Maximum Entropy is a more efficient technique than SVM, and by complementing it with dynamic programming it can achieve similar levels of accuracy.
    Licenza e download free download
    Link http://medialab.di.unipi.it/wiki/NE_tagger
    Contatti attardi[at]di.unipi.it,deirossi[at]di.unipi.it, felice.dellorletta[at]ilc.cnr.it,evamaria.vecchi[at]ilc.cnr.it
Local Entity Detection and Recognition
  • FBK-UNiTRN LER system
    Nome Fondazione Bruno Kessler and University of Trento Local Entity Recognition system
    Autore/i Silvana Marianela Bernaola Biggio, Claudio Giuliano, Massimo Poesio, Yannick Versley, Olga Uryupina, Roberto Zanoli
    Descrizione This system detects and recognizes local entities for the Italian language. It is divided into 2 modules, the Entity Mention Detection (EMD) module which detects all the mentions related to persons, organizations, geo-political entities and locations; and the Coreference Resolution module that recognizes which mentions refer to the same entity. Understanding entity as an object or group of objects in the world; and, mention as the textual reference of an entity.
    Licenza e download -
    Link -
    Contatti bernaola[at]fbk.eu, giuliano[at]fbk.eu, massimo.poesio[at]unitn.it, yversley[at]gmail.com, uryupina[at]gmail.com, zanoli[at]fbk.eu
Temporal Expression Recognition and Normalization
  • ITA-Chronos
    Nome ITA-Chronos
    Autore/i Matteo Negri
    Descrizione ITA-Chronos is designed to recognize all the Timed Entities occurring in a text, identify their extension, and normalize them according to the TIMEX2 standard. It adopts a rule-based approach, with different sets of hand-crafted rules specialized to deal with different aspects of the problem. It is the Italian extension of Chronos, a multilingual system written in Lisp, originally developed for English.
    Licenza e download -
    Link http://ontotext.fbk.eu/ita-chronos.html
    Contatti negri[at]fbk.eu
  • UniPg-TERNsystem
    Nome UniPg-TERNsystem
    Autore/i Loris Faina, Stefania Spina
    Descrizione This system is an Italian parser that combines two separate levels of parsing: a constituent parsing, that entails a category annotation of morphosyntactic constituents; and a dependency parsing, that implies a functional annotation of relations such as subject, complement, etc. it has been used as a Temporal Expression Recognizer in the Evalita contest.
    Licenza e download -
    Link -
    Contatti faina[at]unipg.it, sspina[at]unistrapg.it
Semantic analysis
  • GETARUNS
    Nome GETARUNS
    Autore(i) Rodolfo Delmonte
    Descrizione GETARUNS is a system for semantic analysis which includes a "deep parser" provided with subcategorization coming from different sources. It produces, at the end of the pipeline a "discourse model" including the discours entities with their properties and features. This system is composed by the following modules:
    • tokenizer
    • sentence splitter
    • tagger and disambiguator
    • chunker and shallow parser
    • deep parser (strictly topdown) for annotated c-structure
    • semantic lexical mapping for f-structure
    • pronominal binding
    • anaphora resolution
    • topic hierarchy and centering
    • semantic information processing at propositional level
    • logical form
    • temporal reasoning
    • semantic indexing of individuals, sets, locations and events
    • discourse model creation and updating
    • discourse structure with semantic relations and discourse moves
    Licenza e download free download - version Linux Ubuntu 9; for the updated versions send an email to the contact person
    Link http://project.cgm.unive.it/?page_id=194
    Contatti delmont[at]unive.it
  • VENSES
    Nome VENSES
    Autore(i) Rodolfo Delmonte
    Descrizione VENSES is the scaled version of GETARUNS for creating a semantic analysis system which can include a "partial parser" equipped with subcategorization information coming from different sources. The output of VENSES is a logical form without free variables and where all pronouns are substituted by their antecedents. The system is composed by the following modules:
    • tokenizer
    • sentence splitter
    • tagger and disambiguator
    • chunker and shallow parser
    • semantic lexical mapping for f-structure
    • pronominal binding
    • anaphora resolution
    • topic hierarchy and centering
    • semantic information processing at propositional level
    • logical form
    • semantic indexing of individuals, sets, locations and events
    Licenza e download demo version available on the web site; for a version of the system send an email to the contact person
    Link http://project.cgm.unive.it/?page_id=196
    Contatti delmont[at]unive.it
Question Answering
  • QALLME
    Nome QALL-ME (Question Answering Learning technologies in a multiLingual and Multimodal Environment)
    Autore/i Bernardo Magnini (FBK - Trento, Italy) is the coordinator of the academic and industrial partners
    Descrizione QALLME is an EU funded project in the IST area. The general objective is to establish a shared infrastructure for multilingual and multimodal open domain Question Answering for mobile phones.
    Licenza e download -
    Link http://qallme.fbk.eu/
    Contatti perenthaler[at]fbk.eu
Summarization
Textual entailment
  • EDITS
    Nome EDITS (Edit Distance Textual Entailment Suite)
    Autore/i Milen Kouylekov, Matteo Negri
    Descrizione EDITS is an open source software package aimed at recognizing entailment relations between two portions of text. The system is based on edit distance algorithms, and computes the T-H distance as the cost of the edit operations (i.e. insertion, deletion and substitution) that are necessary to transform T into H.
    Licenza e download GNU Lesser General Public License
    Link https://docs.google.com/Doc?docid=0AV0eoH72QlJeZGNjajdyNHdfMGM1NXo1YzVn&hl=en
    Contatti kouylekov[at]gmail.com, negri[at]fbk.eu
  • UniAlicante Textual Entailment system
    Nome UniAlicante Textual Entailment system
    Autore/i Oscar Ferrandez, Antonio Toral, Rafael Munoz
    Descrizione The system uses a machine learning classier fed by features derived from lexical distances, part-of-speech information and semantic knowledge from SIMPLE-CLIPS, an Italian Language Resource.
    Licenza e download -
    Link -
    Contatti ofe[at]dlsi.ua.es, rafaelg[at]dlsi.ua.es, antonio.toral[at]ilc.cnr.it
Topic Detection and Tracking
  • OntoTDT
    Nome OntoTDT
    Autore/i FBK
    Descrizione OntoTDT is an unsupervised Topic Detection system. A topic is defined a seminal event or activity, along with all directly related events and activities. A topic is expressed as a chronologically ordered list of "stories". A story is "on topic" whenever it discusses events and activities that are directly connected to that topic's seminal event. The goal of a topic detection system is to group together stories that discuss the same event.
    Licenza e download -
    Link http://ontotext.fbk.eu/topic.html
    Contatti bentivo[at]fbk.eu
Speech recognition/understanding (including Speech-To-Text transcription)
  • PROSO
    Nome PROSO
    Autore(i) Rodolfo Delmonte
    Descrizione PROSO is a rule-based system for the translation of an Italian text in the corresponding version labelled for a vocal synthesizer. It uses only phonological rules and a table of verbal roots.
    Licenza e download free download
    Link http://project.cgm.unive.it/?page_id=204
    Contatti delmont[at]unive.it
Speech synthesis (including Text-To-Speech synthesis):
(Spoken) Dialog
  • UNITN Italian Spoken Dialogue System
    Nome UNITN Italian Spoken Dialogue System
    Autore/i Stefan Rigo, Evgeny A. Stepanov, Pierluigi Roberti, Silvia Quarteroni, Giuseppe Riccardi
    Descrizione The main features supporting the UNITN SDS are the mixed initiative control, which allows the caller to get partly in control of the dialog strategy, and the descriptive specification of dialog strategies. The application is based on a complex, high-recall grammar and a user goal planning script. The latter is tightly bound to the grammar and provides functionalities of error checking and recovery from missing or misinterpreted concepts (Automatic Speech Recognition and Spoken Language Understanding errors).
    Licenza e download -
    Link -
    Contatti -
  • Loquendo Spoken Dialogue System
    Nome Loquendo Spoken Dialogue System
    Autore/i Enrico Giraudo, Paolo Baggia
    Descrizione The application was designed in VoiceXML and runs on the Loquendo VoxNauta platform. The dialogue strategy is „mixed-initiative‟, with flexible recognition grammars that were designed to be modular and easy to use in different dialogue application contexts. Change of context and complex requests from caller are allowed.
    Licenza e download -
    Link http://www.loquendo.com/en/technology/voxnauta_platform.htm
    Contatti enrico.giraudo[at]loquendo.com, paolo.baggia[at]loquendo.com
Speaker Recognition
  • QUT Speaker Identity Verication System
    Nome Queensland University of Technology (QUT) Speaker Identity Verification System
    Autore/i Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan
    Descrizione The system includes the following three components developed by QUT for the evaluation in Evalita 2009: Joint Factor GMM-UBM, GMM Supervector SVM and GLDS SVM. The QUT system is the score-level fusion of these components. Fusion was performed on the output scores using linear weights calculated through use of a logistic regression algorithm. This was performed using the FoCal toolkit.
    Licenza e download -
    Link -
    Contatti m.mclaren[at]qut.edu.au, r.vogt[at]qut.edu.au, bj.baker[at]qut.edu.au, s.sridharan[at]qut.edu.au
  • UWB Speaker Identity Verication Systems
    Nome University of West Bohemia (UWB) Speaker Identity Verification Systems
    Autore/i Lukas Machlica, Jan Vanek
    Descrizione The two UWB systems were submitted to the EVALITA 2009 evaluation campaign. Both systems are based on the UBM-GMM approach.
    Licenza e download -
    Link http://www.kky.zcu.cz/en
    Contatti machlica[at]kky.zcu.cz, vanekyjg[at]kky.zcu.cz
  • AGNITIO's Speaker Recognition System
    Nome AGNITIO's Speaker Recognition System
    Autore/i Niko Brummer, Albert Strasheim
    Descrizione AGNITIO's is a fusion of a state-of-the-art Joint Factor Analysis system and a new I-Vector system.
    Licenza e download -
    Link -
    Contatti -
  • RU Speaker Recognition Systems
    Nome Radboud University Speaker Recognition Systems
    Autore/i Marijn Huijbregts, David van Leeuwen
    Descrizione The first is a system based on Universal Background Model and Gaussian Mixture Model (UBM-GMM) and employies a linear scoring approach with channel. The second system is based on Joint Factor Analysis (JFA), also employing linear scoring.
    Licenza e download -
    Link -
    Contatti m.huijbregts[at]let.ru.nl, d.vanleeuwen[at]let.ru.nl
  • SMART III Speaker Recognition Systems
    Nome SMART III Speaker Recognition Systems
    Autore/i Maria Tucci
    Descrizione The SMART III System is a formant based method using an implemented decisional approach with a reference population of 305 male Italian speakers containing fundamental frequency and first three formant values for the vowels /a, e, i, o/.
    Licenza e download -
    Link http://www.linguistica.unical.it/labfon/Home.htm
    Contatti tucci.maria[at]libero.it
Machine Translation and Speech-To-Speech Translation
  • STILVEN-MOSES
    Nome STILVEN-MOSES
    Autore(i) Rodolfo Delmonte
    Descrizione STILVEN-MOSES is a translator Venetian - English based on MOSES, which uses a parallel corpus recently updated composed by 300,000 tokens.
    Licenza e download free access
    Link http://project.cgm.unive.it/cgi-bin/stilven/moses
    Contatti delmont[at]unive.it
Natural Language Generation
  • FUF/SURGE
    Nome FUF/SURGE
    Autore/i Charles Callaway, Alessandra Novello
    Descrizione FUF/SURGE-Italian is a rule-based, wide coverage generator using a systemic grammar. Given a lexicalized semantic specification, it creates a grammatically correct sentence, adds morphology and orthographical information, and returns formatted text.
    Licenza e download Free download, requires a LISP installation
    Link http://homepages.inf.ed.ac.uk/ccallawa/resources.html
    Contatti callaway[at]fbk.eu
Emotion Recognition / Generation
Linguistic annotation
  • ANANAS
    Nome AN.ANA.S. 4
    Autore/i Miriam Voghera, Francesco Cutugno, Annamaria Landolfi, Carmela Sammarco
    Descrizione AN.ANA.S. è un sistema di annotazione sintattica basato su uno schema di regole grammaticali (DTD) per la definizione della struttura ad albero del testo. è utilizzabile per l’etichettatura sia di testi parlati che scritti. Permette l’etichettatura sintattica di tutti i tipi di testo e si avvale del software XGate che funziona da editor per creare un database di testi in formato XML.
    Licenza e download free download
    Link http://www.parlaritaliano.it/index.php/it/strumenti/717-ananas-4
    Contatti people.na.infn.it/~cutugno
  • XGATE
    Nome Xgate
    Autore/i Francesco Cutugno, Miriam Voghera
    Descrizione Xgate is a tool for the annotation and query of linguistic data. It is developed by the NLP group of the Dipartimento di Scienze Fisiche of the Università Federico II of Napoli and the Dipartimento degli Studi Linguistici e Letterari of the Università of Salerno.
    Licenza e download free download
    Link http://www.parlaritaliano.it/index.php/en/projects/666-xgate
    Contatti people.na.infn.it/~cutugno
Language modeling
  • IRSTLM
    Nome IRSTLM Toolkit
    Autore/i Marcello Federico, Nicola Bertoldi
    Descrizione The IRST Language Modeling Toolkit features algorithms and data structures suitable to estimate, store, and access very large LMs. Our software has been integrated into a popular open source Statistical Machine Translation decoder called Moses, and is compatible with language models created with other tools, such as the SRILM Tooolkit.
    Licenza e download Open Source LGPL
    Link http://hlt.fbk.eu/en/irstlm
    Contatti Marcello Federico, Nicola Bertoldi
Lexical substitution
  • LexSub
    Nome LexSub
    Autore/i Diego De Cao, Roberto Basili
    Descrizione The LexSub experimental platform proceeds through three steps: 1) the extraction of the lexical substitution sets for the target words, 2) the acquisition of domain models for candidates and 3) the ranking of candidate lexical substitutes over individual sentences according to the acquired domain models. A further step 4) back-off is included to deal with test cases for which the step 1) produces an empty candidate set.
    Licenza e download -
    Link -
    Contatti decao[at]info.uniroma2.it, basili[at]info.uniroma2.it
Connected Digit Recognition
  • Cedat85 automatic speech recognition system
    Nome Cedat85 automatic speech recognition system
    Autore/i Maria Palmerini
    Descrizione The system has been developed within a research project led in 2008 by Cedat 85 in cooperation with the European Media Laboratory in Heidelberg. It's based on the most recent IBM VoiceTailor technology; Cedat 85 provided the whole training process (acoustic data, text data, scripts) for spontaneous Italian language. Some of the features of VoiceTailor system are the speaker independence, the possibility to manage spontaneous speech, to use unlimited vocabularies, to use different acoustic and language models, to manage noise and the possibility to set some parameters in order to choose different strategies with respect to accuracy or speed. The system works in a Linux environment and can run on more processors in order to have more elaborations running in parallel.
    Licenza e download -
    Link http://www.cedat85.com
    Contatti m.palmerini[at]cedat85.com
  • Syllable-Based ASR System of Naples University
    Nome Syllable-Based ASR System of Naples University
    Autore/i Francesco Cutugno, Bogdan Ludusan, Antonio Origlia, Serena Soldo
    Descrizione The recognition system uses the syllable as base unit. In a first stage, the continuous speech sequence is divided in syllable-like units using an energy-based algorithm. Then, the obtained syllables are passed to a classifier in order to calculate the syllable/class probability distribution. In the final stage, a Viterbi-like decoding algorithm based on multistage graphs will find the most likely sequence corresponding to the audio input.
    Licenza e download -
    Link -
    Contatti cutugno[at]na.infn.it, ludusan[at]na.infn.it, soldo[at]na.infn.it
  • TSpeech
    Nome TSpeech
    Autore/i Leandro D’Anna, Gianpaolo Coro, Francesco Cutugno
    Descrizione TSpeech is an Abla srl proprietary speech recognizer, based on standard decoding algorithms, with syllabic acoustic models. The recognition phase is followed by a rescoring session, based on syllables energy and duration templates, which recover some recognition errors.
    Licenza e download -
    Link www.abla.it
    Contatti ldanna[at]unisa.it, gianpaolo.coro[at]abla.it, cutugno[at]na.infn.it
Other
  • TEXTPRO
    Nome TEXTPRO
    Autore/i Emanuele Pianta, Christian Girardi, Roberto Zanoli
    Descrizione TextPro is a suite of modular Natural Language Processing (NLP) tools for analysis of Italian and English texts. The suite has been designed so as to integrate and reuse state of the art NLP components developed by researchers at FBK. The current version of the tool suite provides functions ranging from tokenization to chunking and Named Entity Recognition (NER).
    Licenza e download free licence obtained by registration
    Link http://textpro.fbk.eu/
    Contatti manspera[at]fbk.eu
  • Coreference Resolution Module
    Nome TEXTPRO
    Autore/i Octavian Popescu, Bernardo Magnini
    Descrizione The coreference system has been developed to decide wether two mentions refer to the same entity or not. The input of the systems consists of a list of Named Entities of type Person, i.e. Person Names (PNs), that have been automatically recognized in a document collection. Its output consists of a number of clusters of PNs, where each cluster is interpreted as the set of PNs that refer to the same entity.
    Licenza e download free licence obtained by registration
    Link http://textpro.fbk.eu/
    Contatti bentivo[at]fbk.eu




TOP


Ultimo aggiornamento 20 Gennaio 2012, Contatti: bosco[at]di.unito.it