Tokenization
|
|
-
Regexp_tokenizer
Name |
Regexp_tokenizer |
Author(s) |
Marco Baroni |
Description |
It is a tokenizer that splits a text into tokens on the basis of a set of regular expressions
that are specified by the user in a parameter file. In this way, the tokenizer can be personalized
for different languages and/or tokenization purposes.. |
Licence and download |
free |
Link |
http://sslmit.unibo.it/~baroni/regexp_tokenizer.html |
Contact |
marco.baroni[at]unitn.it |
|
Morphologic analysis/Pos-Tagging
|
|
-
CORISTagger
Name |
CORISTagger |
Author(s) |
Fabio Tamburini |
Description |
CORISTagger is an high-performance PoS-tagger for Italian. The system is
composed of an Hidden Markov Model tagger followed by a Transfomation Based tagger.
|
Licence and download |
- |
Link |
- |
Contact |
fabio.tamburini[at]unibo.it |
-
C4
Name |
C4 |
Author(s) |
Simone Romagnoli |
Description |
C4 is a portable statistical part of speech tagger based on a second order
Markov model technique, implemented in C++ using standard template libraries.
|
Licence and download |
- |
Link |
- |
Contact |
simone.romagnoli3[at]unibo.it |
-
Felice-POS-Tagger
Name |
Felice-POS-Tagger |
Author(s) |
Felice Dell'Orletta |
Description |
The Felice-POS-Tagger is a combination of six component taggers, with three
different algorithms, each of which is used to construct a left-to-right tagger and
a right-to-left tagger. The algorithms are the TnT and others based on ILC-UniPi
MaxEnt PoS tagger and used with different learning approaches in order to build
the ensemble system.
|
Licence and download |
- |
Link |
- |
Contact |
felice.dellorletta[at]ilc.cnr.it |
-
ILC-UniPi MaxEnt PoS Tagger
Name |
ILC-UniPi MaxEnt PoS Tagger |
Author(s) |
Felice Dell'Orletta, Maria Federico, Simonetta Montemagni, Vito Pirrelli |
Description |
The ILC-UniPi MaxEnt PoS Tagger is a combination of two Maximum Entropy PoS taggers,
operating on the output of MAGIC, an Italian rule-based morphological parser, equipped with a
general-purpose lexicon of about 100.000 entries.
|
Licence and download |
- |
Link |
- |
Contact |
felice.dellorletta[at]ilc.cnr.it, maria.federico[at]ilc.cnr.it, simonetta.montemagni[at]ilc.cnr.it,
vito.pirrelli[at]ilc.cnr.it, alessandro.lenci[at]ilc.cnr.it |
-
TagPro
Name |
TagPro |
Author(s) |
Emanuele Pianta, Roberto Zanoli |
Description |
TagPro, a system for PoS-tagging based on Support Vector Machine. TagPro
exploits a rich set of features, including morphological analysis. It scored as the best
system in the Italian Pos Tagging task at EVALITA 2007. |
Licence and download |
- |
Link |
- |
Contact |
pianta[at]fbk.eu, zanoli[at]fbk.eu |
-
UniPiSynthema POS tagger
Name |
UniPiSynthema POS tagger |
Author(s) |
Carlo Aliprandi, Carmignani Nicola, Deha Nedjma, Mancarella Paolo Maria, Rubino Michele |
Description |
The UniPiSynthema POS tagger basic assumption is that contextual information
affects the environment where the word has to be tagged. In order to tag the word
with the most likely PoS it is necessary to have a high-order representation of the
context. This assumption has been consolidated into stochastic methods that are
based on a second order Markov Model.
|
Licence and download |
- |
Link |
- |
Contact |
aliprandi[at]synthema.it, nicola[at]di.unipi.it, deha[at]di.unipi.it, paolo[at]di.unipi.it, rubino[at]di.unipi.it |
-
UniToPOStagger
Name |
UniTo POS tagger |
Author(s) |
Leonardo Lesmo |
Description |
This rule-based PoS tagger is developed by the NLP Group of the Dipartimento di Informatica of the
University of Torino, and it is part of the TULE framework. It takes as input the result of the
morphological analysis of a sentence, which may include multiple entries for each word when
an ambiguity is present. The output of the tagger is a sequence of single entries, each of which
is associated with an input word.
|
Licence and download |
free download |
Link |
http://www.tule.di.unito.it/ |
Contact |
lesmo[at]di.unito.it |
-
VEST
Name |
VEnice Symbolic Tagger (VEST) |
Author(s) |
Rodolfo Delmonte |
Description |
VeST is a symbolic rule tagger that uses little quantitative and statistical information.
Most of its computational work is based on tagged lexical information available in datasets
made available from previous work in the field. The system also uses a morphological
analyzer which is only activated for derivational nouns, cliticized verbs and some adjectives.
It is also activated as a guesser by unknown, and out of vocabulary words which will end up
with a default classification in case of failure: uppercase words are labeled proper nouns,
lowercase words common nouns.
|
Licence and download |
- |
Link |
- |
Contact |
delmont[at]unive.it |
|
Parsing (Syntactic analysis)
|
|
-
DeSR
Name |
Dependency Shift-Reduce (DeSR) |
Author(s) |
Giuseppe Attardi |
Description |
DeSR is a statistical dependency parser for natural language sentences. DeSR is part of
the TANL framework, that provides the required tools to completely analyze sentences
starting from text. It has been used both for Italian and other languages. DeSR (exequo with
TULE) scored as the best system in the Italian dependency parsing task at EVALITA 2009. |
Licence and download |
free software that can be redistributed and/or modified under GNU General Public License v. 3 |
Link |
http://sites.google.com/site/desrparser/ |
Contact |
attardi[at]di.unipi.it |
-
TUP
Name |
Turin University Parser (TUP) |
Author(s) |
Leonardo Lesmo |
Description |
TUP is a rule-based dependency parser which is part of the TULE framework. It
currently supports Italian and English. Extensions to English, Spanish, Catalan, French and
Hindi are under development.
|
Licence and download |
free download |
Link |
http://www.tule.di.unito.it/ |
Contact |
lesmo[at]di.unito.it |
|
Parsing environment (including tokenizer, PoS tagger and parser)
|
|
-
CHAOS
Name |
CHAOS |
Author(s) |
Roberto Basili, Maria Teresa Pazienza, Fabio Massimo Zanzotto |
Description |
A robust syntactic parser for Italian and for English. The system implements
a modular and lexicalised approach to the syntactic parsing problem. It is based
on the notion of eXtended Dependency Graph (XDG) that has been seen as a useful
representation mechanism in a shallow parsing approach. The system offers a
collection of modules for designing parsing architectures. |
Licence and download |
free download for research purpose, but protected (send e-mail to the contact
to obtain the account for the protected area) |
Link |
http://ai-nlp.info.uniroma2.it/external/chaosproject/ |
Contact |
chaos[at]info.uniroma2.it |
-
GraFo
Name |
GraFo |
Author(s) |
Emanuele Pianta |
Description |
GraFo is a left corner parser for Italian, based on explicit rules manually coded
in a unification formalism. As the linguistic coverage of GraFo is still quite limited, the
parser produces complete parse trees for a small percentage of sentences.
|
Licence and download |
- |
Link |
- |
Contact |
pianta[at]fbk.eu |
-
TANL
Name |
Tanl Italian Parser (TANL) |
Author(s) |
Giuseppe Attardi |
Description |
The Tanl Italian Parser is a Web service for parsing Italian texts and producing parse trees
according to the Tanl Dependency Notation. The service uses the DeSR dependency parser and
other linguistic tools from the Tanl Suite. The input is plain text, the output is in CoNLL X format. |
Licence and download |
free download |
Link |
http://paleo.di.unipi.it/parse |
Contact |
attardi[at]di.unipi.it |
-
TULE
Name |
Turin University Linguistic Environment (TULE) |
Author(s) |
Leonardo Lesmo |
Description |
TULE is the enviroment where are integrated both the PoS Tagger of the University of Torino
and the dependency parser TUP. The output of TULE is in plain text and the output in TUT format,
since TULE has been developed in parallel with the Turin University Treebank (TUT) and shares
with this resource the same format. TULE scored as the best system in the Italian dependency
parsing task at EVALITA 2007 and 2009 (exequo with DeSR). |
Licence and download |
free download |
Link |
http://www.tule.di.unito.it/ |
Contact |
lesmo[at]di.unito.it |
|
Word Sense Disambiguation
|
|
-
JIGSAW
Name |
JIGSAW |
Author(s) |
Pierpaolo Basile, Giovanni Semeraro |
Description |
JIGSAW, is a knowledge-based WSD system that attempts to disambiguate all words
in a text by exploiting external lessical knowledge-base. The main assumption is that a specific
strategy for each Part-Of-Speech (POS) is better than a single strategy.
|
Licence and download |
- |
Link |
- |
Contact |
basilepp[at]di.uniba.it, semeraro[at]di.uniba.it |
|
Information Retrieval (search engines, voice search, document classification, text categorization)
|
|
|
Information Extraction and text mining
|
|
|
Named Entity Recognition
|
|
-
Bidirectional Sequence Classication for NER
Name |
Bidirectional Sequence Classication for NER |
Author(s) |
Andrea Gesmundo |
Description |
The Bidirectional Sequence Classication is a system for Named Entities Recognition, based
on the Perceptron Algorithm. In the proposed framework, the order of the inference is not forced
into a monotonic behavior (left-to-right), but is learned together with the parameters of the local
classifier. It applies a semi-supervised training approach, which extends the Guided Learning
framework.
|
Licence and download |
- |
Link |
- |
Contact |
andrea.gesmundo[at]unige.ch |
-
EntityPro
Name |
EntityPro |
Author(s) |
Emanuele Pianta, Roberto Zanoli |
Description |
EntityPro is a system for NER based on Support Vector Machines, which is part of
TextPro, a suite of modular NLP tools developed at FBK. It was trained with a large number
of both static and dynamic features.
|
Licence and download |
free for research puroposes from the following
link |
Link |
http://textpro.fbk.eu/demo.php |
Contact |
pianta[at]fbk.eu, zanoli[at]fbk.eu, manspera[at]fbk.eu |
-
Typhoon
Name |
Typhoon |
Author(s) |
Silvana Marianela Bernaola Biggio, Roberto Zanoli, Claudio Giuliano
|
Description |
Typhoon is a classifier combination system for NER,
in which two different classifiers are combined to exploit Data Redundancy and Patterns
extracted from a large text corpus. The system consists of two classifiers in cascade, but
it is possible to use a single classifier making the system faster; whereas the second classifier
in the cascade can be used when more accuracy is needed.
|
Licence and download |
- |
Link |
http://textpro.fbk.eu/typhoon.html |
Contact |
manspera[at]fbk.eu, zanoli[at]fbk.eu, pianta[at]fbk.eu, giulianog[at]fbk.eu |
-
Tanl Named Entity Recognizer
Name |
Tanl Named Entity Recognizer |
Author(s) |
Giuseppe Attardi, Stefano Dei Rossi, Felice Dell'Orletta, Eva Maria Vecchi |
Description |
The Tanl tagger is a generic, customizable text chunker, which can be applied to tasks such as
PoS tagging, Super Sense tagging and Named Entity recognition. The chunker uses a Maximum Entropy
classifier for learning how to chunk texts. Maximum Entropy is a more efficient technique than SVM, and
by complementing it with dynamic programming it can achieve similar levels of accuracy. |
Licence and download |
free download |
Link |
http://medialab.di.unipi.it/wiki/NE_tagger |
Contact |
attardi[at]di.unipi.it,deirossi[at]di.unipi.it, felice.dellorletta[at]ilc.cnr.it,evamaria.vecchi[at]ilc.cnr.it |
|
Local Entity Detection and Recognition
|
|
-
FBK-UNiTRN LER system
Name |
Fondazione Bruno Kessler and University of Trento Local Entity Recognition system |
Author(s) |
Silvana Marianela Bernaola Biggio, Claudio Giuliano, Massimo Poesio, Yannick
Versley, Olga Uryupina, Roberto Zanoli |
Description |
This system detects and recognizes local entities for the Italian language. It is divided
into 2 modules, the Entity Mention Detection (EMD) module which detects all the mentions related to
persons, organizations, geo-political entities and locations; and the Coreference
Resolution module that recognizes which mentions refer to the same entity. Understanding
entity as an object or group of objects in the world; and, mention as the textual reference of an entity.
|
Licence and download |
- |
Link |
- |
Contact |
bernaola[at]fbk.eu, giuliano[at]fbk.eu, massimo.poesio[at]unitn.it, yversley[at]gmail.com,
uryupina[at]gmail.com, zanoli[at]fbk.eu |
|
Temporal Expression Recognition and Normalization
|
|
-
ITA-Chronos
Name |
ITA-Chronos |
Author(s) |
Matteo Negri |
Description |
ITA-Chronos is designed to recognize all the Timed Entities occurring in a text, identify
their extension, and normalize them according to the TIMEX2 standard. It adopts a rule-based
approach, with different sets of hand-crafted rules specialized to deal with different aspects
of the problem. It is the Italian extension of Chronos, a multilingual system written in Lisp,
originally developed for English.
|
Licence and download |
- |
Link |
http://ontotext.fbk.eu/ita-chronos.html |
Contact |
negri[at]fbk.eu |
-
UniPg-TERNsystem
Name |
UniPg-TERNsystem |
Author(s) |
Loris Faina, Stefania Spina |
Description |
This system is an Italian parser that combines two separate levels of parsing: a constituent
parsing, that entails a category annotation of morphosyntactic constituents; and a dependency
parsing, that implies a functional annotation of relations such as subject, complement, etc. it has
been used as a Temporal Expression Recognizer in the Evalita contest.
|
Licence and download |
- |
Link |
- |
Contact |
faina[at]unipg.it, sspina[at]unistrapg.it |
|
Semantic analysis
|
|
-
GETARUNS
Name |
GETARUNS |
Author(s) |
Rodolfo Delmonte
|
Description |
GETARUNS is a system for semantic analysis which includes a "deep parser"
provided with subcategorization coming from different sources. It produces, at the
end of the pipeline a "discourse model" including the discours entities with their
properties and features. This system is composed by the following modules:
- tokenizer
- sentence splitter
- tagger and disambiguator
- chunker and shallow parser
- deep parser (strictly topdown) for annotated c-structure
- semantic lexical mapping for f-structure
- pronominal binding
- anaphora resolution
- topic hierarchy and centering
- semantic information processing at propositional level
- logical form
- temporal reasoning
- semantic indexing of individuals, sets, locations and events
- discourse model creation and updating
- discourse structure with semantic relations and discourse moves
|
Licence and download |
free download - version Linux Ubuntu 9; for the updated versions send an email to
the contact person |
Link |
http://project.cgm.unive.it/?page_id=194 |
Contact |
delmont[at]unive.it |
-
VENSES
Name |
VENSES |
Author(s) |
Rodolfo Delmonte
|
Description |
VENSES is the scaled version of GETARUNS for creating a semantic analysis
system which can include a "partial parser" equipped with subcategorization information
coming from different sources. The output of VENSES is a logical form without free
variables and where all pronouns are substituted by their antecedents. The system is
composed by the following modules:
- tokenizer
- sentence splitter
- tagger and disambiguator
- chunker and shallow parser
- semantic lexical mapping for f-structure
- pronominal binding
- anaphora resolution
- topic hierarchy and centering
- semantic information processing at propositional level
- logical form
- semantic indexing of individuals, sets, locations and events
|
Licence and download |
demo version available on the web site; for a version of the system send an email to
the contact person |
Link |
http://project.cgm.unive.it/?page_id=196 |
Contact |
delmont[at]unive.it |
|
Question Answering
|
|
-
QALLME
Name |
QALL-ME (Question Answering Learning technologies in a multiLingual and Multimodal
Environment) |
Author(s) |
Bernardo Magnini (FBK - Trento, Italy) is the coordinator of the academic and industrial
partners
|
Description |
QALLME is an EU funded project in the IST area. The general objective is to establish
a shared infrastructure for multilingual and multimodal open domain Question Answering
for mobile phones.
|
Licence and download |
- |
Link |
http://qallme.fbk.eu/ |
Contact |
perenthaler[at]fbk.eu |
|
Summarization
|
|
|
Textual entailment
|
|
-
EDITS
Name |
EDITS (Edit Distance Textual Entailment Suite) |
Author(s) |
Milen Kouylekov, Matteo Negri |
Description |
EDITS is an open source software package aimed at recognizing entailment relations
between two portions of text. The system is based on edit distance algorithms, and computes
the T-H distance as the cost of the edit operations (i.e. insertion, deletion and substitution)
that are necessary to transform T into H.
|
Licence and download |
GNU Lesser General Public License |
Link |
https://docs.google.com/Doc?docid=0AV0eoH72QlJeZGNjajdyNHdfMGM1NXo1YzVn&hl=en
|
Contact |
kouylekov[at]gmail.com, negri[at]fbk.eu |
-
UniAlicante Textual Entailment system
Name |
UniAlicante Textual Entailment system |
Author(s) |
Oscar Ferrandez, Antonio Toral, Rafael Munoz |
Description |
The system uses a machine learning classier fed by features derived from lexical
distances, part-of-speech information and semantic knowledge from SIMPLE-CLIPS, an
Italian Language Resource.
|
Licence and download |
- |
Link |
-
|
Contact |
ofe[at]dlsi.ua.es, rafaelg[at]dlsi.ua.es, antonio.toral[at]ilc.cnr.it |
|
Topic Detection and Tracking
|
|
-
OntoTDT
Name |
OntoTDT |
Author(s) |
FBK |
Description |
OntoTDT is an unsupervised Topic Detection system. A topic is defined a seminal event or activity,
along with all directly related events and activities. A topic is expressed as a chronologically ordered list
of "stories". A story is "on topic" whenever it discusses events and activities that are directly connected
to that topic's seminal event. The goal of a topic detection system is to group together stories that
discuss the same event.
|
Licence and download |
- |
Link |
http://ontotext.fbk.eu/topic.html |
Contact |
bentivo[at]fbk.eu |
|
Speech recognition/understanding (including Speech-To-Text transcription)
|
|
-
PROSO
Name |
PROSO |
Author(s) |
Rodolfo Delmonte
|
Description |
PROSO is a rule-based system for the translation of an Italian text in the corresponding
version labelled for a vocal synthesizer. It uses only phonological rules and a table of
verbal roots.
|
Licence and download |
free download |
Link |
http://project.cgm.unive.it/?page_id=204 |
Contact |
delmont[at]unive.it |
|
Speech synthesis (including Text-To-Speech synthesis):
|
|
|
(Spoken) Dialog
|
|
-
UNITN Italian Spoken Dialogue System
Name |
UNITN Italian Spoken Dialogue System |
Author(s) |
Stefan Rigo, Evgeny A. Stepanov, Pierluigi Roberti, Silvia Quarteroni, Giuseppe Riccardi
|
Description |
The main features supporting the UNITN SDS are the mixed initiative control, which allows
the caller to get partly in control of the dialog strategy, and the descriptive specification of dialog
strategies. The application is based on a complex, high-recall grammar and a user goal planning
script. The latter is tightly bound to the grammar and provides functionalities of error checking and
recovery from missing or misinterpreted concepts (Automatic Speech Recognition and Spoken
Language Understanding errors). |
Licence and download |
- |
Link |
- |
Contact |
- |
-
Loquendo Spoken Dialogue System
Name |
Loquendo Spoken Dialogue System |
Author(s) |
Enrico Giraudo, Paolo Baggia
|
Description |
The application was designed in VoiceXML and runs on the Loquendo VoxNauta platform.
The dialogue strategy is „mixed-initiative‟, with flexible recognition grammars that were
designed to be modular and easy to use in different dialogue application contexts. Change
of context and complex requests from caller are allowed. |
Licence and download |
- |
Link |
http://www.loquendo.com/en/technology/voxnauta_platform.htm |
Contact |
enrico.giraudo[at]loquendo.com, paolo.baggia[at]loquendo.com |
|
Speaker Recognition
|
|
-
QUT Speaker Identity Verication System
Name |
Queensland University of Technology (QUT) Speaker Identity Verification System |
Author(s) |
Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan |
Description |
The system includes the following three components developed by QUT for the
evaluation in Evalita 2009: Joint Factor GMM-UBM, GMM Supervector SVM and
GLDS SVM. The QUT system is the score-level fusion of these components. Fusion
was performed on the output scores using linear weights calculated through use
of a logistic regression algorithm. This was performed using the FoCal toolkit.
|
Licence and download |
- |
Link |
- |
Contact |
m.mclaren[at]qut.edu.au, r.vogt[at]qut.edu.au, bj.baker[at]qut.edu.au, s.sridharan[at]qut.edu.au |
-
UWB Speaker Identity
Verication Systems
Name |
University of West Bohemia (UWB) Speaker Identity Verification Systems |
Author(s) |
Lukas Machlica, Jan Vanek |
Description |
The two UWB systems were submitted to the EVALITA 2009 evaluation campaign.
Both systems are based on the UBM-GMM approach.
|
Licence and download |
- |
Link |
http://www.kky.zcu.cz/en |
Contact |
machlica[at]kky.zcu.cz, vanekyjg[at]kky.zcu.cz |
-
AGNITIO's Speaker
Recognition System
Name |
AGNITIO's Speaker Recognition System |
Author(s) |
Niko Brummer, Albert Strasheim
|
Description |
AGNITIO's is a fusion of a state-of-the-art Joint Factor Analysis system and a
new I-Vector system.
|
Licence and download |
- |
Link |
- |
Contact |
- |
-
RU Speaker
Recognition Systems
Name |
Radboud University Speaker Recognition Systems |
Author(s) |
Marijn Huijbregts, David van Leeuwen
|
Description |
The first is a system based on Universal Background Model and Gaussian Mixture Model
(UBM-GMM) and employies a linear scoring approach with channel. The second system is
based on Joint Factor Analysis (JFA), also employing linear scoring.
|
Licence and download |
- |
Link |
- |
Contact |
m.huijbregts[at]let.ru.nl, d.vanleeuwen[at]let.ru.nl |
-
SMART III Speaker
Recognition Systems
Name |
SMART III Speaker Recognition Systems |
Author(s) |
Maria Tucci
|
Description |
The SMART III System is a formant based method using an implemented
decisional approach with a reference population of 305 male Italian speakers
containing fundamental frequency and first three formant values for the vowels
/a, e, i, o/.
|
Licence and download |
- |
Link |
http://www.linguistica.unical.it/labfon/Home.htm |
Contact |
tucci.maria[at]libero.it |
|
Machine Translation and Speech-To-Speech Translation
|
|
-
STILVEN-MOSES
Name |
STILVEN-MOSES |
Author(s) |
Rodolfo Delmonte
|
Description |
STILVEN-MOSES is a translator Venetian - English based on MOSES, which uses
a parallel corpus recently updated composed by 300,000 tokens.
|
Licence and download |
free access |
Link |
http://project.cgm.unive.it/cgi-bin/stilven/moses |
Contact |
delmont[at]unive.it |
|
Natural Language Generation
|
|
-
FUF/SURGE
Name |
FUF/SURGE |
Author(s) |
Charles Callaway, Alessandra Novello |
Description |
FUF/SURGE-Italian is a rule-based, wide coverage generator using a systemic grammar.
Given a lexicalized semantic specification, it creates a grammatically correct sentence, adds
morphology and orthographical information, and returns formatted text. |
Licence and download |
Free download, requires a LISP installation |
Link |
http://homepages.inf.ed.ac.uk/ccallawa/resources.html |
Contact |
callaway[at]fbk.eu |
|
Emotion Recognition / Generation
|
|
|
Linguistic annotation
|
|
-
ANANAS
Name |
AN.ANA.S. 4 |
Author(s) |
Miriam Voghera, Francesco Cutugno, Annamaria Landolfi, Carmela Sammarco |
Description |
AN.ANA.S. é un sistema di annotazione sintattica basato su uno schema di regole
grammaticali (DTD) per la definizione della struttura ad albero del testo. É utilizzabile
per l’etichettatura sia di testi parlati che scritti. Permette l’etichettatura sintattica di tutti i tipi
di testo e si avvale del software XGate che funziona da editor per creare un database di testi in
formato XML. |
Licence and download |
free download |
Link |
http://www.parlaritaliano.it/index.php/it/strumenti/717-ananas-4 |
Contact |
people.na.infn.it/~cutugno |
-
XGATE
Name |
Xgate |
Author(s) |
Francesco Cutugno, Miriam Voghera
|
Description |
Xgate is a tool for the annotation and query of linguistic data. It is developed by the
NLP group of the Dipartimento di Scienze Fisiche of the Università Federico II of Napoli
and the Dipartimento degli Studi Linguistici e Letterari of the Università of Salerno. |
Licence and download |
free download |
Link |
http://www.parlaritaliano.it/index.php/en/projects/666-xgate |
Contact |
people.na.infn.it/~cutugno |
|
Language modeling
|
|
-
IRSTLM
Name |
IRSTLM Toolkit |
Author(s) |
Marcello Federico, Nicola Bertoldi |
Description |
The IRST Language Modeling Toolkit features algorithms and data structures suitable
to estimate, store, and access very large LMs. Our software has been integrated into a popular
open source Statistical Machine Translation decoder called Moses, and is compatible with
language models created with other tools, such as the SRILM Tooolkit.
|
Licence and download |
Open Source LGPL |
Link |
http://hlt.fbk.eu/en/irstlm |
Contact |
Marcello Federico, Nicola Bertoldi |
|
Lexical substitution
|
|
-
LexSub
Name |
LexSub |
Author(s) |
Diego De Cao, Roberto Basili |
Description |
The LexSub experimental platform proceeds through three steps: 1) the extraction of
the lexical substitution sets for the target words, 2) the acquisition of domain models for
candidates and 3) the ranking of candidate lexical substitutes over individual sentences
according to the acquired domain models. A further step 4) back-off is included to deal
with test cases for which the step 1) produces an empty candidate set.
|
Licence and download |
- |
Link |
-
|
Contact |
decao[at]info.uniroma2.it, basili[at]info.uniroma2.it |
|
Connected Digit Recognition
|
|
-
Cedat85
automatic speech recognition system
Name |
Cedat85 automatic speech recognition system |
Author(s) |
Maria Palmerini |
Description |
The system has been developed within a research project led in 2008 by Cedat 85 in
cooperation with the European Media Laboratory in Heidelberg. It's based on the most
recent IBM VoiceTailor technology; Cedat 85 provided the whole training process (acoustic
data, text data, scripts) for spontaneous Italian language. Some of the features of VoiceTailor
system are the speaker independence, the possibility to manage spontaneous speech, to use
unlimited vocabularies, to use different acoustic and language models, to manage noise and
the possibility to set some parameters in order to choose different strategies with respect to
accuracy or speed. The system works in a Linux environment and can run on more processors in
order to have more elaborations running in parallel.
|
Licence and download |
- |
Link |
http://www.cedat85.com
|
Contact |
m.palmerini[at]cedat85.com |
-
Syllable-Based ASR System of Naples University
Name |
Syllable-Based ASR System of Naples University |
Author(s) |
Francesco Cutugno, Bogdan Ludusan, Antonio Origlia, Serena Soldo |
Description |
The recognition system uses the syllable as base unit. In a first stage, the continuous
speech sequence is divided in syllable-like units using an energy-based algorithm. Then,
the obtained syllables are passed to a classifier in order to calculate the syllable/class probability
distribution. In the final stage, a Viterbi-like decoding algorithm based on multistage graphs will
find the most likely sequence corresponding to the audio input.
|
Licence and download |
- |
Link |
-
|
Contact |
cutugno[at]na.infn.it, ludusan[at]na.infn.it, soldo[at]na.infn.it |
-
TSpeech
Name |
TSpeech |
Author(s) |
Leandro D’Anna, Gianpaolo Coro, Francesco Cutugno |
Description |
TSpeech is an Abla srl proprietary speech recognizer, based on standard decoding algorithms,
with syllabic acoustic models. The recognition phase is followed by a rescoring session, based on
syllables energy and duration templates, which recover some recognition errors.
|
Licence and download |
- |
Link |
www.abla.it
|
Contact |
ldanna[at]unisa.it, gianpaolo.coro[at]abla.it, cutugno[at]na.infn.it |
|
Other
|
|
-
TEXTPRO
Name |
TEXTPRO |
Author(s) |
Emanuele Pianta, Christian Girardi, Roberto Zanoli |
Description |
TextPro is a suite of modular Natural Language Processing (NLP) tools for analysis
of Italian and English texts. The suite has been designed so as to integrate and reuse
state of the art NLP components developed by researchers at FBK. The current version
of the tool suite provides functions ranging from tokenization to chunking and Named
Entity Recognition (NER).
|
Licence and download |
free licence obtained by registration |
Link |
http://textpro.fbk.eu/ |
Contact |
manspera[at]fbk.eu |
-
Coreference Resolution Module
Name |
TEXTPRO |
Author(s) |
Octavian Popescu, Bernardo Magnini |
Description |
The coreference system has been developed to decide wether two mentions refer to the same
entity or not. The input of the systems consists of a list of Named Entities of type Person, i.e. Person
Names (PNs), that have been automatically recognized in a document collection. Its output consists
of a number of clusters of PNs, where each cluster is interpreted as the set of PNs that refer to the
same entity.
|
Licence and download |
free licence obtained by registration |
Link |
http://textpro.fbk.eu/ |
Contact |
bentivo[at]fbk.eu |
|
|