Combines NLP techniques with Machine Learning algorithms
and semantic resources to explore large textual corpora
Analyze your corpus by aggregating services
librAIry helps you cut down processing time by providing you with a flexible system based on
a drag and drop deployment.
There are many scenarios where we may want to find pairs of textually similar documents in a large corpus (e.g. a researcher doing literature review, or an R&D project manager analyzing project proposals). To programmatically discover those connections can help experts to achieve those goals, but brute-force pairwise comparisons are not computationally adequate when the size of the document corpus is too large...
A novel hashing algorithm based on approximate nearest-neighbor techniques that uses hierarchical sets of topics as hash codes is proposed to explore document collections. It not only performs efficient similarity searches, but also allows extending those queries with thematic restrictions explaining the similarity score from the most relevant topics. Extensive evaluations on both scientific and industrial text datasets validate the proposed algorithm in terms of accuracy and efficiency...
With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need for annotation methods that enable browsing multi-lingual corpora. An unsupervised document similarity algorithm that does not require parallel or comparable corpora, or any other type of translation resource is provided to perform thematic explorations on collections of texts in multiple languages..
Efficient and easy way to analyze large amounts of multilingual texts through standard HTTP and TCP APIs.
Built on top of several NLP open-source tools it offers:
- Part-of-Speech Tagger (and filter)
- Stemming (Lemmas)
- Entity Recognizer
- Wikipedia Relations
- Wordnet Synsets
Analyze document collections to discover the main hidden themes in their texts and create learning models that can be explored through HTTP Restful interfaces.
These models can be used for large-scale multi-lingual document classification and information retrieval tasks.
Categorize texts with labels learned from them or from a different corpus.
Our annotators are designed to generate annotations for each of the items inside big collections of textual documents, in a way that is computationally affordable and enables a semantic-aware exploration of the knowledge inside.
Relates texts from their semantic similarity through cross-lingual labels and hierarchies of multi-lingual concepts.
Documents from multi-language corpora are efficiently browsed and related without the need for translation. They are described by hash codes that preserve the notion of topics and group similar documents.
European project oriented to provide inspiration for scientific creativity by utilising the rich presence of web-based research resources.
(more info)
Analyze the R+D+i information space (mainly patents, papers and public grants) for the implementation of evidence and knowledge-based policies.
(more info)
Proposal to incorporate NLP techniques into the indexing process of open data sets to increase their characterization
(more info)
Cross-lingual similarity between public procurement data contracts and news
(more info)
NLP, Knowledge Representation, Topic Models and Semantic Similarity.
NLP, Knowledge Representation, Web Semantic and Multimedia Annotation.
Ontology-based data integration and semantic technologies in Open Science.