Our text mining solution is highly modular and based on the best open source NLP tools. We wrapped the open source components in an enterprise-ready Datlowe Text Processing server, enabling high parallelism of the processing pipeline and creating lot of interfacing options.
We are using the latest results of research, including the best available language corpora and models. We are digging really deep into language and besides morphological analysis and lemmatization we also use sentence parsing. It allows to understand the information presented in a specific language in more detail.
We use several approaches to extract information from texts ranging from simple regular expressions through rule- and grammar-based approaches to machine learning. We extract everything from simple named entities (“which town is mentioned in the text?”) to complex relations (“which competing product from which competitor does the client own?”).
The entry-level generic components are tailored for each usage and can be modified by users later on.
Usually information extraction is not the ultimate goal per se. It is the usage of the information that creates the value for an organization. We use this information for client behavior modelling (propensity to churn or buy), for alerting and reporting, for historical purposes and streaming (sudden outbreak of unusual complaints) and for process automation (data extraction, document classification and routing).
We believe that domain knowledge is essential for high quality information extraction. Therefore a description of the domain needs to be created. It can be done by creating domain ontologies - these are dictionaries of entities and relationships among them. Ontologies can be created from scratch or some existing ones can be re-used.
We have lots of experience from the field of banking and we therefore created our own retail banking ontology. In it we describe transactions, their properties and relations to other entities. Activities (e.g., cancelling, sending) form an important part of the ontology. This ontology is well suited for analysis and for finding root causes of clients' complaints, emails etc.
This ontology has evolved from needs of analyzing clients' lifetime events and their behavior in general. It describes clients, their relations to products, needs etc. It can be used when working with all texts related to information about clients and their contacts with your company.
We are also working with the ontologies in healthcare and pharmacology. We are mostly using existing ontologies, connecting them and sometimes also enhancing them. We have used these types of ontologies in our Drug Encyclopedia and also in the processing of medical records.