The ADA System ECM suite offers solutions for automatic content recognition. This helps to identify automatically content received from various sources and reduce the need to manually index a document or file. Thus accelerate the process time and efficiency.
The Text Searcher module enables conducting searches in content of textual files, MS-Word, MS-Excel and textual PDF documents or OCR output files. This way, extensive search options are added without being confined to merely the indexing data, but also using the document’s content. Searches can be conducted with the purpose of finding documents containing required textual words or phrases. The search syntax is similar to the one employed by Google for web searches. Searches can be conducted combining parameters of data from the system’s database with textual searches. For instance, a search for the entire documents related to a certain customer containing the word “Order”.
The modules employs the to Apache Lucene ™ core, high-performance, full-featured text search engine library written entirely in Java. It is Scalable, high-Performance Indexing engine:
- over 150GB/hour on modern hardware
- small RAM requirements — only 1MB heap
- incremental indexing as fast as batch indexing
- index size roughly 20-30% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms:
- ranked searching — best results returned first
- many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
- fielded searching (e.g. title, author, contents)
- sorting by any field
- multiple-index searching with merged results
- allows simultaneous update and searching
- flexible faceting, highlighting, joins and result grouping
- fast, memory-efficient and typo-tolerant suggests
- pluggable ranking models, including the Vector Space Model and Okapi BM25
- configurable storage engine (codecs)
BAR CODE RECOGNITION
The Bar Code recognition Module enables the system to recognize the bar code printed on the scanned document, thus automatically receiving the relevant indexing data of a document or a documents group. The module enables recognition of the bar code in any position on the document; this feature is enabling recognition of several bar codes on the same page. In addition, the bar code can act as a separator between multi page documents scanned in one batch.
The OCR module recognizes texts from scanned documents. After recognition, the Text Searcher Module can be deployed to search for documents by their content. In addition, it is possible to detect certain predefines areas in the scanned forms for extracting of data.