Hello Fact
An AI-powered Document Search Engine

Document Search Engine

Hello Fact
An AI-powered Document Search Engine

Hello Fact
An AI-powered Document Search Engine

The Challenge

We were asked to develop a system which could answer queries from documents. Sounds pretty simple right?

Only, these queries were legal questions and the documents to be searched were in millions!

Our client wanted to eliminate the horrors of manual searching for anyone looking through a plethora of legal documents. This included personnel from different industries like Law and order, gaming companies, casinos etc. Keeping that in mind, we came up with the perfect solution in the form of a fast, efficient and an easy-to-use document search engine.

Solution

We created a web-based search engine that would show results ranked by relevance in context. Not just that, the search engine would show paragraphs containing the answer to the query and highlight the answer in the text. If you were to click on your answer, you would get the full document opened. Just like any other search engine but with more options.

1M + Documents digested and optimized

Average searching time reduced from 20 m to 0.2ms

Active scraping for documents

The system will never show outdated results

Active Scraping for Documents

HelloFact in Action

A bird’s eye view of how the whole process works

  • Index Query – User enters a query and the AI engine works to understand it

  • Search Database – It then analyzes the query and matches it with records in the database

  • Rank results – The system ranks results by the most relevant in context at the top

  • Display Results – You get clickable results in a custom-made UI with highlighted answers

  • User Interacts – Users get to open documents, add notes or save them and even change views 

How Hellofact Work

Challenges

We faced several challenges during the development of this project. 

Data Gathering

We initially had to download and maintain essential file information for more than a million documents. Curated from various websites, they had to be stored in a proper and organized manner.

Data Annotation

Data annotation was a challenge because we had no information about document categories. We had to manually explore all documents and assign valid categories to each one.

Data Parsing

We also had to extract all the text and styling information. It’s a pain when most of the documents you find are in multiple formats including pdfs! 

We solved these problems with one stone

Auto-tagging!
We devised a mechanism that allowed us to automate continuous data scraping, annotation, and to some extent parsing. Making HelloFact a search engine that would always show updated and accurate results.

Storage and Indexing

Because we are all about fast and easy to use solutions, document storage was a concern because it would affect the searching speed. We made state of the art NLP models that allowed us to create a distributed, multitenant-capable full-text search engine showing 0 delays. 

Context-based searching

Searching within the documents had to be context-based searching rather than text matching-based. This kind of searching in hundreds of thousands of documents and then showing the results in seconds was a challenging endeavor.

Lets work together

Get in Touch

    Let’s work together

    Get in Touch