Readers of this blog may have realized that Natural Language Processing (NLP) was missing from our ‘5 Data Analytical Trends To Watch For in 2018’ post. Our in-house team of predictive data analysts say it lost out to the other trends by a narrow margin. But that in no way takes away from the importance of NLP and its growing influence in the world of big data analytics. The loser by a whisker surely deserves an honorable mention, hence this 2-part post. After all, none other than Gartner has predicted that by 2019, NLP will be a standard feature in 90% of modern business intelligence and analytics platforms.
By way of explanation of what NLP or ‘computational linguistics’ (as it is known) is – it’s a combination of machine learning (ML), Artificial Intelligence (AI), and linguistics that allows us to speak to machines in human language. This transformation in human-data interface has, slowly but very much definitely, started making inroads into Business Intelligence (BI).
It’s like this – so far, over the decades, humans have been learning computer language. NLP means computers are now learning how humans speak, or rather, what they speak, and why do they use certain, specific words at certain times in a sentence. The first steps in allowing us to “talk” to machines in human language was taken years ago, and their commercial applications can today be seen in simplistic forms such as Apple Siri or Amazon Echo. The idea is to not only ask questions to computers in human language but to also receive replies the same way.
While NLP is being used in analytics, another use is in semantic search. Search engines like Google have been using NLP for years. Those days of trying to understand how to construct the perfect Boolean string to get results are long gone.
Progress on the NLP front can also be measured by the number of patent applications filed by over a dozen companies with the US Patents Office. Some pertain to queries to a data warehouse. So far, the data to a warehouse was queried using programming languages like Structured Query Language (SQL), and the output was also in SQL format.
But some companies like DataRPM Corporation have filed for a patent around a real time data discovery and BI platform using natural language queries that will allow a user to search within a data warehouse or other data sources by posing questions in human language 1.
Such search engine technology provides data indexing from data sources in a “de-normalized schema-free way”, using the indexed data to enable ad-hoc data analysis and rapid deployment. In addition, this type of search engine technology provides flexibility to handle dynamic changes in data, as individual records are stored as columnar key value pairs. Furthermore, scalability and fast access to the data, even for high volumes, can be provided at minimal maintenance.
The data analytics system receives a natural language based question either by voice or by text input. It then extracts keywords and maps these with a set of previously stored query terms and a set of operational commands. It then translates the natural language-based question into a formatted query string based on the index file, the computational operation, and the previously stored query term, and then throws up the desired answer.
…..to be continued.