Is understanding perfect?

No. For many reasons: wrong grammatical parsing, incorrect anaphoras, idiomatic sentences. Moreover, the current version does not understand direct speech. We are constanly working to improve the framework.

The database is a compromise between speed, memory and accuracy. Personalized solutions - taylored to specific problems - would be more efficent.

How it is done

The text is tagged by using the tag frequencies provided in the Open American National Corpus (OANC), an open source project available with this license. Sentences are then parsed by using parsing frequencies extracted from the OANC. A “distance” between words is obtained by using the Wordnet corpus (3.1), available freely under the wordnet-license.

The parsing is then improved by choosing the sentences that make more sense according to the Framenet dataset, distributed under a Creative Commons license.

Please note that NLUlite is developed indipendently and is not endorsed by any of the previously mentioned projects.

What do the different versions implement?

  • 0.1.0 (Sep 2014): Read texts and websites, only one level inference
  • 0.1.2 (Oct 2014): 10-20% faster, longer texts, Mac Os X version
  • 0.1.4 (Nov 2014): 80% less memory usage
  • 0.1.6 (Dec 2014): Improved query system for large datasets
  • 0.1.8 (Jan 2015): Multi-server queries
  • 0.1.10 (Feb 2015): Improved multithreading
  • 0.1.12 (Apr 2015): Wikidata support