What is missing from current NLU systems?

Unsupervised representation learning (both autoregressive and autoencoding) has been highly successful in NLP. So much so that researchers are coming up with new datasets or coming up with stricter guidelines to make room for future research. RECLOR is a recent dataset that uses questions present in standardized exams like GMAT to check the logical reasoning of language models. ANLI dataset was created with the same spirit to make it harder for language models to perform natural language inference(NLI). SuperGLUE was created when the language models simply surpassed the human performance in the GLUE benchmark which was supposed to be the marker of General Language Understanding. I understand that this is the way research works but we need to ask ourselves, What is missing from our current systems?

Such a question is necessary because the present models are simply exploiting loopholes. Some of them are:
a) Models exploit statistical patterns of the language: Many recent works (like 1, 2, 3, 4) have shown that instead of learning the meaning of things like we humans do models just exploit patterns within the language.
b) Models exploit biases of data: Most of the data involves humans annotators and models simply figure out the bias in the data creation process. [1, 2]
c) Current benchmarks avoid language generation tasks for better evaluability. The benchmarks prefer multiple choice questions for explainability instead of using traditional language generation metrics like BLEU and ROGUE.

We have stopped thinking about language as a source of knowledge and have directed our entire research effort towards making models that will make good conversational agents. In Chapter 4 of the book Steps to Success, the authors discuss different parts of Language comprehension which are necessary to understand language. They are:
a) Background Knowledge: Background knowledge means how much information you already know about a subject. Any new knowledge only adds to your understanding of the same. For example, if you are reading about the solar system, you will build upon your existing knowledge of the solar system. If you find a new fact about the solar system you append your existing knowledge.
b) Vocabulary: What do the words actually mean? When we read the word ‘break’ it could refer to multiple meanings like items that can be broken, ways that things can cause things to be broken, and what it means for something to be broken, etc. Vocabulary is never finite. It builds upon our existing vocabulary. We can easily understand the meaning of ‘deactivate’ if we know the meaning of ‘activate’ and the morpheme ‘de’. Also, if we encounter a new meaning of ‘break’ we update the existing meaning of ‘break’ in our minds.
c) Language Structure: The same language can have different structures. The structure of poems is so different from casual English and yet we adapt to reading poems a certain way. Similarly, how and which words we use are different in formal and non-formal settings. So a system has to recognize which language structure to choose in which situation.

The examples generated from GPT-2 show an understanding of ‘language structure’ and can write long passages that are syntactically and semantically correct. Deepmind released a new book level language modeling dataset which will help in generating even larger texts with an intact context. The ability to write a long passage of perfect English depends highly on ‘context size’ on which they were trained. The bigger the ‘context size’ the better the language generation. Even with transfer learning, the models are just learning the correlation between input representations and output labels and have a hard time explaining their decisions. So, the question comes, Are they really understanding the language?

The Turing test in itself isn’t very easy. The only way to concretely verify language skills is through conversation and if a human fails to identify an algorithm, it is a good sign of language understanding skills. The only way to have such a conversation is when the algorithm/model has access to the knowledge of the world and the knowledge of what it has learned about the world till now. Such a knowledge component is critical to language understanding and its missing from current systems. It is not a precondition but a necessary prerequisite to a human-level language understanding and language use capability. This ‘knowledge component’ needs to have certain properties and it doesn’t matter whether it arose as an emergent phenomenon or was implemented as an external memory. Some properties are:
a) It is incremental: Problem with existing knowledge bases is that they are very rigid and very error-prone when a model tries to add more knowledge.
b) Reasonable forgetting and multitask usability: Catastrophic forgetting is a big problem for present deep learning models. However, we can’t remember everything about the input; if we did, we would be making a lookup table. To enable multitask learning, we need to partially decouple the learning and prediction components from the knowledge components.
c) Reasoning enabled: I think that this a highly overlooked component. Reasoning enables explainability. It enables correct associations of different concepts within the ‘knowledge-component’, resulting in faster knowledge acquisition. For example, to understand the sentence, “The karate champion broke the wooden block.”, we don’t have to mention that the champion used his hands to break the blocks. We automatically use our reasoning and existing knowledge of ‘Karate champion’ to understand it. So, the knowledge component has to be built in a way that it easily facilitates reasoning.

How will we make it? I don’t know yet. LSTMs can’t store this kind of knowledge, they are limited by their capacity. We still have to see how well transformers are suited for the task. There have been many experiments to include external knowledge[1 ,2 ,3] but in most cases, models use external knowledge as a reference for better performance and don’t have the ability to change or append it. There have been attempts to append deep learning architectures with external memory[1, 2] but we are still far from building a knowledge component with the aforementioned properties.

Lastly, language is a medium with which communicate our ideas and also our problems. To have a positive impact on humanity, any intelligent system will have to learn and understand our languages. Building systems with ‘background knowledge component’ in them is an important step towards that goal.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top