Jul 6, 2023, Innovations

NLU vs NLP vs ASR – what are they and how do they differ?

Maria Baryluk Content Specialist
what are NLU NLP and ASR
For the last couple of months, the topic of artificial intelligence has been circulating around the media with crazy high intensity. But did you know that AI-powered resources have been present in our lives for over a decade? Whether its voice search tools, virtual assistants or translation programs. None of that would be possible if we didn't have deep learning models. Models that all together with neuro linguistic programming work on human language understanding. Core techniques used for it are natural language processing, natural language understanding and ASR: automatic speech recognition. In today's article we will dive into the computer science behind it, backed by use cases and examples.

NLP: Natural Language Processing

Firstly we’re going to focus on natural language processing, which is a combination of machine translation and computational linguistics, a study that links human language and computer science. It all results in the ability of computers to understand, process and communicate with human language. We can differentiate 4 types of neuro linguistic programming, that with the usage of deep learning models, perform broad range of tasks:

Text to Speech (TTS)

This practice converts text data into spoken words using digital documents like emails, messages, and medical records. The algorithm of TTS analyzes the text and outputs full spoken sentences. You will see this technology on language platforms, such as Duolingo or Google Translate, and when dealing with voice bots or virtual assistants.

Speech to Text (STT)

Speech-to-text technique, on the contrary, with the help of deep learning methods converts voice data into text. You can experience this nlp technology when talking to Siri or using a dictation tool on your smartphone. The ability to use it hands-free is the major convenience here: not only for regular users but especially for people with disabilities.

Speech to Speech (STS)

If you own a smart speaker like Alexa or Google Home, you’ve truly experienced Speech to Speech development. They use nlp tools to firstly process the human behavior and language (no matter the different accents of users), understand the command and context, then search for the accurate information in the database, finally to accordingly answer. The process that sounds complicated, in reality, takes seconds to be completed.

Text to Text (TTT)

The last natural language processing type that we’re going to discover is Text to Text. As the name implies, it converts one text data into another. It could be either a translation from one language to another, or an artificial intelligence tool that summarizes or paraphrases bigger portions of tests such as PDF files or whole websites. All of that done with the high accuracy in the semantic analysis, grammar check and a general sense of original message.

NLU: Natural Language Understanding

Natural language understanding is considered a subtopic of nlp that aims to understand the human language in full spectrum. Once nlp techniques transform the text data into standardized structure, it’s time for NLU to give it a context and a natural, human voice. Here, machine learning aims to interpret the natural language, derive the meaning out of it, then identify the context and finally draw insights. Algorithms conduct sentiment analysis that tackles the emotions and tone of the message. Next step is the word sense disambiguation, which is a process of identifying which meaning of the word is currently used in a sentence. For us, human beings, in most cases we determine it subconsciously. Machines, on the other hand, need to earn these skills. To give an example, look at these sentences:

  • I can hear the bass sound.

  • He likes to eat grilled bass.

In the first sentence, bass refers to the frequency sound. In the second sentence, grilled bass means… fish! If there was no word sense disambiguation, the machines would face problems with recognizing the difference and thus wouldn’t be able to answer accordingly.

ASR: Automatic Speech Recognition

The last natural language toolkit that we’re going to research is Automatic Speech Recognition. A technique that supports machines with recognizing and understanding human language. Sounds familiar? Earlier mentioned Speech-To-Text, a form of NLP, is often confused with Automatic Speech Recognition. They’re both similar with some computer science differences. ASR in order to work, has to convert spoken language into written text. The program uses machine learning, word segmentation and several algorithms to properly interpret our language. It works on the fundamental aspect of the conversion: first analysis of the vocal information, which is then, with the help of NLP, formed in a sentence. So it is safe to say that both work in a synchrony: doing sentiment analysis, looking for the right context to understand the real meaning of the words spoken by the person. These efforts ensure accurate and effective communication between machines and us humans.

The future is already here

Often when we talk about artificial intelligence and all its features and techniques, we often relate to the future. We’re predicting how many people will lose their jobs or how AI will affect the media. What is overlooked though, is that the AI applications have been with us for some years already. These techniques, such as voice assistants, speech recognition, and translation, have seamlessly become part of our everyday routines. All of that is powered by natural language generation, natural language understanding and finally automatic speech recognition. Knowing the differences between each technique is crucial to understand today’s dynamics of the tech world we live in.