Speech Genie
If you have used systems like Grok or ChatGPT at all you will have been both very amazed, and simultaneously extremely frustrated. Why? Your amazement probably comes from the speed at which it pulls together information into a usable form. For instance, a prompt from you aiming to get an overview of a topic from research on the internet can create a credible AI generated report in about 45 seconds. To do the same research yourself would take many hours.
At the same time, the research delivered is often in error and the systems even “hallucinate” answers to the questions that you pose. These problems all arise from the fact that none of the Large Language Model - LLM - based systems actually understand meaning from language like humans do. They simply do statistics on word frequency and colocation!
So it’s obvious that at the present point in history, we cannot count on (rely upon) these linguistic AI systems, especially in areas where mistakes are costly or perhaps even fatal.
What to do? It’s simple, really.
Many years ago John Ball came to the conclusion that for computers to work with and understand language we should try to model how the human brain understands language.
Afterall, the human brain - which weighs less than three and a half pounds and runs on only 12 watts of energy - can understand everything communicated in a language, AS LONG AS IT HAS ALREADY CONNECTED WORDS TO MEANING THROUGH PATTERN RECOGNITION AT SOME POINT IN TIME.
John’s Patom theory of how the brain understands language by means of pattern recognition and interconnected “meaning sets” can be directly applied to develop digital systems that fully understand meaning from language!
In other words – it creates digital language understanding systems that you can count on.
This is the basis for John’s Pat system which allows digital machines to use human language by converting speech or text into meaning. Computer algorithms can work with data when that data embodies meaning.
It’s important to understand that language recognition is by no means trivial.
There are infinitely many sentences in a language that can contain one or more ‘errors.’ For example, what does the following sentence mean?
It means “Can you pick up the cup?”
In spoken language this type of speech error is common. And, for learners of a new language, there are many other types of language error that need to be managed by a linguistically based digital system. The Pat system is engineered based on fundamental linguistic science which allows it to solve problems such as this. There are many such problems that only Pat can solve.
This solutions allow us to create a digital language partner (or “language parent” as we like to say), which can support a learner to master the core of a new language in as little as six months.
In the diagram below the interrelated steps in understanding language can be seen. You cannot resolve elements in the circles without seeing if it makes sense in the next region, and also in the previous region.
The conversion of language to its meaning allows a developer to decide what to do even in the face of multiple types of error.
In the above example, “Pick up the cup, no glass” gives the meaning [do’(you, 0)] CAUSE [raise’(the glass)]
Given a command to make the glass raise, the algorithm therein developed can show a glass being picked up or select it from a series of images. This allows us to build a fully interactive system wherein the learner can use language to command the software and be delighted when the system responds by acting based on full understanding!
Speech Genie will be developed on this basis.
Initially a learner will speak to Speech Genie with commands to manipulate simple on-screen objects. Speech Genie will use meaning to select what was requested from a set of choices best representing the request. This immediate feedback will help the learner to gain confidence in their command’s meaning in the new language as their speech improves.
The system will then be evolved from this foundation based on the following diagram. You can see how the system learns the language once and can operate with either speech or typing input because the language understanding part is common.
Speech Genie