Interview of Albert Georgel talk4’s CEO and CTO, developer of COBOT al4
Albert, you’re a specialist in neuronal networks, you have a PhD about the classification unsupervised of scientific publications, what are the technical particularities of the problem to which Cobot al4 answer ?
To sequence in a few minutes answers in natural language to open questions during a conversation is an action which cause two main problems :
- First, the training which we can anticipate for a Machine Learching program is limiter to the knowledge of the language (language and special vocabulary related to a job, for example), which is already a complex subject.
However, this training is not sufficient enough to have the expected standard of performance to sequence verbatim conceived in a specific context. We have to add to this preliminary training another one : how the supervisor read the verbatim in the particular context in the talk. This training can only be done during the data processing, which is called ‘dynamic’ Machin Learning.
We add a difficulty because the sequel of collected verbatim is not a random process, the first answers reveal a common thinking, the following are more personal and specific.
- Then, during live talk, to give access to the results in a few minutes ask an extra speed requirement.
Those two combined contraints embody a particular challenge optimizing the verbatim representation model, as well as a performance requirement on the processing chain that will calculate the mathematical distances between these data and thus the confidence index allowing to sequence or not in a same semantical group.
The subject of the semantic analysis in natural language seems to have a renewed interest recently and an acceleration in the quality of the obtained results, why?
The semantic analysis in natural language is, indeed, a old domain of algorithmic research. This is a subject that today benefits from the acceleration of research and developments on artificial intelligence, but the publication of scientific articles in recent years, new advances in the vector modeling of words have opened new prospects. The dynamism of open innovation and the structuring provided by repositories such as Github also make it possible to share faster each other’s progress and to have tools already developed.
The research and development work of talk4 makes extensive use of publications and developments available in open innovation. So what is difficult and how can talk4 make the difference?
First of all, we must be able to identify in the current abundance of available publications, those that can contribute to your own problem.
Then, going from a proven theory or a tested prototype, to a real production chain capable of being stable in its results and its performances, in a real environment, is a huge challenge.This supposes to optimize at the same time each one of the stages of the algorithmic chain and their assembly, from a mathematical point of view, but also computer science. This is, today, the main purpose of the R&D team such as talk 4’s one .
Semantic analysis of natural language is a technical field that requires expertise in linguistics, mathematics and computer development.How do you create competent teams in this field of artificial intelligence ?
Indeed, in a research and development team in NLP, language skills are needed which will be critical for the data preprocessing phases where we will focus on creating a simplified representation of words and sentences.
Next, mathematical skills are needed for the vector and statistical computing phases.
Finally, we must be able to translate all this into a powerful code, which is all the more critical for talk4 because the COBOT must restore a ranking in minutes.
The only solution is to create a multidisciplinary team, curious to learn outside of its strict field of expertise, and to constrain a lot of rigor in the process of development and evaluation of the results obtained.
Today everyone is talking about artificial intelligence, many say to do, what is your point of view on this situation?
As we said earlier, today there are many researches and algorithms already developed that are available on the internet and the tools that go with them. Some may believe that it is enough to use them to produce an application with artificial intelligence.
But to switch to a stable code under intensive operating conditions, and ‘scalable’ in terms of volumes and performance, we must master the logic of these prototypes to be able to optimize the application. Very few have the knowledge and experience to do it.
There are also all those who say they make artificial intelligence while they are content with a rules-based algorithm.
Non-specialists have a hard time differentiating between all of this.
Therefore, the generalizations and the put in production are very often deceptive.
The risk is that it generates a lot of mistrust and that serious teams like that of talk4 have difficulty to be heard.