In 1984, I was 10, and I was watching this anime called Pole Position.
The two heroes were driving cars with on-board computers. The two computers were named Turbo and Zoom, and the crazy thing about them was they could talk with the driver (which makes a lot of sense because when you’re driving a car and facing huge dangers in every episode, it is difficult to use a keyboard).
I was not fascinated by the cars themselves, but these two computers were the most exciting thing I could imagine.
At the time, I was also a big fan of Ulysse 31, where the hero, Ulysse, is lost in space with his spaceship called Odysseus (yes, it refers to the Greek mythology from the beginning to the end).
The Odysseus ship also had a super powerful computer, Shirka, able to talk with humans.
This idea we could talk to a computer, and it could understand what we’re saying, and it could answer back to us was blowing my mind.
That was the best idea ever.
So I decided to do it.
I was a lucky kid, because I had an Amstrad, and I knew how to code in Basic. So I just started implementing the thing.
The first step was to make it credible: nobody would talk to something that does not look like it could understand, right? After few hours, I had a bunch of lines of Basic able to display a face: mouth, eyes and eyebrows, just like Turbo or Zoom in Pole Position.
I was young and naive, but still, I was reasonable enough to decide to let speech recognition and speech synthesis aside in my first attempt.
So my second step was about language processing. As mentioned, I was naive, so I tried a naive approach: let’s just cover all the most likely possibilities. The first thing people are likely to say is obviously “Hello”, so if the user says “Hello”, my program will answer “Hello”.
Now what? Hmmm…
Well, listing all the possible sentences seems difficult (and boring). If I restrict to simple sentences like subject + verb + complement, maybe it would be easier. So let’s create a list of possible subjects, a list of possible verbs, and a list of possible complements.
And now, let’s find which of them are a good match, and which are very unlikely. I mean, yes, it could work, no?
Unfortunately, I got busy with other things, like watching animes, and also drawing the Mandelbrot set on my computer.
Note: something I learn at the time is that drawing the Mandelbrot set on a Z80-based computer with 64KB of RAM (my Amstrad was the CPC464, and 64 here stand for 64KB of RAM, yes KB, not MB, later I had the CPC6128, with 128KB of RAM, such a big jump!) takes ages. Literally. Like several days. And it is not multitask, so while you’re doing that, you cannot play to Ikari Warrior or Sorcery Plus.
Anyway, for all these reasons, my plan to implement my Pole Position-like speaking software was put on hold.
But I think somehow, it stayed in my mind as a secret fantasy.
That’s why what we have achieved this week at Nuclia reminded me a lot of past memories. Nuclia was already a very powerful AI search engine, but this week, we have added the ability to generate answers from a collection of resources. Any kind of resources can be ingested, like I could push all the Ulysse 31 episodes as video files, they would be processed entirely, including speech-to-text, and then I can ask questions about Ulysse’s adventures.
And not just one question, I can discuss with Nuclia, exactly like ChatGPT does, but unlike ChatGPT, Nuclia will only use the information I have provided, so it is a controlled set of knowledge, which makes it much more trustable.
As we were putting together the different elements, when I saw this happening on my browser, that was a great emotion, bigger than I would have imagined actually.
I though about me as a kid, 40 years ago, with my extravagant expectation (I love this quote from Jules Verne: “Nothing great was ever made which was not an extravagant expectation“).
Obviously, this time, I am not alone, and I am so grateful to the talented people of the Nuclia team. Thank to them my childhood dream has finally come true.
Let me tell you what my part is in this team: I work on the frontend. Thinking about it now, I think I’ve been pretty consistent with myself over time: as a kid, I started with the user interface, and I’m still there. My first move is always to give a face to the system I build, whatever it is. To me, it’s a way to make it real, to make it tangible for users. I would never diminish the importance of backend, but that’s not my thing. I love frontend because that’s where the magic starts happening for users.
And talking about magic, I remembered my initial idea, and yes, I can chat with my computer now, but by text. If I was driving a Pole Position car, that would not be good enough.
Just for fun, I decided to play a bit with voice recognition and voice synthesis.
Guess what, that’s actually pretty easy to achieve nowadays, most browsers supports it, thanks to the Web Speech API.
So I have quickly assemble a prototype and here I am, I can speak to my computer, it understands my questions, it answers back, it understands the ongoing context, that’s a real discussion!
Achievement unlocked, 40 years later.
I am not sure it is such a useful feature (typically, it is totally broken in noisy environment), but I love the magic of it.
It reminds me an episode of “Halt and catch fire”, where Joe MacMillan sees the demonstration of the Apple Macintosh. When the computer boots, it says “Hello I am Macintosh”. Joe is totally shocked, the only thing he can say at this moment is: “It speaks…”. And it is really breathtaking, indeed.
When something speaks to you, that’s a strange feeling. I think that’s beyond reason, it relates to something deeper. We are used to observe intelligent behaviors in animals. But language belongs to humanity. I guess we instinctively consider as very odd that non-human could speak too.
As a teenager, I read Asimov. I recently read it again, and I was always surprised for example that Isaac Asimov considered voice synthesis as a more advanced feature than the ability to manipulate objects or analyse what is happening in the immediate environment. In the Robot cycle, there is a robot able to monitor and take care of a child, like it can understand if the situation is risky (and understanding language is part of that of course), and it can intervene to protect the child. But it cannot speak. The second version of this robot could speak, implying voice synthesis was very challenging for U.S. Robotics Inc.
And as a matter of fact, in 1984, speech synthesis was actually possible. I mean, I could not do it on my Amstrad, but Apple engineers could do it on Macintosh. At the contrary, neither I nor they could make that computer understand language and participate in a discussion..
But today, I can.
And it makes the child in me very happy 🙂