Breaker, breaker one-nine. This is the Hack Man, come on. We got a big ol’ convoy of homemade large language model-powered voice assistants out there, but I’ll tell ya, they ain’t nothin’ special no more. They’re just keepin’ the rubber side down on the same ol’ road, no surprises, no excitement. What we need is some new rigs rollin’ out with fresh designs, somethin’ that’ll make you wanna throw your hat in the air and shout “That’s a 10-4, good buddy!” on the CB. Time to light up them new ideas and put the hammer down on innovation, ’cause this old chatter on the airwaves is gettin’ stale.
I’m saying important things because I have a radio (📷: Guy Dupont)
With a glut of AI-powered voice assistants having been made in the past couple years, these builds are becoming less interesting. APIs have been released for several popular large language models, and a number of optimizations have been developed that allow smaller models to run on-device, so the formula for making them has pretty well been standardized at this point. Another voice assistant? Ho-hum.
Guy Dupont has brought us another voice assistant, but this time with a creative twist that makes it very unique. Technologically, it is not much different from the rest of the pack — it uses APIs to convert speech to text, query a large language model, then forward the result into a text-to-speech utility. But Dupont’s voice assistant takes on the persona of a security guard named Doug, and you speak to Doug via a radio handset that would make the Bandit jealous.
The push-to-talk button transmits recognizable audio signals (📷: Guy Dupont)
The build is pretty straightforward — a cheap Motorola-style handset is wired to a QCC3034E Bluetooth audio module, which makes it wireless. A nearby smartphone or computer receives the audio and looks for distinctive patterns in the signals it receives to recognize when the push-to-talk button has been depressed. From there, it is a standard matter of working with APIs. In order to reduce the latency in conversations, Dupont made use of OpenAI’s new Realtime API.
Just how useful Doug really is is debatable, but this does look like a project that would be a lot of fun to play with. And when the novelty and humor wear thin, it would be a small task to adjust the model prompting to lose the smart aleck attitude to repurpose Doug for something a bit more (or less) serious. Whatever one chooses to use the setup for, who can honestly say it wouldn’t be made better by speaking into a radio handset and throwing around some CB lingo?