A simple example of using the SFSpeechRecognizer to control a character using voice commands. This sample uses the en_US locale but can easily be udpated to support a different language.
The SFSpeechRecognizer cannot easily recognize single words and thus the voice commands included are all two words long.
Here’s the list of voice commands included with the sample:
I have a little problem, How to make it display text at the same time, can also produce sound to say hello?
I tried to modify the following code, but it didn’t work. It looks as if I’m not allowed to make a sound with the speech.say() while doing speech recognition
@jfperusse - very neat, is the intention to include this facility within Codea or do we have to use these external C libraries?
@John - just a quick note on an observation that I have seen in V4 which seems to pop up intermittently. That is the stop execution button on the controls in the BRHS of the screen. I ran this demo, tapped the screen a few times and then tried to close the project - it was very slow to respond after several attempts.
Closing the project, restarting it and immediate closing worked on one tap quickly. But, a repeat and after several on screen taps before tapping the close control again took ages and seemed not to respond initially.
I don’t think this is limited to this project.
Feels like the parsing of the controls is not in the main touch loop.
Edit: weird, came out of Codea after posting this. Re-started Codea and loaded this project and ran it, but this time I noticed the speech bubble posted up (wasn’t there before) and the touch response was excellent. Looks like it may be one of those something left in memory interfering with a newly loaded project. Or, since I downloaded this project and ran it directly, some interference from the downloaded project.
On that topic - it seems a bit odd that you need to set up a new project to access the forum website from the Codea Project Editor menu - would it be better to access the forum website via the project files window and download and store before running projects?
Thanks! It’s quite possible the audio engine used for the speech recognition is preventing the ability to play speech audio. I will have to investigate if that’s an issue with how it’s configured or if both simply cannot work together at the same time. If it’s the later, then one approach which should work would be to enable/disable the speech recognition based on inputs. For example, you could have a “Talk” button which you have to hold to use speech recognition, release it when you’re done talking to process the command, and then play the audio using the speech api.
At the moment, the main intention of this sample is to show how the objc feature can be used to access speech recognition, but it should be easy to hide most of the low-level objc functionnality behind an higher level Lua library using the code I shared.
I’ve learned that, like some social software, there should be a state-switching mode that turns on speech recognition and disables voice playback when you speak. When voice input is stopped, the voice recognition function is turned off and the voice is allowed to play. Thank you for the explanation.
@jfperusse This is great! I’ve fixed a few issues with objc syntax regressions in new betas (objc.<class> vs. objc.cls.<class> ) for the WebRepo version. @UberGoober I’ve added the project to WebRepo.
Forgive me for messing with the version numbers though I’ve hooked the backend up to the forum so new projects are announced automatically in the WebRepo thread.
Interesting… after doing the change I was able to use the commands, but maybe there’s something else going on.
One thing you could try is adding a print(message) around line 51, before looping over the possible commands. Then, when you see the “?”, look at the console to see what was recognized by the device. This might give us a hint as to what’s going on.
@jfperusse I’m using the zip file at the very beginning of this post. When I run the code, I get
objc.cls.ClassName is deprecated. Use objc.ClassName instead.
Don’t know if that causes a problem or not or if the zip at the beginning is the correct version to use.
When I run the code, my volume goes to max and I can’t reduce it by trying to slide the volume down (control panel) or using the volume down button. I have to exit Codea before I can change the volume.
Here’s startListening() that I added print statements to.
When I run the code, it prints a1, a2, and a5. When I speak a command, nothing happens. I don’t get the a3 or a4 print statements.
function startListening()
print("a1")
recognitionRequest = objc.cls.SFSpeechAudioBufferRecognitionRequest()
recognitionRequest.shouldReportPartialResults = true
recognitionRequest.requiresOnDeviceRecognition = true
recognitionTask = speechRecognizer:recognitionTaskWithRequest_resultHandler_(recognitionRequest,
function (oResult, oError)
print("a2")
if oResult ~= nil and oResult.bestTranscription.formattedString ~= nil then
print("a3")
if messageStart == -1 then
messageStart = ElapsedTime
end
local message = oResult.bestTranscription.formattedString
local foundCommand = false
print("a4")
for k, v in pairs(commands) do
if message == k then
v()
foundCommand = true
restartListening()
break
end
end
end
print("a5")
end)
inputNode = audioEngine.inputNode
recordingFormat = inputNode:outputFormatForBus_(0)
inputNode:installTapOnBus_bufferSize_format_block_(0, 1024, recordingFormat,
function(oBuffer, oTime) recognitionRequest:appendAudioPCMBuffer_(oBuffer) end)
audioEngine:prepare()
audioEngine:startAndReturnError_(nil)
end
@jfperusse Loaded from WebRep. I added the print statements in the same spots as before. When I run, it prints a1, a5, a2. The volume goes to max and won’t change until I exit Codea. I say the Go Up command and nothing happens.