End of Day 1
So, at the end of the first day, one thing is clear: It’s not just easy to use the speech recognition libraries in .Net 3.0’s System.Speech.Recognition libraries, it’s really really freaking easy to use them. It’s just a handful of lines to start recognizing speech.
The harder part is figuring out what you want to recognize it and how to deal with it after you’ve recognized the speech.
There’s really not that much left for me to do with the speech recognition at this point. I wanted to spent time to get it working but it didn’t actually take any time to get it working… Any more work will be toward a specific use, which I wasn’t really planning on doing here. So then, that means tomorrow will focus on the facial recognition side of the project.
November 25, 2009 No Comments
reform might be the poetry
Like I said… Speech recognition sucks.
Instead of doing a grammar with choices, like I did before, you can use a DictationGrammar. Free form dictation. Free form, like beatnik poetry. As in it ends up making as much sense as beatnik poetry.
I have no idea what I said, but it certainly was not the following:
into the earth and one half of this room here we are we’re on
After that sentence got derailed, for a fun exercise, I decided to see how the speech recognizer would interpret the words of others, say, for instance, the Constitution:
we have people of the United States or 11 for you, justice for domestic front of the provider, and jail where welfare and secure the blessings of liberty to ourselves and our posterity Jordan’s challenge this constitution the united states of America
And to that, I believe Madison would have said “Here we are, we’re on!”
November 25, 2009 No Comments
And a better example…
Here’s an example of using System.Speech.Recognition to recognize the words “red”, “blue”, and “green”.
using System.Speech.Recognition;
namespace MathPirate.AlternativeInputDevices.SpeechCommandProcessor
{
public class SpeechProcessor
{
protected SpeechRecognizer Recognizer { get; set; }
public SpeechProcessor()
{
Recognizer = new SpeechRecognizer();
Recognizer.SpeechRecognized += (sender, e) => Console.WriteLine(e.Result.Text);
Recognizer.SpeechDetected += (sender, e) => Console.WriteLine("Detected: {0}", e.AudioPosition);
Recognizer.SpeechHypothesized += (sender, e) => Console.WriteLine("Hypothesis: {0}", e.Result.Text);
Recognizer.SpeechRecognitionRejected += (sender, e) => Console.WriteLine("Rejected: {0}", e.Result.Text);
GrammarBuilder builder = new GrammarBuilder();
builder.Append(new Choices("blue", "green", "red"));
Recognizer.LoadGrammar(new Grammar(builder));
}
}
}
No, seriously. That’s it. And most of it’s debug writelines that you don’t even need.
November 25, 2009 No Comments
Well, that seems to be working…
Of course, this is completely meaningless without the audio to go along with it, but still…Â It had a decently high hit rate for something that was quickly thrown together in an attempt to get something going on.
Say Something Detected: 00:00:00.5400000 Rejected: blue Detected: 00:00:01.0300000 Rejected: blue Detected: 00:00:01.7600000 Rejected: blue Detected: 00:00:02.7600000 Hypothesis: green green Detected: 00:00:04.8900000 Hypothesis: red red Detected: 00:00:07.8000000 Hypothesis: blue blue Detected: 00:00:10.8700000 blue Detected: 00:00:15.3700000 Hypothesis: red Rejected: red Detected: 00:00:15.7300000 Rejected: blue Detected: 00:00:18.8700000 Rejected: red Detected: 00:00:20.4500000 Hypothesis: blue Rejected: blue Detected: 00:00:21.0500000 Rejected: blue Detected: 00:00:21.4100000 Rejected: blue Detected: 00:00:21.6900000 Hypothesis: blue Hypothesis: blue Rejected: blue Detected: 00:00:31.7400000 Hypothesis: blue Hypothesis: blue Rejected: blue Detected: 00:00:34.4500000 blue Detected: 00:00:36.4600000 Rejected: blue Detected: 00:00:38.6400000 Hypothesis: green green Detected: 00:00:40.2800000 Hypothesis: blue Rejected: blue Detected: 00:00:40.8800000 Rejected: blue Detected: 00:00:42.6900000 Rejected: red Detected: 00:00:50.6900000 Hypothesis: red red Detected: 00:00:52 Rejected: blue Detected: 00:00:52.4400000 Hypothesis: blue blue Detected: 00:00:53.5600000 Rejected: blue Detected: 00:00:55.0100000 Hypothesis: green green Detected: 00:00:56.4800000 Rejected: blue Detected: 00:00:58.1300000 red Detected: 00:00:59.8000000 Rejected: blue Detected: 00:01:01.3400000 Hypothesis: blue blue
November 25, 2009 No Comments
Rule #1: The Documentation Must Not Suck
So that’s how you create, initialize, and    a SpeechRecognizer…
Besides the fact that this example is apparently showing something that’s invisible, it’s not even really showing the creation and initialization, either. Calling your local functions “SetupEventHandlers()” and “LoadInitialGrammars()” isn’t exactly all that helpful to me. I understand that I need to set up event handlers and load grammars. THAT’S WHY I’M READING THE DOCUMENTATION: I know I need to do something, but I don’t know how. You’ve pretty much shown me how to call the constructor on your class. I managed to get that bit on my own, remarkably enough.
November 25, 2009 No Comments
Speech Command Processor
As mentioned before, one of the tasks will be to write a speech-activated command processor. For this, I do not mean that I’ll be doing a full speech recognition engine capable of dictation. Instead, I mean that want to do something that will be able to recognize commands, like a “Say ‘one’ for service in Swahili” menu system or telling the computer to perform some operation like “Open the pod bay doors”.
There are two reasons for this limitation:
- I don’t need full dictation support for the application I have in mind.
- Speech recognition sucks.
Obviously, any form of speech recognition is a highly complex task, involving in-depth knowledge of linguistics and signal processing and all sorts of related things that I know nothing about. That is why I am very glad that I don’t have to write any of it. You see, one of the namespaces included in .Net 3.0 was something called System.Speech.Recognition. It looks like classes in that library will pretty much do everything for me, hopefully making this task dead simple to implement, while seeming really impressive to anyone who doesn’t know about them.
November 25, 2009 No Comments
Crazy Weekend Project 2: Semi-Crazy
As you may recall, back in September, I spent five solid days building a robot that could play Atari 2600 Pong. It wasn’t perfect, but it did beat the computer player in several matches. However, there was significant room for improvement. The motion was too jerky, the trajectory projection algorithm had problems, and the robot was no match for a human player. Over the next five days, I will not be continuing that project.
You see, a couple of weeks ago, I finally bought an XBox 360, so I just don’t have that kind of time to devote to building robots at the moment. Instead, I’ll be doing something much more practical and limited in scope, and only spend a few hours a day on it. The rest of the time I’ll be alone in my apartment, immersed in HD gaming glory, like any other sane person would be this weekend.
Now, by “more practical and limited in scope”, I mean that I intend to attempt to build a facial recognition system and voice activated command processor. The reason for this is plain: Everyone needs a facial recognition system and voice activated command processor. What good is a computer without one? Additionally, these are two of the three necessary pieces that I need in order to fully exploit an HP TouchSmart PC that I got from Haggle.com a few weeks back, which has an integrated webcam and microphone. The third piece, exploitation of the multi-touch screen, is left as an exercise to the reader.
As with the previous Crazy Project Weekend, I have not done any work in these areas or used any of these technologies prior to the commencement of the Crazy Project Weekend, other than a cursory glance to make sure that I’d have a chance of doing something useful in the timeframe alloted. Additionally, I will be sharing successes, failures, thoughts, and above all, source code, which, in this case, might actually be useful to other people.
November 24, 2009 No Comments