Random header image... Refresh for more!

Category — Crazy Weekend Project 2: Wesley Crusher Removal

reform might be the poetry

Like I said… Speech recognition sucks.

Instead of doing a grammar with choices, like I did before, you can use a DictationGrammar. Free form dictation. Free form, like beatnik poetry. As in it ends up making as much sense as beatnik poetry.

I have no idea what I said, but it certainly was not the following:

into the earth and one half of this room here we are we’re on

After that sentence got derailed, for a fun exercise, I decided to see how the speech recognizer would interpret the words of others, say, for instance, the Constitution:

we have people of the United States or 11 for you, justice for domestic front of the provider, and jail where welfare and secure the blessings of liberty to ourselves and our posterity Jordan’s challenge this constitution the united states of America

And to that, I believe Madison would have said “Here we are, we’re on!”

November 25, 2009   No Comments

And a better example…

Here’s an example of using System.Speech.Recognition to recognize the words “red”, “blue”, and “green”.

using System;
using System.Speech.Recognition;

namespace MathPirate.AlternativeInputDevices.SpeechCommandProcessor
{
    public class SpeechProcessor
    {
        protected SpeechRecognizer Recognizer { get; set; }

        public SpeechProcessor()
        {
            Recognizer = new SpeechRecognizer();

            Recognizer.SpeechRecognized += (sender, e) => Console.WriteLine(e.Result.Text);
            Recognizer.SpeechDetected += (sender, e) => Console.WriteLine("Detected: {0}", e.AudioPosition);
            Recognizer.SpeechHypothesized += (sender, e) => Console.WriteLine("Hypothesis: {0}", e.Result.Text);
            Recognizer.SpeechRecognitionRejected += (sender, e) => Console.WriteLine("Rejected: {0}", e.Result.Text);

            GrammarBuilder builder = new GrammarBuilder();
            builder.Append(new Choices("blue", "green", "red"));
            Recognizer.LoadGrammar(new Grammar(builder));          
        }
    }
}

No, seriously. That’s it. And most of it’s debug writelines that you don’t even need.

November 25, 2009   No Comments

Well, that seems to be working…

Of course, this is completely meaningless without the audio to go along with it, but still…  It had a decently high hit rate for something that was quickly thrown together in an attempt to get something going on.

Say Something
Detected: 00:00:00.5400000
Rejected: blue
Detected: 00:00:01.0300000
Rejected: blue
Detected: 00:00:01.7600000
Rejected: blue
Detected: 00:00:02.7600000
Hypothesis: green
green
Detected: 00:00:04.8900000
Hypothesis: red
red
Detected: 00:00:07.8000000
Hypothesis: blue
blue
Detected: 00:00:10.8700000
blue
Detected: 00:00:15.3700000
Hypothesis: red
Rejected: red
Detected: 00:00:15.7300000
Rejected: blue
Detected: 00:00:18.8700000
Rejected: red
Detected: 00:00:20.4500000
Hypothesis: blue
Rejected: blue
Detected: 00:00:21.0500000
Rejected: blue
Detected: 00:00:21.4100000
Rejected: blue
Detected: 00:00:21.6900000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:31.7400000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:34.4500000
blue
Detected: 00:00:36.4600000
Rejected: blue
Detected: 00:00:38.6400000
Hypothesis: green
green
Detected: 00:00:40.2800000
Hypothesis: blue
Rejected: blue
Detected: 00:00:40.8800000
Rejected: blue
Detected: 00:00:42.6900000
Rejected: red
Detected: 00:00:50.6900000
Hypothesis: red
red
Detected: 00:00:52
Rejected: blue
Detected: 00:00:52.4400000
Hypothesis: blue
blue
Detected: 00:00:53.5600000
Rejected: blue
Detected: 00:00:55.0100000
Hypothesis: green
green
Detected: 00:00:56.4800000
Rejected: blue
Detected: 00:00:58.1300000
red
Detected: 00:00:59.8000000
Rejected: blue
Detected: 00:01:01.3400000
Hypothesis: blue
blue

November 25, 2009   No Comments

Rule #1: The Documentation Must Not Suck

So that’s how you create, initialize, and     a SpeechRecognizer…

WonderfulExamples

Besides the fact that this example is apparently showing something that’s invisible, it’s not even really showing the creation and initialization, either.  Calling your local functions “SetupEventHandlers()” and “LoadInitialGrammars()” isn’t exactly all that helpful to me.  I understand that I need to set up event handlers and load grammars.  THAT’S WHY I’M READING THE DOCUMENTATION:  I know I need to do something, but I don’t know how.  You’ve pretty much shown me how to call the constructor on your class.  I managed to get that bit on my own, remarkably enough.

November 25, 2009   No Comments

Straight From the Source

Here’s where I’ll be checking in the code, as it gets written.

https://mathpirate.net/svn/Projects/AlternativeInputDevices/

November 25, 2009   No Comments

The Hardware

For this project, I’m going to need a microphone and a webcam.  So, why not both in one?

Doesn't Completely Suck

I bought a pair of these cheap-o webcams with mics at Office Depot a few months ago, because they were cheap and small and I thought that maybe I could use them to build a 3D video camera of some sort.  Since that time, they’ve been sitting on the floor next to my computer, under a TV tray holding an Odyssey 2 system that I haven’t put away since I wrote about K.C. Munchkin! and Pac-Man a while back.

I finally opened up one of them today and plugged it in and was fairly amazed that the camera didn’t completely suck.  In fact, it seems to be the best quality webcam I’ve ever bought.  Maybe there’s been a sudden leap in webcam quality over the past few years that I haven’t been aware of, but the previous ones I’d bought, even the one that was supposed to be good, were merely producers of 320×240 blurriness.  This thing can take 1.3 MP pictures that turn out better than the first digital camera I had.

The picture above was taken using it.  It’s pretty good, even in these non-optimal lighting conditions.  Previous webcams would only be a blur without a 17 million candlepower halogen lamp melting the face off the subject.

Of course, it’s still a webcam…  This thing won’t hold a candle to my Canon SX10.  But still, for a bargain bin impulse buy, I’m impressed with it.

At least, I’m impressed at the moment.  As I use it an discover that it has some annoying limitation, I may change my mind.

November 25, 2009   No Comments

Facial Recognition System

Everyone’s got a webcam, and no one is using them to the full potential that the technology currently possesses.  People generally think of them as nothing more than a camera, allowing them to have a face-to-face conversation with family members a thousand miles away or record videos that will ruin their prospects of becoming respected voices in the conservative movement at some point in the future.  However, their true potential lies in their use as another input device, like a keyboard or mouse or, to some extent, a microphone.  The EyeToy for the PS2 showed that a video camera could be used as a controller and Microsoft’s Project Natal promises to extend that concept.

For this project, however, I don’t plan to try to implement direct control.  Instead, I’m going to focus on passive facial recognition.  I want to make a distinction between face detection and face recognition.  Detection is what your digital camera does, where it says “The thing in this box is likely to be a face”.  It doesn’t say whose face it is, just that it’s a face.  I want to do facial recognition, which will determine whose face it is.  With this, the computer will be able to know who is looking at its screen and adjust what it displays on the screen accordingly.

While I could do something constructive and useful, like using this system to display messages of interest for a specific person on a common display screen, my ultimate goal is the Minority Report model of displaying targeted advertising to people as they walk by my cube.  CHA-CHING!

For this, I’ll be returning to OpenCV, the computer vision library that I used for the Pong Robot.  Just like with the speech command, most of the stuff I need to pull this off has already been written and is just waiting for me to pull it together.  I get the sense that this won’t be as straightforward as the voice stuff and will almost definitely produce more spectacular failures.

November 25, 2009   No Comments

Speech Command Processor

As mentioned before, one of the tasks will be to write a speech-activated command processor.  For this, I do not mean that I’ll be doing a full speech recognition engine capable of dictation.  Instead, I mean that want to do something that will be able to recognize commands, like a “Say ‘one’ for service in Swahili” menu system or telling the computer to perform some operation like “Open the pod bay doors”.

There are two reasons for this limitation:

  1. I don’t need full dictation support for the application I have in mind.
  2. Speech recognition sucks.

Obviously, any form of speech recognition is a highly complex task, involving in-depth knowledge of linguistics and signal processing and all sorts of related things that I know nothing about.  That is why I am very glad that I don’t have to write any of it.  You see, one of the namespaces included in .Net 3.0 was something called System.Speech.Recognition.  It looks like classes in that library will pretty much do everything for me, hopefully making this task dead simple to implement, while seeming really impressive to anyone who doesn’t know about them.

November 25, 2009   No Comments

Crazy Weekend Project 2: Semi-Crazy

As you may recall, back in September, I spent five solid days building a robot that could play Atari 2600 Pong. It wasn’t perfect, but it did beat the computer player in several matches. However, there was significant room for improvement. The motion was too jerky, the trajectory projection algorithm had problems, and the robot was no match for a human player. Over the next five days, I will not be continuing that project.

You see, a couple of weeks ago, I finally bought an XBox 360, so I just don’t have that kind of time to devote to building robots at the moment.  Instead, I’ll be doing something much more practical and limited in scope, and only spend a few hours a day on it.  The rest of the time I’ll be alone in my apartment, immersed in HD gaming glory, like any other sane person would be this weekend.

Now, by “more practical and limited in scope”, I mean that I intend to attempt to build a facial recognition system and voice activated command processor.  The reason for this is plain:  Everyone needs a facial recognition system and voice activated command processor.  What good is a computer without one?  Additionally, these are two of the three necessary pieces that I need in order to fully exploit an HP TouchSmart PC that I got from Haggle.com a few weeks back, which has an integrated webcam and microphone.  The third piece, exploitation of the multi-touch screen, is left as an exercise to the reader.

As with the previous Crazy Project Weekend, I have not done any work in these areas or used any of these technologies prior to the commencement of the Crazy Project Weekend, other than a cursory glance to make sure that I’d have a chance of doing something useful in the timeframe alloted.  Additionally, I will be sharing successes, failures, thoughts, and above all, source code, which, in this case, might actually be useful to other people.

November 24, 2009   No Comments