Random header image... Refresh for more!

Posts from — November 2009

reform might be the poetry

Like I said… Speech recognition sucks.

Instead of doing a grammar with choices, like I did before, you can use a DictationGrammar. Free form dictation. Free form, like beatnik poetry. As in it ends up making as much sense as beatnik poetry.

I have no idea what I said, but it certainly was not the following:

into the earth and one half of this room here we are we’re on

After that sentence got derailed, for a fun exercise, I decided to see how the speech recognizer would interpret the words of others, say, for instance, the Constitution:

we have people of the United States or 11 for you, justice for domestic front of the provider, and jail where welfare and secure the blessings of liberty to ourselves and our posterity Jordan’s challenge this constitution the united states of America

And to that, I believe Madison would have said “Here we are, we’re on!”

November 25, 2009   No Comments

And a better example…

Here’s an example of using System.Speech.Recognition to recognize the words “red”, “blue”, and “green”.

using System;
using System.Speech.Recognition;

namespace MathPirate.AlternativeInputDevices.SpeechCommandProcessor
{
    public class SpeechProcessor
    {
        protected SpeechRecognizer Recognizer { get; set; }

        public SpeechProcessor()
        {
            Recognizer = new SpeechRecognizer();

            Recognizer.SpeechRecognized += (sender, e) => Console.WriteLine(e.Result.Text);
            Recognizer.SpeechDetected += (sender, e) => Console.WriteLine("Detected: {0}", e.AudioPosition);
            Recognizer.SpeechHypothesized += (sender, e) => Console.WriteLine("Hypothesis: {0}", e.Result.Text);
            Recognizer.SpeechRecognitionRejected += (sender, e) => Console.WriteLine("Rejected: {0}", e.Result.Text);

            GrammarBuilder builder = new GrammarBuilder();
            builder.Append(new Choices("blue", "green", "red"));
            Recognizer.LoadGrammar(new Grammar(builder));          
        }
    }
}

No, seriously. That’s it. And most of it’s debug writelines that you don’t even need.

November 25, 2009   No Comments

Well, that seems to be working…

Of course, this is completely meaningless without the audio to go along with it, but still…  It had a decently high hit rate for something that was quickly thrown together in an attempt to get something going on.

Say Something
Detected: 00:00:00.5400000
Rejected: blue
Detected: 00:00:01.0300000
Rejected: blue
Detected: 00:00:01.7600000
Rejected: blue
Detected: 00:00:02.7600000
Hypothesis: green
green
Detected: 00:00:04.8900000
Hypothesis: red
red
Detected: 00:00:07.8000000
Hypothesis: blue
blue
Detected: 00:00:10.8700000
blue
Detected: 00:00:15.3700000
Hypothesis: red
Rejected: red
Detected: 00:00:15.7300000
Rejected: blue
Detected: 00:00:18.8700000
Rejected: red
Detected: 00:00:20.4500000
Hypothesis: blue
Rejected: blue
Detected: 00:00:21.0500000
Rejected: blue
Detected: 00:00:21.4100000
Rejected: blue
Detected: 00:00:21.6900000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:31.7400000
Hypothesis: blue
Hypothesis: blue
Rejected: blue
Detected: 00:00:34.4500000
blue
Detected: 00:00:36.4600000
Rejected: blue
Detected: 00:00:38.6400000
Hypothesis: green
green
Detected: 00:00:40.2800000
Hypothesis: blue
Rejected: blue
Detected: 00:00:40.8800000
Rejected: blue
Detected: 00:00:42.6900000
Rejected: red
Detected: 00:00:50.6900000
Hypothesis: red
red
Detected: 00:00:52
Rejected: blue
Detected: 00:00:52.4400000
Hypothesis: blue
blue
Detected: 00:00:53.5600000
Rejected: blue
Detected: 00:00:55.0100000
Hypothesis: green
green
Detected: 00:00:56.4800000
Rejected: blue
Detected: 00:00:58.1300000
red
Detected: 00:00:59.8000000
Rejected: blue
Detected: 00:01:01.3400000
Hypothesis: blue
blue

November 25, 2009   No Comments

Rule #1: The Documentation Must Not Suck

So that’s how you create, initialize, and     a SpeechRecognizer…

WonderfulExamples

Besides the fact that this example is apparently showing something that’s invisible, it’s not even really showing the creation and initialization, either.  Calling your local functions “SetupEventHandlers()” and “LoadInitialGrammars()” isn’t exactly all that helpful to me.  I understand that I need to set up event handlers and load grammars.  THAT’S WHY I’M READING THE DOCUMENTATION:  I know I need to do something, but I don’t know how.  You’ve pretty much shown me how to call the constructor on your class.  I managed to get that bit on my own, remarkably enough.

November 25, 2009   No Comments

Straight From the Source

Here’s where I’ll be checking in the code, as it gets written.

https://mathpirate.net/svn/Projects/AlternativeInputDevices/

November 25, 2009   No Comments

The Hardware

For this project, I’m going to need a microphone and a webcam.  So, why not both in one?

Doesn't Completely Suck

I bought a pair of these cheap-o webcams with mics at Office Depot a few months ago, because they were cheap and small and I thought that maybe I could use them to build a 3D video camera of some sort.  Since that time, they’ve been sitting on the floor next to my computer, under a TV tray holding an Odyssey 2 system that I haven’t put away since I wrote about K.C. Munchkin! and Pac-Man a while back.

I finally opened up one of them today and plugged it in and was fairly amazed that the camera didn’t completely suck.  In fact, it seems to be the best quality webcam I’ve ever bought.  Maybe there’s been a sudden leap in webcam quality over the past few years that I haven’t been aware of, but the previous ones I’d bought, even the one that was supposed to be good, were merely producers of 320×240 blurriness.  This thing can take 1.3 MP pictures that turn out better than the first digital camera I had.

The picture above was taken using it.  It’s pretty good, even in these non-optimal lighting conditions.  Previous webcams would only be a blur without a 17 million candlepower halogen lamp melting the face off the subject.

Of course, it’s still a webcam…  This thing won’t hold a candle to my Canon SX10.  But still, for a bargain bin impulse buy, I’m impressed with it.

At least, I’m impressed at the moment.  As I use it an discover that it has some annoying limitation, I may change my mind.

November 25, 2009   No Comments

Facial Recognition System

Everyone’s got a webcam, and no one is using them to the full potential that the technology currently possesses.  People generally think of them as nothing more than a camera, allowing them to have a face-to-face conversation with family members a thousand miles away or record videos that will ruin their prospects of becoming respected voices in the conservative movement at some point in the future.  However, their true potential lies in their use as another input device, like a keyboard or mouse or, to some extent, a microphone.  The EyeToy for the PS2 showed that a video camera could be used as a controller and Microsoft’s Project Natal promises to extend that concept.

For this project, however, I don’t plan to try to implement direct control.  Instead, I’m going to focus on passive facial recognition.  I want to make a distinction between face detection and face recognition.  Detection is what your digital camera does, where it says “The thing in this box is likely to be a face”.  It doesn’t say whose face it is, just that it’s a face.  I want to do facial recognition, which will determine whose face it is.  With this, the computer will be able to know who is looking at its screen and adjust what it displays on the screen accordingly.

While I could do something constructive and useful, like using this system to display messages of interest for a specific person on a common display screen, my ultimate goal is the Minority Report model of displaying targeted advertising to people as they walk by my cube.  CHA-CHING!

For this, I’ll be returning to OpenCV, the computer vision library that I used for the Pong Robot.  Just like with the speech command, most of the stuff I need to pull this off has already been written and is just waiting for me to pull it together.  I get the sense that this won’t be as straightforward as the voice stuff and will almost definitely produce more spectacular failures.

November 25, 2009   No Comments

Speech Command Processor

As mentioned before, one of the tasks will be to write a speech-activated command processor.  For this, I do not mean that I’ll be doing a full speech recognition engine capable of dictation.  Instead, I mean that want to do something that will be able to recognize commands, like a “Say ‘one’ for service in Swahili” menu system or telling the computer to perform some operation like “Open the pod bay doors”.

There are two reasons for this limitation:

  1. I don’t need full dictation support for the application I have in mind.
  2. Speech recognition sucks.

Obviously, any form of speech recognition is a highly complex task, involving in-depth knowledge of linguistics and signal processing and all sorts of related things that I know nothing about.  That is why I am very glad that I don’t have to write any of it.  You see, one of the namespaces included in .Net 3.0 was something called System.Speech.Recognition.  It looks like classes in that library will pretty much do everything for me, hopefully making this task dead simple to implement, while seeming really impressive to anyone who doesn’t know about them.

November 25, 2009   No Comments

Crazy Weekend Project 2: Semi-Crazy

As you may recall, back in September, I spent five solid days building a robot that could play Atari 2600 Pong. It wasn’t perfect, but it did beat the computer player in several matches. However, there was significant room for improvement. The motion was too jerky, the trajectory projection algorithm had problems, and the robot was no match for a human player. Over the next five days, I will not be continuing that project.

You see, a couple of weeks ago, I finally bought an XBox 360, so I just don’t have that kind of time to devote to building robots at the moment.  Instead, I’ll be doing something much more practical and limited in scope, and only spend a few hours a day on it.  The rest of the time I’ll be alone in my apartment, immersed in HD gaming glory, like any other sane person would be this weekend.

Now, by “more practical and limited in scope”, I mean that I intend to attempt to build a facial recognition system and voice activated command processor.  The reason for this is plain:  Everyone needs a facial recognition system and voice activated command processor.  What good is a computer without one?  Additionally, these are two of the three necessary pieces that I need in order to fully exploit an HP TouchSmart PC that I got from Haggle.com a few weeks back, which has an integrated webcam and microphone.  The third piece, exploitation of the multi-touch screen, is left as an exercise to the reader.

As with the previous Crazy Project Weekend, I have not done any work in these areas or used any of these technologies prior to the commencement of the Crazy Project Weekend, other than a cursory glance to make sure that I’d have a chance of doing something useful in the timeframe alloted.  Additionally, I will be sharing successes, failures, thoughts, and above all, source code, which, in this case, might actually be useful to other people.

November 24, 2009   No Comments

Test-Driven Disaster

When first considering using Test-Driven Development, many people will consult their local tester.  This is, of course, the wrong thing to do, because their local tester doesn’t actually care about Test-Driven Development.  It’s not a testing methodology, it’s a development methodology.  Asking a tester about it because it has the word “Test” in it is just as wrong as asking a bus driver about it because it has the word “Drive” in it.  And if you’re assigning your tester the task of “Test-Driven Development”, you need to stop before you damage something, because you’re doing it wrong.

For those who don’t know, “Test-Driven Development” is an Agile methodology for designing classes and developing code, where you begin with writing an automated test, stubbing out a method as you go, then after you have the test, you implement the method until the test case passes.  What many people seem to miss is that testing is merely a side-effect, it’s not the central goal.  Instead, the goal is to develop code from the top down by looking at how you’re going to use it.  There’s virtually no difference between TDD and stubbing out a low level module while implementing a higher level module.  The approach is the same:  You focus on how you’re going to use the functions you’re writing by actually using them, rather than trying to list all the operations you might need without the context on how they’ll be called.  If you don’t understand that’s what you’re doing, you’re bound to foul it up and hurt something.  You can’t do TDD after you’ve already specced out the method signatures, so don’t even try.

Beyond the rampant misunderstanding regarding what TDD is, my biggest reservation about it is the fact that most people who want to try it don’t know how to write tests.  Bad tests are far worse than no tests.  By strictly adhering to the principles behind TDD, you’ll write five tests and think you have everything covered because you red-green-refactored your way to perfection.   In fact, since TDD and the Wide World of Agile will encourage you to use code coverage you’re guaranteed to have tested everything.  Trouble is, you’ve only covered five specific cases, not the five hundred cases that would be apparent if you actually looked at the problem and nowhere near the five thousand cases your users will throw your way.  Code coverage will only tell you that you’ve executed lines of code, not that they’re correctly executing according to your plan.  By exactly following the process of Test-Driven Development, you’re pretty much assured to write nice shiny code that doesn’t actually work, even though it passes all of your tests.  To successfully write effective unit tests, you have to go beyond the initial requirements (Which are incomplete) and the initial design (Which has changed) and look at the solution you actually implemented to see where the problems are.  In other words, your job isn’t over after “Refactor”, you still need to go back and enhance your initial suite of tests before you’re done.

Take, for example, the requirement that you write a function that takes any two numbers and returns their sums.  In the world of TDD, you start with a test:

[Test]
public void TestSumMethod()
{
    Assert.AreEqual(5, SumMethod(2, 3), "2 + 3 = 5 Check");
}

Okay, so, we have the test.  Now we need the function stub.

public int SumMethod(int a, int b)
{
    return 0;
}

Compile, run tests, and blammo!  Your test failed, as it should, because you haven’t implemented anything yet.  This was the “Red” step.  Now it’s time to go Green.  For that, you implement the function you stubbed out.

public int SumMethod(int a, int b)
{
    return a + b;
}

Run the test again and it’ll pass.  Run code coverage and -look at that- 100%.  Now you refactor, rerun the tests and code coverage, and everything’s green, so you’re done!

Except…  You’re not done.  The requirements said “Any two numbers”.  So, you need more tests.  Does it work with negatives?  Big numbers?  Small numbers?  Any tester worth paying will immediately break your function by trying to pass in 2147483647, 2147483647.  And that’s just the beginning.  What about real numbers?  Can it do Ï€ + e?  And let’s not even get into complex numbers.  During your refactoring, did you change the inputs to a type that includes NaNs and Infinities?

The point is that following Test-Driven Development left you thinking that you had written a function that was adequately tested, when in reality, it was woefully under-tested.  Obviously, this was a simplified example.  The consequences of doing this with some real, worthwhile code could be disastrous.

Now, I like developers writing unit tests (Well, I like it when they write good unit tests, as I’ve already written…).  I don’t care if they come first, last, or during.  Just don’t fool yourself into thinking that practicing TDD will mean that you’ll have all of your unit tests written as part of your red-green-refactor cycle.

All in all, I like what TDD tries to do.  Thinking about code before you write it and writing unit tests are generally good things to do.  What I don’t like is the shiny glowing path to failure that TDD sets out for those who are unprepared.  I have no doubt that certain Agile practitioners can do TDD and do TDD well.  Unfortunately, I don’t think those people live in The Real World.  If you run your own software consulting firm, then great, go off and do things how you want.  But for everyone else, you’re going to get fired if you try to do that sort of thing, because you’re going to do it wrong and waste a lot of time in the process.

November 4, 2009   No Comments