Random header image... Refresh for more!

UI Automation: Tricks and Traps

UI Automation and testing can be among the trickiest areas of software testing.  Directly testing an API is relatively easy.  You’ve got functions to call, well defined inputs and outputs.  It’s meant to be used in the way you’re using it when you write your tests.  You can spend most of your time writing real test cases that will generally work correctly with minimal effort.  UI Automation, however, isn’t nearly as friendly.  You’ll sometimes spend hours twisting and tweaking one test case to get it running, and even then it’ll still randomly fail 25% of the time.

A large part of this is due to the fact that a UI is meant to present an interaction model for a human.  It’s not actually meant for another computer program to deal with.  A person is clicking the buttons and typing text in the text boxes and so on.  Allowing a computer to interact with it is usually an afterthought, hacked together using technologies that will work, sometimes, and only if the application programmer followed the rules.  If they’re not using a button, but instead are using something that they’re drawing themselves to look and act like a button, it’s not going to be a button for you and your UI tests aren’t going to be able click it easily.

Another major problem with UI tests is that the user interface frequently changes.  That’s not supposed to be a radio button, it’s supposed to be a check box.  Move that button after the text box.  Make that list box a combo box.  The UI is often the most fluid piece of a software application.  Once the API is in place, your API tests have a decent chance of working version over version, because an API isn’t subject to focus groups or marketing studies.  But it’s very rare to leave the UI untouched between versions.

There’s also a problem of perception regarding what UI tests do.  People often think that since UI automation is testing the UI, that means that it’s covering the look of the UI, as well.  Most of the time, it won’t because it can’t.  It’s very difficult to have automated visual testing.  Sure, you can compare screenshots, but what if the window size changes?  It’ll break if you move a button or box.  Your graphical verification tests will report complete and total failures if you took the screenshots on plain XP and someone later uses Vista with Aero to run them.  Hell, they’ll likely die if you turn font smoothing on or off.  Doing something so fragile is what we testers call “A Waste of Time”.  UI testing generally doesn’t cover the look of the application.  Instead, it verifies the correct functionality of the controls in the application.  It’s possible to have your UI tests reporting a 100% success rate when nothing is shown on the screen.  As long as the controls are accessible in the way you specify in your tests, they’ll run.

So then, what can be done about automated UI testing?  It’s obviously very valuable to have, despite the difficulties.  Here’s a few tips and tricks, as well as some traps to avoid.

Name Everything:

In web applications, you can give elements IDs or names.  In regular Windows apps, you can use SWA or  MSAA to identify things.  At any rate, anything you interact with should be uniquely identifiable in some way.  If you’re a developer, do this.  If you’re a tester, get your devs to do this.  If they refuse, do it for them.  Naming things will tend to make your automation resilient in the face of most general changes.  Bits and pieces can move around, but as long as they’re named the same and work the same way, your test will probably survive.

You don’t have to give a completely unique identifier to absolutely everything.  What I’ve found that tends to work well is giving logical groups a unique id for the current window or page, then naming repeated controls.  Consider, for example, a page of results from a search engine.  You’ll have a search box at the top of the page and at the bottom, and you’ll have multiple sets of results in the middle area.  Give the logical areas unique IDs, like “SearchBoxTop” and “SearchBoxBottom” for the search boxes, and “MainResults”, “AdResultsRight” and “AdResultsTop” for the result sections.  Then, those areas can share names across them.  For instance, I don’t really care that I’m dealing with the top search button or the bottom search button specifically.   All I need at that point is “Button”.   “Button” can be used as a name for fifteen controls on the page, but I already know that it’s the top search button I’m using because I got it in the context of SearchBoxTop. 

Turn UI Testing Into API Testing.  Sort Of…:

I’ve seen UI test code that’s an unreadable mess of copied and pasted bits to extract controls or elements followed by copied and pasted unreadable messes where the controls or elements are fiddled with followed by messes of bits that had been copied and pasted to the point of unreadability which extract results from controls or elements.  In fact, that’s what pretty much any test recorder will spit out at you.  It’s a total nightmare to look at and deal with even on a good day, and if you’re looking at it and dealing with it, chances are it’s not a good day.  Chances are all your tests broke last night and now you have to dig through a hundred separate tests and repair the element extraction code in each one of them, all because your UI developer made a “quick change” from tables to divs in the page layout.  Even though the page looks identical, the entire structure is different now and nothing is going to work.

I mentioned in the intro that API testing was relatively easy, because it’s typically well defined what you’re doing and how things are expected to function.  Things may fail, but usually they’ll fail in somewhat predictable ways.  Well, the best way I’ve found to make UI testing easier is to make it closer to API testing.  Wrap the code that interacts with the UI that you’re testing in classes and functions that behave somewhat predictably and expose the bits and pieces of the UI in ways that make sense.1  I prefer to create a class with ordinary properties or methods that operate on a web page or dialog or whatever.  Going back to the web search example, you’ll have a page with a text box and a button next to it.  That translates to a simple class along these lines:2

public class SearchPage
    public string SearchText { get; set; }
    public void ClickSearch();

Then it’s up to the SearchPage class to determine how to find the text box how to click the button, and to deal with all of the nonsense and WTFery that the UI throws at you.  Your test case that needs to do a search then only needs these two lines:

    SearchPage page = new SearchPage();
    page.SearchText = "nuclear manatee seesaw detector";

In that example, it should be clear to anyone looking at the code what’s going on.  It’s not full of element paths and SWA control patterns.  I’m just setting the text of the search box and clicking a button.  Your test usually doesn’t care about the mechanics of getting the textbox filled in or what kind of stupid tricks are required to click the button, and it shouldn’t.  Doing it this way means that it won’t.  And then the next time the devs make a “quick change” that breaks everything, you only have to make a “quick change” yourself to the code of the wrapper classes and everything should be fixed.

Always Have A Plan B.  And A Plan C.  (And D…):

Successful UI Automation often requires hacks.  Not just hacks, but dirty hacks.  If you feel completely clean after writing UI automation, then there’s a good chance your tests won’t actually work.  Start by trying to do everything the “right” way, using the controls provided to you by SWA or the browser DOM or what have you.  They’ll work, most of the time.  Unfortunately, every so often you’ll run into a button or a dialog that just doesn’t behave.  Sometimes there are security measures put in place to prevent automated tasks from doing certain things, for instance downloading files in a browser.  You have to be ready to defeat whatever is thrown in your way.  Remember, you’re dealing with UI elements, so if you have to, you can act like an actual user.  Can’t “click” a button using SWA’s InvokePattern?  Try simulating a mouse click or sending keystrokes to the application (Space or Enter will usually activate a button that has focus).  Hell, if you need to, don’t be afraid to buy a Lego Mindstorms kit and build a robot that can click a physical mouse button for you.

SendKeys is Your Worst Enemy: 

 Available through the Windows API, as well as exposed in the .Net Framework, there’s a function called “SendKeys”.  It lets you send keystrokes to windows.  The application will then respond as if an actual user pressed the keys.  You can use keyboard shortcuts, tab through dialogs, type text into textboxes.  Pretty much anything a user can do from a keyboard, you can do with SendKeys.  It might be tempting to write all of your UI automation using SendKeys, but don’t.  Just don’t.  SendKeys is one of the least reliable and most fragile ways to try to interact with your software.  It won’t survive any kind of change to the interface, and even when it is set up properly, it doesn’t always work right.  Keys will get lost or come early or late, and if the focus changes at all for some reason, you’re screwed.

SendKeys is Your Best Friend: 

When all else fails, SendKeys will get the job done.  I once ran across a pretty normal looking Windows dialog, with normal looking buttons.  Unfortunately, for whatever reason, the dialog refused to respond to any kind of standard attempts to reach it.  I tried SWA first, and although I could find the button I wanted to click, Invoking it did nothing.  So I tried sending the button a Windows Message to tell it that it had been clicked.  Still nothing.  Then I tried setting its focus and sending the Enter key and still nothing.  In the end, what worked was SendKeys(“{Right}{Right}{Enter}”), which selected the button and triggered it completely from the keyboard.  Not a happy solution by any means, but it worked and that’s all that matters.  It’s definitely worth learning its syntax for those obscure cases where you need to hold down ALT for twenty seconds or whatever.3

Beware of “Don’t Show This Dialog Again” and Similar Conditions: 

You know that dialog option.  It’s everywhere and you always check it.  It turns off stupid things like the “Tip of the Day” or warnings about the mean and scary hackers that want to steal your life on the Internet.  And it will come back to bite you when you try to do UI automation.  You’ll write your tests on your machine and they’ll run beautifully.  Then you’ll put them on your automation box and they’ll fall apart because there’s some window or dialog that appears that you had long forgotten about.   You’ll need to alter your test to take into account the possibility that an optional dialog might be there and handle it if it is or move along quickly if it isn’t.  Speaking of which…

Waiting, Waiting, Waiting…: 

Pretty much any piece of UI automation will have some kind of timing dance.  Normal API testing is usually synchronous.  You call a method and are blocked until it returns or have some clear way of waiting for an asynchronous operation to complete.  This is often not the case with UI automation.  After all, you’re trying to run something that doesn’t know about you and doesn’t care about your schedule.  As a human, you click an icon, wait a few seconds, and continue when the application has finished opening.  You can’t just do that with your automated test.  You click, fine, that’s easy.  Then what?  You have to wait, but for how long?  One second?  Two?  What happens if your virus scanner kicked on when the test is running and now it takes ten seconds to open the application?  It’s ridiculous to force your test to wait for ten seconds every time just in case something goes wrong, but it’s equally bad to only wait one second and fail the test one out of ten times when something does go wrong.  The common solution is to poll, looking for something you expect, like a window with a certain title.  Every 100 ms or so, see if the window (or whatever) you’re waiting for is there yet.  But don’t wait forever, because if it doesn’t show up, you don’t want to be stuck.  Use a reasonable timeout that you’re willing to wait before giving up and fail after you reach that point. 

Okay, so you’ve waited for the window to show up, so you can continue with your test.  CRASH!  Well, sure, the window is there, but the control you’re trying to use won’t actually be visible for another 20 ms, so your test dies in a fire.  Watch out for things like that.

Wherever possible, use some indicator within the application itself as a guide for when something is done.  If your app has a status bar that reads “Working” when it’s working and “Done” when it’s done, then watch that status bar text for a change.  If a file is supposed to be written, then look for that file.  You have to be careful, though, don’t always trust the application outright.  As you’ll soon see, the application isn’t always telling you what you really want to know.

My absolute favorite brainbender of a timing issue is dealing with the IE COM object that lets you run browser automation through the IE DOM.  With this COM object, your commands are shipped off to be executed in another process, largely asynchronously.  You call the navigate method to open a web page.  Obviously, since you’re opening a web page, that can take some time, so you make sure that you wait for the page to finish loading before you begin the test.  Your test runs perfectly about 70-80% of the time.  But there’s that remaining chunk where your test reports that it can’t find the page element you’re trying to use.  So, you watch the test run.  It opens the browser and navigates to the page, the element is clearly present on the page you see, yet your test whines that it’s missing and it dies.  WTF?  As far as you can tell, everything is doing exactly what it should be doing except for the failing miserably part somewhere in the middle.  You step through in a debugger, hoping to catch the bug in action, but it works every time.  Here’s where it gets fun:  The IE instance that you’re driving lives in another process and operates on its own time.  You send off an asynchronous request to load a page, then almost immediately thereafter, you ask it if it’s done.  Most of the time, the browser will say “Not yet”, and your test goes to sleep.  But, once in a while, the browser responds, “Yeah, I’m done” on that first request.  You continue, and die because obviously it hasn’t loaded your page yet.  Why is it saying that it has?  Well, you’re not asking if it’s loaded the page you’re looking for.  You’ve asked if it’s done loading.  It says “Sure”, because as far as it’s concerned, it is done…  It’s done loading the LAST page you sent it to.  It hasn’t even started loading the page you just told it to go to.

Debugger == FAIL:

Stepping through your automated UI test case in a debugger is a blueprint for fail.  It won’t work right.  It just won’t.  Your test is happily humming along, driving controls, setting text, having fun, when all of a sudden, a breakpoint is hit.  Your trusty debugger IDE comes to the foreground and you tell it to step to the next line.

Where are you now?

The debugger stole focus.  Does it give focus back to the window you were at before?  Does it give it back to the same control?  When the debugger steps in, it FUBARs the state of your test.  Things might work.  Maybe.  Then again, your test might go completely off the rails and start opening menus and typing things in whatever application you land in.  It could go catastrophically wrong, and while it’s often entertaining to sit back and watch your computer flip out, it usually doesn’t help you solve the original problem.

You can try stepping through an automated test using a debugger, but dust off your Console.WriteLine or printf debugging skills, because there’s a good chance you’ll need them.

Make Sure You Have A UI To Test:

Standard operating prodcedure for automated tests in a Continuous Integration environment is to have some automated process kick off your tests in response to a check-in or a build.  Trouble is, these automated processes typically live as a service or a scheduled task on a machine hidden in a closet that no one ever logs in to.  If no one is logged into a machine, then there’s a good chance that the application you’re trying to test won’t be running in an interactive window station, and if it’s not in an interactive window station, then your application probably won’t have things like, oh, windows.  It’s pretty hard to test a UI when the UI doesn’t exist.  Make sure that you’re running your UI tests somewhere that they they’ll have an interactive window station.  If you can leave a machine unlocked and open all the time, then that’s the easiest thing to do.  Unfortunately, things like “Corporate Computing Security Policies” tend to get in the way of you getting done what you need to do.   If you can’t leave a machine unlocked, then another possible solution is to use a virtual machine.  It’s not as scary to set up a simple VM as it might sound initially4, and it’s possible to have a VM running and unlocked and with a nice shiny interactive window station, even on a physical box that no one is logged in to.

Now, some of you might be thinking of using Remote Desktop to solve your problems, but good luck with that.  I’ve found that any place big enough to have a computing security policy that prohibits unlocked machines also tends to have a computing security policy that will log you out of inactive remote sessions.  Even without a policy, remote desktop sessions tend to log themselves out when they get bored.  And, to top it all off, even if you don’t get logged out, I’ve had problems with UI things over Remote Desktop, so use at your own risk.  You might have better luck than I did…

Beware of Outside Influences:

With direct API or service testing, you’re usually insulated from whatever’s happening on the machine.

“Windows has just updated your computer and it will restart automatically in 3, 2, 1…”

Unfortunately, that’s not the case with UI automation.

“There is an update available for Flash.  Download NOW!”

You’re much more at the mercy of unexpected windows and popups and dialogs.

“This application has encountered an error and will be shut down.”

There’s not much you can do about it.

“You need administrative rights to perform this action.  Allow?”

You can always try to eliminate or tune down the things that you can predict, but there will always be the unexpected willing to come along and bite you.

“You need administrative rights to allow this action to obtain administrative rights to perform this action.  Allow?  Are you REALLY sure this time?”

Basically, your only option is to be defensive.  You can’t always recover from some random dialog or other interference5, but you can make sure that your tests don’t hang forever and at least report that something went wrong.  If you’re looking for a window or a control, don’t look forever.  If it’s not there within a minute, it ain’t coming, so kill the test and move on.  And whenever possible, have your tests take screenshots of unexpected failures.  You’d be amazed how much frustration you’ll avoid if you have a screenshot that clearly shows what went wrong.

Take Screenshots Whenever Possible If Something Goes Wrong:

Yeah, I know I just said that above, but it needed its own headline.

Sometimes It’s Just Plain Flaky

Even when you’ve tailored the environment to be exactly what the test needs, even when you’ve taken care of all the stupid timing issues and dialog interference, even when it should work, sometimes, it just won’t work.  UI testing should always be treated as your sworn enemy because it hates you.  And there’s nothing you can do about it.  A good rule to live by is that if a UI test fails once, run it again, if it fails twice, run it once more, and if it fails a third time in a row, it’s an actual bug in the software.  It is a waste of time to attempt to get UI automation running flawlessly 100% of the time.  Shoot for 90% and call it a day.  You’ll find more bugs in the software if you write 30 slightly imperfect tests than if you spend all that time writing 5 perfect tests.

And When All Else Fails…


  1. And really, if you’re an SDET, you already have been thinking of a solution of some form along these lines.  If not, then give the D in your title back and get the hell out of my pay grade because you have no business calling yourself a Software Development Engineer, in Test or otherwise. []
  2. I actually follow a slightly more complicated model where I have wrapper classes for the controls, too.  For this SearchPage example, I’d actually have something like a “UITextBox” class or interface with a Text property, and a “UIButton” class with a “Click” method and a “Text” property, etc.  This lets me expand the functionality of the controls without having to change the container class.  (For instance, if I need a “Focus()” method on the button, I just add it on my “UIButton” class and it’s accessible on every button I have.)  Additionally, it allows for subclassing/inheritance, so if I have a stupid button that requires keystrokes to press, I can have “StupidButton” derive from UIButton, then make the class return a StupidButton instance, and the test cases are none the wiser. []
  3. Helpful tip:  If you want to send a space, call SendKeys(” “);.  Seems obvious now, but it’s amazing how your mind shuts out that possibility when you’re trying to do it. []
  4. MS gives away Virtual PC and Virtual Server for free, and chances are you have an OS install disc around somewhere, and that’s all you need. []
  5. Keep in mind that interference need not be from the system.  If you’re running on an open machine somewhere, they have a bad habit of being used to check Facebook in the middle of a test run, and that’s not good for your UI driving automation… []


1 Neff { 01.22.10 at 18:56:57 [164] }

This was the most entertaining article in UIA that I have read! I’m just starting to get involved with UI Automation and your article has opened my eyes to many things that I hadn’t thought about. Thanks. All the anecdotes show that you have been in the trenches for a while. Thanks for sharing your experiences.

2 Dean Cornish { 12.10.10 at 05:07:10 [588] }

As a former MS SDET this made me lol.
Also recommend a leaf out of Michael Feather’s book: sensing variables are a must for any automator.

3 Trish Khoo { 12.12.10 at 21:20:20 [264] }

Awesome post. One of the best descriptions of UI automation I’ve seen, with really useful advice. Please write more!

4 A Smattering of Selenium #36B « Official Selenium Blog { 01.11.11 at 05:01:57 [584] }

[…] UI Automation: Tricks and Traps has, well, umm, tricks to try and traps to avoid. Shocking. I know. […]

5 Randy Webb { 01.12.11 at 23:19:52 [347] }

Good informative post. I was also amused by the vitriol you expressed in footnote 1, right after you were so empathetic of people who think it’s “scary” to set up a virtual machine. IT class wars are hilarious :)

6 Alex { 12.16.11 at 00:51:41 [410] }

Really good article. Funny yet very helpful. I’m just a junior in UIAutomation, but this helps me realize what I have to do from now on to avoid so many mistakes. Thanks!

7 Katya { 06.13.13 at 00:38:22 [359] }

So much relevant and helpful! Thank you.

Leave a Comment