Posts Tagged ‘Design’

Problems with Design Testing

Monday, March 22nd, 2010

My work at OkCupid gave me the so far unique opportunity to help design some research and development methodologies with an enormous user base. Some of my favorite work came from when I and the rest of the frontend team were tasked with improving signup conversions rates, which is the endless obsession of any website with membership. OkCupid had been poking at it for years, but this was the first time we dedicated the whole team exclusively to optimizing signups.

The process is what I imagine any other audience testing process is; research, research, research, put out a few designs, measure the conversion rates of each, take the best performing, theorize as to why it outperformed the others, repeat.

My ending design won that particular round (woot!), but I encountered an interesting problem. My own theories for signup seemed to pan out:

  • Keep each section short, which was key to the okcupid signup since we require eight pieces of information at the outset, and eight items plus captcha on one page can be a bit intimidating.
  • Order the items from simple and general to difficult and personal. I started with gender and orientation, and ended with username, password, email and captcha. The idea is to get people invested in the process with simple questions, so they’re already committed to finishing the form by the time they get to questions where they have to think (username) or cough up info they might be wary about (email).
  • Give them a taste of the site as they go through it. My particular design was nicknamed “adaptive” because once it had gender and orientation, it immediately found user profiles and displayed them alongside the signup form. Once it got to the “what are you looking for?” field, if the user only checked non-dating options (pen pals, friends, activity partners), it would stop promoting user profiles and start promoting the tests, forums, and other time-killers and social features of the site.
  • Keep it light. OkCupid allowed for a healthy sense of humor, and the signup process reflected that. If a person has begun the signup process, they appreciate some tenor in the site’s personality, and the signup process should continue to reflect that. I have a strong negative reaction to sites that have a particular feel, then skimp when they have to put together complex forms. This is deadly to a signup process.

So that all sounded valid, and seemed to pay off. Once the adaptive design started outperforming other trials, I got interested in tweaking it. As any good scientist would in comparing data, I had to start isolating elements: change the wording for a particular question, put three questions on the opening page instead of two, show two matches, three matches, keep promoting users even if the people say they don’t want to date, etc.

I made various tweaks in this vein, but eventually, we were running over a hundred trials, and starting to compete for dwindling segments of daily signups with which to test. With thousands of users to work with, you can make statistical comparisons in a day; with a few dozen, it takes a week, and we just had to move on.

There’s a problem with this kind of testing, which is you’re not testing what my statistics enthusiast friend calls interaction effects. Briefly, the problem is this:

I test three color themes: red, blue, yellow. Red gets the most signups, so I go with a red theme. Then I test three variations on question text: humorless, witty, and absurd. Witty does best. Now I have a witty red theme, and go on to test something else.

What I don’t know is if a humorless green theme would have done better. I don’t even know if witty blue theme would have done better. Basically, I can’t determine if I’m pursuing the best combination because I’m not testing all possible combinations. With three colors and three writing styles, I could easily test the nine possible combinations, but it’s never three and three in real life design scenarios; it’s thousands and thousands and thousands and thousands, and the number of combinations rapidly starts competing with those awesomely huge numbers, like particles in the universe and connections in the brain, people use to express “a whole bunch”. Not even Google in a million years could test it all.

There’s no way around this problem except to try and extrapolate theories from the tests and come up with new hypothesis. This is basic science. With the vast sample groups popular internet sites have to work with today, testing interface and aesthetic design has become more and more of a numbers game, since you can isolate any particular variable and find out what value performs best within the hour. But this method must be balanced with theoretical and more radically experimental approaches, because there’s no way to test the entirety of the staggeringly large set of possible combinations.