For machine-learning (ML) scientists to train artificial intelligence (AI) systems and algorithms, they need data. They collect many of the datasets they use to build AI systems from human behaviors and people’s interactions with the technology they use in their everyday lives.
Whether the data comprise a set of liver-disease diagnoses and outcomes, come from a consumer survey on attitudes toward marijuana usage, or derive from active/passive data collection of spoken phrases, AI systems always need training data to ensure their algorithms produce the right outcomes. Such data can frequently be hard to come by. And even once ML scientists have acquired a dataset, how can we be sure that it includes what an AI system needs?
3 Types of Machine Learning
In general, there are three kinds of ML techniques for constructing AI systems, as follows:
supervised learning—In this approach, scientists feed an algorithm a dataset comprising data—for example, labels, text, numbers, or images—then calibrate the algorithm to recognize a certain set of inputs as a particular thing. For instance, imagine feeding an algorithm a set of pictures of dogs, in which each picture exhibits a set of features that correspond to the properties of pictures of dogs. Inputs to the algorithm could also include a number of images that are not dogs—for example, pictures of cats, pigeons, polar bears, pickup trucks, or snow shovels—and the corresponding properties of each of the not-dogs images. Then, based on what the algorithm has learned about classifying images as dog or not dog through the features and properties of images, if you show the algorithm a picture of a dog it’s never seen before, it has the ability to identify that it is, in fact, a picture of a dog. The algorithm is successful when it can accurately recognize an image as a dog and reject images that are not dogs.
unsupervised learning—This approach attempts to find classes of similar objects in a dataset based on each object’s properties. Once scientists give an algorithm a set of inputs that have certain parameters and values, the algorithm attempts to find common features and group them. For example, scientists might feed an algorithm thousands of pictures of flowers with various tags such as color, stem length, or preferred soil. The algorithm is successful if it can group all flowers of the same type.
reinforcement learning—This approach trains an algorithm through a series of positive and negative feedback loops. In old lab studies, behavioral psychologists used feedback loops to train pigeons. Reinforcement learning is also how many pet owners train their animals to follow simple commands such as sit or stay, then reward them with a treat or reprimand them with a no. In the context of machine learning, scientists show an algorithm a series of images, then as the algorithm classifies images—of, say, penguins—they confirm the model when the algorithm properly identifies a penguin and adjust it when the algorithm gets it wrong. When you hear about bots on Twitter that have gone awry, this is typically an example of reinforcement learning where the bots have learned to identify examples incorrectly, even though the system thinks they are correct.
Although all ML techniques are useful and applicable in various contexts, in the remainder of this article, we’ll focus on the role of supervised learning in User Experience.
All Data Are Not Equal
Obtaining good training data is the Achilles heel of many ML scientists. Where does one get this type of data? Surprisingly, there are many sources that provide access to thousands of free data sets. Recently, Google launched a search tool to make finding publicly available databases for ML applications easier. But it is important to note that many of these databases are very esoteric—for example, “Leading Anti-aging Facial Brands in the U.S. Sales 2018.” Nonetheless, data sets are becoming more accessible.
However, many databases that are relevant for ML applications have limitations such as the following:
They might not have precisely what ML researchers are seeking—for example, videos of elderly people crossing a street.
They might not be tagged appropriately or usefully with the metadata that is necessary for ML use.
Other ML researchers might have used them over and over again.
They might not represent a rich, robust sample—for example, a database might not be representative of the population.
They might lack enough examples.
They might not be very clean—for example, they could have lots of missing values.
As many researchers often say: All data are not equal. The inherent assumptions and context that are associated with datasets often get overlooked. If scientists do not give sufficient care to a dataset’s hygiene before plugging it into an ML system, the AI might never learn—or worse, could learn incorrectly, as we described earlier. In cases where the quality of the data may be suspect, it’s difficult to know whether an algorithm’s learning is real or accurate. This is a huge risk.
Knowing what we now know about machine learning and the risks and limitations of datasets, how can we mitigate these risks? The answer involves User Experience.
User Experience and Machine Learning
While not all datasets relate to human behavior, the vast majority of them do. Therefore, understanding the behaviors that the data capture is essential. Over the last decade, we have been engaged by several companies to collect the precise examples and attribute tags that are necessary to train or prove in AI algorithms. (In some cases, there were thousands of samples.) Here are some examples of the samples we’ve worked with:
video samples of people doing indoor and outdoor activities
voice and text samples of doctors and nurses making clinical requests
video samples capturing the presence or absence of people in a room
video and audio samples of people approaching a front door
thumb-print samples from specific populations
Note that none of this data was available publicly. We had to acquire each of the datasets we needed through custom research, with specific intentions and research objectives.
At first glance, the sheer magnitude of the data that it is necessary to collect for ML applications screams non-UX techniques. For many scientists and researchers, the simple answer to this challenge is to use quantitative methods of acquiring data. But our clients who commissioned our projects understood a key shortcoming of these methods: low data integrity. Our project sponsors recognized that the underlying data had to be precise. This is especially true when it’s necessary to carefully consider the nuances of captured human experiences. We needed to collect the behaviors in context and had to observe them—not simply ask for a number on a five-point scale—as is often the case in quantitative data collection.
Capturing behavior is the prerogative of User Experience and requires research rigor and formal protocols. What we learned is that User Experience is uniquely positioned to collect and code these data elements through our research methodologies and expertise in understanding and codifying human behavior.
User Experience Measures Behavior
To measure behavior, follow this process:
Identify the objective. To construct the conditions necessary for capturing user experiences, the first task is to understand what the ML researchers really need. What is the objective? What constitutes a good sample case? How much variability across cases is acceptable? What are the core cases and what are the edge cases? So, if we wanted to get 10,000 pictures of people smiling, is there an objective definition of a smile? Does a wry smile work? With teeth; without teeth? What age ranges of subjects? Gender? Ethnicity? Facial hair or clean shaven? Different hair styles? And so on. Both the in and out cases are components that ML researchers need to clearly define and have all parties agree on.
Collect data. Next, plan for data collection. One of the strengths of UX researchers is the ability to construct and execute large-scale research programs that involve humans. How to collect masses of behavioral data face to face, efficiently, and effectively is beyond the experience and expertise of many ML researchers. In contrast, much of the practice of user research is about setting the conditions necessary to get unbiased data. Being able to recruit participants, obtain facilities, get informed consent, instruct participants, and collect, store, and transmit data is essential. Furthermore, UX researchers can also collect all the metadata necessary and attach that data to the examples for additional support. UX researchers are practiced in the art of sorting, collecting, and categorizing data—as is evidenced by a skillset that includes qualitative coding and the many tools that support these types of analysis.
Do further tagging. After initial data collection, it may be necessary to organize and execute a crowdsourcing program such as Amazon’s Mechanical Turk to further augment the data you’ve collected so far. For instance, if we were to collect voice samples of how a person might order a decaf, skim, extra-hot, triple-shot latte in a noisy coffee shop, there could be several properties that would be of interest for each sample. In such cases, we might employ multiple researchers, or coders, to review each sample, transcribe the samples, and judge each of them for clarity and completeness. These coders would then have to resolve any observed differences to ensure the cleanliness of the coding.
These are just a few of the many reasons why UX researchers are uniquely positioned to help bridge the gap between ML scientists and the collection, interpretation, and usage of human-behavior datasets for incorporation into AI algorithms. The use of User Experience in this domain can help protect us against the limitations of available databases for AI and avoid the use of inconclusive, useless, or incorrect datasets whose limitations might not be obvious—whether the issues are with the data itself or the algorithm. UX researchers are well positioned to help ML scientists collect clean datasets for the training and testing of AI algorithms.
Bob has more 25 years of experience, working in both corporate and academic environments. After a decade doing human factors at telecoms and travel companies, he joined the UX consultancy User Centric, helping it to become the largest, private UX consultancy in the US. He co-founded the UX Alliance in 2005, which is the premiere network of global UX agencies. After selling User Centric to GfK, he continued to lead GfK’s North American UX team until 2016, when he took a brief detour to lead GfK’s Consumer Insights business. In 2018, Bob left GfK and co-founded Bold Insight, where he is Managing Director. He is a frequent presenter at national and international conferences. He is the author of The Handbook of Global User Research, has written dozens of publications, and is the inventor on several patents. Bob is an adjunct professor at Northwestern. He has a PhD in Cognitive and Experimental Psychology from University of Illinois at Urbana-Champaign. Read More
Gavin has 25 years of experience, working in both corporate and academic environments. He founded the UX consultancy User Centric, growing it to become the largest, private UX consultancy in the US. After selling the company, he continued to lead its North American UX team until 2018, which become one of the most profitable business units of its parent organization. Gavin then founded Bold Insight, where he is Managing Director. He is a frequent presenter at national and international conferences. He is the inventor on several patents. He is an adjunct professor at DePaul and Northwestern Universities and a guest lecturer at University of California, San Diego’s Design Lab. Gavin has an MA in Experimental Psychology from Loyola University. Read More