17. Evaluation with Users

Posted May 28, 2025 Updated Jun 19, 2025

By Mingyu An

5 min read

Now, test an interface with real users! (종강 전 마지막…)

Users are human beings… not simulator
- Ethics
- Responsibilities!

Types of User testing

Qualitative / Naturalistic
Quantitative / Experimental
Field study (완성도가 높아야 가능)

Participant standpoint

Testing is a distressing experience.
- Pressure to perform
- Feeling inadequacy
- Looking like a fool

Milgrams’ Obedience Experiment

authority figure + social peer group이 70%이상이 영향을 준다는 연구.

→ 근데 윤리적이지가 못함

Deceived participants.
put them under more pressure than many believe was necessary

Was it useful? → Did we anything learned that can be broadly applied? 너 재미로 그냥 호기심에 한거 아냐?

Was it ethical? → 다른 윤리적인 방법이 없었을까? 상황을 다 설명해준다던지. 역할극을 해본다던지.

Treating Subjects with `respect`

Follow human subject protocols

Individual test results should be confidential
Users can stop the test any time
Users are aware the monitoring technique
Their performance will not have implication on their life (승진 금지)
Records will be anonymous

Use standard informed consent formnnnb

Conducting Experiment

Before experiment

Have them read and sign consent form
Explain goal of experiment

During experiment

Stay neutral
Never indicate displeasures (한숨쉬기. 에휴 너무 못쓰네 ㅉㅉ 금지)

After the experiment

Debrief users
Inform users about goal
Answer any questions!

Managing Subjects

Don’t waste users’ time
- Use pilot tests to debug experiments!
- Have everything ready
Make user comfortable
- Keep a relaxed atmosphere
- allow breaks
- pace tasks correctly
Compensation
- Pay them!

Concerns of User Testing

Internal Validity

Observed result by independent variables.
Confidence in our explanation
Usually good in experimental setting
watch for confounding variables

다른 영향 없이, 해당 실험에서 수행 능력이 independent variable (p

External Validity

다른 환경, 다른 대상에게도 적용되는거야?

Generalizability.
confidence that results applies in real situation

→ These two has trade-off.

Reliability

이 실험 똑같은 상황에서 반복해도 똑같은 결과가 나올까?

Considerations on…

Internal Validity

Ordering effect
- X먼저? Y 먼저?
- Learning effect!
- Get tired…
Selection Bias
- 무작위로 고른 줄 알았는데, 나눈 집단에 bias가 있을 수 있음
Experiment Bias
- 실험 수행자가 자기가 원하는 결과로 해석하는 경향성

→ Double-blind experiment (내 주제가 뭔지 모르게 conductor를 고용)

External Validity

Population
- Target population을 잘 반영할만한 사람인가?
Ecological Validity
- Real world와 환경 세팅이 얼마나 같은가?
Training validity
- 너무 튜토리얼을 많이 알려준거 아닌가?
Task validity
- 실험에서 진행한 task들이 실제로 사람들의 활동을 대표하는가?

Qualitative Evaluation

The raw data is non-numeric data.

Observations, video
Open-ended interviews
Narrative, textual description

→ We should focus on how good it is, richness and depth of data. (not reduction to numbers)

Grounded Theory approach

Data-driven method for building theory from qualitative data
Aim: generate new theory grounded in the data itself

Usability Study - Qualitative

목적

Understand the user’s perception

Emphasize the users’ ability to use the system

방법

Introspection (cognitive walkthrough)
Direct observation
Interviews and questionnaires

일단 task를 만든다.

End goals
Specific and realistic
Doable
Not too long

Cognitive walkthrough (Introspection)

Designer tries the system out (without users)

Completely Subjective
Designer is non-typical user

Direct Observation

Observing users interacting with system

Good for identifying gross design/interface problems

→ need to code

Three approaches

Simple observation
Think-aloud
Constructive interaction

Simple observation
- Evaluator observes!
- Drawbacks
  - No insight into the user’s decision process or attitude
Think aloud
- Subject asked to say what they are thinking
- Widely used
- Drawbacks
  - Awkward for subject, not natural
  - Thinking about it may alter the way people perform their task
  - Hard to talk when they are concentrating.
Constructive Interaction Method
- Two people work together on a task
- Normal conversation between the two user is monitored (less distortion)
  - Removes awkwardness of think-aloud
- Co-discovery learning
  - Use coach and naive subject together
  - Make naive subject use the interface
- Drawbacks
  - Need good team!

Interviews

Pick the right population
Be prepared
Probe more deeply on interesting issues (focus on goals)

Pros

Very good at directing next design phase

Cons

Subjective (leading questions)
Time-consuming

Debriefing

Post-observation interviews

Pros

Avoid errorneous reconstruction

Cons

Time-consuming

Questionnaires, Survey

Pick population
Establish purpose
Establish means
Design questionnaires, (with debug)
Deliver

Pros

Can reach a large population
As good as the questions asked (질문이 좋으면 답도 좋다)

Cons

Preparation is expensive
Data collection can be tedious

Closed Question

Supply possible answers

Easy to analyze
Make it more difficult to respondent
Be sure to be specific

→ Make sure to pick odd numbers!

Ex)

Scalar (1~5) (odd number)
Multi-choice (Can be exclusive, or not..)
Ranked choice (Helpful for preference)

Open-ended Questions

Answers in his or her own words

Good for general information
Difficult to analysis
Can complement closed question

So, What is outcome?

High-level effect
- Taks flow problems - 흐름 상의 문제
- Task description problems - 제대로 설명 못한 부분?
- Contextual findings - 한손 사용 등 맥락적인 요소

Pros

Apply to real situation
- Good external validity

Cons

Poor internal validity
- Poor control of independent(predictor) variables
Often subject Data

Lecture Notes, Human Computer Interaction

hci evaluation

This post is licensed under CC BY 4.0 by the author.

Types of User testing

Participant standpoint

Milgrams’ Obedience Experiment

Treating Subjects with respect

Conducting Experiment

Managing Subjects

Concerns of User Testing

Internal Validity

External Validity

Reliability

Considerations on…

Internal Validity

External Validity

Qualitative Evaluation

Grounded Theory approach

Usability Study - Qualitative

Cognitive walkthrough (Introspection)

Direct Observation

Interviews

Debriefing

Questionnaires, Survey

Closed Question

Open-ended Questions

So, What is outcome?

Trending Tags

Treating Subjects with `respect`