What is visual attention?

Visual system Attention Eye movement

The aim of this page is to briefly describe what is visual attention and how does it operate. 

The narrow, high resolution foveal visual field is generally considered the “focus” of our attention, both visual and cognitive. During the day, we scan this visual spotlight all around the environment targeting things like faces, words, images on a package, and a variety of other objects. This process can occur both non-consciously and by conscious direction. What is it that determines what we shine our spotlight upon?

At its most basic, attention is defined as the process by which we select a subset from all of the available information for further processing.


While attention is a rich and diverse field of study in and of itself, a straightforward framework for thinking about attention in the context of visual processes for eye tracking can have great practical value. It is useful to think of attentional selection as an interaction, both competitive and cooperative, between bottom-up and top-down factors. Before we explore these influences, let’s acknowledge the eye’s partner in seeing: the brain.

When we think of the brain in the context of vision, we are generally considering the primary visual cortex and the downstream visual areas along the pathways for the neural processing of vision. In between the eyes and the visual cortex, there are also a number of important structures including the optic nerves, the thalamus, and lateral geniculate nucleus. Discussion of these and other areas, such as the frontal eye fields for coordination of eye movements, are beyond the scope of this workbook. It is worthwhile to note, however, that the “seeing” is not confined solely to the eyes and “thinking” (as visual input processing) is not solely carried out in the brain.

It is generally true, though, that the eyes carry out more basic functions while the areas of the brain along the visual pathways conduct integrative information processing. For example, the identity of an object from the multiplicity of visual cues is the output of the ventral or “what is it” stream’s V1, V2, and V4 visual cortical areas. Simultaneously, visual information travels from V1 along the dorsal or “where is it” stream to the occipital and parietal lobes. This is the pathway where the extraction of spatial relationships between things in the visual field and the planning of bodily movement takes place.

Bottom-up vs Top-down influences on attention

Bottom-up influences have nothing to do with direction or spatial location, per se. Rather, these are factors that are low level, early, and normative. Normative means that unless a subject has a specific condition or impairment such as color blindness, these capabilities can be expected to be inherent to people in general. Low level and early pertain to processes that occur almost as soon as the photons of light from the visual scene hit the retina.

Recall that we likened the retina to the CCD of a digital camera. But, whereas the image sensor in your smartphone only receives and passes on the signal, the retina contains neurons, like those in the brain, that actively extract features of the scene from the incoming patterns of light— for example, light/dark contrast, edge detection, and horizontality/verticality. That this happens before the visual signal even gets to the visual centers of the brain means these preprocessed features are immune to conscious effects.
Visual illusions often reveal the functioning of these processes. See the Zöllner illusion in Figure 3.

Zöllner illusion

Figure 3, Zöllner illusion. The horizontal lines of the top part of the image are parallel just like the lines below. A fact that one could verify by measurement. However, the presence of the short, slanted lines activates low level visual process that contribute to the overall illusion of non-parallelism.

In contrast to the fast, automatic bottom-up processes, there is another set of considerations that is often at play during eye tracking tasks: top-down factors. These influencers of attention are generally high level, cognitive in nature, and individuating. They often involve some consideration, thought, or context-setting for effect. And even if everything in the experimenter’s control is standardized, each subject can react as an individual who brings a lifetime’s worth of history to any task. A few of the common top-down factors are the statement of the task, the test environment/use context, prior knowledge or experience level, and socioeconomic characteristics.

Given these two types of attention-influencing factors, how does the eye tracking researcher apply them in practice? It is useful to think of bottom-up factors as the workspace of the designer. This could be a web or packaging designer, a graphic artist, or a developmental psychology researcher. If you make visual design decisions about test stimuli, and you do so taking into account design principles, you are effectively manipulating bottom-up features of a stimulus to achieve desired ends. Such goals could be to get viewers to notice faster, remember something better, read more, click through more often, react, or decide.

The top-down factors are important concerns of the experimentalist. In addition to task design and formulating instructions, the researcher decides on the gender split, experience level, or ethnicities recruited to test the design manipulations. In other words, do the stimuli (including its bottom-up design features) elicit the expected effect among the population of interest? Often, the response is not the goal itself but rather provides clues to the cognitive processes that are the true target of interest.

So whether stimulus design and experimentation is carried out by one or several resources, a successful eye tracking study ensures that the experiment aligns the considerations from design to the test, and ultimately through to analysis of the data.

About overt and covert attention

While the narrow, high resolution field of foveal (central) vision is most commonly the primary target of study in eye tracking research, visual attention can be spread out over a much wider area. Accordingly, we can think of attention in terms of covert and overt components.

Overt attention is the measurable attention of the eyes and is the 1-2 degree high-resolution central field. It is the gaze point that shows the visual targeting that takes place and is the fundamental data from eye tracking. In contrast to overt attention, we also deploy covert attention or attention of the mind. As the label implies, covert attention is not directly measurable because it involves the attentional spotlight of our minds without deploying the eyes. Thus, our visual behavior, on the whole, is the result of an interplay between covert and overt attention. Covert attention detects objects and locations in the peripheral field of view. This is then followed by an eye movement to that area to bring the spotlight of our overt attention to task of seeing.

There are two important considerations with respect to the division between overt and covert attention. The first is that, although covert attention is not discernable in any given instant, subsequent eye movements can reveal the prior focus of the mind’s attention. For example, while my current measurable gaze might be to the smartphone I’m using, I could be mentally attending to the fact that the person sitting next to me is starting to write in her notebook. A moment later, as I look over toward her, an eye trackerwould then be able to register my gaze on her notebook, the pen she’s using, and the words she is writing.

The second important consideration is that the relationship between one’s overt and covert attention is, to a degree, elastic and subject to manipulation. Neglecting to control aspects of the study that could unintentionally change this relationship can introduce serious experimental confounds. In extreme cases, observed effects could disappear or findings become contaminated. This is why it is crucial to pretest the experimental setting, task prompts, and instructions exhaustively to uncover any undesirable attention- focusing features.

By this point, you have learned about the components of the human visual system: the eye and the visual cortex of the brain. We’ve touched on aspects of the attentional process including bottom-up and top- down factors on attentional selection. We’re now ready to take a look at the interaction between the human side and the machine side of the eye tracking ecosystem.

Recommended Reading

  • Connor, C.E., Egeth, H.E. 2004. Visual Attention: Bottom-Up Versus Top-Down. Current Biology, 14(19), 850-852. http://dx.doi.org/10.1016/j.cub.2004.09.041
  • Carrasco M. 2011.Visual attention: The past 25 years. Vision Research., 51(13), 1484-1525. doi:10.1016/j.visres.2011.04.012.

Related Articles

  • Why do our eyes move?

    The aim of this page is to give a brief introduction to the human visual system, and to briefly explain why we move our eyes and it's relevance to eye tracking. 

    Visual system Human eye Eye movement
  • Types of eye movements

    The aim of this page is top give a brief description of the different main types of eye movements and their function. 

    Visual system Human eye Eye movement
  • What does eye tracking data tell us?

    Eye tracking analysis is based on the important assumption that there is a relationship between fixations, our gaze and what we are thinking about. However, there are a few factors that need to be considered for this assumption to be true which will be discussed in this section. 

    Eye tracking Attention Gaze data