Chapter 13 Progress and recommendations

Near the beginning of this Element, I suggested that five findings about multiple object tracking were particularly important. Now that I’ve explained them and gone through the associated evidence, it’s time to sum up. The five findings are:

  1. The number of moving objects humans can track is limited, but not to a particular number such as four or five. (Section 3)
  2. The number of targets has little effect on spatial interference, whereas it greatly increases temporal interference (Section 5).
  3. Predictability of movement paths benefits tracking only for one or two targets, not for more (Section 6).
  4. Tracking capacity is hemifield specific: capacity nearly doubles when targets are presented in different hemifields (Section 9).
  5. When tracking multiple targets, people often don’t know which target is which, and updating of non-location features is poor (Section 10).

The first theory of multiple object tracking, Pylyshyn’s FINST theory, debuted in the first paper that established that people can actually do the task. Although hundreds of MOT experiments have been published since then, as of this writing, the FINST theory is the only theory mentioned on the Wikipedia page for MOT (Editors 2021). Based on what they write in their papers, many active researchers as well as Wikipedia’s editors do not seem to appreciate how much the main points of FINST theory have been rebutted. Core to the theory was the idea that tracking is mediated by a small set of discrete and pre-attentive indices. As we have seen, however, as object speed increases, the number of targets that can be tracked steadily decreases, to just one target, which doesn’t sit well with a fixed set of indices (G. A. Alvarez and Franconeri 2007; Alex O. Holcombe and Chen 2012; see also Brian J. Scholl 2009). Instead, it suggests that tracking reflects a more continuous resource that can both be allocated entirely to one or two objects and spread thinly among several objects. However, it could also be explained by a process that has to serially switch among the targets.

Another prediction of FINST theory was that participants would be aware of which target is which among the targets they are tracking. Pylyshyn himself reported evidence against this, to his credit, and the evidence that updating of target identities is poor has increased since then (important finding #5 above). Explaining the dissociation between position updating and non-position maintenance and updating of features is an integral part of two recent theories, by Li, Oksama, and Hyönä (2019) and by Lovett, Bridewell, and Bello (2019). Both concur with FINST theory that position updating happens in parallel, but they suggest that other features of targets are maintained and updated by a process that switches among the targets one-by-one.

Humans’ poor awareness of which monitored object is which has consequences for the quest to explain our cognitive abilities. Our minds can represent structure in a content-independent fashion, such as with language, where syntax involves structure with distinct roles, e.g. using the word “giving” can involve a giver, a recipient, and item. A recent paper suggested that this could be implemented by Pylyshyn’s FINSTs (O’Reilly, Ranganath, and Russin 2022). As we have seen, however, during multiple object tracking the distinct identities of the targets often are not represented, so this approach to explaining cognition may not work.

In positing a serial process for updating of features other than position, Lovett, Bridewell, and Bello (2019) further proposed that the serial process can compute the motion history of a target. This can explain important finding #3, that predictability of motion trajectories yields a measurable advantage only when there are only a few targets (Piers DL Howe and Holcombe 2012; Luu and Howe 2015), because with more targets, the benefit may be too small to be detectable.

In summary, spatial selection appears to occur in parallel, at a hemifield-specific processing stage, with other features subsequently updated and linked in at a visual field-wide, possibly serial process. Some evidence about position updating, however, suggests that it may be more limited-capacity than it appears, which I grapple with in another manuscript (Alex O. Holcombe 2022).

13.1 Recommendations for future work

The MOT paradigm is important not only because of the insights that its findings provide, but also because it has the potential to reveal many more insights about human abilities. MOT’s high test-retest reliability, on the order of .8 or .9, has been found to be the higher than other attentional tasks. High reliability means that MOT results are often highly credible (because with a non-noisy task, less data is needed to have high statistical power) and have high potential for revealing individual differences (Section 11).

The discovery that tracking’s capacity limit reflects two resources, one in each hemisphere, was one of the greatest advances in tracking research, but it’s disappointing how little that discovery has been built upon. Consider, for example, the issue of whether tracking draws on the same mental resources as other tasks. FINST theory proposed that the tracking process is preattentive, but dual-task studies show substantial interference from other tasks (Oksama and Hyönä 2016; Alnaes et al. 2014). Sadly, however, such studies do not seem to have ruled out the possibility that these findings were caused entirely by a process with a capacity of only one object (what I have called System B) rather than the hemifield-specific tracking processes. “Carving nature at its joints”, or dissociating the components of a biological system, is important for scientific progress but can be difficult in psychology (Fodor 1983) — general cognition (System B) can do many different things and thereby contaminate the study of any processing specific to object tracking. Testing for hemifield specificity can help us tease System B apart.

I’d like to see fewer missed opportunities to study what makes tracking distinctive, hence my top recommendations for future research emphasize this point. Those recommendations are:

  • To dilute the influence of capacity-one System B processing (@ref(#Cequals1)), use several targets, not just two or three. But remember that even with several targets, a small effect could be explained by a capacity-one process. Test for hemifield specificity as that can help rule out a capacity-one process.

  • Always test for hemifield specificity! In addition to it helping to rule out a factor having its effect only on a capacity-one process, we know very little about what limited-capacity brain processes are hemisphere-specific, so any results here are likely to be interesting.

  • As we have seen, MOT is a complex task, so it’s difficult to interpret individual differences and predict whether they will translate to other tasks. Individual-difference studies should use task variations that help isolate the component processes that contribute to overall success or failure, such as spatial interference, temporal interference, and cognitive processing.

  • For computational modelling as well, don’t restrict oneself to standard MOT tasks with unconstrained trajectories, as that sort of data may not constrain models very much. Show that a model succeeds at task variations that isolate component processes.

13.2 Omissions

Several topics that I originally planned to cover could not be included here, due to limited space. Some of the most important are the role of retinotopic, spatiotopic, and configural representations in tracking (see (Yantis 1992; Bill et al. 2020; Piers D. L. Howe, Pinto, and Horowitz 2010; Meyerhoff et al. 2015; G. Liu et al. 2005; Maechler, Cavanagh, and Tse 2021)), the role of distractor suppression, the role of surface features (Papenmeier et al. 2014), and the findings from dual-task paradigms. I hope those readers whose favorite topic was left out can take some consolation in the fact that my own favorite, the temporal limits on tracking (Alex O. Holcombe and Chen 2013; Roudaia and Faubert 2017), also was not covered. Because that topic has major implications for what the tracking resource actually does during tracking, and whether processing is serial or parallel, I have a separate manuscript about it (Alex O. Holcombe 2022).


Alnaes, Dag, Markus Handal Sneve, Thomas Espeseth, Steven Harry Pieter, and Bruno Laeng. 2014. “Pupil Size Signals Mental Effort Deployed During Multiple Object Tracking and Predicts Brain Activity in the Dorsal Attention Network and the Locus Coeruleus.” Journal of Vision 14: 1–20.
Alvarez, G A, and S L Franconeri. 2007. “How Many Objects Can You Track? Evidence for a Resource-Limited Attentive Tracking Mechanism.” Journal of Vision 7 (13): 1–10.
Bill, Johannes, Hrag Pailian, Samuel J. Gershman, and Jan Drugowitsch. 2020. “Hierarchical Structure Is Employed by Humans During Visual Motion Perception.” Proceedings of the National Academy of Sciences 117 (39): 24581–89.
Editors, Wikipedia. 2021. “Multiple Object Tracking.” Wikipedia, September.
Fodor, Jerry A. 1983. The Modularity of Mind. Cambridge, Mass: MIT Press.
———. 2022. “Object Separation in Time Imposes Severe Constraints on Multiple Object Tracking.”
Holcombe, Alex O., and Wei-Ying Chen. 2012. “Exhausting Attentional Tracking Resources with a Single Fast-Moving Object.” Cognition 123 (2).
Holcombe, Alex O, and Wei-ying Chen. 2013. “Splitting Attention Reduces Temporal Resolution from 7 Hz for Tracking One Object to \(<\)3 Hz When Tracking Three.” Journal of Vision 13 (1): 1–19.
Howe, Piers D L, Yair Pinto, and Todd S Horowitz. 2010. “The Coordinate Systems Used in Visual Tracking.” Vision Research 50 (23): 2375–80.
Howe, Piers DL, and Alex O. Holcombe. 2012. “Motion Information Is Sometimes Used as an Aid to the Visual Tracking of Objects.” Journal of Vision 12 (13): 1–10.
Li, Jie, Lauri Oksama, and Jukka Hyönä. 2019. “Model of Multiple Identity Tracking (MOMIT) 2.0: Resolving the Serial Vs. Parallel Controversy in Tracking.” Cognition 182 (January): 260–74.
Liu, Geniva, Erin L Austen, Kellogg S Booth, Brian D Fisher, Ritchie Argue, Mark I Rempel, and James T Enns. 2005. “Multiple-Object Tracking Is Based on Scene, Not Retinal, Coordinates.” Journal of Experimental Psychology. Human Perception and Performance 31 (2): 235–47.
Lovett, Andrew, Will Bridewell, and Paul Bello. 2019. “Selection Enables Enhancement: An Integrated Model of Object Tracking.” Journal of Vision 19 (14): 23.
Luu, Tina, and Piers D. L. Howe. 2015. “Extrapolation Occurs in Multiple Object Tracking When Eye Movements Are Controlled.” Attention, Perception, & Psychophysics 77: 1919–29.
Maechler, Marvin R., Patrick Cavanagh, and Peter U. Tse. 2021. “Attentional Tracking Takes Place over Perceived Rather Than Veridical Positions.” Attention, Perception, & Psychophysics 83: 1455–62.
Meyerhoff, Hauke S., Frank Papenmeier, Georg Jahn, and Markus Huff. 2015. “Distractor Locations Influence Multiple Object Tracking Beyond Interobject Spacing: Evidence From Equidistant Distractor Displacements.” Experimental Psychology 62 (3): 170–80.
O’Reilly, Randall C., Charan Ranganath, and Jacob L. Russin. 2022. “The Structure of Systematicity in the Brain.” Current Directions in Psychological Science 31 (2): 124–30.
———. 2016. “Position Tracking and Identity Tracking Are Separate Systems: Evidence from Eye Movements.” Cognition 146: 393–409.
Papenmeier, Frank, Hauke S. Meyerhoff, Georg Jahn, and Markus Huff. 2014. “Tracking by Location and Features: Object Correspondence Across Spatiotemporal Discontinuities During Multiple Object Tracking.” Journal of Experimental Psychology: Human Perception and Performance 40 (1): 159.
Roudaia, Eugenie, and Jocelyn Faubert. 2017. “Different Effects of Aging and Gender on the Temporal Resolution in Attentional Tracking.” Journal of Vision 17 (11): 1.
Scholl, Brian J. 2009. “What Have We Learned about Attention from Multiple-Object Tracking ( and Vice Versa )?” In Computation, Cognition, and Pylyshyn, 49–78. MIT Press.
Yantis, S. 1992. Multielement Visual Tracking: Attention and Perceptual Organization. Cognitive Psychology 24 (3): 295–340.