Friday, September 25, 2015

How to do cross-validation? Have to partition on the runs?

People sometimes ask why many MVPA studies use leave-one-run-out cross-validation (partitioning on the runs), and if it's valid to set up the cross-validation in other ways, such as leaving out individual trials. Short answer: partitioning on the runs is a sensible default for single-subject analyses, but (as usual with fMRI) other methods are also valid in certain situations.

As I've written before, cross-validation is not a statistical test itself, but rather the technique by which we get the statistic we want to test (see this for a bit more on doing statistical testing). My default example statistic is accuracy from a linear SVM, but these issues apply to other statistics and algorithms as well.

Why is partitioning on the runs a sensible default? A big reason is that scanner runs (also called "sessions") are natural breaks in fMRI datasets: a run is the period in which the scanner starts, takes a bunch of images at intervals of one TR, then stops. Images from within a single scanner run are intrinsically more related to each other than to images from different runs, partly because they were collected closer together in time (so the subject's mental state and physical position should be more similar; single hemodynamic responses can last more than one TR), and because many software packages perform some corrections on each run individually (e.g., SPM motion correction usually aligns each volume to the first image within its run). The fundamental nature of temporal dependency is reflected in the terminology of many analysis programs, such as "chunks" in pyMVPA and "bricks" in afni.

So, since the fMRI data within each person is structured into runs, it makes sense to partition on the runs. It's not necessary to leave-one-run-out, but if you're not leaving some multiple number of runs out for the cross-validation, you should verify that your (not run-based) scheme is ok (more later). By "some multiple number of runs" I mean leaving out two, three, or more runs on each cross-validation fold. For example, if your dataset has 20 short runs you will likely have more stable (and significant) performance if you leave out two or four runs each fold rather than just one (e.g., Permutation Tests for Classification. AI Memo 2003-019). Another common reason to partition with multiple runs is if you're doing some sort of cross-classification, like training on all examples from one scanning day and testing on all examples from another scanning day. With cross-classification there might not be full cross-validation, such as if training on "novel" runs and testing on "practiced" runs is of interest for an experiment, but training on "practiced" runs and testing with "novel" runs is not. There's no problem with these cross-classification schemes, so long as one set of runs is used for training, and another (non-intersecting) set of runs for testing in an experiment-sensible (and ideally a priori) manner.

But it can sometimes be fine to do cross-validation with smaller bits of the dataset than the run, such as leaving out individual trials or blocks. The underlying criteria is the same, though: are there sensible reasons to think that the units you want to use for cross-validation are fairly independent? If your events are separated by less than 10 seconds or so (i.e., HRF duration) it's hard to argue that it'd be ok to put one event in the training set and the event after it into the testing set, since there will be so much "smearing" of information from the first event into the second (and doubly so if the event ordering is not perfectly balanced). But if your events are fairly well-spaced (e.g. a block design with 10 second action blocks separated by 20 seconds of rest), then yes, cross-validating on the blocks might be sensible, particularly if preprocessing is aimed at isolating the signal from individual blocks.

If you think that partitioning on trials or blocks, rather than full runs, is appropriate for your particular dataset and analysis, I suggest you include an explanation of why you think it's ok (e.g., because of the event timing), as well as a demonstration that your partitioning scheme did not positively bias the results. What demonstration is convincing necessarily varies with dataset and hypotheses, but could include "random runs" and control analyses. I described "random runs" in The impact of certain methodological choices ... (2011); the basic idea is to cross-validate on the runs, but then randomly assign the trials to new "runs", and do the analysis again (i.e., so that trials from the same run are mixed into the training and testing sets). If partitioning on the random runs and the real runs produce similar results, then mixing the trials didn't artificially increase performance, and so partitioning other than on the runs (like leaving out individual blocks) is probably ok. Control analyses are classifications that shouldn't work, such as using visual cortex to classify an auditory stimulus. If such negative control analyses aren't negative (can classify) in a particular cross-validation scheme, then that cross-validation scheme shouldn't be trusted (the training and testing sets are not independent, or some other error is occurring). But if the target analysis (e.g., classifying the auditory stimuli with auditory cortex) is successful while the control analysis (e.g., classifying the auditory stimuli with visual cortex) is not, and both use the same cross-validation scheme, then the partitioning was probably ok.

This post has focused on cross-validation when each person is analyzed separately. However, for completeness, I want to mention that another sensible way to partition fMRI data is on the people, such as with leave-one-subject-out cross-validation. Subjects are generally independent (and so suitable "chunks" for cross-validation), unless the design includes an unusual population (e.g., identical twins) or design (e.g., social interactions).

No comments:

Post a Comment