Ethan Hwang, Hossein Adeli, Wenxuan Guo, Andrew Luo, and Nikolaus Kriegeskorte
NeurIPS 2025
The ensemble model is a combination of many individual models, each trained on a particular subject, encoder layer, and hemisphere. It uses the validation split to generate a model confidence value for each voxel, so it should be evaluated on the test split or unseen data.
The hosted ensemble model includes weights for two runs each for
lhandrhhemispheres- encoder layers 1, 3, 5, 7 from the dino backbone.
The model weights will be automatically downloaded from https://huggingface.co/ehwang/brain_encoder_weights/tree/main and the directory structure will automatically be created. Note that this will use around 11GB of space per subject. If you want to download the files manually, be sure that git lfs is installed.
Alternatively, follow the training instructions below in Reproducing the checkpoints to train the checkpoints yourself. This should point to the checkpoints/ folder. The expected directory structure is: checkpoints/nsd_test/dinov2_q_transformer/schaefer/subj_{subj_num:02}/enc_{enc_layer}/run_{run_num}/{hemi}.
Set up the conda environment for the model:
conda env create -f env/xformers.ymlFollow the example in tutorials/test_wrapper.ipynb.
Set up the conda environment for pycortex plotting:
conda env create -f env/pycortex.ymlconda activate pycortex
python plot_run_results.py --ensemble 1 --subj $SUBJECT --split $SPLITThe following structure is expected: results/schaefer/enc_{"_".join(enc_layers)}_run_{"_".join(run_nums)}/subj_{subj_num:02}/{hemi}_test_corr_avg.npy, for both hemispheres lh and rh. For example, results/schaefer/enc_1_3_5_7_run_1_2/subj_01/lh_test_corr_avg.npy. Plots for the correlation across all voxels and correlation for known ROIs will be saved in the same .
Valid splits include train, val, and test
conda activate pycortex
python plot_run_results.py --subj $SUBJECT --enc_output_layer $layer --run $RUN_IDThe directory structure described in Downloading weights is expected. Plots for the correlation across all voxels and a graph showing correlation for known ROIs will be saved in the run_{run_num} directory.
In utils/args.py, modify the following paths in the default arguments:
--data_dir: Directory containing metadata and neural data files. These are from NSD.--imgs_dir: Directory containing NSD image data. These are from NSD.
Set up the conda environment for the parcellation algorithm:
conda env create -f env/parcel.ymlWe used a kmeans-based parcellation algorithm that generates around 450-500 parcels per hemisphere. To generate the brain parcels, run the following command:
conda activate parcel
python generate_parcels.py --subj $SUBJECT --hemi $HEMI --save_dir /path/to/save/Set up the conda environment for model training and inference.
If you're using slurm, see scripts/train_plot for a bash script to reproduce the checkpoint files.
Otherwise, to train a single model, run:
conda activate xformers
python main.py --subj $SUBJECT --enc_output_layer $layer --run $RUN_ID --hemi $HEMIconda activate xformers
python main.py --parcel_dir ./parcels/nsd_labels --run $RUN_IDSee the BrainDIVE github repo for environment requirements.
python generate_imgs.py --subj ${subj} --hemi ${hemi} --parcel_dir ${parcel_dir} --num_imgs_to_generate ${num_imgs}Prerequisities:
- Predicted NSD activations for the target subject (for cross-subject retrieval)
python brain_encoder_wrapper.py --subject ${subj} --split_subj ${subj_target} --split ${train, test}- BrainDIVE generated images and activation predictions
- ImageNet predictions. Note that if you use any large dataset, you will need to refactor the data with
img_gen/activation_dist.pyso it can be read in a reasonable amount of time (i.e., prerank the top images for each parcel to avoid reading in all files).
cd clip_hypothesis
python preprocess_nsd.py --subject ${subj}
python preprocess_imgnet_gen.py --subject ${subj}Then follow the script in clip_hypothesis/run_test.ipynb.