We provide access to our preprocessed data (including extracted features) and preprocessing scripts to replicate our setup.
- Conceptual Captions
- Flickr30k
- GQA
- MS COCO
- NLVR2
- RefCOCO (UNC)
- RefCOCO+ (UNC)
- RefCOCOg (UMD)
- SNLI-VE
- VQAv2
We rely on the airsplay/bottom-up-attention
Docker image to extract image features from Faster R-CNN.
This docker file for bottom-up-attention
is available on
docker hub and can be downloaded with:
sudo docker pull airsplay/bottom-up-attention
For more details about the Docker image, see the LXMERT repository. Our scripts assume the pretrained Caffe models to be stored under snap/pretrained/.
Check out the README files for each data set for detailed preprocessing procedures.