|
| 1 | +# Okapi: Generalising Better By Making Statistical Matches Match |
| 2 | + |
| 3 | +Official code for the NeurIPS 2022 paper _Okapi: Generalising Better By Making |
| 4 | +Statistical Matches Match_ |
| 5 | + |
| 6 | +> We propose Okapi, a simple, efficient, and general method for robust |
| 7 | +semi-supervised learning based on online statistical matching. Our method uses |
| 8 | +a nearest-neighbours-based matching procedure to generate cross-domain views |
| 9 | +for a consistency loss, while eliminating statistical outliers. In order to |
| 10 | +perform the online matching in a runtime- and memory-efficient way, we |
| 11 | +draw upon the self-supervised literature and combine a memory bank with |
| 12 | +a slow-moving momentum encoder. The consistency loss is applied within |
| 13 | +the feature space, rather than on the predictive distribution, making |
| 14 | +the method agnostic to both the modality and the task in question. We |
| 15 | +experiment on the WILDS 2.0 datasets Sagawa et al., which significantly |
| 16 | +expands the range of modalities, applications, and shifts available for |
| 17 | +studying and benchmarking real-world unsupervised adaptation. Contrary |
| 18 | +to Sagawa et al., we show that it is in fact possible to leverage |
| 19 | +additional unlabelled data to improve upon empirical risk minimisation |
| 20 | +(ERM) results with the right method. Our method outperforms the |
| 21 | +baseline methods in terms of out-of-distribution (OOD) generalisation |
| 22 | +on the iWildCam (a multi-class classification task) and PovertyMap (a |
| 23 | +regression task) image datasets as well as the CivilComments (a binary |
| 24 | +classification task) text dataset. Furthermore, from a qualitative |
| 25 | +perspective, we show the matches obtained from the learned encoder are |
| 26 | +strongly semantically related. |
| 27 | + |
| 28 | +## Requirements |
| 29 | +- python >=3.9 |
| 30 | +- [poetry](https://python-poetry.org/) |
| 31 | +- CUDA >=11.3 (if installing with ``install.sh``) |
| 32 | + |
| 33 | +## Installation |
| 34 | +We use [poetry](https://python-poetry.org/) for dependency management, |
| 35 | +installation of which is a prerequisite for installation of the python |
| 36 | +dependencies. With poetry installed, the dependencies can then be installed by |
| 37 | +running ``install.sh``, contingent on CUDA >=11.3 being installed if installing |
| 38 | +to a CUDA-equipped machine. This constraint can be bypassed by manually |
| 39 | +excuting the commands: |
| 40 | +- ``poetry install`` |
| 41 | +- install the appropriate version of Pytorch and ``torch-scatter`` (required |
| 42 | + for evaluation with [WILDS](https://github.com/p-lambda/wilds)) for the |
| 43 | + version of CUDA installed on your machine. |
| 44 | + |
| 45 | +## Running the code |
| 46 | +We use [hydra](https://github.com/facebookresearch/hydra) for managing the |
| 47 | +configuration of our experiments. Experiment configurations are grouped by |
| 48 | +dataset in ``external_confs/experiments`` and can be imported via the |
| 49 | +commandline with the command ``python main.py +experiment={dataset}/{method}``; |
| 50 | +one can then override any desired configs/arguments with the syntax |
| 51 | +``{config}={name_of_config_file}`` or ``{config}.{attribute}={value}`` |
| 52 | +(e.g.``seed=42`` (defined in the main config class), ``backbone=iw/rn50``, |
| 53 | +``alglr.=1.e-5``). |
| 54 | + |
| 55 | + |
| 56 | +## Citation |
| 57 | +``` |
| 58 | +@article{bartlett2022okapi, |
| 59 | + title={Okapi: Generalising Better by Making Statistical Matches Match}, |
| 60 | + author={Bartlett, Myles and Romiti, Sara and Sharmanska, Viktoriia and Quadrianto, Novi}, |
| 61 | + journal={Advances in neural information processing systems}, |
| 62 | + volume={35}, |
| 63 | + year={2022} |
| 64 | +} |
0 commit comments