Inconsistent behavior when different seeds are initialized at evaluations time #3

marcoromanelli-github · 2023-07-18T21:46:37Z

Thank you for your work and code!

After running the command

python train.py --dataset cifar10 --target_label 0 --gpu 0

we have tried to evaluate the performance of your detector with

python Beatrix.py --dataset cifar10 --gpu 0

limiting ourselves to only checking the effect of poisoning label 0.

In particular, we have changed this code to the following

if __name__ == "__main__":
    for seed_ in range (10):
        print('-'*50+'seed:', seed_)
        seed = seed_
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        np.random.seed(seed)
        random.seed(seed)
        torch.backends.cudnn.deterministic = True

        opt = config.get_argument().parse_args()
        os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu
        for k in range(1):  # range(10):
            main(k)

to study the effect of different seeds on the performance.

From the attached log file, we have noticed that for some seeds, namely [3, 5, 7, 9] the value of the anomaly index for the target class 0 is not the highest.
Moreover, for some seeds, namely [0, 2, 3, 5, 7, 9], the anomaly index for class 0 appears to be below the threshold $e^2$ reported in the paper, resulting in missed detections.

These phenomena seem to appear more often than we expected.
Could you help interpreting this, and suggest what to change in case we are doing something wrong?

The text was updated successfully, but these errors were encountered:

wanlunsec · 2023-07-19T04:30:23Z

Hi Marco,

Thank you for reaching out and for your interest in our work.

In the detection evaluation, the randomness exists in the "shuffle" method:

Beatrix/defenses/Beatrix/Beatrix.py

Line 432 in 685827e

    
           (clean_feature,bd_feature,ori_label,bd_label) = shuffle(clean_feature,bd_feature,ori_label,bd_label)

You may be able to get more stable detection results under different random seeds, if you increase the available clean data in detection:

Beatrix/defenses/Beatrix/Beatrix.py

Line 415 in 685827e

self.clean_data_perclass = 30

marcoromanelli-github · 2023-07-19T15:17:19Z

Thanks for your answer. Indeed, we arrived to the same conclusion by replacing 30 with 300.

However, at this point, my questions are:

How can we obtain the results you published in tables 5 and 6?
Were multiple runs with different seeds performed to produce these results?

wanlunsec · 2023-07-20T02:11:36Z

Hi Marco,

Thanks for your questions.

Following previous works, we trained multiple of backdoored models with different infected labels to conduct comparison experiments in Table 5 and 6.
Just like the other baseline methods, we conducted multiple experiments with different backdoored models instead of different seeds.

marcoromanelli-github · 2023-07-20T02:57:01Z

Thanks for your answer.
However we still have doubts on how to reproduce these results.

We trained multiple of backdoored models with different infected labels to conduct comparison experiments in Table 5 and 6

We understand this point and we did the same too. However we couldn’t obtain the same results.

What seed(s) did you use in your experiments?
In light of your previous answer, were these values certainly obtained with only 30 clean samples?

wanlunsec · 2023-07-20T03:42:29Z

In the experiments, we used 30 clean samples per class for backdoor detection. And as shown in the implementation, we did not set "random seed".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior when different seeds are initialized at evaluations time #3

Inconsistent behavior when different seeds are initialized at evaluations time #3

marcoromanelli-github commented Jul 18, 2023

wanlunsec commented Jul 19, 2023

marcoromanelli-github commented Jul 19, 2023

wanlunsec commented Jul 20, 2023

marcoromanelli-github commented Jul 20, 2023

wanlunsec commented Jul 20, 2023

Inconsistent behavior when different seeds are initialized at evaluations time #3

Inconsistent behavior when different seeds are initialized at evaluations time #3

Comments

marcoromanelli-github commented Jul 18, 2023

wanlunsec commented Jul 19, 2023

marcoromanelli-github commented Jul 19, 2023

wanlunsec commented Jul 20, 2023

marcoromanelli-github commented Jul 20, 2023

wanlunsec commented Jul 20, 2023