The usage of '<oov>' is not consistent with the paper #39

plasmashen · 2021-01-04T09:16:43Z

In paper, the importance score of the word is calculated by removing this word, but you use '<oov>' to replace this word to calculate the importance score in
https://github.com/jind11/TextFooler/blob/master/attack_classification.py#L216

Moreover, the '<oov>' will be tokenized into 4 tokens which may have attention affects with other words.
I'm wondering why such nonsensical '<oov>' is used?

jind11 · 2021-01-07T00:48:37Z

hi, I have tested both methods: removing the word or replacing it with "" and the difference is not obvious. is in the vocab so I don't think it can be tokenized into 4 tokens. Let me know if you have more questions.

Youoo1 · 2021-10-20T14:33:38Z

Where is the emdding.npz file, please? Or how is it generated?

jind11 · 2021-10-21T06:24:17Z

The readme file has explained how to obtain the embeddings:
Run the following code to pre-compute the cosine similarity scores between word pairs based on the counter-fitting word embeddings [https://drive.google.com/file/d/1bayGomljWb6HeYDMTDKXrh0HackKtSlx/view].

python comp_cos_sim_mat.py [PATH_TO_COUNTER_FITTING_WORD_EMBEDDINGS]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The usage of '<oov>' is not consistent with the paper #39

The usage of '<oov>' is not consistent with the paper #39

plasmashen commented Jan 4, 2021 •

edited

Loading

jind11 commented Jan 7, 2021

Youoo1 commented Oct 20, 2021

jind11 commented Oct 21, 2021

The usage of '<oov>' is not consistent with the paper #39

The usage of '<oov>' is not consistent with the paper #39

Comments

plasmashen commented Jan 4, 2021 • edited Loading

jind11 commented Jan 7, 2021

Youoo1 commented Oct 20, 2021

jind11 commented Oct 21, 2021

plasmashen commented Jan 4, 2021 •

edited

Loading