You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Moreover, the '<oov>' will be tokenized into 4 tokens which may have attention affects with other words.
I'm wondering why such nonsensical '<oov>' is used?
The text was updated successfully, but these errors were encountered:
hi, I have tested both methods: removing the word or replacing it with "" and the difference is not obvious. is in the vocab so I don't think it can be tokenized into 4 tokens. Let me know if you have more questions.
The readme file has explained how to obtain the embeddings:
Run the following code to pre-compute the cosine similarity scores between word pairs based on the counter-fitting word embeddings [https://drive.google.com/file/d/1bayGomljWb6HeYDMTDKXrh0HackKtSlx/view].
In paper, the importance score of the word is calculated by removing this word, but you use '<oov>' to replace this word to calculate the importance score in
https://github.com/jind11/TextFooler/blob/master/attack_classification.py#L216
Moreover, the '<oov>' will be tokenized into 4 tokens which may have attention affects with other words.
I'm wondering why such nonsensical '<oov>' is used?
The text was updated successfully, but these errors were encountered: