To prevent miscreants from launching new RSP attacks, we hide the vulnerable URSes and make the rest of the data public which include the IPT search keywords, the IPTs, the IPT contacts of various categories and the messages collected from Telegram.
As shown in the figure above, our methodology consists of three major components. One is to discover IPTs and RSPs through the IPT hunter. Then, given IPTs, an analyzer is further applied to profile IPT categories and extract contact identifiers as embedded in IPTs, through which, a large volume of contacts including instant messaging accounts and websites have been discovered. To reveal what these contacts will redirect a victim to, an IPT infiltrator is designed to automatically visit and profile websites and Telegram accounts that are promoted in IPTs.
For these crawlers, we open-source the runnable codes. And for these models, we open-source the scripts for training and testing as well as the ground truth datasets.
Deploy a crawler to obtain reflected search poisoning data from four search engines: Google, Bing, Baidu, Sogou.
A Random Forest classifier trained with 2,229 positive data and 1,468 negative data to distinguish RSPs from benign URL reflections.
A Random Forest classifier trained with 1,012 positive data and 3,170 negative data to decide whether an IPT segment is a contact segment or not, which is a good search keyword in terms of guiding the search engines and discovering new RSPs/IPTs.
By fine tuning the multilingual BERT model, we build this classifier to classify IPT as either a harmless 'Benign' category or one or more of the 14 illicit services/goods categories.
Taking an IPT as the input, our contact extractor is designed to extract all the embedded contact entities, which is achieved by a contact type classifier and contact entity extractors.
By instrumenting a headless browser, we capture the final landing webpage as a screenshot and save all the network traffic of both HTTP requests and HTTP responses.
Leveraging publicly available Telegram APIs, we can extract the profile of each Telegram account at a weekly pace.
@misc{wu2024reflected,
title={Reflected Search Poisoning for Illicit Promotion},
author={Sangyi Wu and Jialong Xue and Shaoxuan Zhou and Xianghang Mi},
year={2024},
eprint={2404.05320},
archivePrefix={arXiv},
primaryClass={cs.CR}
}