Twint-Distributed

No long supported

I have many problems with twint. I decided to stop developing the library. If you liked my solution, maybe you will be interested in my library - https://github.com/markowanga/stweet.

Description

Sometimes there is a need to scrap many enormous tweet data in short time. This project help to do this task. Solution is based on Twint — popular tool to scrap twitter data.

Main concepts

Prepare architecture of microservices, which is scalable and can be distributed for many machines
Divide single scrap tasks for small task
Support that wne worker have error and the elementary task can be repeated on other instance
Workaround twitter limit, which disallow to download many data from one ip address
All data are gathered into one location
Use docker whenever possible

How it works

User add commands to scrap by HTTP request
As a request result, server add commands to RabbitMQ for scrap data, the time bounds can be divided for small intervals
Workers get the messages from RabbitMQ to scrap data — they do this job
When elementary task has been finished the data is upload to server
Server save all received data to central storage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Twint-Distributed

No long supported

Description

Main concepts

How it works

Files

README.md

Latest commit

History

README.md

File metadata and controls

Twint-Distributed

No long supported

Description

Main concepts

How it works