-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinct options for every new browser instance #107
Comments
Ok I managed to do this myself. here is the test case: const { Cluster } = require('./dist/index.js');
(async () => {
let browserArgs = [
'--disable-infobars',
'--window-position=0,0',
'--ignore-certifcate-errors',
'--ignore-certifcate-errors-spki-list',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--disable-gpu',
'--window-size=1920x1080',
'--hide-scrollbars',
'--proxy-server=socks5://78.94.172.42:1080',
];
// each new call to workerInstance() will
// left pop() one element from this list
// maxConcurrency should be equal to perBrowserOptions.length
let perBrowserOptions = [
{
headless: false,
ignoreHTTPSErrors: true,
args: browserArgs.concat(['--proxy-server=socks5://78.94.172.42:1080'])
},
{
headless: true,
ignoreHTTPSErrors: true,
args: browserArgs.concat(['--proxy-server=socks5://CENSORED'])
},
];
const cluster = await Cluster.launch({
monitor: true,
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 2,
puppeteerOptions: {
headless: false,
args: browserArgs,
ignoreHTTPSErrors: true,
},
perBrowserOptions: perBrowserOptions
});
// Event handler to be called in case of problems
cluster.on('taskerror', (err, data) => {
console.log(`Error crawling ${data}: ${err.message}`);
});
await cluster.task(async ({ page, data: url }) => {
await page.goto(url, {waitUntil: 'domcontentloaded', timeout: 20000});
const pageTitle = await page.evaluate(() => document.title);
console.log(`Page title of ${url} is ${pageTitle}`);
console.log(await page.content());
});
await cluster.queue('http://ipinfo.io/json');
await cluster.queue('http://ipinfo.io/json');
// many more pages
await cluster.idle();
await cluster.close();
})(); here is the diff:
|
Thank you. The parallelization part of the code will probably be rewritten in the not-too-far future. Then this use case will be easier to implement. Right now it would be possible by using a concurrency implementation, but that is just too complicated right now... // Edit: I hope it's okay I closed this? Otherwise feel free to re-open :) |
Thanks for the great module! Is this possible yet? I need to set a different http proxy per browser instance. Could some kind of event not be fired beforeLaunch or something like that, then we can configure each browser/page instance. This use case would not work for my application as I need to dynamically queue tasks every X mins to the cluster object that are fetched from a server. Thanks |
Thank you for a wonderful module. Am incorporating proxy rotation. NOTE: "It has been said a common issue with Puppeteer is that proxies can only be set at the Browser level, not the Page level, so each Page (browser tab) must use the same proxy. To use different proxies with each page, one shall need to use proxy-chain module. Below is the current code under development. Comments from anyone that has accomplished such is greatly appreciated. httpbin.org/ip is being used to confirm proxy switch.
|
Could you share the modified files from puppeteer cluster that you used to get this working? |
I want for each instance task running, to have another ip ? |
Hi!
First of all: Very beautiful code and software. I should start learning typescript.
Is it possible to pass different options to different browser launches?
As I can see in the concurrency implementation of
CONCURRENCY_BROWSER
insrc/concurrency/built-in/Browser.ts
, every new browser is started with identical options:Would it be possible to pass different options to new launches of browser instances?
I ask because I want to set different
--proxy-server=some-proxy
flags to new browser launches.Thanks for viewing
The text was updated successfully, but these errors were encountered: