Skip to content

Commit

Permalink
Merge pull request #555 from apifytech/refactoring/session-pool
Browse files Browse the repository at this point in the history
Refactoring/session pool
  • Loading branch information
mnmkng authored Jan 17, 2020
2 parents 311d242 + bc1a039 commit 342c727
Show file tree
Hide file tree
Showing 6 changed files with 53 additions and 102 deletions.
74 changes: 15 additions & 59 deletions docs/api/Session.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,10 @@ internal state can be enriched with custom user data for example some authorizat
- [`.retire()`](#Session+retire)
- [`.markBad()`](#Session+markBad)
- [`.checkStatus(statusCode)`](#Session+checkStatus)`boolean`
- [`.putResponse(response)`](#Session+putResponse)
- [`.putPuppeteerCookies(puppeteerCookies, url)`](#Session+putPuppeteerCookies)
- [`.setCookies(cookies, url)`](#Session+setCookies)
- [`.getCookies(url)`](#Session+getCookies)`Array<Object>`
- [`.setCookiesFromResponse(response)`](#Session+setCookiesFromResponse)
- [`.setPuppeteerCookies(cookies, url)`](#Session+setPuppeteerCookies)
- [`.getPuppeteerCookies(url)`](#Session+getPuppeteerCookies)`Array<Object>`
- [`.getCookieString(url)`](#Session+getCookieString)`String`
- [`.getPuppeteerCookies(url)`](#Session+getPuppeteerCookies)`*`

<a name="new_Session_new"></a>

Expand Down Expand Up @@ -178,9 +176,9 @@ Retires session based on status code.
<td colspan="3"><p>HTTP status code</p>
</td></tr></tbody>
</table>
<a name="Session+putResponse"></a>
<a name="Session+setCookiesFromResponse"></a>

## `session.putResponse(response)`
## `session.setCookiesFromResponse(response)`

Sets cookies from response to the cookieJar. Parses cookies from `set-cookie` header and sets them to `Session.cookieJar`.

Expand All @@ -197,36 +195,12 @@ Sets cookies from response to the cookieJar. Parses cookies from `set-cookie` he
<tr>
</tr></tbody>
</table>
<a name="Session+putPuppeteerCookies"></a>
<a name="Session+setPuppeteerCookies"></a>

## `session.putPuppeteerCookies(puppeteerCookies, url)`
## `session.setPuppeteerCookies(cookies, url)`

Persists puppeteer cookies to session for reuse.

<table>
<thead>
<tr>
<th>Param</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>puppeteerCookies</code></td>
</tr>
<tr>
<td colspan="3"><p>cookie from puppeteer <code>page.cookies</code> method.</p>
</td></tr><tr>
<td><code>url</code></td>
</tr>
<tr>
<td colspan="3"><p>Loaded url from page function.</p>
</td></tr></tbody>
</table>
<a name="Session+setCookies"></a>

## `session.setCookies(cookies, url)`

Set cookies to session cookieJar. Cookies array should be compatible with tough-cookie.
Set cookies to session cookieJar. Cookies array should be [puppeteer](https://pptr.dev/#?product=Puppeteer&version=v2.0.0&show=api-pagecookiesurls)
cookie compatible.

<table>
<thead>
Expand All @@ -236,7 +210,7 @@ Set cookies to session cookieJar. Cookies array should be compatible with tough-
</thead>
<tbody>
<tr>
<td><code>cookies</code></td><td><code>Array<Cookie></code></td>
<td><code>cookies</code></td><td><code>Array<Object></code></td>
</tr>
<tr>
</tr><tr>
Expand All @@ -245,11 +219,11 @@ Set cookies to session cookieJar. Cookies array should be compatible with tough-
<tr>
</tr></tbody>
</table>
<a name="Session+getCookies"></a>
<a name="Session+getPuppeteerCookies"></a>

## `session.getCookies(url)``Array<Object>`
## `session.getPuppeteerCookies(url)``Array<Object>`

Get cookies. Gets a array of `tough-cookie` Cookie instances.
Gets cookies in puppeteer ready to be used with `page.setCookie`.

<table>
<thead>
Expand All @@ -262,7 +236,8 @@ Get cookies. Gets a array of `tough-cookie` Cookie instances.
<td><code>url</code></td><td><code>String</code></td>
</tr>
<tr>
</tr></tbody>
<td colspan="3"><p>website url. Only cookies stored for this url will be returned</p>
</td></tr></tbody>
</table>
<a name="Session+getCookieString"></a>

Expand All @@ -285,22 +260,3 @@ Wrapper around `tough-cookie` Cookie jar `getCookieString` method.
<tr>
</tr></tbody>
</table>
<a name="Session+getPuppeteerCookies"></a>

## `session.getPuppeteerCookies(url)``*`

Gets cookies in format ready for puppeteer.

<table>
<thead>
<tr>
<th>Param</th><th>Type</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>url</code></td><td><code>String</code></td>
</tr>
<tr>
</tr></tbody>
</table>
2 changes: 1 addition & 1 deletion src/crawlers/cheerio_crawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ class CheerioCrawler {
}

if (this.persistCookiesPerSession) {
session.putResponse(response);
session.setCookiesFromResponse(response);
}

request.loadedUrl = response.url;
Expand Down
4 changes: 2 additions & 2 deletions src/crawlers/puppeteer_crawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ class PuppeteerCrawler {
const browser = page.browser();
session = browser[BROWSER_SESSION_KEY_NAME];

// setting cookies for page
// setting cookies to page
if (this.persistCookiesPerSession) {
await page.setCookie(...session.getPuppeteerCookies(request.url));
}
Expand All @@ -312,7 +312,7 @@ class PuppeteerCrawler {
// save cookies
if (this.persistCookiesPerSession) {
const cookies = await page.cookies(request.loadedUrl);
session.putPuppeteerCookies(cookies, request.loadedUrl);
session.setPuppeteerCookies(cookies, request.loadedUrl);
}

await addTimeoutToPromise(
Expand Down
65 changes: 30 additions & 35 deletions src/session_pool/session.js
Original file line number Diff line number Diff line change
Expand Up @@ -199,48 +199,41 @@ export class Session {
* Parses cookies from `set-cookie` header and sets them to `Session.cookieJar`.
* @param response
*/
putResponse(response) {
setCookiesFromResponse(response) {
try {
const cookies = getCookiesFromResponse(response).filter(c => c);

this.setCookies(cookies, response.url);
this._setCookies(cookies, response.url);
} catch (e) {
// if invalid Cookie header is provided just log the exception.
log.exception(e, 'Session: Could not get cookies from response');
}
}

/**
* Persists puppeteer cookies to session for reuse.
* @param puppeteerCookies - cookie from puppeteer `page.cookies` method.
* @param url - Loaded url from page function.
*/
putPuppeteerCookies(puppeteerCookies, url) {
const cookies = puppeteerCookies.map(puppeteerCookie => this._transformPuppeteerCookie(puppeteerCookie));

this.setCookies(cookies, url);
}

/**
* Set cookies to session cookieJar.
* Cookies array should be compatible with tough-cookie.
* @param cookies {Array<Cookie>}
* Cookies array should be [puppeteer](https://pptr.dev/#?product=Puppeteer&version=v2.0.0&show=api-pagecookiesurls) cookie compatible.
* @param cookies {Array<Object>}
* @param url {String}
*/
setCookies(cookies, url) {
for (const cookie of cookies) {
this.cookieJar.setCookieSync(cookie, url, { ignoreError: false });
setPuppeteerCookies(cookies, url) {
try {
this._setCookies(cookies.map(this._puppeteerCookieToTough), url);
} catch (e) {
// if invalid cookies are provided just log the exception. No need to retry the request automatically.
log.exception(e, 'Session: Could not set cookies in puppeteer format.');
}
}

/**
* Get cookies.
* Gets a array of `tough-cookie` Cookie instances.
* @param url {String}
* Gets cookies in puppeteer ready to be used with `page.setCookie`.
* @param url {String} - website url. Only cookies stored for this url will be returned
* @return {Array<Object>}
*/
getCookies(url) {
return this.cookieJar.getCookiesSync(url);
getPuppeteerCookies(url) {
const cookies = this.cookieJar.getCookiesSync(url);

return cookies.map(this._toughCookieToPuppeteer);
}

/**
Expand All @@ -252,24 +245,14 @@ export class Session {
return this.cookieJar.getCookieStringSync(url, {});
}

/**
* Gets cookies in format ready for puppeteer.
* @param url {String}
* @return {*}
*/
getPuppeteerCookies(url) {
const cookies = this.getCookies(url);
return cookies.map(cookie => this._transformToughCookie(cookie));
}


/**
* Transforms puppeteer cookie to tough-cookie.
* @param puppeteerCookie {Object} - Cookie from puppeteer `page.cookies method.
* @return {Cookie}
* @private
*/
_transformPuppeteerCookie(puppeteerCookie) {
_puppeteerCookieToTough(puppeteerCookie) {
return new Cookie({
key: puppeteerCookie.name,
value: puppeteerCookie.value,
Expand All @@ -287,7 +270,7 @@ export class Session {
* @return {Object} - puppeteer cookie
* @private
*/
_transformToughCookie(toughCookie) {
_toughCookieToPuppeteer(toughCookie) {
return {
name: toughCookie.key,
value: toughCookie.value,
Expand All @@ -298,4 +281,16 @@ export class Session {
httpOnly: toughCookie.httpOnly,
};
}

/**
* Sets cookies.
* @param cookies
* @param url
* @private
*/
_setCookies(cookies, url) {
for (const cookie of cookies) {
this.cookieJar.setCookieSync(cookie, url, { ignoreError: false });
}
}
}
2 changes: 1 addition & 1 deletion src/session_pool/session_utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import { CookieParseError } from './errors';
* @return {undefined|Array}
*/
export const getCookiesFromResponse = (response) => {
const { headers } = response;
const headers = typeof response.headers === 'function' ? response.headers() : response.headers;
const cookieHeader = headers['set-cookie'] || '';

try {
Expand Down
8 changes: 4 additions & 4 deletions test/session_pool/session.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ describe('Session - testing session behaviour ', () => {
let error;

try {
session.putResponse({ headers: { Cookie: 'invaldi*{*{*{*-----***@s' } });
session.setCookiesFromResponse({ headers: { Cookie: 'invaldi*{*{*{*-----***@s' } });
} catch (e) {
error = e;
}
Expand Down Expand Up @@ -164,13 +164,13 @@ describe('Session - testing session behaviour ', () => {
];
const newSession = new Session({ sessionPool: new SessionPool() });
const url = 'https://example.com';
newSession.putResponse({ headers, url });
newSession.setCookiesFromResponse({ headers, url });
let cookies = newSession.getCookieString(url);
expect(cookies).toEqual('CSRF=e8b667; id=a3fWa');

const newCookie = 'ABCD=1231231213; Domain=example.com; Secure';

newSession.putResponse({ headers: { 'set-cookie': newCookie }, url });
newSession.setCookiesFromResponse({ headers: { 'set-cookie': newCookie }, url });
cookies = newSession.getCookieString(url);
expect(cookies).toEqual('CSRF=e8b667; id=a3fWa; ABCD=1231231213');
});
Expand All @@ -185,7 +185,7 @@ describe('Session - testing session behaviour ', () => {
];
const newSession = new Session({ sessionPool: new SessionPool() });
const url = 'https://example.com';
newSession.putResponse({ headers, url });
newSession.setCookiesFromResponse({ headers, url });

const old = newSession.getState();

Expand Down

0 comments on commit 342c727

Please sign in to comment.