-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sources memory consumption #279
Comments
Hey @vzamanillo we didn't focus on memory profiling in the past because subfinder is not something we run all the time, mostly it's one time run before you start with your target, but definitely one of the things to improve to make it more mature. Apart from memory consumption improvement, do you also notice an improvement in overall run time (as we can see in the above poc), is it a result of linting work on your side or little improvement because of better memory management? |
Hi @bauthard, there are no significant improvement in overall run time, there is in some cases, but the difference is not so important, in fact, in sources with large response data, such as These improvements in memory consumption are not in the branch of pull request #278, they are changes that I have made based on that branch, but I have them prepared to be able to merge them when #278 comes out (I think it is not time to introduce them in the #278 so as not to increase the cost of the review and because the scope of these changes is different from the changes we are talking about) |
Step by step guide to profile Add
Run
after finished you can see the following message:
Run
|
While doing some memory profiles with
pprof
I've discovered that some sources increase the memory footprint of subfinder in excess ej:waybackarchive
This is because the size of the results is very large and we are using
ioutil.ReadAll(pagesResp.Body)
.After some changes to read the response stream using
bufio.NewReader(pagesResp.Body)
the memory consumption is drastically reduced.It happens in other sources too, especially in those that return
json
and no decoder is used to process it, but all the content is put in memory withioutil.ReadAll (pagesResp.Body)
andsubdomainExtractor
is used withregexp
to match subdomains (ej:threatminer
,threatcrowd
...).It would be nice to avoid using
ioutil.ReadAll (pagesResp.Body)
as long as possible and check the rest of the sources to use thejson
responses correctly.We could do it after merging #278 or we could introduce them directly in that branch.
The text was updated successfully, but these errors were encountered: