Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sources memory consumption #279

Closed
vzamanillo opened this issue Jul 24, 2020 · 4 comments · Fixed by #281
Closed

Improve sources memory consumption #279

vzamanillo opened this issue Jul 24, 2020 · 4 comments · Fixed by #281
Labels
Priority: Medium This issue may be useful, and needs some attention. Type: Discussion Some ideas need to be planned and disucssed to come to a strategy. Type: Enhancement Most issues will probably ask for additions or changes.

Comments

@vzamanillo
Copy link
Contributor

vzamanillo commented Jul 24, 2020

While doing some memory profiles with pprof I've discovered that some sources increase the memory footprint of subfinder in excess ej: waybackarchive

wayback1

This is because the size of the results is very large and we are using ioutil.ReadAll(pagesResp.Body).

After some changes to read the response stream using bufio.NewReader(pagesResp.Body) the memory consumption is drastically reduced.

wayback2

It happens in other sources too, especially in those that return json and no decoder is used to process it, but all the content is put in memory with ioutil.ReadAll (pagesResp.Body) and subdomainExtractor is used with regexp to match subdomains (ej: threatminer, threatcrowd...).

It would be nice to avoid using ioutil.ReadAll (pagesResp.Body) as long as possible and check the rest of the sources to use the json responses correctly.

We could do it after merging #278 or we could introduce them directly in that branch.

@vzamanillo
Copy link
Contributor Author

First results after some rework, I have excluded github because it takes a long time to finish, but it increases the consumption by only about 5MB and keeps it constant until finished.

mem-test

@ehsandeep
Copy link
Member

Hey @vzamanillo we didn't focus on memory profiling in the past because subfinder is not something we run all the time, mostly it's one time run before you start with your target, but definitely one of the things to improve to make it more mature.

Apart from memory consumption improvement, do you also notice an improvement in overall run time (as we can see in the above poc), is it a result of linting work on your side or little improvement because of better memory management?

@vzamanillo
Copy link
Contributor Author

vzamanillo commented Jul 26, 2020

Hi @bauthard, there are no significant improvement in overall run time, there is in some cases, but the difference is not so important, in fact, in sources with large response data, such as commoncrawl or waybackarchive, it is a few milliseconds slower because the content of the responses is iterated line by line instead of putting everything in memory and processing the data.

These improvements in memory consumption are not in the branch of pull request #278, they are changes that I have made based on that branch, but I have them prepared to be able to merge them when #278 comes out (I think it is not time to introduce them in the #278 so as not to increase the cost of the review and because the scope of these changes is different from the changes we are talking about)

@vzamanillo
Copy link
Contributor Author

vzamanillo commented Jul 27, 2020

Step by step guide to profile golang CPU / Memory.

Add profile package to main.go imports.

import (
	"context"

	"github.com/projectdiscovery/gologger"
	"github.com/pkg/profile"
	"github.com/projectdiscovery/subfinder/pkg/runner"
)

func main() {
	defer profile.Start().Stop() // CPU profiling (default)
       // defer profile.Start(profile.MemProfile).Stop() // Memory profiling
....
}

Run main.go:

# go run main.go -d uber.com -sources alienvault

after finished you can see the following message:

2020/07/27 13:46:24 profile: cpu profiling disabled, /tmp/profile978571390/cpu.pprof

Run pprof and inspect the results (it will open a new browser window):

go tool pprof -http=:8080 /tmp/profile093511175/cpu.pprof

pprofui

freecodecamp pprof guide: https://www.freecodecamp.org/news/how-i-investigated-memory-leaks-in-go-using-pprof-on-a-large-codebase-4bec4325e192/

@ehsandeep ehsandeep added Priority: Medium This issue may be useful, and needs some attention. Type: Discussion Some ideas need to be planned and disucssed to come to a strategy. Type: Enhancement Most issues will probably ask for additions or changes. labels Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium This issue may be useful, and needs some attention. Type: Discussion Some ideas need to be planned and disucssed to come to a strategy. Type: Enhancement Most issues will probably ask for additions or changes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants