Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU is high #9

Closed
sankxuan opened this issue Mar 13, 2017 · 18 comments
Closed

CPU is high #9

sankxuan opened this issue Mar 13, 2017 · 18 comments

Comments

@sankxuan
Copy link

thank you html parser , it's a very smart tool like jsoup. but i found when i use it , the cpu is very high.

I compare it with other tool base on libxml2 (like Fuzi, kanna).i test in my project , fuzzy and kanna use 10%-15% cpu when parse one web site, but swiftsoup is 60% and more.

but I like this project because of liking jsoup too.

@scinfu
Copy link
Owner

scinfu commented Mar 13, 2017

Thanks for the tip.
Indeed I never did comparative tests speed with the other libraries.
As soon as I can I'll put hand to lighten the library.

@scinfu
Copy link
Owner

scinfu commented Mar 20, 2017

Can you share your test projects ?

@sankxuan
Copy link
Author

hello,

You can compare with other library base on libxml2, and write some test code running many times, like :

https://github.com/cezheng/Fuzi/blob/master/README.md
and
https://github.com/tid-kijyun/Kanna

I think maybe the reason is libxml2 is faster because of performance enhancement in many years....

but I don't like the libraries base on libxml2, they are not good in html and css Selector..

my code is normal usage such as:

// let doc = try SwiftSoup.parse(stringData)
//
// let contentTag = try doc.select((site["chapter"]["content"].stringValue)).first()
// temp = try (contentTag?.html())!

@scinfu
Copy link
Owner

scinfu commented Mar 25, 2017

Try now 1.2.5 , String builder was very slow to concatenate string and swift string get characters count very but very slow.
Now String builder use un array of Characters.
With this change, the performance is improved by 8x.
The gap with Kanna is very small now , tenths of a second with single execution .
In the coming days I will make other changes to improve performance.

@scinfu
Copy link
Owner

scinfu commented Mar 26, 2017

New version available 1.2.6 , 2x compared to 1.2.5

@sankxuan
Copy link
Author

hello, thank your fix

I have test 1.2.8 with my pod: pod 'SwiftSoup', :git => 'https://github.com/scinfu/SwiftSoup.git'

my project code is below:

self.sessionManager.request(book.url).responseData{ response in
                    print("update book \(book.name) start " + formatter.string(from: (Date())))

                    if let data = response.result.value {
                        
                        let baseurl = (response.request?.url?.scheme)! + "://" + (response.request?.url?.host)!
                        
                        let stringdata = Util.dataToString(data: data)
                        let site = book.getSite()

                        let chapterList = self.parseChapter(site: site, baseurl: baseurl, stringdata: stringdata)
                        
                        if chapterList.count > 0 {
                            book.newChapterName = chapterList[chapterList.count - 1].name
                        }
                        
                        if book.chapterList != nil && chapterList.count > (book.chapterList?.count)! {
                            book.isUpdate = true
                            
                        }
                        //book.isUpdate = true
                        
                        if book.isUpdate {
                            DispatchQueue.main.async {
                                self.tableView.reloadData()
                            }
                        }
                        
                    }
                    
                    print("update book \(book.name) end " + formatter.string(from: (Date())))

                }

func parseChapter(site: JSON, baseurl: String, stringdata: String) -> [Chapter] {
        var chapterList = [Chapter]()
        
        do {
            let doc = try SwiftSoup.parse(stringdata)
            
            //let clist =  doc.css(".zjlist4")
            //let ss = site["catalogue"]["list"].stringValue
            let clist =  try doc.select(site["catalogue"]["list"].stringValue)
            
            for element in clist {
                
                let aElement = try element.select(site["catalogue"]["atag"].stringValue).first()

                if aElement != nil {
                    //let baseurl = response.request?.url?.baseURL?.absoluteString
                    
                    let title = try aElement?.text()
                    
                    if title != "" {
                        let chapter = Chapter()
                        chapter.name = title!
                        //chapter.url = url
                        chapterList.append(chapter)
                        
                    }
                    
                }
            }
            
        } catch let error {
            print(error)
        }
        
        return chapterList
        
    }

and result is :
update book 神话2 start 10:04:19.030
update book 神话2 end 10:04:19.171

it cause 140ms
the cpu is 15%-90%

but i test with Fuzi, the code is :

func parseChapter(site: JSON, baseurl: String, stringdata: String) -> [Chapter] {
        var chapterList = [Chapter]()
        
        do {
            let doc = try HTMLDocument(string: stringdata)
            
            //let clist =  doc.css(".zjlist4")
            //let ss = site["catalogue"]["list"].stringValue
            let clist =  doc.css(site["catalogue"]["list"].stringValue)
            
            for element in clist {
                
                let aElement = element.firstChild(tag: site["catalogue"]["atag"].stringValue)

                if aElement != nil {
                    //let baseurl = response.request?.url?.baseURL?.absoluteString
                    
                    let title = aElement?.stringValue
                    
                    if title != nil {
                        let chapter = Chapter()
                        chapter.name = title!
                        //chapter.url = url
                        chapterList.append(chapter)
                        
                    }
                    
                }
            }
            
        } catch let error {
            print(error)
        }
        
        return chapterList
        
    }


the result :
update book 神话2 start 00:12:33.849
update book 神话2 end 00:12:33.857

it cause 10 ms
and the cpu is only 5%-15%

so I think the performance is not enough.

@scinfu
Copy link
Owner

scinfu commented Apr 3, 2017

Thank you, I'm making improvements and test to improve performance.

@lightsprint09
Copy link

consider using final on your classes. Not sure how much performance this gains but could help

@scinfu
Copy link
Owner

scinfu commented Jun 9, 2017

@lightsprint09 thank you I will try this where possible.
Swift is very slow to parse strings, I'm finding where possible a solution to use Unicodescalar array instead Strings.

@0xTim
Copy link
Contributor

0xTim commented Jun 19, 2017

@scinfu String performance should be a lot better in Swift 4. In Vapor we basically just cast everything to an array of bytes when parsing stuff. The UFT8 strictness is a real killer for performance

@scinfu
Copy link
Owner

scinfu commented Jun 29, 2017

@0xTim did you tried Swift4?

@0xTim
Copy link
Contributor

0xTim commented Jun 30, 2017

No I haven't used Swift 4 much yet unfortunately. I also notice you are using Regex a lot, which is quite slow. Take a look at this presentation, which talks about building a parser from scratch, which was about 1000x quicker that a regex parser when they guy who gave it tested it on XML. Might be worth looking into long term, though I appreciate it is a big change!

https://github.com/london-vapor-meetup/event-material/tree/master/2017-05-17/Byte-sized%20-%20low-level%20programming%20with%20Vapor

@scinfu
Copy link
Owner

scinfu commented Jun 30, 2017

@sankxuan with 1.4..2 there is a major performance change in selection css

@IBAction func parse(_ sender: Any) {
        let ss = try? SwiftSoup.parse(html)
        let fu = try? HTMLDocument(string: html)
        
        
        var date = Date()
        for _ in 0...50 {
            //_ = try? SwiftSoup.parse(html)
            _ = try? ss?.select(".zjlist4")
        }
        print(Date().timeIntervalSince(date))
 
        
        date = Date()
        for _ in 0...50 {
            //_ = try? HTMLDocument(string: html)
            _ = fu?.css(".zjlist4")
        }
        print(Date().timeIntervalSince(date))
    }

Before:

// 3.5837669968605
// 0.255009055137634

After:

// 0.898868978023529
// 0.129451990127563

@scinfu
Copy link
Owner

scinfu commented Jun 30, 2017

@0xTim it can be a good idea to use in Regex and CharacterReader.swift.

@onurcelikeng
Copy link

toString() method is not working. (Node -> String) Why?

@tjw0051
Copy link

tjw0051 commented Aug 12, 2017

I'm also having really slow performance when compared to Kanna. I like how similar this is to jSoup as I have to re-write the same code in both but swift soup is unusably slow for my application :(

@scinfu
Copy link
Owner

scinfu commented Sep 7, 2017

I'm doing tests to switch from string to a low-level parsing method.
Branch: Bits , it's very very fast but need more test and bug fix .

@scinfu
Copy link
Owner

scinfu commented Nov 4, 2017

Now String performance is better in Swift 4, the library has gained in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants