Web interface / browser plugin for interactive scraper creation #13

blahah · 2014-07-18T11:03:31Z

Take ideas from

PeerLibrary

mitar · 2014-07-20T07:50:00Z

So the idea is to use a similar process that we use for annotation to let user highlight parts of a page and then we can store that as an open annotation standard targets. But instead of attaching annotations, we would use it to extract data.

This could be then integrated with nice user interface, maybe reusing parts of this feedback tool, or simply Annotator.

mitar · 2014-07-20T07:53:29Z

So this is related also to the question how to define scrappers. I would advise using Xpath as only one of available options. I think storing also other information similar to open annotation standard would be helpful:

offsets in the page
prefix/suffix + regex to match the content (instead of direct quote as used in annotations)
xpath
DOM path (my addition)

blahah added the enhancement label Jul 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web interface / browser plugin for interactive scraper creation #13

Web interface / browser plugin for interactive scraper creation #13

blahah commented Jul 18, 2014

mitar commented Jul 20, 2014

mitar commented Jul 20, 2014

Web interface / browser plugin for interactive scraper creation #13

Web interface / browser plugin for interactive scraper creation #13

Comments

blahah commented Jul 18, 2014

mitar commented Jul 20, 2014

mitar commented Jul 20, 2014