Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web interface / browser plugin for interactive scraper creation #13

Open
blahah opened this issue Jul 18, 2014 · 2 comments
Open

Web interface / browser plugin for interactive scraper creation #13

blahah opened this issue Jul 18, 2014 · 2 comments

Comments

@blahah
Copy link
Member

blahah commented Jul 18, 2014

Take ideas from

  • PeerLibrary

cc @mitar

@mitar
Copy link

mitar commented Jul 20, 2014

So the idea is to use a similar process that we use for annotation to let user highlight parts of a page and then we can store that as an open annotation standard targets. But instead of attaching annotations, we would use it to extract data.

This could be then integrated with nice user interface, maybe reusing parts of this feedback tool, or simply Annotator.

@mitar
Copy link

mitar commented Jul 20, 2014

So this is related also to the question how to define scrappers. I would advise using Xpath as only one of available options. I think storing also other information similar to open annotation standard would be helpful:

  • offsets in the page
  • prefix/suffix + regex to match the content (instead of direct quote as used in annotations)
  • xpath
  • DOM path (my addition)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants