You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So the idea is to use a similar process that we use for annotation to let user highlight parts of a page and then we can store that as an open annotation standard targets. But instead of attaching annotations, we would use it to extract data.
This could be then integrated with nice user interface, maybe reusing parts of this feedback tool, or simply Annotator.
So this is related also to the question how to define scrappers. I would advise using Xpath as only one of available options. I think storing also other information similar to open annotation standard would be helpful:
offsets in the page
prefix/suffix + regex to match the content (instead of direct quote as used in annotations)
Take ideas from
cc @mitar
The text was updated successfully, but these errors were encountered: