Skip to content
This repository was archived by the owner on Aug 30, 2018. It is now read-only.

The behaviour attribute value doesn't specify it's parsing. #8

Open
buckett opened this issue Mar 23, 2015 · 14 comments
Open

The behaviour attribute value doesn't specify it's parsing. #8

buckett opened this issue Mar 23, 2015 · 14 comments

Comments

@buckett
Copy link

buckett commented Mar 23, 2015

There's no specification on how the behaviour attribute's value should parsed. How should strings, URIs and XPath expressions should be quoted.

@buckett
Copy link
Author

buckett commented Mar 23, 2015

When attempting to parse a behaviour="cit(.,'uri://something') it would be good to know how I should parse the arguments.

@sebastianrahtz
Copy link
Member

in tei-pm, there should be a datatype for each parameter of a function. That should deal with this? XPaths are not quoted, strings are.

@buckett
Copy link
Author

buckett commented Mar 24, 2015

For example how is a " escaped in a string? I'm guessing the existing implementation treats the function as an XSLT function and so the parsing rules are the same as XSLT function parsing rules.

@sebastianrahtz
Copy link
Member

um. we have no idea! we don't know how we'd handle that in XSLT.

@buckett
Copy link
Author

buckett commented Mar 24, 2015

So are strings assumed to be XML encoded, so a string of "Hello" said the policeman should be written as "Hello" said the policeman ?

@sebastianrahtz
Copy link
Member

That doesn't help you, because the XML parser expands the entities into
Unicode anyway. I honestly dont know how to deal with this.

On 24 March 2015 at 15:59, Matthew Buckett [email protected] wrote:

So are strings assumed to be XML encoded, so a string of "Hello" said the
policeman should be written as "Hello" said the policeman ?


Reply to this email directly or view it on GitHub
#8 (comment).

Sebastian Rahtz

Director (Research) of Academic IT

University of Oxford IT Services

13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Não sou nada.

Nunca serei nada.

Não posso querer ser nada.

À parte isso, tenho em mim todos os sonhos do mundo.

@buckett
Copy link
Author

buckett commented Mar 24, 2015

This came about because an XPath expression may contain a comma (I think) so I was thinking about how to parse the function to extract out the 2 XPath expressions for alternate(xpath,xpath)

@sebastianrahtz
Copy link
Member

ah. I see where you are going how. and I just met a similar problem.
I just wrote
behaviour="break('page',if (@n) then @n else @FACS)"
and it doesn't look right at all.

I am beginning to think we should change this spec to say that the XPath expression should be passed as a string, i.e. surrounded by quotes. Doesn't help with how to pass quotes, but does deal with the embedded comma.

@Conal-Tuohy
Copy link

I suggest elements should be used instead of attributes (for behaviour and predicate). Otherwise I think
this is going to be a source of endless pain.
EDIT: on the other hand, if this stuff will typically be implemented in XSLT etc then perhaps it makes sense to use attributes, so that encoders are forced to write XPath expressions in a way that will work in XSLT, however awkward it may make certain expressions.

@Conal-Tuohy
Copy link

Since this is XPath 2, we have the codepoints-to-string() function, but it's not pretty.

"concat(codepoints-to-string(34), 'Hello', codepoints-to-string(34), ' said the policeman')"

@sebastianrahtz
Copy link
Member

It's a fair point, Conal. I don't want to change horses mid-race when the problem right now is
checking functionality is there, but after we have a stable 1.0 using attributes, it would be a good idea
to reconsider the choice of using attributes rather then element children.

@martinmueller39
Copy link
Contributor

I've compared the TEI Simple dtd with the DTA schema. Simple is more generous than DTA, but DTA has the following elements that Simple does not allow for:

addName
country
foreName
genName
nameLink
orgName
persName
roleName
surname

Should we include them? I can see three different arguments in favour of doing so. First, DTA has been adopted by CLARIN as its base format. Other things being equal, there is a benefit if a text in that format validates under Simple.

Second, and perhaps more substantively, named entity extraction seems to be the chief, and often the only, thing that people are interested in when they work with texts.

Third, when I showed Simple to the Perseus folks, they were very interested in the processing model but objected to the exclusion of the name elements.

On the minus side, you can just use type attributes for sub specification of names, and Simple may run the risk of no longer being simple. Do we want to slide down that slippery slope?

@tuurma
Copy link
Contributor

tuurma commented Apr 9, 2015

I think we quite consciously have made the decision of excluding 'syntactic
sugar' options for types and subtypes of names, all for the sake of leaving
the editor with precisely one way of encoding things.
To accommodate DTA and other corpora we provided a conversion piece from
'general TEI' to 'Simple TEI' that converts all &co into typed
. Funnily enough I can't find the conversion stylesheet on gitHub
now.

On 8 April 2015 at 15:19, martinmueller39 [email protected] wrote:

I've compared the TEI Simple dtd with the DTA schema. Simple is more
generous than DTA, but DTA has the following elements that Simple does not
allow for:

addName
country
foreName
genName
nameLink
orgName
persName
roleName
surname

Should we include them? I can see three different arguments in favour of
doing so. First, DTA has been adopted by CLARIN as its base format. Other
things being equal, there is a benefit if a text in that format validates
under Simple.

Second, and perhaps more substantively, named entity extraction seems to
be the chief, and often the only, thing that people are interested in when
they work with texts.

Third, when I showed Simple to the Perseus folks, they were very
interested in the processing model but objected to the exclusion of the
name elements.

On the minus side, you can just use type attributes for sub specification
of names, and Simple may run the risk of no longer being simple. Do we want
to slide down that slippery slope?


Reply to this email directly or view it on GitHub
#8 (comment).

@sebastianrahtz
Copy link
Member

the naming thing is hard. we can put back all the specific ones, but then we'd have to remove the generic @type version. would that actually be better? i.e. not to support at all?

the conversion stylesheet is now in the TEI Stylesheets

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants