-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement request: Support '... Ph. D.' instead of only '... Ph.D.' #43
Comments
Interesting. Are you sure that it used to work that way? I have tried locally with every version back to v0.3.5 and the result from "John Smith, Ph. D." is always the same as v0.3.12. Am I doing something wrong with my local env or maybe you are mistaken about the previous behavior? In general I try to avoid having the parser correct mistakes in the input, just because there are so many potential mistakes and correcting one frequently causes other valid input to not work. It's more important that it work correctly for input with no mistakes. But it would be nice if the parser could be useful a useful tool for that because the reality is that these mistakes sometimes exist in the input. One approach would be to use the Another approach would be to make the parser recognize "Ph d" as a suffix. This would be somewhat difficult because at the moment the first thing the parser does is break up the string on spaces, so "ph" and "d" are in different pieces. Maybe you could do something like with the conjunctions, whenever you find a "ph" by itself connect it to the following piece, i guess only if it's a "d". But it's hard to imagine an agnostic solution that would be helpful for more than just "ph d". Can you think of other similar examples? I feel like ideally I'd like to have the parser do something to make it easy for each developer to handle correcting the input for their particular use case. Not sure the best way to do that though, partly because I know so little about how people actually use this parser. Suggestions welcome. |
No. Something about suffix handling changed in the last release but I may have mis-remembered which test case it was that originally caught my attention.
That sounds wise.
Not really, I think Ed.D. is the only other real example. These errors come up occasionally in older book author data. For example, http://clas.caltech.edu/record/418307?ln=en lists a |
The change in the last release with suffix handling is here: fcd7652 It does pertain to the handling of suffixes after a comma. Now the parser will only consider the name to be in the "Firstname Lastname, Suffix" format if the part before the first comma has more than one piece when split on spaces, the assumption being that "Lastname, Suffix" is not an expected/supported format. Does that break something in your data? |
No. |
In 0.3.12:
HumanName('John Smith, Ph.D.') works as expected, but the common misspelling HumanName('John Smith, Ph. D.'), which incorrectly has a space between Ph. and D., now yields 'Ph. D. John Smith'. Personally I would prefer to go back to 0.3.11's behavior, where it left the misspelled title at the end.
The text was updated successfully, but these errors were encountered: