Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umlauts are encoded in the xml file #14

Open
binarious opened this issue Oct 8, 2014 · 4 comments
Open

Umlauts are encoded in the xml file #14

binarious opened this issue Oct 8, 2014 · 4 comments
Assignees

Comments

@binarious
Copy link

I'm having a problem with the commits 54d0c42 and 9ceabbf. They encode umlauts to html representations. Why is this function used? I assume to prevent XML injection, but that's covered by DOMDocument (or to be more accurate by libxml's xmlOutputBufferWriteEscape function which is used by libxml's xmlNodeDump which is used by DOMDocument's saveXML.)
The htmlentities usages should be removed. I tested that against PHP 5.3 and 5.4 resulting in correctly written umlauts and escaped html/xml tags.

@JelleM-Congressus
Copy link
Contributor

Had to dig into this a bit (we have just released our python version today, so we were kind of busy), but yes, I think removing this is a good idea. Also umlauts likely won't work, most (all?) banks restrict the character set to Latin-1 (even though the document is UTF-8):

a b c d e f g h i j k l m n op q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
! # $ % * ~ / -? : ; = @ [ \ ] ^ _ { | } ( ) . , ‘ +
Space

@binarious
Copy link
Author

I think so, too. Do you have any reference for the character set restriction on the name values? I read that just the identifying elements (MessageId, PaymentInformationId, InstructionId and EndToEndId) need to be without special characters.

@JelleM-Congressus
Copy link
Contributor

Well, there is a lot of documentation on SEPA (all banks have their own manual). Some saying ALL text should be Latin-1 some saying only the identifying elements. I DO know from experience that special characters are not allowed (even if they are htmlencoded) bij most banks and strongly urge you NOT to use them. Also read something about a german SUPA which does allow it but not quite sure on that one.

@si4dev
Copy link

si4dev commented Jan 11, 2016

This issue might be solves with the use of createTextNode() instead of htmlentities(). As currently with htmlentities it will encode all html advanced characters like umlauts. And with createTextNode it will only catch the XML exceptions and keep the document XML well formed. So it will result in less changes in the string and keep more of it's original.

I also understand the reaction that probably umlauts are not allowed or at least not for all banks. But it seems that for the issue owner binarious it worked without the htmlentities() conversion.

In another project I experienced myself that I couldn't add a string to the dom without replacing the XML reserved characters. Even with the second parameter on createElement(name, value) it will NOT do the XML escaping. It needs a second text node creation with createTextNode() to create a text node with escaping support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants