Umlauts are encoded in the xml file #14

binarious · 2014-10-08T08:53:10Z

I'm having a problem with the commits 54d0c42 and 9ceabbf. They encode umlauts to html representations. Why is this function used? I assume to prevent XML injection, but that's covered by DOMDocument (or to be more accurate by libxml's xmlOutputBufferWriteEscape function which is used by libxml's xmlNodeDump which is used by DOMDocument's saveXML.)
The htmlentities usages should be removed. I tested that against PHP 5.3 and 5.4 resulting in correctly written umlauts and escaped html/xml tags.

The text was updated successfully, but these errors were encountered:

JelleM-Congressus · 2014-10-08T14:08:32Z

Had to dig into this a bit (we have just released our python version today, so we were kind of busy), but yes, I think removing this is a good idea. Also umlauts likely won't work, most (all?) banks restrict the character set to Latin-1 (even though the document is UTF-8):

a b c d e f g h i j k l m n op q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
! # $ % * ~ / -? : ; = @ [ \ ] ^ _ { | } ( ) . , ‘ +
Space

binarious · 2014-10-08T15:35:06Z

I think so, too. Do you have any reference for the character set restriction on the name values? I read that just the identifying elements (MessageId, PaymentInformationId, InstructionId and EndToEndId) need to be without special characters.

JelleM-Congressus · 2014-10-08T15:47:23Z

Well, there is a lot of documentation on SEPA (all banks have their own manual). Some saying ALL text should be Latin-1 some saying only the identifying elements. I DO know from experience that special characters are not allowed (even if they are htmlencoded) bij most banks and strongly urge you NOT to use them. Also read something about a german SUPA which does allow it but not quite sure on that one.

si4dev · 2016-01-11T08:05:33Z

This issue might be solves with the use of createTextNode() instead of htmlentities(). As currently with htmlentities it will encode all html advanced characters like umlauts. And with createTextNode it will only catch the XML exceptions and keep the document XML well formed. So it will result in less changes in the string and keep more of it's original.

I also understand the reaction that probably umlauts are not allowed or at least not for all banks. But it seems that for the issue owner binarious it worked without the htmlentities() conversion.

In another project I experienced myself that I couldn't add a string to the dom without replacing the XML reserved characters. Even with the second parameter on createElement(name, value) it will NOT do the XML escaping. It needs a second text node creation with createTextNode() to create a text node with escaping support.

JelleM-Congressus self-assigned this Oct 8, 2014

JelleM-Congressus added the bug label Oct 8, 2014

JelleM-Congressus added the wontfix label Oct 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Umlauts are encoded in the xml file #14

Umlauts are encoded in the xml file #14

binarious commented Oct 8, 2014

JelleM-Congressus commented Oct 8, 2014

binarious commented Oct 8, 2014

JelleM-Congressus commented Oct 8, 2014

si4dev commented Jan 11, 2016

Umlauts are encoded in the xml file #14

Umlauts are encoded in the xml file #14

Comments

binarious commented Oct 8, 2014

JelleM-Congressus commented Oct 8, 2014

binarious commented Oct 8, 2014

JelleM-Congressus commented Oct 8, 2014

si4dev commented Jan 11, 2016