Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should support combined-UCS-4, replacing combined-UCS-2 #63

Open
vinc17fr opened this issue Feb 17, 2025 · 0 comments
Open

should support combined-UCS-4, replacing combined-UCS-2 #63

vinc17fr opened this issue Feb 17, 2025 · 0 comments

Comments

@vinc17fr
Copy link

In the recode 3.7.14 manual:

   The Recode library is able to combine 'UCS-2' some sequences of codes
into single code characters, to represent a few diacriticized
characters, ligatures or diphtongs which have been included to ease
mapping with other existing charsets.  It is also able to explode such
single code characters into the corresponding sequence of codes.  The
request syntax for triggering such operations is rudimentary and
temporary.  The 'combined-UCS-2' pseudo character set is a special form
of 'UCS-2' in which known combinings have been replaced by the simpler
code.  Using 'combined-UCS-2' instead of 'UCS-2' in an _after_ position
of a request forces a combining step, while using 'combined-UCS-2'
instead of 'UCS-2' in a _before_ position of a request forces an
exploding step.  For the time being, one has to resort to advanced
request syntax to achieve other effects.  For example:

     recode u8..co,u2..u8 < INPUT > OUTPUT

copies an 'UTF-8' INPUT over OUTPUT, still to be in 'UTF-8', yet merging
combining characters into single codes whenever possible.

However, nowadays not all characters can be represented in UCS-2. UCS-4 should be used instead. So it would be nice to have combined-UCS-4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant