You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Recode library is able to combine 'UCS-2' some sequences of codes
into single code characters, to represent a few diacriticized
characters, ligatures or diphtongs which have been included to ease
mapping with other existing charsets. It is also able to explode such
single code characters into the corresponding sequence of codes. The
request syntax for triggering such operations is rudimentary and
temporary. The 'combined-UCS-2' pseudo character set is a special form
of 'UCS-2' in which known combinings have been replaced by the simpler
code. Using 'combined-UCS-2' instead of 'UCS-2' in an _after_ position
of a request forces a combining step, while using 'combined-UCS-2'
instead of 'UCS-2' in a _before_ position of a request forces an
exploding step. For the time being, one has to resort to advanced
request syntax to achieve other effects. For example:
recode u8..co,u2..u8 < INPUT > OUTPUT
copies an 'UTF-8' INPUT over OUTPUT, still to be in 'UTF-8', yet merging
combining characters into single codes whenever possible.
However, nowadays not all characters can be represented in UCS-2. UCS-4 should be used instead. So it would be nice to have combined-UCS-4.
The text was updated successfully, but these errors were encountered:
In the
recode
3.7.14 manual:However, nowadays not all characters can be represented in UCS-2. UCS-4 should be used instead. So it would be nice to have
combined-UCS-4
.The text was updated successfully, but these errors were encountered: