Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANSI escape codes not yet implemented #2

Open
withoutboats opened this issue Nov 17, 2015 · 13 comments
Open

ANSI escape codes not yet implemented #2

withoutboats opened this issue Nov 17, 2015 · 13 comments

Comments

@withoutboats
Copy link
Owner

There are a large number of ANSI escape sequences which are not currently implemented, but most of these are almost never actually used. This is a tracking issue for any ANSI escape sequences seen in the wild by anyone testing or using a terminal based on this library.

The escape codes that are most important to implement are those that are called out in the terminfo sheets for xterm and its offspring, because this is the protocol, and not the full features of an ANSI terminal, that well-written programs are written against. The only major gap here is the terminfo item csr, which is called in VT manuals DECSTBM, or Set Top and Bottom Margins.

Page margins are a very complicated feature to implement, and not very efficient. More importantly, though, their interaction with other commands that move the cursor or scroll the screen is very underspecified in the documentation of VT series terminals and their emulators. I have read these manuals with interest and still have no idea how exactly this is supposed to work.

vim uses this feature, and seems a bit slower without it. It would be a good thing to add for legacy. However, a more powerful, cleaner, but similar feature for future programs would be just to allow multiple visible grids on the screen, which have their own boundaries and cursor, and scroll separately. Logically this is what this feature is being used to implement anyway.

@ghost
Copy link

ghost commented Dec 4, 2015

Is it a design goal for applications which render to notty to fallback gracefully on traditional vt100/xterm terminals? If so, might I suggest using the APC or PM sequences to build the extended "not-vt100" protocol into? Most "extended" terminals get this wrong, they often overlap on valid CSI sequences and C0 control characters, making vttest-passing terminals spew garbage on the screen. Using APC or PM gives you an entire world to play with that a good xterm-compatible emulator will quietly ignore: see for example http://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h2-Application-Program-Control-functions .

Sorry I don't know enough about your code to figure out myself how you are transferring images.

@withoutboats
Copy link
Owner Author

There's a notty.md file which attempts to explain the format well, but its basically this (omitting some details):

  • notty codes are initiated with \E{, which is afaik a totally unused escape sequence.
  • They then have a hex-encoded opcode which identifies the command, and a series of arguments which are all hex-encoded also (ish).
  • They then may contain attachments, which have a hex-encoded length followed by binary data of that many bytes.
  • They terminate with }.

Image-drawing opcodes have some positioning arguments, then a MIME-type attachment and a binary data attachment. Later there could be commands which fetch the data using https:// and file:// URLs to "sideload" large binary files, but for now just inline images is how its working.

I suppose we could use \E_ and \E^G as initiator/terminators instead, so they will be more likely to ignored by other terminals, but there are two things that make that not especially appealing to me:

  • notty codes with binary attachments will likely contain \E^G and pollute output anyway. And terminals may not be well-equipped to handle escape codes of the length codes with binary attachments can reach also.
  • I intend to release an ncurses-like library which will polyfill a semblance of these features on other terminals, and it will expose the ability to serialize esc codes abstractly without knowing how the protocol works. So hopefully programs writing these codes to non-notty terminals will be a very uncommon problem.

It's a good idea and I'm considering it though. Just not sure that the benefit outweighs the memetic disadvantage of not being symmetrical in the same way.

@withoutboats
Copy link
Owner Author

Is it a design goal for applications which render to notty to fallback gracefully on traditional vt100/xterm terminals?

To answer this question more broadly, the answer is yes, but my plan has been more focused on polyfilling those features as best as possible using the library mentioned to make the applications at least somewhat functional on ANSI terminals. This would be through implementing the local echo features as a non-local echo in the library, drawing graphics using ASCII art, and so on. The goal here is of course so that using this library to write a terminal app won't seem so ridiculous, because your app will mostly work on other terminals, except the features you want that just aren't possible on those devices.

@ghost
Copy link

ghost commented Dec 4, 2015

Using "ESC { ... stuff ... }" will result in xterm-type terminals emitting all of "... stuff ... }" to the screen. If that includes raw binary (C0/C1) it has the potential to hose the xterm session. (For terminals based on something like http://www.vt100.net/emu/dec_ansi_parser it goes GROUND => ESCAPE => GROUND.)

A couple of options come to mind:

  1. Use PM or APC, and then encode the "stuff" to not have any C0/C1 control characters. There is the risk as you mentioned of a terminal with a short buffer for PM/APC to spew junk, but proper state machine terminals should actually not have a buffer at all and thus be OK. Testing with xterm would be sufficient to figure that out.
  2. Figure out early on if the terminal is notty-aware and disable the non-vt100 stuff if it isn't. A nice way to do this would be CSI ? XX n (DSR, DEC specific), with XX being above 85 so it doesn't overlap with the existing commands. xterms ought to simply ignore it. You could also use plain ^E (ENQ) and have notty respond with 'notty-' . This one gives you a lot more flexibility, you know immediately that you've either got an xterm or notty on the other side, and if it's notty you don't have to try to contort your protocol into something xterm will fallback on.

@withoutboats
Copy link
Owner Author

I think I'm going to switch to APC and encoding the attachments in base64. This should avoid all kinds of backcompat issues.

As to notty awareness, I was going to see if I could add a custom capability to the terminfo page, rather than doing an inline identification like that.

@withoutboats
Copy link
Owner Author

@klamonte do you know if ^G is recognized as a string terminator for APC codes by most ANSI terminals?

@ghost
Copy link

ghost commented Dec 6, 2015

@withoutboats As far as I can tell ^G will NOT terminate an APC command as per http://vt100.net/emu/dec_ansi_parser . For xterms, ^G will terminate an OSC string though I don't know what historical accident led to that because ST (ESC ) is supposed to (and does) terminate DCS, PM, APC, and OSC too.

You do also have a way if you need it to terminate using CAN (0x18). Terminals are supposed to immediately return to GROUND state once they see that, and (just now checking) looks like xterm will do that. CAN is supposed to mean that there was an error in the escape code so ignore it.

@withoutboats
Copy link
Owner Author

In recording my own terminal use, every instance of an OSC (usually setting the title) was terminated with ^G rather than ST. I was hoping that using ^G as an alternative to ST was just common practice for all of these commands.

Does ST here mean the byte 0x9C, or the unicode code point in UTF8? I suppose if its the byte that's "fine" because the UTF8 representation of U+009C is 0xC2 0x9C, and that byte would not appear in any other code points of the protocol. You can also represent ST as ESC \, correct? Using CAN also might be workable but seems like an incredibly hacky answer. I doubt that all commonly used virtual terminators implement that functionality.

Of these options, terminating with utf8 encoded U+009C sounds like the best option.

@ghost
Copy link

ghost commented Dec 7, 2015

OSC for xterms often is terminated with ^G, true. But the linux console doesn't terminate OSC at all: they have "ESC ] P <7 hex digits>" as a way to change the VGA palette color used for one of the 8 basic colors (RED/BLUE/CYAN/etc). xterm devised the "brokenLinuxOSC" resource (which is defaulted to true these days) because otherwise it would appear to hang since there was no ^G or ST to see.

(Aside: ^N (shift out) has been used by BBS ANSI terminals for terminating "ANSI music" strings: http://webtweakers.com/swag/ANSI/0019.PAS.html . These were actually implemented by several DOS emulators and at least one big (SMLR/OLX) offline mail reader. Dickey had some words on Usenet long ago about the idiocy of that but I can't find it anymore.)

9C vs U+009C: good question indeed. The Unicode standard states that encoding (all forms including UTF-8) occur before any terminal emulation processing is done, so that certainly supports the U+009C option. However, true VT220 terminals require raw 0x9C, and I don't remember for sure which behavior I saw uxterm doing back then when I was playing with it. Then again there is also X10 mouse reporting which in its raw form hoses a UTF8 stream, and it was only recently (u)xterm, (u)rxvt, and friends settled on SGR (1006) mode which is all 7-bit.

All in all, I think it would be fine to go with either U+009C or ESC \ . The former is what the standards really push for, and if uxterm doesn't do it already then I think Dickey would accept it as a bug and fix it. The latter is conveniently 7-bit already (and even has a harmless meaning on VT52 (it ends "hold mode"), talk about old).

If you want to use the most consistent thing, then you could make both ends either 7-bit or 8-bit: "ESC _ ...stuff... ESC ", or "U+009F ...stuff... U+009C" .

Thanks also for considering this change and the discussion.

@withoutboats
Copy link
Owner Author

"ANSI music" strings: http://webtweakers.com/swag/ANSI/0019.PAS.html

Oh wow, what a different world.

All in all, I think it would be fine to go with either U+009C or ESC \ . The former is what the standards really push for, and if uxterm doesn't do it already then I think Dickey would accept it as a bug and fix it. The latter is conveniently 7-bit already (and even has a harmless meaning on VT52 (it ends "hold mode"), talk about old).

If you want to use the most consistent thing, then you could make both ends either 7-bit or 8-bit: "ESC _ ...stuff... ESC ", or "U+009F ...stuff... U+009C" .

I think I'm settled on ESC _ and U+009C. Its a bit weird to use a C0 init and a C1 term, but I think these are the options that are most consistent with how terminal apps usually emit OSC and DCS sequences. I'm actually using ESC _ [ right now, kind of just on feel.

Thanks also for considering this change and the discussion.

Of course! This is definitely an improvement to the protocol, and I'm very glad to discuss with other people what has been a solitary project for the past year (the first 9 or so months of which were brainstorming and reading manuals while focusing on other things). Also your experience with terminals seems different from and longer than mine - I have never even seen a VT terminal, and before deciding I was interested in solving this problem I had very little idea how terminals actually worked. Its really good to get other perspectives.

In general, my philosophy about backward compatibility is that in the vast majority of cases, using ANSI programs on a notty terminal should work and using notty programs on ANSI terminals should fail gracefully, but that there's a trade off here and its time some features were deprecated. I haven't supported multiple character sets, for example, and I don't plan to, and I don't really think most of the complicated tab stop support is worth implementing either. Page margins are another feature that I think would be better replaced by a whole different design, but this one is actually used by ncurses and so I am more swayed to support it. If someone's 25-year-old legacy system uses features this terminal doesn't support, I'm okay with it.

@singpolyma
Copy link

Isn't this better solved by not emitting extended content on other terminals? Every program already has to configure its output based on the terminal it's in if it uses anything but pure-ansi, so this is not new ground.

base64-encoding things should not be needed in a modern program and just slows things down...

@withoutboats
Copy link
Owner Author

@singpolyma not emitting notty codes on non-notty terminals is indeed ideal and correct behavior, but there is merit in failing politely if someone writes an application which doesn't check that this is a notty terminal. While every program ought to configure its output based on capabilities of the running terminal, in practice today many terminals don't check and assume ANSI compatibility. If notty ever becomes standard enough that people assume their programs are being run in a notty terminal, it would be good for their programs to fail politely instead of spewing garbage all over their screen (or worse: possibly this property could be abused to create malicious programs by embedding valid ANSI escapes in the binary data). We cannot control what all programs will ever do.

Base64 has other advantages over binary data beyond this one. In particular, the default setting of the tty is to mangle binary data by inserting a \r before every \n in program output. While notty enabled programs can unset this flag (called ONLCR in the tty ioctl documentation), they can only do this while running locally - when running over ssh, nothing can be done. So binary output is in fact a non-starter because it would mean that notty doesn't work over ssh.

The performance cost of base64ing is minimal in comparison to the cost of loading or generating the media, transferring the image over the tty (and possibly over the network), and parsing and rendering the media.

The other changes discussed here have no real performance cost - changing the code initializer and terminator from one string to another.

@singpolyma
Copy link

So binary output is in fact a non-starter because it would mean that notty doesn't work over ssh.

That is extremely sad, but also a very good reason to make the change. Stupid history.

withoutboats pushed a commit that referenced this issue Jun 3, 2016
Merge in download progress indicator changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants