public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
* ITP: rxvt-unicode-X
@ 2006-03-21  6:28 Charles Wilson
  2006-03-21 15:33 ` Reid Thompson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Charles Wilson @ 2006-03-21  6:28 UTC (permalink / raw)
  To: CygWin-Apps

Okay, so it doesn't actually support unicode on cygwin.  So why bother?

(1) bugfixes.  Upstream development of rxvt is dead.  cygwin's rxvt is 
moribund.  rxvt-unicode is actively maintained.

(2) Heck, eventually we might actually get unicode support assuming the 
proper stuff goes into newlib.  I dunno -- but I'm sure I'll be informed 
by cygwin's non-US userbase whenever it DOES become possible ...

(3) Pretty. xft support. Styled text[*].  Looks cool with inheritPixmap 
and xsri. (xft with antialias is a bit slower, but not too bad on a fast 
machine, and you can go back to non-antialias or plain old bitmap fonts 
if you're desperate).
[*] italic and boldItalic submodes don't seem to work yet.  But that's 
probably because I haven't researched the proper escape codes to 
actually *activate* I and BI.

(4) Lightweight.  Has an optional client-server mode where all client 
windows are part of the same process.  Yes, it does present a 
single-point-of-failure (but so does xwin!) -- but I haven't had a 
problem yet.

(5) no need for run.exe: the standalone urxvt-X and the server urxvtd-X 
will hide their console window themselves (using code borrowed from 
inetutils).  Actually, run.exe + urxvt + [-ls | loginShell=true | -e 
${SHELL} --login ] == 100% CPU for some reason, but ONLY that 
combination.  Any other combination is fine.

=====
AH! But your version is X-only.  What happened to my split-personality rxvt?

Well, IMO the split personality is a bad idea: the worst of both worlds. 
  rxvt is configured to support only the least common denominator 
options, those that BOTH modes can each support.  So, no xft support 
ever.  InheritPixmap is, err, at-your-own-risk.  Plus, the underlying 
W11 library is just as moribund as rxvt -- and the wrapper system means 
ALL library calls in EITHER mode must be handled by dlsym().

So, it's all part of my diabolical plan: to augment the dying rxvt with 
rxvt-unicode-X and rxvt*W, where at the X version is as featureful as 
possible, and the native version is at least as usable as the present 
rxvt(native).

I'm not there yet, and I'll need help to get there, but that's the plan, 
anyway.  The rxvt-unicode-common package provides stuff like man pages, 
terminfo and termcap entries (auto-added by postinstall if not present 
already), documentation, etc -- things that would be shared between 
rxvt-unicode-X and some future rxvt-unicode-W...

=====
http://cygutils.fruitbat.org/ITP/rxvt-unicode-X-7.7-1-src.tar.bz2
http://cygutils.fruitbat.org/ITP/rxvt-unicode-X-7.7-1.tar.bz2
http://cygutils.fruitbat.org/ITP/rxvt-unicode-common-7.7-1.tar.bz2
http://cygutils.fruitbat.org/ITP/rxvt-unicode-X.hint
http://cygutils.fruitbat.org/ITP/rxvt-unicode-common.hint

Fedora4, Mandriva2006, Debian

download.fedora.redhat.com/pub/fedora/linux/extras/4/i386/rxvt-unicode-7.5-1.fc4.i386.rpm
carroll.cac.psu.edu/pub/linux/distributions/mandrakelinux/official/2006.0/i586/media/contrib/rxvt-unicode-5.6-1mdk.i586.rpm
http://packages.debian.org/unstable/x11/rxvt-unicode
http://packages.debian.org/stable/x11/rxvt-unicode

--
Chuck


---- rxvt-unicode-X.hint ----
sdesc: "An improved version of rxvt requiring an Xserver."
ldesc: "rxvt-unicode-X is an X-based version of rxvt-unicode, which is an
improved version of the venerable rxvt terminal emulator.  The upstream
codebase supports unicode as well as possessing numerous bugfixes
over the unmaintained rxvt code; however, on cygwin the unicode
support is non-functional."
category: Shells
requires: cygwin bash coreutils xorg-x11-bin-dlls libXft2 xorg-x11-xwin 
rxvt-unicode-common
-------------------------------

---- rxvt-unicode-common.hint ----
sdesc: "An improved version of rxvt requiring an Xserver."
ldesc: "rxvt-unicode-X is an X-based version of rxvt-unicode, which is an
improved version of the venerable rxvt terminal emulator.  The upstream
codebase supports unicode as well as possessing numerous bugfixes
over the unmaintained rxvt code; however, on cygwin the unicode
support is non-functional."
category: Shells
external-source: unicode-rxvt-X
requires: cygwin bash grep ncurses
----------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ITP: rxvt-unicode-X
  2006-03-21  6:28 ITP: rxvt-unicode-X Charles Wilson
@ 2006-03-21 15:33 ` Reid Thompson
       [not found] ` <200603211640.k2LGecHO013433@ns-srv-2.bln1.siemens.de>
  2006-04-02 21:07 ` Charles Wilson
  2 siblings, 0 replies; 7+ messages in thread
From: Reid Thompson @ 2006-03-21 15:33 UTC (permalink / raw)
  To: cygwin-apps

Charles Wilson wrote:
>
> =====
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-X-7.7-1-src.tar.bz2
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-X-7.7-1.tar.bz2
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-common-7.7-1.tar.bz2
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-X.hint
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-common.hint
>
> Fedora4, Mandriva2006, Debian
>
> download.fedora.redhat.com/pub/fedora/linux/extras/4/i386/rxvt-unicode-7.5-1.fc4.i386.rpm 
>
> carroll.cac.psu.edu/pub/linux/distributions/mandrakelinux/official/2006.0/i586/media/contrib/rxvt-unicode-5.6-1mdk.i586.rpm 
>
> http://packages.debian.org/unstable/x11/rxvt-unicode
> http://packages.debian.org/stable/x11/rxvt-unicode
>
>
>
Could someone point me to what i'm missing....
....
checking for unix-compliant filehandle passing ability... no
configure: error: libptytty requires unix-compliant filehandle passing 
ability

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ITP: rxvt-unicode-X
       [not found] ` <200603211640.k2LGecHO013433@ns-srv-2.bln1.siemens.de>
@ 2006-03-21 21:54   ` Charles Wilson
  2006-05-10 20:31     ` Thomas Wolff
  0 siblings, 1 reply; 7+ messages in thread
From: Charles Wilson @ 2006-03-21 21:54 UTC (permalink / raw)
  To: mined, cygwin-apps


On Tue, 21 Mar 2006 17:40:38 +0100 (MET), XXXXdadgum webmail quoting
addresses by defaultXXXX said:
> I have some questions on porting rxvt.
> 
> * How did you get it to compile 7.7? When I try it myself, it fails with
>   rxvtfont.C:1328: error: 'struct rxvt_term' has no member named
>   'bgPixmap'

That looks like an error in how configure.ac harmonizes configure
options (or in how rxvtfont.C uses them).  The way rxvtfont.C is coded,
you can't have transparency support without also enabling xpm support. 
That restriction may be true for obscure X11 reasons -- in which case
configure.ac should flag an error if you try otherwise -- or the
restriction may be bogus -- in which case rxvtfont.C should be more
careful.

In any event, for now either enable both transparency and xpm, or
neither.
 
> * I was previously able to compile rxvt-unicode 4.8 myself on cygwin.
>   Missing the Unicode support, I first tried to trick out the 
>   dogmatic locale dependency of rxvt (as the cygwin locale mechanism is 
>   unfortunately bogus). I patched rxvt around its locale requests so 
>   it was forced to assume a UTF-8 environment. There was nothing around 
>   these positions in the code that suggested further dependencies.
>   So what is the actual "newlib" problem that prevents rxvt from 
>   supporting Unicode - apparently even from trying to support it?

I don't know.  All I know is that (a) I didn't see it actually work, and
(b) I've read other reports that unicode doesn't actually work on
cygwin.  Maybe I'm wrong.  I'm pretty clueless on unicode issues: do I
need a specific unicode font to even try it?  How many LC_* variables
*should* I have to set in order to "enable" unicode -- say, if I were on
a Linux system will full unicode support?  I dunno.  I was hoping others
with more experience could use my package -- or my build system -- and
experiment, reporting successess and failures.  I know, that's fairly
pollyanna-ish of me, but...  I was eventually planning on building
rxvt-unicode with identical options over on my Linux box, and play
around with it there, but that's a roudtuit item.

> * Did you notice that the Backspace key enters a quote character rather 
>   than Backspace? This is since rxvt-unicode version 5 or so and also 
>   happens with the Linux-compiled version. I have the impression that 
>   a program that carries such a striking bug over 3 versions has some 
>   maintenance deficiencies. That leads me to my next question:

I do not observe this behavior.  It may be related to your TERM setting
and the current state of your terminfo/termcap databases.  I've
explicitly compiled rxvt-unicode to report 'TERM=rxvt-unicode'; I do not
override that value in my startup scripts.  The package I've created
will install the appropriate termcap and terminfo entries if necessary. 
Try *my package* and not some older one you've compiled, ensure 'echo
$TERM' says rxvt-unicode, and see if that works.  I can't debug your
private, older versions for you.

> * Why deal with rxvt at all? Wouldn't it be feasible with the same 
>   effort to make a native version of xterm with your (highly appreciated) 
>   libW11 plans?
>   That would be of even higher advantage as Unicode is already working 
>   with xterm on cygwin because xterm is not so dogmatic about its 
>   environment when asked to support Unicode.

Several reasons. One, xterm requires much more support from X than rxvt: 
  D:\cygwin\usr\X11R6\bin\cygXaw-8.dll
    D:\cygwin\usr\X11R6\bin\cygXext-6.dll
    D:\cygwin\usr\X11R6\bin\cygXmu-6.dll
      D:\cygwin\usr\X11R6\bin\cygXt-6.dll
        D:\cygwin\usr\X11R6\bin\cygICE-6.dll
        D:\cygwin\usr\X11R6\bin\cygSM-6.dll
    D:\cygwin\usr\X11R6\bin\cygXp-6.dll

(I suppose, just like with rxvt-unicode, the following could be turned
off)
  D:\cygwin\usr\X11R6\bin\cygXft-2.dll
    D:\cygwin\usr\X11R6\bin\cygXrender-1.dll

So it's a much higher mountain to climb before we'd have something that
kinda-sorta works.  Read /usr/share/doc/Cygwin/libW11-20050610.README
for more info, but although the upstream version of libW11 is intended
as a "drop in" *replacement* for cygX11-6.dll, that's not the way I'm
envisoning cygwin-libW11.  THEIR way, you replace the real cygX11-6.dll
with a fake one that contains libW11 code -- and all of the other X libs
will use the new libW11 stuff and it'll all "just work"

Except that it doesn't.  libW11 isn't complete enough for that, and it's
an all-or-nothing major system mod: you can't have some apps in "X" mode
and others in "libW11" mode.  So basically, their way breaks almost
everything.  That's bad.

MY way, libW11 code is in a specific, cygW11-6.dll library.  Apps (and
other DLLs) that use it must explicitly be built (e.g. link) against it
[that is, -L/usr/lib/W11 -lX11].  So, we'd need cygSM-W11.dll and
cygICE-W11.dll and cygXmu-W11.dll and ... which are all built against
libW11.  Now, that may eventually be possible -- especially if we use
the modular x.org sourcecode.  However, since not even the official
cygwin-X guys have released a "true" X using the modular sourcecode, I'm
not brave enough to guinea pig THAT on *top* of the incomplete libW11.  
I'm convinced my plan is much less work than all that rot, and gets to
my desired end goal faster: a maintained replacement for X- and native-
rxvt that is upstream-supported as far as the rxvt-unicode sourcecode is
concerned. (Unicode support is "nice" but not necessary in this
conception; current rxvt doesn't support it, so it's not an immediate
goal for "my" rxvt, either, no matter the "name" of the package).

Second, and related, rxvt is much lighter weight -- just look at all
those DLLs that xterm needs.

Third, geez, xterm is just so gosh-darned ugly. :-) Not "important" --
but never underestimate the appeal of eye-candy.  It's also personal
preference: I despise the xterm scrollbar: middle click, no arrows, no
keep-scrolling, blech.  The only terminal in existence worse than xterm
is command.com. :-)

--
Chuck
--
  Charles Wilson
  cygwin at removespam cwilson dot fastmail dot fm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ITP: rxvt-unicode-X
  2006-03-21  6:28 ITP: rxvt-unicode-X Charles Wilson
  2006-03-21 15:33 ` Reid Thompson
       [not found] ` <200603211640.k2LGecHO013433@ns-srv-2.bln1.siemens.de>
@ 2006-04-02 21:07 ` Charles Wilson
  2 siblings, 0 replies; 7+ messages in thread
From: Charles Wilson @ 2006-04-02 21:07 UTC (permalink / raw)
  To: CygWin-Apps

Charles Wilson wrote:

updated packages: now contains 'checkX' described here:
http://cygwin.com/ml/cygwin-apps/2006-03/msg00148.html

> http://cygutils.fruitbat.org/ITP/rxvt-unicode-X-7.7-2-src.tar.bz2
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-X-7.7-2.tar.bz2
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-common-7.7-2.tar.bz2
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-X.hint
> http://cygutils.fruitbat.org/ITP/rxvt-unicode-common.hint
> 
> Fedora4, Mandriva2006, Debian
> 
> download.fedora.redhat.com/pub/fedora/linux/extras/4/i386/rxvt-unicode-7.5-1.fc4.i386.rpm 
> 
> carroll.cac.psu.edu/pub/linux/distributions/mandrakelinux/official/2006.0/i586/media/contrib/rxvt-unicode-5.6-1mdk.i586.rpm 
> 
> http://packages.debian.org/unstable/x11/rxvt-unicode
> http://packages.debian.org/stable/x11/rxvt-unicode
> 

How about a GTG, anybody?

--
Chuck

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ITP: rxvt-unicode-X
  2006-03-21 21:54   ` Charles Wilson
@ 2006-05-10 20:31     ` Thomas Wolff
  2006-05-11  4:48       ` Charles Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Wolff @ 2006-05-10 20:31 UTC (permalink / raw)
  To: Charles Wilson, cygwin-apps

Sorry for the very late response, but I've finally succeessfully 
pursuaded rxvt-unicode now to actually support Unicode on cygwin, 
and I'd like to suggest to include that in the package.


Charles Wilson wrote on Tue, 21 Mar 2006 16:54:11 -0500:

> >   ...
> >   So what is the actual "newlib" problem that prevents rxvt from 
> >   supporting Unicode - apparently even from trying to support it?

> I don't know.  All I know is that (a) I didn't see it actually work, and
> (b) I've read other reports that unicode doesn't actually work on
> cygwin.  Maybe I'm wrong.  I'm pretty clueless on unicode issues: do I
> need a specific unicode font to even try it?  How many LC_* variables
> *should* I have to set in order to "enable" unicode -- say, if I were on
> a Linux system will full unicode support?  I dunno.  I was hoping others
> with more experience could use my package -- or my build system -- and
> experiment, reporting successess and failures.  I know, that's fairly
> pollyanna-ish of me, but...  I was eventually planning on building
> rxvt-unicode with identical options over on my Linux box, and play
> around with it there, but that's a roudtuit item.

Some general remarks:
Depending on the application, Unicode may be triggered either
1) explicitly or
2) using the locale mechanism (which is bogus on cygwin).
   It should be noted that the set of locale variables (LC_* and LANG) 
   are not identical to the locale mechanism which needs addtional 
   library support.

1) For example, xterm has an explicit command line option:
	xterm -u8
   which invokes xterm in UTF-8 mode. Additional configuration is 
   needed to use Unicode fonts. And LC_* variables are unfortunately 
   not set implicitly in this invocation mode which confuses many 
   applications.

   My package mined includes a script uterm which invokes xterm in a 
   suitable mode, including font setup. Cygwin/X does include some 
   Unicode fonts, but apparently a very outdated version of them with 
   a very limited character range. I would offer to maintain a package 
   of Unicode X fonts if that helps.

2) Rxvt insists on locale configuration to provide desired encodings.
   This means, you would have to invoke rxvt like this:
	LC_CTYPE=en_US.UTF-8 rxvt
   or
	LC_ALL=vi_VN rxvt
   (Note: vi_VN is one of the UTF-8 locales that lack the usual 
   indication suffix.)
   And rxvt would run in UTF-8 mode where the locale mechanism 
   works (which it doesn't on cygwin).


The reason why I couldn't trick out rxvt before by just setting the 
variables was that it also depends on the wide character library 
functions which in turn depend on a working locale mechanism.
I have now replaced those functions (well, the subset of them needed 
by rxvt) with substitutes that either operate in UTF-8 mode, or 
delegate to the system functions, depending on the setting of the 
locale variables, and it works. At least it does so for display, 
although it suppresses 8-bit input for some obscure reason still to be 
found.

I will send the files to you (Charles Wilson) directly and would 
appreciate if you confirm the solution.

Kind regards,
Thomas Wolff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ITP: rxvt-unicode-X
  2006-05-10 20:31     ` Thomas Wolff
@ 2006-05-11  4:48       ` Charles Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Charles Wilson @ 2006-05-11  4:48 UTC (permalink / raw)
  To: CygWin-Apps

Thomas Wolff wrote:
> Sorry for the very late response, but I've finally succeessfully 
> pursuaded rxvt-unicode now to actually support Unicode on cygwin, 
> and I'd like to suggest to include that in the package.

That's great, thank you very much.  I received your other emails and 
will take a look as soon as possible.  However, I'll let the brand new 
(not even announced yet) rxvt-unicode-X package stay as-is for a while 
to give folks a chance to try it out before incorporating any new 
features/changes.

> Some general remarks:
> Depending on the application, Unicode may be triggered either
> 1) explicitly or
> 2) using the locale mechanism (which is bogus on cygwin).
>    It should be noted that the set of locale variables (LC_* and LANG) 
>    are not identical to the locale mechanism which needs addtional 
>    library support.
> 
> 1) For example, xterm has an explicit command line option:
> 	xterm -u8
>    which invokes xterm in UTF-8 mode. Additional configuration is 
>    needed to use Unicode fonts. And LC_* variables are unfortunately 
>    not set implicitly in this invocation mode which confuses many 
>    applications.
> 
>    My package mined includes a script uterm which invokes xterm in a 
>    suitable mode, including font setup. Cygwin/X does include some 
>    Unicode fonts, but apparently a very outdated version of them with 
>    a very limited character range. I would offer to maintain a package 
>    of Unicode X fonts if that helps.
> 
> 2) Rxvt insists on locale configuration to provide desired encodings.
>    This means, you would have to invoke rxvt like this:
> 	LC_CTYPE=en_US.UTF-8 rxvt
>    or
> 	LC_ALL=vi_VN rxvt
>    (Note: vi_VN is one of the UTF-8 locales that lack the usual 
>    indication suffix.)
>    And rxvt would run in UTF-8 mode where the locale mechanism 
>    works (which it doesn't on cygwin).

So, you're saying that rxvt-unicode doesn't have an explicit switch, but 
relies on pre-existing env vars.  This is good, because the apps one 
runs IN the terminal will need those env vars too, something a command 
line switch won't set for you properly anyway.

BUT...

> The reason why I couldn't trick out rxvt before by just setting the 
> variables was that it also depends on the wide character library 
> functions which in turn depend on a working locale mechanism.

if the wide char library functions don't exist, then rxvt ignores the LC 
vars anyway.  Gotcha.

> I have now replaced those functions (well, the subset of them needed 
> by rxvt) with substitutes that either operate in UTF-8 mode, or 
> delegate to the system functions, depending on the setting of the 
> locale variables, and it works. 

Shims -- that's a reasonable approach.  (I'd prefer if unicode/locale 
support were added to cygwin's version of newlib but that might be 
Augean Stables-level of effort.) OTOH,  I *really* prefer 
things-that-work, sooner rather than later -- so this is good.

> At least it does so for display, 
> although it suppresses 8-bit input for some obscure reason still to be 
> found.

I'm just guessing, but this could be related to the configure settings 
in my build script, if that's what you were using:

   --enable-shared --enable-utmp --enable-wtmp --enable-lastlog \
   --enable-xft --enable-font-styles --disable-xim --enable-combining \
   --enable-fallback=Rxvt --with-res-name=urxvt --with-res-class=URxvt \
   --program-suffix=-X \
   --enable-xpm-background  --enable-menubar --enable-rxvt-scroll \
   --enable-next-scroll --enable-xterm-scroll --enable-plain-scroll \
   --enable-transparency --enable-tinting --enable-fading \
   --enable-frills --enable-smart-resize --enable-pointer-blank \
   --enable-mousewheel --enable-slipwheeling --enable-keepscrolling \
   --enable-old-selection --disable-perl \
   --with-xpm-includes=/usr/X11R6/include 
--with-xpm-library=/usr/X11R6/lib \
   --x-libraries=/usr/X11R6/lib


Note: --disable-xim as well as not specifying --enable-8bitctrls

Now, the latter is "not recommended" and its only effect is the 
following block of code in the input-processing loop:

#ifdef EIGHT_BIT_CONTROLS
       // 8-bit controls
       case 0x90:        /* DCS */
         process_dcs_seq ();
         break;
       case 0x9b:        /* CSI */
         process_csi_seq ();
         break;
       case 0x9d:        /* CSI */
         process_osc_seq ();
         break;
#endif

So, I don't think that's it.

=====

While 8bit input != xim, there are two things I've discovered about the 
rxvt-unicode sourcecode:
   (1) very little testing is done in non-default configurations (and 
--enable-xim is the default)
   (2) some #define macros turn on/turn off more than their simple names 
and descriptions might suggest -- and the code often makes unwarranted 
assumptions (e.g. see earlier thread about an unwarranted linkage 
between transparency and XPM support)

So, it's possible that --disable-xim turns off some non-XIM input 
support needed for 8bit entry.

Try: --enable-xim.
=====

Also, try the iso14755 support (CTRL-SHFT-key).  Maybe that helps?

=====

Finally, input is a cooperative affair between the terminal, the shell, 
and for X11 terminals, the Xserver.  In the case of bash, that also 
includes readline.  How's your ~/.inputrc set up?

      # don't strip characters to 7 bits when reading
      set input-meta on

      # allow iso-latin1 characters to be inserted rather
      # than converted to prefix-meta sequences
      set convert-meta off

      # display characters with the eighth bit set directly
      # rather than as meta-prefixed characters
      set output-meta on

Also, are you sure that the "meta" key is what you think it is?  You can 
force it by using the -mod cmdline option of rxvt-unicode (see that 
urxvt manpage).  I think the cygwin Xserver defaults to using Alt.

And then, there's the -meta8 cmdline option to rxvt-unicode:

      meta8: boolean
           True: handle Meta (Alt) + keypress to set the 8th bit.
	  False: handle Meta (Alt) + keypress as an escape prefix
	   [False is default].

Maybe you want True?

> I will send the files to you (Charles Wilson) directly and would 
> appreciate if you confirm the solution.

Quick perusal looks pretty good.  I like the caching of is_u_utf8_mode, 
but you should watch out: --enable-frills turns on
    'locale switching escape sequence'
so you might need to add a hook in that handler to "un-cache".

--
Chuck

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ITP: rxvt-unicode-X
@ 2006-05-18 18:45 Thomas Wolff
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Wolff @ 2006-05-18 18:45 UTC (permalink / raw)
  To: Charles Wilson, cygwin-apps

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 5307 bytes --]

I have now succeeded in finishing my Unicode support hook for rxvt on 
cygwin (almost, as far as Unicode operation is concerned).
There were some more obstacles to take which I will describe below in 
case anyone is interested :)

A few problems remain:
* If I start rxvt in NON-Unicode mode, 8 bit input doesn't work. This 
  also happens with the unpatched rxvt-unicode 6.0 (compiled from the 
  source archive), but it works in Charles' package, so I would hope 
  that the patch is applicable to the package without injecting this 
  error.
* The wchar_t type on cygwin is only "unsigned short", raising a minor 
  problem with handling Unicode characters beyond 16 bit; my patch is 
  now mapping the output to the Unicode replacement character U+FFFD.
  Substituting a sufficiently wide type might work but would require 
  more subtle modifications to the code.
* Charles pointed out that an application can use setlocale multiple 
  times, switching encoding dynamically, and that rxvt actually does 
  that (although I didn't understand for which purpose). Anyway, 
  a proper substitution of setlocale that mimics this behaviour is 
  still missing in my patch library.
* Suspected remaining handling bug in 'draw_string' as described below.

To apply the patch, please unzip the uwc.zip archive in the rxvt 
src subdirectory. Then invoke the uwc script which applies the patch 
generically, by substituting the respective function names in the 
source files. The final "return NOCHAR" fix described below still has 
to be applied manually, sorry.
The patch can be downloaded from <http://towo.net/mined/cygwin/uwc.zip>

Thomas


------------------------------------------------------------------------
Now about the problems I had:
* First, I had to remove one more bug in my wide character replacement 
  functions in order to avoid an occasional crash. Alright.
* Then, Unicode input still would not work. I found that indeed I had 
  overlooked one function to be replaced which is XwcLookupString.
  The code in rxvt (command.C) has an alternative invocation of 
  Xutf8LookupString which is commented "// currently disabled, doesn't 
  seem to work, nor is useful".
  It turns out that it is indeed very useful in making input work; the 
  reason the disabled rxvt code could not work is that the return 
  values are not handled properly.
* Finally, there was some occasional weird display garbage remaining 
  which I am describing below in some detail because there is some 
  really buggy rxvt code involved.

When displaying a long string to the screen it may happen that 
rxvt splits a single UTF-8 character into subsequent fills of some 
internal buffer. (I could not observe this on Linux, however, where 
the buffer seems to be chosen always long enough to fit in the complete 
output, whereas on cygwin it seems to have a maximum length of 257 bytes.)

Then at the end of the buffer, rxvt invokes mbrtowc with an incomplete 
UTF-8 sequence:

mbrtowc (& wc, C3 BC E2, 3, & ps) -> 2, wc = FC
mbrtowc (& wc, E2, 1, & ps) -> -1, wc unchanged
now the continuation of E2, combining to E2 80 A7, the dot symbol U+2027:
mbrtowc (& wc, 80 A7 C3 A4 C3 B6 C3 9F ..., 257, & ps) -> -1, wc unchanged
mbrtowc (& wc, A7 C3 A4 C3 B6 C3 9F E2 ..., 256, & ps) -> -1 wc unchanged
mbrtowc (& wc, C3 A4 C3 B6 C3 9F E2 87 ..., 255, & ps) -> 2 wc = E4

The display produced is "üâ§ä" instead of "ü‧ä".

A sample program xwrite.c demonstrating the bug is included in uwc.zip 
(only if the "return NOCHAR" fix below has not yet been applied).


When I further analysed the mbrtowc function (on Linux where it works), 
it turned out that it maintains a state of incomplete UTF-8 and is 
able to automatically consider this with a continuation sequence 
requested later. Also some comments in the rxvt source suggest that 
rxvt might even depend on this undocumented behaviour. So I 
reimplemented it with my cygwin mbrtowc replacement but the display 
bug remained. It finally turned out that rxvt does not need this 
"feature" (or rather bug, as it's not documented), at least not for 
screen display.

So I checked the invocations of mbrtowc in rxvt in command.C and 
menubar.C; I thought it was the latter because it's inside a function 
called 'draw_string' which quite clearly suggests that it would be used 
for screen display but it was not the case.
It rather turned out that the function 'next_char' in command.C is 
handling screen output which is really weird (the function is 
commented "// read the next octet").
The function has the return option
      if (len == (size_t)-1) {
        return *cmdbuf_ptr++;
with the comment 
"// the _occasional_ latin1 character is allowed to slip through"; 
now this sounds mega-weird - why should something that't not right 
be allowed to slip through? Anyway, replacing this with just
      if (len == (size_t)-1) {
        return NOCHAR;
finally solves the display problem and there we are with a working 
rxvt-unicode on cygwin.


A remaining issue might be 'draw_string' in menubar.C; I don't know 
what its purpose is.


The re-implementation of the setlocale functionality in my replacement 
function which you correctly pointed out is still pending.


------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-05-18 18:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-21  6:28 ITP: rxvt-unicode-X Charles Wilson
2006-03-21 15:33 ` Reid Thompson
     [not found] ` <200603211640.k2LGecHO013433@ns-srv-2.bln1.siemens.de>
2006-03-21 21:54   ` Charles Wilson
2006-05-10 20:31     ` Thomas Wolff
2006-05-11  4:48       ` Charles Wilson
2006-04-02 21:07 ` Charles Wilson
2006-05-18 18:45 Thomas Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).