public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Internal echo of shell beaves (sometimes) different to external echo
@ 2012-07-19 14:51 Ralf
  2012-07-19 20:16 ` Cyrille Lefevre
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf @ 2012-07-19 14:51 UTC (permalink / raw)
  To: cygwin

Is there a way to get the right umlaut with the internal echo of the shell?
Example script:

export LC_ALL=de_DE
c:/unix/bin/uname -a
echo "Rücken" > ttt.txt
cat ttt.txt
c:/unix/bin/od -c ttt.txt
c:/unix/bin/echo "Rücken"
c:/unix/bin/echo "Rücken" | c:/unix/bin/od -c
echo "Rücken"
echo "Rücken" | c:/unix/bin/od -c

The output is:
CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin
Rücken
0000000   R   ü   c   k   e   n  \r  \n
0000010
Rücken
0000000   R   ü   c   k   e   n  \n
0000007
Râ–’cken
0000000   R   ü   c   k   e   n  \n
0000007

It's strange that the internal echo only gives the right output if its
redirected. Is there a way to get the right output without redirecting?
(I tried output-meta but with no success)


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Internal echo of shell beaves (sometimes) different to external echo
  2012-07-19 14:51 Internal echo of shell beaves (sometimes) different to external echo Ralf
@ 2012-07-19 20:16 ` Cyrille Lefevre
  2012-07-20  7:08   ` Ralf
  0 siblings, 1 reply; 7+ messages in thread
From: Cyrille Lefevre @ 2012-07-19 20:16 UTC (permalink / raw)
  To: cygwin

Le 19/07/2012 16:51, Ralf a écrit :
> Is there a way to get the right umlaut with the internal echo of the shell?
> Example script:
>
> export LC_ALL=de_DE

seems to default to iso8859-1 or something like that, let's try

export LC_ALL=de_DE.UTF-8

which should work better...

also, check your that terminal setting is UTF-8 both for input/output.

tried using fr_FR vs fr_FR.UTF-8 and echo vs /bin/echo, both work well 
in all case !


Regards,

Cyrille Lefevre
-- 
mailto:Cyrille.Lefevre-lists@laposte.net




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Internal echo of shell beaves (sometimes) different to external echo
  2012-07-19 20:16 ` Cyrille Lefevre
@ 2012-07-20  7:08   ` Ralf
  2012-07-20  8:25     ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf @ 2012-07-20  7:08 UTC (permalink / raw)
  To: cygwin

Cyrille Lefevre <cyrille.lefevre-lists <at> laposte.net> writes:

> 
> Le 19/07/2012 16:51, Ralf a écrit :
> > Is there a way to get the right umlaut with the internal echo of the shell?
> > Example script:
> >
> > export LC_ALL=de_DE
> 
> seems to default to iso8859-1 or something like that, let's try
> 
> export LC_ALL=de_DE.UTF-8
> 
> which should work better...
> 

export LC_ALL=de_DE.UTF-8 gives following output:

 C:\>bash ttt.sh
 CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin
 Râ–’cken
 0000000   R 374   c   k   e   n  \r  \n
 0000010
 Râ–’cken
 0000000   R 374   c   k   e   n  \n
 0000007
 Râ–’cken
 0000000   R 374   c   k   e   n  \n
 0000007


> also, check your that terminal setting is UTF-8 both for input/output.

How do I check that my terminal setting is UTF-8 both for input/output?
All settings I use are made in inputrc:
 C:\>cat ~/.inputrc
 set meta-flag on
 set convert-meta off
 set output-meta on
And in my environment variables:
 ...
 BASH=/usr/bin/sh
 BASHOPTS=cmdhist:expand_aliases:extquote:force_fignore:hostcomplete:interactive
 _comments:progcomp:promptvars:sourcepath
 BASH_ALIASES=()
 BASH_ARGC=()
 BASH_ARGV=()
 BASH_CMDS=()
 BASH_LINENO=()
 BASH_SOURCE=()
 BASH_VERSINFO=([0]="4" [1]="1" [2]="10" [3]="4" [4]="release" [5]="i686-
 pc-cygwin")
 BASH_VERSION='4.1.10(4)-release'
 ...
 CYGWIN=nodosfilewarning
 ...
 POSIXLY_CORRECT=y
 ...
 SHELL=/bin/bash
 SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-
 comments:monitor:posix
 ...
 TERM=cygwin
 ....
> 
> tried using fr_FR vs fr_FR.UTF-8 and echo vs /bin/echo, both work well 
> in all case !
> 

Do you use rxvt, mintty or something like that?

Regards
Ralf


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Internal echo of shell beaves (sometimes) different to external echo
  2012-07-20  7:08   ` Ralf
@ 2012-07-20  8:25     ` Corinna Vinschen
  2012-07-20 10:47       ` Ralf
  0 siblings, 1 reply; 7+ messages in thread
From: Corinna Vinschen @ 2012-07-20  8:25 UTC (permalink / raw)
  To: cygwin

On Jul 20 07:08, Ralf wrote:
> Cyrille Lefevre <cyrille.lefevre-lists <at> laposte.net> writes:
> 
> > 
> > Le 19/07/2012 16:51, Ralf a écrit :
> > > Is there a way to get the right umlaut with the internal echo of the shell?
> > > Example script:
> > >
> > > export LC_ALL=de_DE
> > 
> > seems to default to iso8859-1 or something like that, let's try
> > 
> > export LC_ALL=de_DE.UTF-8
> > 
> > which should work better...
> > 
> 
> export LC_ALL=de_DE.UTF-8 gives following output:
> 
>  C:\>bash ttt.sh
>  CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin
>  Râ–’cken
>  0000000   R 374   c   k   e   n  \r  \n
>  0000010
>  Râ–’cken
>  0000000   R 374   c   k   e   n  \n
>  0000007
>  Râ–’cken
>  0000000   R 374   c   k   e   n  \n
>  0000007

What you don't seem to see is that the codeset doesn't play any role
anymore *at this point in time*.  You already created the string
"Rücken" in ISO-8859-1 at the time you created the script and your
script will diligently create the file ttt.txt with the word Rücken in
ISO-8859-1, because that's how it's stored in the script.  Thus, it
doesn't matter what codeset you have set when running that script.

Here's an idea for you to test:

Replace

  echo "Rücken" > ttt.txt

with

  read -p "Enter: " foo
  echo "$foo" > ttt.txt

And then start your script with LANG set to, for instance, C.UTF-8, as is
the default when running an interactive Cygwin shell like bash or tcsh.
(though I would prefer to use POSIX paths rather than DOS paths:
 http://cygwin.com/cygwin-ug-net/using.html#using-pathnames)


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Internal echo of shell beaves (sometimes) different to external echo
  2012-07-20  8:25     ` Corinna Vinschen
@ 2012-07-20 10:47       ` Ralf
  2012-07-20 11:53         ` Andy Koppe
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf @ 2012-07-20 10:47 UTC (permalink / raw)
  To: cygwin

My problem is not that the script is in ISO-8859-1, nor that the strings
or ttt.txt are in ISO-8859.1. They have to be in ISO-8859-1 because all my
scripts are in ISO-8859-1 and they are used together with Windows-Programs
(in the DOS-Box) which read and write only ISO-8851-1.

My Problem is to handle in Shell-Scripts strings which are coded in
ISO-8851 (and line-endings which depend on relative/absolute filenames,
mounting and so on) without rewriting all the stuff.

So what't the best setting in cygwin to echo ISO-88591? I still don't
unterstand why the internal echo behaves in a different way from the external
echo.




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Internal echo of shell beaves (sometimes) different to external echo
  2012-07-20 10:47       ` Ralf
@ 2012-07-20 11:53         ` Andy Koppe
  2012-07-20 12:09           ` Ralf
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Koppe @ 2012-07-20 11:53 UTC (permalink / raw)
  To: cygwin

On 20 July 2012 11:46, Ralf wrote:
> My problem is not that the script is in ISO-8859-1, nor that the strings
> or ttt.txt are in ISO-8859.1. They have to be in ISO-8859-1 because all my
> scripts are in ISO-8859-1 and they are used together with Windows-Programs
> (in the DOS-Box) which read and write only ISO-8851-1.
>
> My Problem is to handle in Shell-Scripts strings which are coded in
> ISO-8851 (and line-endings which depend on relative/absolute filenames,
> mounting and so on) without rewriting all the stuff.
>
> So what't the best setting in cygwin to echo ISO-88591? I still don't
> unterstand why the internal echo behaves in a different way from the external
> echo.

It's because setting LC_ALL in a bash script is too late for the bash
process itself, which will be using the default C.UTF-8 locale unless
something else is set when bash is invoked.

When stuff is written to a console (but not a pty-based terminal), the
Cygwin DLL converts it from the process charset (UTF-8 in this case)
to UTF-16 to pass it to the relevant Windows API function. Your
ISO-8859-1 encoded 'ü' is an invalid byte when interpreted as UTF-8,
hence the error character.

/usr/bin/echo on the other hand is invoked as a separate process, with
LC_ALL already set appropriately, hence they're you're getting the
expected output.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Internal echo of shell beaves (sometimes) different to external echo
  2012-07-20 11:53         ` Andy Koppe
@ 2012-07-20 12:09           ` Ralf
  0 siblings, 0 replies; 7+ messages in thread
From: Ralf @ 2012-07-20 12:09 UTC (permalink / raw)
  To: cygwin

Andy Koppe <andy.koppe <at> gmail.com> writes:


> 
> It's because setting LC_ALL in a bash script is too late for the bash
> process itself, which will be using the default C.UTF-8 locale unless
> something else is set when bash is invoked.
> 
Now I understand. Setting LC_ALL before calling ttt.sh works:

C:\>set LC_ALL=de_DE
C:\>bash ttt.sh
CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin
Rücken
0000000   R   ü   c   k   e   n  \r  \n
0000010
Rücken
0000000   R   ü   c   k   e   n  \n
0000007
Rücken
0000000   R   ü   c   k   e   n  \n
0000007

Thanks!


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-07-20 12:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-19 14:51 Internal echo of shell beaves (sometimes) different to external echo Ralf
2012-07-19 20:16 ` Cyrille Lefevre
2012-07-20  7:08   ` Ralf
2012-07-20  8:25     ` Corinna Vinschen
2012-07-20 10:47       ` Ralf
2012-07-20 11:53         ` Andy Koppe
2012-07-20 12:09           ` Ralf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).