From: Csaba Raduly <rcsaba@gmail.com>
To: cygwin@cygwin.com
Subject: Re: length in gawk returns wrong value
Date: Thu, 19 Jul 2012 12:36:00 -0000 [thread overview]
Message-ID: <CAEhDDbCJyHY-MWPCZ5=OQJFyvohuUU4AFsoPDzFudLQgfb-8Jw@mail.gmail.com> (raw)
In-Reply-To: <20120719113927.GH31055@calimero.vinschen.de>
On Thu, Jul 19, 2012 at 1:39 PM, Corinna Vinschen wrote:
> On Jul 19 11:27, Ralf wrote:
>> Corinna Vinschen <corinna-cygwin <at> cygwin.com> writes:
>>
>> >
>> > Uh oh. 1.7.9 is old. Please update.
>> >
>> > > 0000000 R 374 c k e n \r \n
>> > > 0000010
>> > > Length: 1
>> > >
>> > > What can I do to get the correct length in gawk without changing
>> > > ttt.txt?
>> >
>> > Dunno. This is not what I see. What did you have $LANG and $LC_CTYPE
>> > set to? Here's what I see:
>> >
>> > $ uname -a
>> > CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin
>> >
>> > $ echo $LANG
>> > C.UTF-8
>> >
>> > $ echo "Rücken" > ttt.txt
>> > $ od -c ttt.txt
>> > 0000000 R 303 274 c k e n \n
>> > 0000010
>> >
>> > $ gawk '{print "Length: " length($0)}' ttt.txt
>> > Length: 6
>> >
>> > $ gawk --version | head -1
>> > GNU Awk 4.0.1
>> >
>> > Corinna
>> >
>>
>> After updating I added following lines on top of my script:
>> export LANG=C.UTF-8
>> echo LANG: $LANG
>> echo LC_CTYPE: $LC_TYPE
>> c:/unix/bin/gawk --version | head -1
>>
>> And this is my output:
>> LANG: C.UTF-8
>> LC_CTYPE:
>> GNU Awk 4.0.1
>> CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin
>> 0000000 R 374 c k e n \r \n
>> 0000010
>> Length: 5
>>
>> Very strange!
>
> Not at all. The file contains an invalid character. 0374 is the
> umlaut-u in the ISO-8859-1 or ISO-8859-15 codesets. Try this:
>
> $ LC_ALL=de_DE gawk '{print "Length: " length($0)}' ttt.txt
> Length: 6
>
> When you create the file under the UTF-8 codeset, you'll get:
>
> 0000000 R 303 274 c k e n \n
>
Proving, once again, that "There Ain't No Such Thing as Plain Text"
http://www.joelonsoftware.com/articles/Unicode.html
Csaba
--
GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ w++$ tv+ b++ DI D++ 5++
The Tao of math: The numbers you can count are not the real numbers.
Life is complex, with real and imaginary parts.
"Ok, it boots. Which means it must be bug-free and perfect. " -- Linus Torvalds
"People disagree with me. I just ignore them." -- Linus Torvalds
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
next prev parent reply other threads:[~2012-07-19 12:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-19 8:50 Ralf
2012-07-19 9:21 ` Corinna Vinschen
2012-07-19 11:27 ` Ralf
2012-07-19 11:40 ` Corinna Vinschen
2012-07-19 12:36 ` Csaba Raduly [this message]
2012-07-19 13:58 ` Aaron Schneider
2012-07-19 14:56 ` Corinna Vinschen
2012-07-19 16:17 ` Aaron Schneider
2012-07-19 16:46 ` Cliff Hones
2012-07-19 16:54 ` Aaron Schneider
2012-07-19 17:02 ` Eric Blake
2012-07-19 17:15 ` Aaron Schneider
2012-07-20 14:42 ` Reini Urban
2012-07-19 17:03 ` Cliff Hones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEhDDbCJyHY-MWPCZ5=OQJFyvohuUU4AFsoPDzFudLQgfb-8Jw@mail.gmail.com' \
--to=rcsaba@gmail.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).