* length in gawk returns wrong value @ 2012-07-19 8:50 Ralf 2012-07-19 9:21 ` Corinna Vinschen 0 siblings, 1 reply; 14+ messages in thread From: Ralf @ 2012-07-19 8:50 UTC (permalink / raw) To: cygwin The following lines create a file named ttt.txt. The file ttt.txt contains exactly what I want (oct 374 for the umlaut u). But if you look at the output of these lines you can see that the function length() of gawk can not handle this character: uname -a echo "Rücken" > ttt.txt od -c ttt.txt gawk '{print "Length: " length($0)}' ttt.txt Output: CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin 0000000 R 374 c k e n \r \n 0000010 Length: 1 What can I do to get the correct length in gawk without changing the contents of ttt.txt? -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 8:50 length in gawk returns wrong value Ralf @ 2012-07-19 9:21 ` Corinna Vinschen 2012-07-19 11:27 ` Ralf 0 siblings, 1 reply; 14+ messages in thread From: Corinna Vinschen @ 2012-07-19 9:21 UTC (permalink / raw) To: cygwin On Jul 19 08:50, Ralf wrote: > The following lines create a file named ttt.txt. The file ttt.txt contains > exactly what I want (oct 374 for the umlaut u). But if you look at the output of > these lines you can see that the function length() of gawk can not handle this > character: > > uname -a > echo "Rücken" > ttt.txt > od -c ttt.txt > gawk '{print "Length: " length($0)}' ttt.txt > > Output: > CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin Uh oh. 1.7.9 is old. Please update. > 0000000 R 374 c k e n \r \n > 0000010 > Length: 1 > > What can I do to get the correct length in gawk without changing the contents of > ttt.txt? Dunno. This is not what I see. What did you have $LANG and $LC_CTYPE set to? Here's what I see: $ uname -a CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin $ echo $LANG C.UTF-8 $ echo "Rücken" > ttt.txt $ od -c ttt.txt 0000000 R 303 274 c k e n \n 0000010 $ gawk '{print "Length: " length($0)}' ttt.txt Length: 6 $ gawk --version | head -1 GNU Awk 4.0.1 Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 9:21 ` Corinna Vinschen @ 2012-07-19 11:27 ` Ralf 2012-07-19 11:40 ` Corinna Vinschen 0 siblings, 1 reply; 14+ messages in thread From: Ralf @ 2012-07-19 11:27 UTC (permalink / raw) To: cygwin Corinna Vinschen <corinna-cygwin <at> cygwin.com> writes: > > Uh oh. 1.7.9 is old. Please update. > > > 0000000 R 374 c k e n \r \n > > 0000010 > > Length: 1 > > > > What can I do to get the correct length in gawk without changing > > ttt.txt? > > Dunno. This is not what I see. What did you have $LANG and $LC_CTYPE > set to? Here's what I see: > > $ uname -a > CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin > > $ echo $LANG > C.UTF-8 > > $ echo "Rücken" > ttt.txt > $ od -c ttt.txt > 0000000 R 303 274 c k e n \n > 0000010 > > $ gawk '{print "Length: " length($0)}' ttt.txt > Length: 6 > > $ gawk --version | head -1 > GNU Awk 4.0.1 > > Corinna > After updating I added following lines on top of my script: export LANG=C.UTF-8 echo LANG: $LANG echo LC_CTYPE: $LC_TYPE c:/unix/bin/gawk --version | head -1 And this is my output: LANG: C.UTF-8 LC_CTYPE: GNU Awk 4.0.1 CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin 0000000 R 374 c k e n \r \n 0000010 Length: 5 Very strange! But after adding export LC_CTYPE=C I got the correct result. Thanks for your quick help! -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 11:27 ` Ralf @ 2012-07-19 11:40 ` Corinna Vinschen 2012-07-19 12:36 ` Csaba Raduly 0 siblings, 1 reply; 14+ messages in thread From: Corinna Vinschen @ 2012-07-19 11:40 UTC (permalink / raw) To: cygwin On Jul 19 11:27, Ralf wrote: > Corinna Vinschen <corinna-cygwin <at> cygwin.com> writes: > > > > > Uh oh. 1.7.9 is old. Please update. > > > > > 0000000 R 374 c k e n \r \n > > > 0000010 > > > Length: 1 > > > > > > What can I do to get the correct length in gawk without changing > > > ttt.txt? > > > > Dunno. This is not what I see. What did you have $LANG and $LC_CTYPE > > set to? Here's what I see: > > > > $ uname -a > > CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin > > > > $ echo $LANG > > C.UTF-8 > > > > $ echo "Rücken" > ttt.txt > > $ od -c ttt.txt > > 0000000 R 303 274 c k e n \n > > 0000010 > > > > $ gawk '{print "Length: " length($0)}' ttt.txt > > Length: 6 > > > > $ gawk --version | head -1 > > GNU Awk 4.0.1 > > > > Corinna > > > > After updating I added following lines on top of my script: > export LANG=C.UTF-8 > echo LANG: $LANG > echo LC_CTYPE: $LC_TYPE > c:/unix/bin/gawk --version | head -1 > > And this is my output: > LANG: C.UTF-8 > LC_CTYPE: > GNU Awk 4.0.1 > CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin > 0000000 R 374 c k e n \r \n > 0000010 > Length: 5 > > Very strange! Not at all. The file contains an invalid character. 0374 is the umlaut-u in the ISO-8859-1 or ISO-8859-15 codesets. Try this: $ LC_ALL=de_DE gawk '{print "Length: " length($0)}' ttt.txt Length: 6 When you create the file under the UTF-8 codeset, you'll get: 0000000 R 303 274 c k e n \n Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 11:40 ` Corinna Vinschen @ 2012-07-19 12:36 ` Csaba Raduly 2012-07-19 13:58 ` Aaron Schneider 0 siblings, 1 reply; 14+ messages in thread From: Csaba Raduly @ 2012-07-19 12:36 UTC (permalink / raw) To: cygwin On Thu, Jul 19, 2012 at 1:39 PM, Corinna Vinschen wrote: > On Jul 19 11:27, Ralf wrote: >> Corinna Vinschen <corinna-cygwin <at> cygwin.com> writes: >> >> > >> > Uh oh. 1.7.9 is old. Please update. >> > >> > > 0000000 R 374 c k e n \r \n >> > > 0000010 >> > > Length: 1 >> > > >> > > What can I do to get the correct length in gawk without changing >> > > ttt.txt? >> > >> > Dunno. This is not what I see. What did you have $LANG and $LC_CTYPE >> > set to? Here's what I see: >> > >> > $ uname -a >> > CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin >> > >> > $ echo $LANG >> > C.UTF-8 >> > >> > $ echo "Rücken" > ttt.txt >> > $ od -c ttt.txt >> > 0000000 R 303 274 c k e n \n >> > 0000010 >> > >> > $ gawk '{print "Length: " length($0)}' ttt.txt >> > Length: 6 >> > >> > $ gawk --version | head -1 >> > GNU Awk 4.0.1 >> > >> > Corinna >> > >> >> After updating I added following lines on top of my script: >> export LANG=C.UTF-8 >> echo LANG: $LANG >> echo LC_CTYPE: $LC_TYPE >> c:/unix/bin/gawk --version | head -1 >> >> And this is my output: >> LANG: C.UTF-8 >> LC_CTYPE: >> GNU Awk 4.0.1 >> CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.15(0.260/5/3) 2012-05-09 10:25 i686 Cygwin >> 0000000 R 374 c k e n \r \n >> 0000010 >> Length: 5 >> >> Very strange! > > Not at all. The file contains an invalid character. 0374 is the > umlaut-u in the ISO-8859-1 or ISO-8859-15 codesets. Try this: > > $ LC_ALL=de_DE gawk '{print "Length: " length($0)}' ttt.txt > Length: 6 > > When you create the file under the UTF-8 codeset, you'll get: > > 0000000 R 303 274 c k e n \n > Proving, once again, that "There Ain't No Such Thing as Plain Text" http://www.joelonsoftware.com/articles/Unicode.html Csaba -- GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ w++$ tv+ b++ DI D++ 5++ The Tao of math: The numbers you can count are not the real numbers. Life is complex, with real and imaginary parts. "Ok, it boots. Which means it must be bug-free and perfect. " -- Linus Torvalds "People disagree with me. I just ignore them." -- Linus Torvalds -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 12:36 ` Csaba Raduly @ 2012-07-19 13:58 ` Aaron Schneider 2012-07-19 14:56 ` Corinna Vinschen 0 siblings, 1 reply; 14+ messages in thread From: Aaron Schneider @ 2012-07-19 13:58 UTC (permalink / raw) To: cygwin On 19/07/2012 14:35, Csaba Raduly wrote: > > Proving, once again, that "There Ain't No Such Thing as Plain Text" > http://www.joelonsoftware.com/articles/Unicode.html > > > Csaba > No idea, but can't cygwin come with native UTF-8 enabled by default so the behavior is the same for everyone? -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 13:58 ` Aaron Schneider @ 2012-07-19 14:56 ` Corinna Vinschen 2012-07-19 16:17 ` Aaron Schneider 0 siblings, 1 reply; 14+ messages in thread From: Corinna Vinschen @ 2012-07-19 14:56 UTC (permalink / raw) To: cygwin On Jul 19 15:58, Aaron Schneider wrote: > On 19/07/2012 14:35, Csaba Raduly wrote: > > > >Proving, once again, that "There Ain't No Such Thing as Plain Text" > >http://www.joelonsoftware.com/articles/Unicode.html > > > > > >Csaba > > > > No idea, but can't cygwin come with native UTF-8 enabled by default > so the behavior is the same for everyone? It is. See /etc/profile.d/lang.{sh,csh} Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 14:56 ` Corinna Vinschen @ 2012-07-19 16:17 ` Aaron Schneider 2012-07-19 16:46 ` Cliff Hones 0 siblings, 1 reply; 14+ messages in thread From: Aaron Schneider @ 2012-07-19 16:17 UTC (permalink / raw) To: cygwin On 19/07/2012 16:55, Corinna Vinschen wrote: > On Jul 19 15:58, Aaron Schneider wrote: >> On 19/07/2012 14:35, Csaba Raduly wrote: >>> >>> Proving, once again, that "There Ain't No Such Thing as Plain Text" >>> http://www.joelonsoftware.com/articles/Unicode.html >>> >>> >>> Csaba >>> >> >> No idea, but can't cygwin come with native UTF-8 enabled by default >> so the behavior is the same for everyone? > > It is. See /etc/profile.d/lang.{sh,csh} > > > Corinna > Looking at /etc/profile.d/lang.csh if ( $?LC_ALL == 0 && $?LC_CTYPE == 0 && $?LANG == 0 ) setenv LANG `/usr/bin/locale -uU` I wonder why in my system the setenv command does not exist: $ setenv -bash: setenv: command not found and why the if structure is not followed if (test for true) then command ; fi On the other side, /etc/profile.d/lang.sh seems to be ok. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 16:17 ` Aaron Schneider @ 2012-07-19 16:46 ` Cliff Hones 2012-07-19 16:54 ` Aaron Schneider 0 siblings, 1 reply; 14+ messages in thread From: Cliff Hones @ 2012-07-19 16:46 UTC (permalink / raw) To: cygwin On 19/07/2012 17:16, Aaron Schneider wrote: > Looking at /etc/profile.d/lang.csh > if ( $?LC_ALL == 0 && $?LC_CTYPE == 0 && $?LANG == 0 ) setenv LANG `/usr/bin/locale -uU` > > I wonder why in my system the setenv command does not exist: > $ setenv > -bash: setenv: command not found > > and why the if structure is not followed > if (test for true) then command ; fi > > On the other side, /etc/profile.d/lang.sh seems to be ok. I think you'll find the clue is the ".csh" extension. That syntax is for the C-shell, not bash. -- Cliff -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 16:46 ` Cliff Hones @ 2012-07-19 16:54 ` Aaron Schneider 2012-07-19 17:02 ` Eric Blake 2012-07-19 17:03 ` Cliff Hones 0 siblings, 2 replies; 14+ messages in thread From: Aaron Schneider @ 2012-07-19 16:54 UTC (permalink / raw) To: cygwin On 19/07/2012 18:44, Cliff Hones wrote: > On 19/07/2012 17:16, Aaron Schneider wrote: > >> Looking at /etc/profile.d/lang.csh >> if ( $?LC_ALL == 0 && $?LC_CTYPE == 0 && $?LANG == 0 ) setenv LANG `/usr/bin/locale -uU` >> >> I wonder why in my system the setenv command does not exist: >> $ setenv >> -bash: setenv: command not found >> >> and why the if structure is not followed >> if (test for true) then command ; fi >> >> On the other side, /etc/profile.d/lang.sh seems to be ok. > > I think you'll find the clue is the ".csh" extension. That syntax is > for the C-shell, not bash. > > -- Cliff > I can't find such csh or cshell on my system, I've searched from packages and I only see scsh, slsh, posh, mosh, tcsh, zsh, mksh that I don't have installed in my system any of them, unless csh comes with the system. How do I run the csh? -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 16:54 ` Aaron Schneider @ 2012-07-19 17:02 ` Eric Blake 2012-07-19 17:15 ` Aaron Schneider 2012-07-20 14:42 ` Reini Urban 2012-07-19 17:03 ` Cliff Hones 1 sibling, 2 replies; 14+ messages in thread From: Eric Blake @ 2012-07-19 17:02 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1073 bytes --] On 07/19/2012 10:53 AM, Aaron Schneider wrote: >> I think you'll find the clue is the ".csh" extension. That syntax is >> for the C-shell, not bash. >> >> -- Cliff >> > > I can't find such csh or cshell on my system, I've searched from > packages and I only see scsh, slsh, posh, mosh, tcsh, zsh, mksh that I Both scsh and tcsh are from the csh family of shells. posh, zsh, mksh, bash, dash, and ksh are from the Bourne family of shells No idea what slsh or mosh are > don't have installed in my system any of them, unless csh comes with the > system. How do I run the csh? Why bother? csh syntax is non-standard, and in my opinion, it is ugly (others around here disagree, or tcsh would have died long ago, but that's a different story - it's mostly people that were on a system that picked csh as its default shell long before standardization picked Bourne over csh syntax). http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/ -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 620 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 17:02 ` Eric Blake @ 2012-07-19 17:15 ` Aaron Schneider 2012-07-20 14:42 ` Reini Urban 1 sibling, 0 replies; 14+ messages in thread From: Aaron Schneider @ 2012-07-19 17:15 UTC (permalink / raw) To: cygwin On 19/07/2012 19:02, Eric Blake wrote: > > Why bother? csh syntax is non-standard, and in my opinion, it is ugly > (others around here disagree, or tcsh would have died long ago, but > that's a different story - it's mostly people that were on a system that > picked csh as its default shell long before standardization picked > Bourne over csh syntax). > http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/ > Ok, I understand that you don't have to execute both lang.sh or lang.csh; they are executed depending on the shell you have, there is no need to run both, in fact they do the same. Default shell will suffice and is better for porting scripts. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 17:02 ` Eric Blake 2012-07-19 17:15 ` Aaron Schneider @ 2012-07-20 14:42 ` Reini Urban 1 sibling, 0 replies; 14+ messages in thread From: Reini Urban @ 2012-07-20 14:42 UTC (permalink / raw) To: cygwin On Thu, Jul 19, 2012 at 12:02 PM, Eric Blake wrote: > On 07/19/2012 10:53 AM, Aaron Schneider wrote: >> I can't find such csh or cshell on my system, I've searched from >> packages and I only see scsh, slsh, posh, mosh, tcsh, zsh, mksh that I > > Both scsh and tcsh are from the csh family of shells. > > posh, zsh, mksh, bash, dash, and ksh are from the Bourne family of shells > > No idea what slsh or mosh are No login shells. mosh is a a remote shell, a better ssh for slow or varying connections. http://mosh.mit.edu/ slsh is the S-Lang shell. A different beast. http://www.jedsoft.org/slang/doc/html/slang-2.html#ss2.1 -- Reini Urban http://cpanel.net/ http://www.perl-compiler.org/ -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: length in gawk returns wrong value 2012-07-19 16:54 ` Aaron Schneider 2012-07-19 17:02 ` Eric Blake @ 2012-07-19 17:03 ` Cliff Hones 1 sibling, 0 replies; 14+ messages in thread From: Cliff Hones @ 2012-07-19 17:03 UTC (permalink / raw) To: cygwin On 19/07/2012 17:53, Aaron Schneider wrote: > I can't find such csh or cshell on my system, I've searched from packages and I only see scsh, slsh, posh, mosh, tcsh, zsh, mksh that I don't have installed in my system any of them, unless csh comes with the system. How do I run the csh? tcsh is the C shell - as indeed it says in the description setup.exe shows you. -- Cliff -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-07-20 14:42 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-07-19 8:50 length in gawk returns wrong value Ralf 2012-07-19 9:21 ` Corinna Vinschen 2012-07-19 11:27 ` Ralf 2012-07-19 11:40 ` Corinna Vinschen 2012-07-19 12:36 ` Csaba Raduly 2012-07-19 13:58 ` Aaron Schneider 2012-07-19 14:56 ` Corinna Vinschen 2012-07-19 16:17 ` Aaron Schneider 2012-07-19 16:46 ` Cliff Hones 2012-07-19 16:54 ` Aaron Schneider 2012-07-19 17:02 ` Eric Blake 2012-07-19 17:15 ` Aaron Schneider 2012-07-20 14:42 ` Reini Urban 2012-07-19 17:03 ` Cliff Hones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).