From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32293 invoked by alias); 24 Apr 2009 16:15:35 -0000 Received: (qmail 32262 invoked by uid 22791); 24 Apr 2009 16:15:33 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from sunsite.ms.mff.cuni.cz (HELO sunsite.mff.cuni.cz) (195.113.15.26) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 24 Apr 2009 16:15:25 +0000 Received: from sunsite.mff.cuni.cz (localhost.localdomain [127.0.0.1]) by sunsite.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id n3OGOwx6019252; Fri, 24 Apr 2009 18:24:58 +0200 Received: (from jakub@localhost) by sunsite.mff.cuni.cz (8.13.8/8.13.8/Submit) id n3OGOw84019245; Fri, 24 Apr 2009 18:24:58 +0200 Date: Fri, 24 Apr 2009 16:15:00 -0000 From: Jakub Jelinek To: Ulrich Drepper Cc: Glibc hackers Subject: [PATCH] Fix iconv from SHIFT-JIS Message-ID: <20090424162458.GA16681@sunsite.ms.mff.cuni.cz> Reply-To: Jakub Jelinek Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i Mailing-List: contact libc-hacker-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-hacker-owner@sourceware.org X-SW-Source: 2009-04/txt/msg00005.txt.bz2 Hi! As can be seen on the https://bugzilla.redhat.com/show_bug.cgi?id=497267 testcase, sjis.c handles invalid 2 byte inputs incorrectly, inptr += 2 shouldn't be done before STANDARD_FROM_LOOP_ERR_HANDLER (2). The code does: if (something) STANDARD_FROM_LOOP_ERR_HANDLER (1); else { if (...) ch = ... else... ch = ... inptr += 2; } if (ch == 0) STANDARD_FROM_LOOP_ERR_HANDLER (2); As STANDARD_FROM_LOOP_ERR_HANDLER never falls through (either does break or continue), we can move the second S_F_L_E_H into the else and it really has to be done before the inptr += 2, otherwise when stopping on errors we move over the invalid 2 byte sequence and when ignoring errors skip over 4 bytes instead of 2. Also, I believe we should skip over 2 bytes instead of 1 when ignoring errors for say \xea\xaf, instead of trying to convert the second byte as start of a character. 2009-04-24 Jakub Jelinek * iconvdata/sjis.c (BODY): Don't advance inptr before STANDARD_FROM_LOOP_ERR_HANDLER (2) for 2 byte invalid input. Use STANDARD_FROM_LOOP_ERR_HANDLER with 2 instead of 1 for two byte chars. --- libc/iconvdata/sjis.c.jj 2002-12-02 23:07:56.000000000 +0100 +++ libc/iconvdata/sjis.c 2009-04-24 18:09:01.000000000 +0200 @@ -1,5 +1,5 @@ /* Mapping tables for SJIS handling. - Copyright (C) 1997-2001, 2002 Free Software Foundation, Inc. + Copyright (C) 1997-2001, 2002, 2009 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Ulrich Drepper , 1997. @@ -4379,7 +4379,7 @@ static const char from_ucs4_extra[0x100] || __builtin_expect (idx > 0xeaa4, 0)) \ { \ /* This is illegal. */ \ - STANDARD_FROM_LOOP_ERR_HANDLER (1); \ + STANDARD_FROM_LOOP_ERR_HANDLER (2); \ } \ else \ { \ @@ -4395,14 +4395,15 @@ static const char from_ucs4_extra[0x100] else \ ch = cjk_block4[(ch - 0xe0) * 192 + ch2 - 0x40]; \ \ + if (__builtin_expect (ch == 0, 0)) \ + { \ + /* This is an illegal character. */ \ + STANDARD_FROM_LOOP_ERR_HANDLER (2); \ + } \ + \ inptr += 2; \ } \ \ - if (__builtin_expect (ch == 0, 0)) \ - { \ - /* This is an illegal character. */ \ - STANDARD_FROM_LOOP_ERR_HANDLER (2); \ - } \ } \ \ put32 (outptr, ch); \ Jakub