From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E593E3858419; Fri, 1 Oct 2021 22:23:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E593E3858419 From: "soko246 at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug locale/28409] New: iconv corrupted output when input has convertable and non-convertable chars Date: Fri, 01 Oct 2021 22:23:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: locale X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: soko246 at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2021 22:23:26 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D28409 Bug ID: 28409 Summary: iconv corrupted output when input has convertable and non-convertable chars Product: glibc Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: locale Assignee: unassigned at sourceware dot org Reporter: soko246 at gmail dot com Target Milestone: --- Using iconv results in corrupted output, when "-c" flag is used for input w= here characters that *can* and *cannot* be converted appear together. The issue only manifests for rather large inputs (presumably > 32K). Steps to reproduce: run the following in bash: >export LANG=3DC >perl -E 'say "\x58\xe2\x58\xc3\x92\x58\xe2\x58\x58\xe2\x58\xc3\x92\x58\xe2= \x58\n" x 15000' | iconv -c -f ISO-8859-3 -t UTF-8 | sort | uniq -c Expected output: >15000 X=C3=A2X=EF=BF=BDX=C3=A2XX=C3=A2X=EF=BF=BDX=C3=A2X Actual output: > 1 > 2 XX=C3=A2X=EF=BF=BDX=C3=A2X > 2 X=C3=A2X=EF=BF=BDXX=C3=A2X > 2 X=C3=A2X=EF=BF=BDX=C3=A2X > 1 X=C3=A2X=EF=BF=BDX=C3=A2XX > 2 X=C3=A2X=EF=BF=BDX=C3=A2XX=C3=A2X=EF=BF=BDX=EF=BF=BDX=C3=A2XX=C3=A2X=EF= =BF=BDX=C3=A2X > 14917 X=C3=A2X=EF=BF=BDX=C3=A2XX=C3=A2X=EF=BF=BDX=C3=A2X As can be seen, many lines just disappear (14917+2+1+2+2+2+1 don't sum up to 15000).=20 Actual specific input does not matter, as long as it has a mix of convertab= le and non-convertable characters. Reducing number of input lines to smaller number (ex. 1000) and all works as expected: >1000 X=C3=A2X=EF=BF=BDX=C3=A2XX=C3=A2X=EF=BF=BDX=C3=A2X I tried this for ISO-8859-3 and ISO-8859-8 (same input) with similar (wrong) results. Using piconv (Perl variant of iconv) instead of iconv produces correct resu= lts. --=20 You are receiving this mail because: You are on the CC list for the bug.=