From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129158 invoked by alias); 2 Nov 2015 14:32:28 -0000 Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-regex-owner@sourceware.org Received: (qmail 129105 invoked by uid 48); 2 Nov 2015 14:32:24 -0000 From: "arekm at maven dot pl" To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/18986] ERE '0|()0|\1|0' causes regexec undefined behavior Date: Mon, 02 Nov 2015 14:32:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: regex X-Bugzilla-Version: 2.22 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: arekm at maven dot pl X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security+ X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-11/txt/msg00000.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D18986 Arkadiusz Miskiewicz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |arekm at maven dot pl --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-regex-return-700-listarch-glibc-bugs-regex=sources.redhat.com@sourceware.org Wed Dec 09 14:40:29 2015 Return-Path: Delivered-To: listarch-glibc-bugs-regex@sources.redhat.com Received: (qmail 54966 invoked by alias); 9 Dec 2015 14:40:28 -0000 Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-regex-owner@sourceware.org Delivered-To: mailing list glibc-bugs-regex@sourceware.org Received: (qmail 54630 invoked by uid 48); 9 Dec 2015 14:40:21 -0000 From: "alex_y_xu at yahoo dot ca" To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines Date: Wed, 09 Dec 2015 14:40:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: regex X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: alex_y_xu at yahoo dot ca X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-12/txt/msg00000.txt.bz2 Content-length: 1548 https://sourceware.org/bugzilla/show_bug.cgi?id=3D19348 Bug ID: 19348 Summary: re_search is incredibly slow when processing '$' on long lines Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: regex Assignee: unassigned at sourceware dot org Reporter: alex_y_xu at yahoo dot ca CC: drepper.fsp at gmail dot com Target Milestone: --- $ echo {1..5000000} > file # adjust based on CPU speed $ time sed -e 's/$/stuff/' file >/dev/null # logical way to append to l= ines sed -e 's/$/stuff/' file > /dev/null 2.91s user 0.09s system 99% cpu 3= .007 total $ time sed -e 's/.*/&stuff/' file >/dev/null sed -e 's/.*/&stuff/' file > /dev/null 1.62s user 0.34s system 99% cpu 1.972 total musl via busybox sed was tested to be 2x faster in the first case than in t= he second. intuitively, this does not make sense. .* should be slower because it needs= to match the entire string whereas $ can skip to the end of the line (since sed must already find the new line in order to run the commands). however, glibc spends an inordinate amount of time inside of check_halt_state_context, re_state_reconstruct, and re_string_context_at, according to callgrind. I am unsure whether this qualifies as a glibc bug or how to fix it, but I t= hink it is useful to have on the record. --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-regex-return-701-listarch-glibc-bugs-regex=sources.redhat.com@sourceware.org Wed Dec 09 16:43:38 2015 Return-Path: Delivered-To: listarch-glibc-bugs-regex@sources.redhat.com Received: (qmail 44982 invoked by alias); 9 Dec 2015 16:43:37 -0000 Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-regex-owner@sourceware.org Delivered-To: mailing list glibc-bugs-regex@sourceware.org Received: (qmail 44731 invoked by uid 48); 9 Dec 2015 16:43:34 -0000 From: "alex_y_xu at yahoo dot ca" To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/19348] re_search matches $ much slower than .* Date: Wed, 09 Dec 2015 16:43:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: regex X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: alex_y_xu at yahoo dot ca X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: short_desc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-12/txt/msg00001.txt.bz2 Content-length: 493 https://sourceware.org/bugzilla/show_bug.cgi?id=3D19348 alex_y_xu at yahoo dot ca changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|re_search is incredibly |re_search matches $ much |slow when processing '$' on |slower than .* |long lines | --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-regex-return-702-listarch-glibc-bugs-regex=sources.redhat.com@sourceware.org Fri Dec 18 09:26:19 2015 Return-Path: Delivered-To: listarch-glibc-bugs-regex@sources.redhat.com Received: (qmail 109039 invoked by alias); 18 Dec 2015 09:26:19 -0000 Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-regex-owner@sourceware.org Delivered-To: mailing list glibc-bugs-regex@sourceware.org Received: (qmail 98108 invoked by uid 48); 18 Dec 2015 09:26:15 -0000 From: "t.rus76 at ya dot ru" To: glibc-bugs-regex@sourceware.org Subject: [Bug regex/19376] New: regcomp.c needs to be upgraded to GNU Grep's one Date: Fri, 18 Dec 2015 09:26:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: regex X-Bugzilla-Version: 2.22 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: t.rus76 at ya dot ru X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-12/txt/msg00002.txt.bz2 Content-length: 1392 https://sourceware.org/bugzilla/show_bug.cgi?id=3D19376 Bug ID: 19376 Summary: regcomp.c needs to be upgraded to GNU Grep's one Product: glibc Version: 2.22 Status: NEW Severity: normal Priority: P2 Component: regex Assignee: unassigned at sourceware dot org Reporter: t.rus76 at ya dot ru CC: drepper.fsp at gmail dot com Target Milestone: --- Symptom: GNU Grep does not handle Syriac characters (U+0700 =E2=80=93 U+074= F) correctly $ echo '=DC=AB=DC=A0=DC=A1=DC=90' > peace $ egrep '\<[=DC=90-=DC=AC]' peace grep: Invalid collation character $ awk /'\<[=DC=90-=DC=AC]'/ peace =DC=AB=DC=A0=DC=A1=DC=90 However when grep is build with ./configure --with-included-regex it works just fine and there is no REG_ECOLLATE error $ echo =DC=AB=DC=A0=DC=A1=DC=90 | src/egrep [=DC=AB-=DC=AC] =DC=AB=DC=A0=DC=A1=DC=90 $ echo =DC=AB=DC=A0=DC=A1=DC=90 | src/egrep [=DC=92-=DC=93] $ This is because GNU Grep contains improved version of regcomp. The bus was found here: http://forum.rosalab.ru/viewtopic.php?f=3D53&t=3D6219&p=3D54747 (in Russian) It is tested and confirmed also on Gentoo (both glibc and grep are 2.22). I expect there are other bugs that could be fixed with this upgrade. --=20 You are receiving this mail because: You are on the CC list for the bug.