public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Fix regcomp.c build for 32-bit with GCC mainline
@ 2017-11-20 19:04 Joseph Myers
  2017-11-20 20:05 ` Adhemerval Zanella
  0 siblings, 1 reply; 8+ messages in thread
From: Joseph Myers @ 2017-11-20 19:04 UTC (permalink / raw)
  To: libc-alpha

posix/regcomp.c fails to build for 32-bit configurations with GCC
mainline:

In file included from regex.c:66:
regcomp.c: In function 'init_word_char':
regcomp.c:934:24: error: conversion from 'long long unsigned int' to 'bitset_word_t {aka long unsigned int}' changes value from '287948901175001088' to '0' [-Werror=overflow]
    dfa->word_char[0] = wc0;
                        ^~~
regcomp.c:935:24: error: conversion from 'long long unsigned int' to 'bitset_word_t {aka long unsigned int}' changes value from '576460745995190270' to '2281701374' [-Werror=overflow]
    dfa->word_char[1] = wc1;
                        ^~~

The code has a workaround using const temporary variables instead of
having constants directly on the RHS of the assignments; this
workaround no longer suffices to avoid the warning.  This patch
replaces the workaround by a different one, using conditional
expressions that evaluate to 0, in the case where this code is inside
an "if" whose condition evaluates to 0.

Tested compilation for arm with GCC mainline with build-many-glibcs.py
(where it allows the compilation to get further, next running into
-Werror=format-overflow= failures in res_debug.c, for which I've filed
bug 22463 as there seems to be a potentially user-visible bug there);
also tested for 32-bit x86 with an older compiler to make sure no
execution failures result from this patch.

2017-11-20  Joseph Myers  <joseph@codesourcery.com>

	* posix/regcomp.c (init_word_char): Use conditional expressions
	not intermediate const variables to avoid -Werror=overflow errors
	for 32-bit systems in 64-bit case.

diff --git a/posix/regcomp.c b/posix/regcomp.c
index 871ae2f..54573f7 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -927,12 +927,14 @@ init_word_char (re_dfa_t *dfa)
     {
       if (sizeof (dfa->word_char[0]) == 8)
 	{
-          /* The extra temporaries here avoid "implicitly truncated"
+          /* The conditionals here avoid "implicitly truncated"
              warnings in the case when this is dead code, i.e. 32-bit.  */
-          const uint64_t wc0 = UINT64_C (0x03ff000000000000);
-          const uint64_t wc1 = UINT64_C (0x07fffffe87fffffe);
-	  dfa->word_char[0] = wc0;
-	  dfa->word_char[1] = wc1;
+	  dfa->word_char[0] = (sizeof (dfa->word_char[0]) == 8
+			       ? UINT64_C (0x03ff000000000000)
+			       : 0);
+	  dfa->word_char[1] = (sizeof (dfa->word_char[0]) == 8
+			       ? UINT64_C (0x07fffffe87fffffe)
+			       : 0);
 	  i = 2;
 	}
       else if (sizeof (dfa->word_char[0]) == 4)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-20 19:04 Fix regcomp.c build for 32-bit with GCC mainline Joseph Myers
@ 2017-11-20 20:05 ` Adhemerval Zanella
  2017-11-20 23:18   ` Joseph Myers
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella @ 2017-11-20 20:05 UTC (permalink / raw)
  To: libc-alpha



On 20/11/2017 17:03, Joseph Myers wrote:
> posix/regcomp.c fails to build for 32-bit configurations with GCC
> mainline:
> 
> In file included from regex.c:66:
> regcomp.c: In function 'init_word_char':
> regcomp.c:934:24: error: conversion from 'long long unsigned int' to 'bitset_word_t {aka long unsigned int}' changes value from '287948901175001088' to '0' [-Werror=overflow]
>     dfa->word_char[0] = wc0;
>                         ^~~
> regcomp.c:935:24: error: conversion from 'long long unsigned int' to 'bitset_word_t {aka long unsigned int}' changes value from '576460745995190270' to '2281701374' [-Werror=overflow]
>     dfa->word_char[1] = wc1;
>                         ^~~
> 
> The code has a workaround using const temporary variables instead of
> having constants directly on the RHS of the assignments; this
> workaround no longer suffices to avoid the warning.  This patch
> replaces the workaround by a different one, using conditional
> expressions that evaluate to 0, in the case where this code is inside
> an "if" whose condition evaluates to 0.
> 
> Tested compilation for arm with GCC mainline with build-many-glibcs.py
> (where it allows the compilation to get further, next running into
> -Werror=format-overflow= failures in res_debug.c, for which I've filed
> bug 22463 as there seems to be a potentially user-visible bug there);
> also tested for 32-bit x86 with an older compiler to make sure no
> execution failures result from this patch.
> 
> 2017-11-20  Joseph Myers  <joseph@codesourcery.com>
> 
> 	* posix/regcomp.c (init_word_char): Use conditional expressions
> 	not intermediate const variables to avoid -Werror=overflow errors
> 	for 32-bit systems in 64-bit case.
> 
> diff --git a/posix/regcomp.c b/posix/regcomp.c
> index 871ae2f..54573f7 100644
> --- a/posix/regcomp.c
> +++ b/posix/regcomp.c
> @@ -927,12 +927,14 @@ init_word_char (re_dfa_t *dfa)
>      {
>        if (sizeof (dfa->word_char[0]) == 8)
>  	{
> -          /* The extra temporaries here avoid "implicitly truncated"
> +          /* The conditionals here avoid "implicitly truncated"
>               warnings in the case when this is dead code, i.e. 32-bit.  */
> -          const uint64_t wc0 = UINT64_C (0x03ff000000000000);
> -          const uint64_t wc1 = UINT64_C (0x07fffffe87fffffe);
> -	  dfa->word_char[0] = wc0;
> -	  dfa->word_char[1] = wc1;
> +	  dfa->word_char[0] = (sizeof (dfa->word_char[0]) == 8
> +			       ? UINT64_C (0x03ff000000000000)
> +			       : 0);
> +	  dfa->word_char[1] = (sizeof (dfa->word_char[0]) == 8
> +			       ? UINT64_C (0x07fffffe87fffffe)
> +			       : 0);
>  	  i = 2;
>  	}
>        else if (sizeof (dfa->word_char[0]) == 4)
> 

Why not use the already upstream fix from gnulib 
(252b52457da7887667c036d18cc5169777615bb0) ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-20 20:05 ` Adhemerval Zanella
@ 2017-11-20 23:18   ` Joseph Myers
  2017-11-21  0:36     ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Joseph Myers @ 2017-11-20 23:18 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On Mon, 20 Nov 2017, Adhemerval Zanella wrote:

> Why not use the already upstream fix from gnulib 
> (252b52457da7887667c036d18cc5169777615bb0) ?

That change would not apply cleanly to current sources, and if we have 
over five years of regex changes out of sync with gnulib, it doesn't seem 
a good idea to try to tie any possible resync with gnulib (in both 
directions) to fixing build failures to make it possible to detect build 
regressions when they occur (rather than piling one failure on top of 
another that isn't yet fixed, so making it hard to determine the 
problematic patch, which is the current state).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-20 23:18   ` Joseph Myers
@ 2017-11-21  0:36     ` Paul Eggert
  2017-11-21  0:56       ` Joseph Myers
  2017-11-21  9:11       ` Andreas Schwab
  0 siblings, 2 replies; 8+ messages in thread
From: Paul Eggert @ 2017-11-21  0:36 UTC (permalink / raw)
  To: Joseph Myers, Adhemerval Zanella; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

On 11/20/2017 03:18 PM, Joseph Myers wrote:
> On Mon, 20 Nov 2017, Adhemerval Zanella wrote:
>
>> Why not use the already upstream fix from gnulib
>> (252b52457da7887667c036d18cc5169777615bb0) ?
> That change would not apply cleanly to current sources,

That should be easy enough to fix. Proposed patch attached.

> and if we have
> over five years of regex changes out of sync with gnulib, it doesn't seem
> a good idea to try to tie any possible resync with gnulib (in both
> directions) to fixing build failures to make it possible to detect build
> regressions when they occur (rather than piling one failure on top of
> another that isn't yet fixed, so making it hard to determine the
> problematic patch, which is the current state).

Although I didn't quite follow all that, I vaguely took it to mean "it's 
too much trouble to try to keep glibc regex code close to Gnulib". But 
if someone has taken the trouble to keep them closer (as in the attached 
patch) then we might as well keep them as close as we reasonably can. On 
the Gnulib side I've been trying to do that by merging glibc's changes, 
as in the patch I installed into Gnulib today:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=6dc8556f97cf3e7953249190b46465ad845761cf

[-- Attachment #2: 0001-regex-don-t-assume-uint64_t-or-uint32_t.txt --]
[-- Type: text/plain, Size: 3830 bytes --]

From 99a25c4defd9ff20c9592d999c5ff2e932c22008 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 20 Nov 2017 16:25:49 -0800
Subject: [PATCH] regex: don't assume uint64_t or uint32_t

This avoids -Werror=overflow errors for 32-bit systems in
the 64-bit case.  Problem reported by Joseph Myers in:
https://sourceware.org/ml/libc-alpha/2017-11/msg00694.html
Also, when this code is used in Gnulib it ports to platforms
that lack uint64_t and uint32_t.  The C standard doesn't guarantee
them, and on some 32-bit compilers there is no uint64_t.
Problem reported by Gianluigi Tiesi in:
http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00154.html
* posix/regcomp.c (init_word_char): Don't assume that the types
uint64_t and uint32_t exist.  Adapted from Gnulib patch
2012-05-27T06:40:00!eggert@cs.ucla.edu.  See:
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=252b52457da7887667c036d18cc5169777615bb0
---
 ChangeLog       | 16 ++++++++++++++++
 posix/regcomp.c | 29 +++++++++++++++--------------
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 3688c7f624..50da3df0b9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2017-11-20  Paul Eggert  <eggert@cs.ucla.edu>
+
+	regex: don't assume uint64_t or uint32_t
+	This avoids -Werror=overflow errors for 32-bit systems in
+	the 64-bit case.  Problem reported by Joseph Myers in:
+	https://sourceware.org/ml/libc-alpha/2017-11/msg00694.html
+	Also, when this code is used in Gnulib it ports to platforms
+	that lack uint64_t and uint32_t.  The C standard doesn't guarantee
+	them, and on some 32-bit compilers there is no uint64_t.
+	Problem reported by Gianluigi Tiesi in:
+	http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00154.html
+	* posix/regcomp.c (init_word_char): Don't assume that the types
+	uint64_t and uint32_t exist.  Adapted from Gnulib patch
+	2012-05-27T06:40:00!eggert@cs.ucla.edu.  See:
+	https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=252b52457da7887667c036d18cc5169777615bb0
+
 2017-11-20  Siddhesh Poyarekar  <siddhesh@sourceware.org>
 
 	* sysdeps/aarch64/memset-reg.h: New file.
diff --git a/posix/regcomp.c b/posix/regcomp.c
index 871ae2ffab..520596b51b 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -925,26 +925,26 @@ init_word_char (re_dfa_t *dfa)
   int ch = 0;
   if (BE (dfa->map_notascii == 0, 1))
     {
-      if (sizeof (dfa->word_char[0]) == 8)
-	{
-          /* The extra temporaries here avoid "implicitly truncated"
-             warnings in the case when this is dead code, i.e. 32-bit.  */
-          const uint64_t wc0 = UINT64_C (0x03ff000000000000);
-          const uint64_t wc1 = UINT64_C (0x07fffffe87fffffe);
-	  dfa->word_char[0] = wc0;
-	  dfa->word_char[1] = wc1;
+      bitset_word_t bits0 = 0x00000000;
+      bitset_word_t bits1 = 0x03ff0000;
+      bitset_word_t bits2 = 0x87fffffe;
+      bitset_word_t bits3 = 0x07fffffe;
+      if (BITSET_WORD_BITS == 64)
+	{
+	  dfa->word_char[0] = bits1 << 31 << 1 | bits0;
+	  dfa->word_char[1] = bits3 << 31 << 1 | bits2;
 	  i = 2;
 	}
-      else if (sizeof (dfa->word_char[0]) == 4)
+      else if (BITSET_WORD_BITS == 32)
 	{
-	  dfa->word_char[0] = UINT32_C (0x00000000);
-	  dfa->word_char[1] = UINT32_C (0x03ff0000);
-	  dfa->word_char[2] = UINT32_C (0x87fffffe);
-	  dfa->word_char[3] = UINT32_C (0x07fffffe);
+	  dfa->word_char[0] = bits0;
+	  dfa->word_char[1] = bits1;
+	  dfa->word_char[2] = bits2;
+	  dfa->word_char[3] = bits3;
 	  i = 4;
 	}
       else
-	abort ();
+        goto general_case;
       ch = 128;
 
       if (BE (dfa->is_utf8, 1))
@@ -954,6 +954,7 @@ init_word_char (re_dfa_t *dfa)
 	}
     }
 
+ general_case:
   for (; i < BITSET_WORDS; ++i)
     for (int j = 0; j < BITSET_WORD_BITS; ++j, ++ch)
       if (isalnum (ch) || ch == '_')
-- 
2.14.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-21  0:36     ` Paul Eggert
@ 2017-11-21  0:56       ` Joseph Myers
  2017-11-21  4:41         ` Paul Eggert
  2017-11-21  9:11       ` Andreas Schwab
  1 sibling, 1 reply; 8+ messages in thread
From: Joseph Myers @ 2017-11-21  0:56 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Adhemerval Zanella, libc-alpha

On Mon, 20 Nov 2017, Paul Eggert wrote:

> On 11/20/2017 03:18 PM, Joseph Myers wrote:
> > On Mon, 20 Nov 2017, Adhemerval Zanella wrote:
> > 
> > > Why not use the already upstream fix from gnulib
> > > (252b52457da7887667c036d18cc5169777615bb0) ?
> > That change would not apply cleanly to current sources,
> 
> That should be easy enough to fix. Proposed patch attached.

Please commit this fix.

> > and if we have
> > over five years of regex changes out of sync with gnulib, it doesn't seem
> > a good idea to try to tie any possible resync with gnulib (in both
> > directions) to fixing build failures to make it possible to detect build
> > regressions when they occur (rather than piling one failure on top of
> > another that isn't yet fixed, so making it hard to determine the
> > problematic patch, which is the current state).
> 
> Although I didn't quite follow all that, I vaguely took it to mean "it's too
> much trouble to try to keep glibc regex code close to Gnulib". But if someone

It means that *in the context where we haven't had regression-free builds 
with GCC and binutils mainline for over two months* (over two weeks even 
if you disregard the old powerpc64le problems), and so investigation of 
regressions is routinely complicated by them having appeared while things 
were already broken for another reason, we want simple, safe fixes for 
build regressions that can be understood and reviewed quickly.  
Complicated, high-risk changes such as merging years of changes from 
gnulib need more detailed review, which would leave the build broken for 
longer, and if the build is already clean at the time such complicated 
changes go in, it makes it a lot easier to identify when those changes 
have caused build regressions.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-21  0:56       ` Joseph Myers
@ 2017-11-21  4:41         ` Paul Eggert
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Eggert @ 2017-11-21  4:41 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Adhemerval Zanella, libc-alpha

Joseph Myers wrote:
> Complicated, high-risk changes such as merging years of changes from
> gnulib need more detailed review

No argument there! And thanks for the further explanation. I committed the patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-21  0:36     ` Paul Eggert
  2017-11-21  0:56       ` Joseph Myers
@ 2017-11-21  9:11       ` Andreas Schwab
  2017-11-22 19:24         ` Paul Eggert
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2017-11-21  9:11 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Joseph Myers, Adhemerval Zanella, libc-alpha

On Nov 20 2017, Paul Eggert <eggert@cs.ucla.edu> wrote:

> This avoids -Werror=overflow errors for 32-bit systems in
> the 64-bit case.  Problem reported by Joseph Myers in:
> https://sourceware.org/ml/libc-alpha/2017-11/msg00694.html
> Also, when this code is used in Gnulib it ports to platforms
> that lack uint64_t and uint32_t.  The C standard doesn't guarantee
> them, and on some 32-bit compilers there is no uint64_t.
> Problem reported by Gianluigi Tiesi in:

This needs to go in a comment.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fix regcomp.c build for 32-bit with GCC mainline
  2017-11-21  9:11       ` Andreas Schwab
@ 2017-11-22 19:24         ` Paul Eggert
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Eggert @ 2017-11-22 19:24 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Joseph Myers, Adhemerval Zanella, libc-alpha, Gnulib bugs

[-- Attachment #1: Type: text/plain, Size: 150 bytes --]

On 11/21/2017 01:11 AM, Andreas Schwab wrote:
> This needs to go in a comment.

Sure, I installed the attached into glibc and merged it into Gnulib.


[-- Attachment #2: 0001-posix-regcomp.c-init_word_char-Add-comments.patch --]
[-- Type: text/x-patch, Size: 1344 bytes --]

From 0285e6bdf223314d7751a83795001c0e87a1f825 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 22 Nov 2017 11:21:44 -0800
Subject: [PATCH] * posix/regcomp.c (init_word_char): Add comments.

---
 ChangeLog       | 4 ++++
 posix/regcomp.c | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 013d1e47e5..e67b128fe6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2017-11-22  Paul Eggert  <eggert@cs.ucla.edu>
+
+	* posix/regcomp.c (init_word_char): Add comments.
+
 2017-11-22  Joseph Myers  <joseph@codesourcery.com>
 
 	[BZ #22447]
diff --git a/posix/regcomp.c b/posix/regcomp.c
index 520596b51b..81c2932991 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -925,12 +925,15 @@ init_word_char (re_dfa_t *dfa)
   int ch = 0;
   if (BE (dfa->map_notascii == 0, 1))
     {
+      /* Avoid uint32_t and uint64_t as some non-GCC platforms lack
+	 them, an issue when this code is used in Gnulib.  */
       bitset_word_t bits0 = 0x00000000;
       bitset_word_t bits1 = 0x03ff0000;
       bitset_word_t bits2 = 0x87fffffe;
       bitset_word_t bits3 = 0x07fffffe;
       if (BITSET_WORD_BITS == 64)
 	{
+	  /* Pacify gcc -Woverflow on 32-bit platformns.  */
 	  dfa->word_char[0] = bits1 << 31 << 1 | bits0;
 	  dfa->word_char[1] = bits3 << 31 << 1 | bits2;
 	  i = 2;
-- 
2.14.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-22 19:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-20 19:04 Fix regcomp.c build for 32-bit with GCC mainline Joseph Myers
2017-11-20 20:05 ` Adhemerval Zanella
2017-11-20 23:18   ` Joseph Myers
2017-11-21  0:36     ` Paul Eggert
2017-11-21  0:56       ` Joseph Myers
2017-11-21  4:41         ` Paul Eggert
2017-11-21  9:11       ` Andreas Schwab
2017-11-22 19:24         ` Paul Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).