public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/1278] New: regex undefined behavior with shifting past word length
@ 2005-08-31 19:37 eggert at gnu dot org
  2005-08-31 19:37 ` [Bug regex/1278] " eggert at gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: eggert at gnu dot org @ 2005-08-31 19:37 UTC (permalink / raw)
  To: glibc-bugs-regex

The regex code sometimes shifts a word by a value greater than the word size,
which has undefined behavior.  While fixing this, I also fixed a
few other porting glitches that are related. I'll attach a patch.

-- 
           Summary: regex undefined behavior with shifting past word length
           Product: glibc
           Version: 2.3.5
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
        AssignedTo: gotom at debian dot or dot jp
        ReportedBy: eggert at gnu dot org
                CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
                    bugs at sources dot redhat dot com


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
@ 2005-08-31 19:37 ` eggert at gnu dot org
  2005-09-01  7:04 ` paolo dot bonzini at lu dot unisi dot ch
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: eggert at gnu dot org @ 2005-08-31 19:37 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From eggert at gnu dot org  2005-08-31 19:37 -------
Created an attachment (id=633)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=633&action=view)
shift-related patches for regex


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
  2005-08-31 19:37 ` [Bug regex/1278] " eggert at gnu dot org
@ 2005-09-01  7:04 ` paolo dot bonzini at lu dot unisi dot ch
  2005-09-01 10:00 ` schwab at suse dot de
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: paolo dot bonzini at lu dot unisi dot ch @ 2005-09-01  7:04 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From paolo dot bonzini at lu dot unisi dot ch  2005-09-01 07:03 -------
Subject: Re:  regex undefined behavior with shifting past
 word length

The last hunk is surely wrong.  I really meant ~0.

Paolo


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
  2005-08-31 19:37 ` [Bug regex/1278] " eggert at gnu dot org
  2005-09-01  7:04 ` paolo dot bonzini at lu dot unisi dot ch
@ 2005-09-01 10:00 ` schwab at suse dot de
  2005-09-01 22:29 ` eggert at gnu dot org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: schwab at suse dot de @ 2005-09-01 10:00 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From schwab at suse dot de  2005-09-01 10:00 -------
-1 is better. 

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
                   ` (2 preceding siblings ...)
  2005-09-01 10:00 ` schwab at suse dot de
@ 2005-09-01 22:29 ` eggert at gnu dot org
  2005-09-02  6:17 ` paolo dot bonzini at lu dot unisi dot ch
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: eggert at gnu dot org @ 2005-09-01 22:29 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From eggert at gnu dot org  2005-09-01 22:29 -------
The last hunk is purely for ports to ones' complement and
signed-magnitude hosts.  It has no effect in the normal case.

For example, on a one's complement host, ~0 has the numeric value
zero, i.e., ~0 == 0.  Also, ~0 is of type int.  When ~0 is converted
to unsigned int, it is converted by value, not by bit-pattern.  (The C
Standard requires this.)  Hence ((unsigned) ~0) is equivalent to
((unsigned) 0), which in turn is equivalent to 0u, which is zero.

The same problem occurs with signed-magnitude hosts.  It also occurs
with unsigned short int (the type being used here).

Admittedly this is a minor point since such hosts are rare, but it's
easy to do portably so we might as well do it that way.


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
                   ` (3 preceding siblings ...)
  2005-09-01 22:29 ` eggert at gnu dot org
@ 2005-09-02  6:17 ` paolo dot bonzini at lu dot unisi dot ch
  2005-09-02 10:21 ` schwab at suse dot de
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: paolo dot bonzini at lu dot unisi dot ch @ 2005-09-02  6:17 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From paolo dot bonzini at lu dot unisi dot ch  2005-09-02 06:17 -------
Subject: Re:  regex undefined behavior with shifting past
 word length


>For example, on a one's complement host, ~0 has the numeric value
>zero, i.e., ~0 == 0.  Also, ~0 is of type int.  When ~0 is converted
>to unsigned int, it is converted by value, not by bit-pattern.  (The C
>Standard requires this.)  Hence ((unsigned) ~0) is equivalent to
>((unsigned) 0), which in turn is equivalent to 0u, which is zero.
>  
>
So you want ~0u, but not -1.

Paolo


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
                   ` (4 preceding siblings ...)
  2005-09-02  6:17 ` paolo dot bonzini at lu dot unisi dot ch
@ 2005-09-02 10:21 ` schwab at suse dot de
  2005-09-02 23:17 ` eggert at gnu dot org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: schwab at suse dot de @ 2005-09-02 10:21 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From schwab at suse dot de  2005-09-02 10:20 -------
-1 when cast to unsigned is exactly the same as ~0u and also works with any 
other unsigned type regardless of its width, whereas ~0u doesn't. 

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
                   ` (5 preceding siblings ...)
  2005-09-02 10:21 ` schwab at suse dot de
@ 2005-09-02 23:17 ` eggert at gnu dot org
  2005-09-06  7:32 ` eggert at gnu dot org
  2005-09-06 23:30 ` drepper at redhat dot com
  8 siblings, 0 replies; 10+ messages in thread
From: eggert at gnu dot org @ 2005-09-02 23:17 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From eggert at gnu dot org  2005-09-02 23:17 -------
Andreas is right.  For example, "unsigned long int x = ~0u;" will not
have an all-1s value on most 64-bit hosts.

In this particular hunk, ~0u would also work since the destination
type is unsigned short int.  So if you'd really rather use ~0u I
guess that would be OK.  However, as a style matter, it is confusing
to use ~0u in some unsigned contexts, while using -1 in other unsigned
contexts.  Since -1 always works, it's more consistent to use it in
all unsigned contexts.

For example, suppose someone later changes eps_reachable_subexps_map
from unsigned short int to unsigned long int, for performance reasons.
If the code used ~0u here, it would have to be changed to ~ (unsigned
long int) 0, and it's quite possible that people would forget to make
that change.  Whereas if we simply change it to -1 now, it will work
regardless of later changes like this.

I should mention that the situation is different in signed contexts.
In general one must use ~ (SIGNED_TYPE) 0 in that case to get an
all-1s pattern.  But signed bit-twiddling is trickier (since one must
in general worry about ~0 == 0 and overflow issues), and I'd rather
that the regex code stuck with unsigned unsigned bit-twiddling.


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
                   ` (6 preceding siblings ...)
  2005-09-02 23:17 ` eggert at gnu dot org
@ 2005-09-06  7:32 ` eggert at gnu dot org
  2005-09-06 23:30 ` drepper at redhat dot com
  8 siblings, 0 replies; 10+ messages in thread
From: eggert at gnu dot org @ 2005-09-06  7:32 UTC (permalink / raw)
  To: glibc-bugs-regex



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |1302
              nThis|                            |


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug regex/1278] regex undefined behavior with shifting past word length
  2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
                   ` (7 preceding siblings ...)
  2005-09-06  7:32 ` eggert at gnu dot org
@ 2005-09-06 23:30 ` drepper at redhat dot com
  8 siblings, 0 replies; 10+ messages in thread
From: drepper at redhat dot com @ 2005-09-06 23:30 UTC (permalink / raw)
  To: glibc-bugs-regex


------- Additional Comments From drepper at redhat dot com  2005-09-06 23:29 -------
It is ridicuous to care about 1-complement and "signed-magnitude" hosts.

I've applied most of the patch.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=1278

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-09-06 23:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-31 19:37 [Bug regex/1278] New: regex undefined behavior with shifting past word length eggert at gnu dot org
2005-08-31 19:37 ` [Bug regex/1278] " eggert at gnu dot org
2005-09-01  7:04 ` paolo dot bonzini at lu dot unisi dot ch
2005-09-01 10:00 ` schwab at suse dot de
2005-09-01 22:29 ` eggert at gnu dot org
2005-09-02  6:17 ` paolo dot bonzini at lu dot unisi dot ch
2005-09-02 10:21 ` schwab at suse dot de
2005-09-02 23:17 ` eggert at gnu dot org
2005-09-06  7:32 ` eggert at gnu dot org
2005-09-06 23:30 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).