* [Bug c/18019] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
@ 2004-10-15 18:05 ` stuart at apple dot com
2004-10-15 18:12 ` [Bug target/18019] " pinskia at gcc dot gnu dot org
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-10-15 18:05 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stuart at apple dot com 2004-10-15 18:05 -------
Created an attachment (id=7359)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7359&action=view)
testcase
Attaching the testcase for covenience.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
2004-10-15 18:05 ` [Bug c/18019] " stuart at apple dot com
@ 2004-10-15 18:12 ` pinskia at gcc dot gnu dot org
2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-15 18:12 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-15 18:12 -------
This is definitely a target problem, the RTL is correct:
(insn:HI 23 22 24 (set (reg:QI 0 ax [orig:60 D.1459 ] [60])
(mem:QI (reg/v/f:SI 2 cx [orig:63 from ] [63]) [0 S1 A8])) 48 {*movqi_1} (nil)
(nil))
I think the comment in i386.md summarizies the issues:
;; Situation is quite tricky about when to choose full sized (SImode) move
;; over QImode moves. For Q_REG -> Q_REG move we use full size only for
;; partial register dependency machines (such as AMD Athlon), where QImode
;; moves issue extra dependency and for partial register stalls machines
;; that don't use QImode patterns (and QImode move cause stall on the next
;; instruction).
;;
;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial
;; register stall machines with, where we use QImode instructions, since
;; partial register stall can be caused there. Then we use movzx.
--
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
2004-10-15 18:05 ` [Bug c/18019] " stuart at apple dot com
2004-10-15 18:12 ` [Bug target/18019] " pinskia at gcc dot gnu dot org
@ 2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
2004-10-15 18:27 ` stuart at apple dot com
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-15 18:23 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-15 18:23 -------
I think this is expected behavior (on pentium4 word fetches are always faster than byte fetches).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (2 preceding siblings ...)
2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
@ 2004-10-15 18:27 ` stuart at apple dot com
2004-10-15 18:36 ` [Bug target/18019] [4.0 Regression] " pinskia at gcc dot gnu dot org
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-10-15 18:27 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stuart at apple dot com 2004-10-15 18:27 -------
The bug was discovered when it walked off the end of a VM page and faulted. Are you certain this is
"expected behavior?"
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (3 preceding siblings ...)
2004-10-15 18:27 ` stuart at apple dot com
@ 2004-10-15 18:36 ` pinskia at gcc dot gnu dot org
2004-10-28 15:19 ` steven at gcc dot gnu dot org
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-15 18:36 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-10-15 18:36 -------
Confirmed, something is wrong in the .md file.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Keywords| |wrong-code
Last reconfirmed|0000-00-00 00:00:00 |2004-10-15 18:36:39
date| |
Summary|-march=pentium4 generates |[4.0 Regression] -
|word fetch instead of byte |march=pentium4 generates
|fetch |word fetch instead of byte
| |fetch
Target Milestone|--- |4.0.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (4 preceding siblings ...)
2004-10-15 18:36 ` [Bug target/18019] [4.0 Regression] " pinskia at gcc dot gnu dot org
@ 2004-10-28 15:19 ` steven at gcc dot gnu dot org
2004-10-28 15:22 ` steven at gcc dot gnu dot org
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-10-28 15:19 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-10-28 15:19 -------
vrong quode
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |critical
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (5 preceding siblings ...)
2004-10-28 15:19 ` steven at gcc dot gnu dot org
@ 2004-10-28 15:22 ` steven at gcc dot gnu dot org
2004-11-09 4:50 ` pinskia at gcc dot gnu dot org
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-10-28 15:22 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-10-28 15:22 -------
Is this perhaps related to Bug 17949?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (6 preceding siblings ...)
2004-10-28 15:22 ` steven at gcc dot gnu dot org
@ 2004-11-09 4:50 ` pinskia at gcc dot gnu dot org
2004-11-09 18:10 ` roger at eyesopen dot com
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-09 4:50 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-09 04:50 -------
No this is not related to PR 17949 at all.
The behavior changed between 2004-01-18 and 2004-01-23.
Which means most likely this:
2004-01-19 Roger Sayle <roger@eyesopen.com>
* config/i386/i386.md (*movhi_1, *movqi_1): When optimizing for
size, don't use the larger zero-extending loads.
Roger this patch can cause bad code if done on a page boundary and the next page is protected.
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at eyesopen dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (7 preceding siblings ...)
2004-11-09 4:50 ` pinskia at gcc dot gnu dot org
@ 2004-11-09 18:10 ` roger at eyesopen dot com
2004-11-09 19:34 ` stuart at apple dot com
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: roger at eyesopen dot com @ 2004-11-09 18:10 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From roger at eyesopen dot com 2004-11-09 18:10 -------
I believe that the bug is a latent problem that was only exposed by my patch.
The *movqi_1 (and *movhi_1) pattern(s) have two attributes "type" and "mode".
The mode attribute indicates the RTL machine mode the move should be done in
QImode or SImode, and the type attribute is either IMOVX or IMOV indicating with
or without extension respectively. My patch tweaked the type attribute, such
that when optimizing for size IMOV is smaller on x86 than IMOVX. The
independent underlying problem is that the mode is incorrectly set to SImode,
even if the alignment is unknown on partial register stall machines.
IMOVX SImode -> movzbl
IMOVX QImode -> movzbl
IMOV SImode -> movl
IMOV QImode -> movb
This issue was hidden prior to my patch as fortuituously even if the mode was
incorrectly SImode, using a IMOVX instruction would only read a single byte.
The issue appears to be when to use SImode and when QImode. Clearly, partial
register stall machines get a performance benefit from using SImode, but it
looks like there's no attempt to check the alignment of the access when making
this optimization.
I'm happy to revert my patch, clearly a minor code size tweak is much less
important than a correctness issue. But I'm trying to convince myself that
this isn't papering over the more fundamental problem in mode selection.
Any help/advice/opinions would be greatly appreciated.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (8 preceding siblings ...)
2004-11-09 18:10 ` roger at eyesopen dot com
@ 2004-11-09 19:34 ` stuart at apple dot com
2004-11-09 20:56 ` roger at eyesopen dot com
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-11-09 19:34 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stuart at apple dot com 2004-11-09 19:34 -------
I agree with Roger.
I'm suspicious of the "0,1,2 ... TARGET_PARTIAL_xx" clauses of the "*movqi_1" pattern. (Also the
analogous parts of "*movhi_1".)
I've tried reverting Roger's patch, and excising the TARGET_PARTIAL_xx clauses; either change appears
to fix the problem. I've also successfully regression-tested the excision.
I will invite Jan Hubicka, author of the TARGET_PARTIAL_xx clauses (i386.md, v1.503), to look at this.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (9 preceding siblings ...)
2004-11-09 19:34 ` stuart at apple dot com
@ 2004-11-09 20:56 ` roger at eyesopen dot com
2004-11-16 19:40 ` stuart at apple dot com
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: roger at eyesopen dot com @ 2004-11-09 20:56 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From roger at eyesopen dot com 2004-11-09 20:56 -------
I agree with Stuart :>
Although both of the patches he investigated will cure the problem, reverting
mine will generate a movzbl, whilst disabling the TARGET_PARTIAL_... bits will
generate a "movb". As hinted in the original problem description, when
optimizing for size we ideally want the smaller "movb" (2 bytes vs. 3 bytes).
But I still don't think disabling the TARGET_PARTIAL_... bits completely is
an ideal solution. It would be better to take advantage of any alignment
information on the MEM to determine whether this transformation is safe, and
still benefit from this optimization when it is.
I'm also still not convinced that there aren't any non -Os cases where the
existing TARGET_PARTIAL_... code is incorrect without checking alignment.
Perhaps initially we should disable TARGET_PARTIAL_... only when optimizing
for size? I'm not convinced its perfect, but it would resolve this failure
and not hurt performance at -O2/-O3.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (10 preceding siblings ...)
2004-11-09 20:56 ` roger at eyesopen dot com
@ 2004-11-16 19:40 ` stuart at apple dot com
2004-11-27 18:28 ` neroden at gcc dot gnu dot org
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-11-16 19:40 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stuart at apple dot com 2004-11-16 19:39 -------
Here is the body of an email I sent to Jan Hubicka concerning this bug. In the body of the message,
'you' refers to Jan.
--------------------------------------------------------------
For discussion, here is the pattern in question as it exists on the FSF mainline today:
1503 ;; Situation is quite tricky about when to choose full sized (SImode) move
1504 ;; over QImode moves. For Q_REG -> Q_REG move we use full size only for
1505 ;; partial register dependency machines (such as AMD Athlon), where QImode
1506 ;; moves issue extra dependency and for partial register stalls machines
1507 ;; that don't use QImode patterns (and QImode move cause stall on the next
1508 ;; instruction).
1509 ;;
1510 ;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial
1511 ;; register stall machines with, where we use QImode instructions, since
1512 ;; partial register stall can be caused there. Then we use movzx.
1513 (define_insn "*movqi_1"
1514 [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
1515 (match_operand:QI 1 "general_operand" " q,qn,qm,q,rn,qm,qn"))]
1516 "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
1517 {
1518 switch (get_attr_type (insn))
1519 {
1520 case TYPE_IMOVX:
1521 if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM)
1522 abort ();
1523 return "movz{bl|x}\t{%1, %k0|%k0, %1}";
1524 default:
1525 if (get_attr_mode (insn) == MODE_SI)
1526 return "mov{l}\t{%k1, %k0|%k0, %k1}";
1527 else
1528 return "mov{b}\t{%1, %0|%0, %1}";
1529 }
1530 }
1531 [(set (attr "type")
1532 (cond [(ne (symbol_ref "optimize_size") (const_int 0))
1533 (const_string "imov")
1534 (and (eq_attr "alternative" "3")
1535 (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL")
1536 (const_int 0))
1537 (eq (symbol_ref "TARGET_QIMODE_MATH")
1538 (const_int 0))))
1539 (const_string "imov")
1540 (eq_attr "alternative" "3,5")
1541 (const_string "imovx")
1542 (and (ne (symbol_ref "TARGET_MOVX")
1543 (const_int 0))
1544 (eq_attr "alternative" "2"))
1545 (const_string "imovx")
1546 ]
1547 (const_string "imov")))
1548 (set (attr "mode")
1549 (cond [(eq_attr "alternative" "3,4,5")
1550 (const_string "SI")
1551 (eq_attr "alternative" "6")
1552 (const_string "QI")
1553 (eq_attr "type" "imovx")
1554 (const_string "SI")
1555 (and (eq_attr "type" "imov")
1556 (and (eq_attr "alternative" "0,1,2")
1557 (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY")
1558 (const_int 0))))
1559 (const_string "SI")
1560 ;; Avoid partial register stalls when not using QImode arithmetic
1561 (and (eq_attr "type" "imov")
1562 (and (eq_attr "alternative" "0,1,2")
1563 (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL")
1564 (const_int 0))
1565 (eq (symbol_ref "TARGET_QIMODE_MATH")
1566 (const_int 0)))))
1567 (const_string "SI")
1568 ]
1569 (const_string "QI")))])
Roger added lines 1532-1533 in January of this year. It looks like you added lines 1555-1567 in 2000.
The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use SImode if "imov" and
byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb". (The testcase is
strcpy(); see the Bugzilla.) For the following discussion, note that GCC currently matches "movqi_1"
alternative #2 ("q" and "qm" in the attribute list) on the critical byte-fetch-from-memory in the strcpy()
testcase.
It appears to me that the 1555-1559 clause depends upon any CPU with TARGET_MOVX always
choosing "movx|movbl" over "imove". If some not-yet-existing CPU had
TARGET_PARTIAL_REG_DEPENDENCY but did /not/ have TARGET_MOVX support, I think this would
generate a 'movl' where a 'movb' is required. (Yes, I agree no such CPU exists, nor is one likely to be
built.) Roger's patch made it choose "imove" because it's a smaller instruction than imovx (one byte
versus two, plus whatever additional mod/r/m glop).
Furthermore, as Roger pointed out in the Bugzilla, if we've chosen to use imovx, what do we gain by
marking it with SImode? It appears to generate the same "movx/movzbl" instruction either way. (I am
really confused by this.)
I can think of several potential fixes. All are suspect, because I'm not convinced I fully understand
what's going on:
a) revert Rogers patch (I'd prefer to keep it)
b) remove the ",2" from line 1556 (clause won't apply to byte loads)
c) remove lines 1555-1559 (allow byte-register/byte-register mov insns on P4-class CPUs? not clear to
me)
d) mark the bug "behaves correctly"
e) some idea of Jan's that is much smarter than any of the above
If you immediately see how this should work, please make a suggestion, or even commit a fix. I claim
no expertise in this area.
As a side note, I'm curious about this usage of TARGET_QIMODE_MATH; CVS suggests that you made it
synonymous with TRUE in March 2000. In the subsequent April, you added lines 1563-1566. If
TARGET_QIMODE_MATH is always true, will the clause on lines 1561-1567 ever be used? Again, I just
don't understand how this is supposed to work.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (11 preceding siblings ...)
2004-11-16 19:40 ` stuart at apple dot com
@ 2004-11-27 18:28 ` neroden at gcc dot gnu dot org
2004-12-02 1:08 ` stuart at apple dot com
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: neroden at gcc dot gnu dot org @ 2004-11-27 18:28 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From neroden at gcc dot gnu dot org 2004-11-27 18:28 -------
I suggest excising the TARGET_PARTIAL clauses ASAP, and reinstating them only
when alignment checking (and/or something else) is used in them to verify that
the transform is safe. That seems to be the most correct thing to do.
If someone is up to writing the alignment checking right now, of course,
that's great, but I'm not!
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (12 preceding siblings ...)
2004-11-27 18:28 ` neroden at gcc dot gnu dot org
@ 2004-12-02 1:08 ` stuart at apple dot com
2004-12-02 3:35 ` neroden at gcc dot gnu dot org
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-12-02 1:08 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stuart at apple dot com 2004-12-02 01:07 -------
Jan emailed this to me privately. Appended here for completeness. - stuart
----------------------------------------------------------------
Just to clarify things a bit. TARGET_MOVX and
TARGET_PARTIAL_REG_DEPENDENCY is not about supporting some feature but
about a way the CPU deals with dependencies on partial registers. Some
CPUs (Athlon+,P4+) deal with partial register writes as read-modify
operation of the whole thing (TARGET_PARTIAL_REG_DEPENDENCT) and for
some of these it is profitable to do dummy zero extend (TARGET_MOVX)
instead of loads to avoid the dependency, while others (K6, P3) give it
another internal name and don't see the false dependency
(TARGET_PARTIAL_REG_STALL). On the other hand they get penalty if the
result is used as a whole register. There is unlikely to be ever CPU
spoiled up in both directions..
However the Roger's patch, as I understand it, is about avoiding movx as
it encodes longer on -Os. It seems to me that for targets not defining
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we should always
produce the straighforward movq as expected, while for
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use
the full moves as long as they don't encode longer. I can't check right
now, but i believe it is only the movl imm, register that comes out
longer and that is the alternative 2.
We can also probably kill the TARGET_QIMODE_MATH as it is no longer
used.
There is the type and mode argument not only to choose the particular
instruction but also to drive scheduling (K6 for instance has limited
supply of units that do 8bit operations). imovx is 32bit operation so
it needs to get SImode. I would simply break out the alternative 2 from
both conditionals and would additionally check optimize_size to be
nonzero
I can prepare patch later next week unless someone beats me :)
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (13 preceding siblings ...)
2004-12-02 1:08 ` stuart at apple dot com
@ 2004-12-02 3:35 ` neroden at gcc dot gnu dot org
2004-12-02 9:02 ` jh at suse dot cz
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: neroden at gcc dot gnu dot org @ 2004-12-02 3:35 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From neroden at gcc dot gnu dot org 2004-12-02 03:35 -------
Jan's message quoted in the previous comment seems to be orthogonal to the
main problem in this bug. The problem is that a word fetch is being generated
which *reads out of bounds memory*.
If the last byte in a page is being fetched, you must not extend that to fetch
the next byte; if the first byte in a page is being fetched, you must not
extend that to fetch the previous byte. Those are the key failure situations
which must be prevented, regardless of whether this is -Os, -O2, or -O0.
It doesn't appear to me that the changes proposed in Jan's message actually
address that, since none of them seem to feature a alignment or other type of
correctness check before converting a QImode move into a (possibly illegal)
SImode move.
In particular, this statement looks like it's wrong:
>while for
>TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use
>the full moves as long as they don't encode longer.
I believe a check is required that the full moves don't cause a segmentation
violation. An alignment check would be sufficient (it would prohibit more
cases than strictly necessary, but this is a correctness issue, and I haven't
thought of a better check).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (14 preceding siblings ...)
2004-12-02 3:35 ` neroden at gcc dot gnu dot org
@ 2004-12-02 9:02 ` jh at suse dot cz
2004-12-10 17:09 ` neroden at gcc dot gnu dot org
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: jh at suse dot cz @ 2004-12-02 9:02 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From jh at suse dot cz 2004-12-02 09:02 -------
Subject: Re: [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
>
> ------- Additional Comments From neroden at gcc dot gnu dot org 2004-12-02 03:35 -------
> Jan's message quoted in the previous comment seems to be orthogonal to the
> main problem in this bug. The problem is that a word fetch is being generated
> which *reads out of bounds memory*.
>
> If the last byte in a page is being fetched, you must not extend that to fetch
> the next byte; if the first byte in a page is being fetched, you must not
> extend that to fetch the previous byte. Those are the key failure situations
> which must be prevented, regardless of whether this is -Os, -O2, or -O0.
>
> It doesn't appear to me that the changes proposed in Jan's message actually
> address that, since none of them seem to feature a alignment or other type of
> correctness check before converting a QImode move into a (possibly illegal)
> SImode move.
We don't need to check that since the code always use zero extension
conversion (unless it hit the buggy -Os conditional) and zero extension
still reads byte, just writes whole register. It is not good idea to
use full sized read even for aligned sources since we will get partial
memory stalls.
We use full sized moves only for register to register moves where it is
safe.
I will prepare the patch later today.
Honza
>
> In particular, this statement looks like it's wrong:
> >while for
> >TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use
> >the full moves as long as they don't encode longer.
>
> I believe a check is required that the full moves don't cause a segmentation
> violation. An alignment check would be sufficient (it would prohibit more
> cases than strictly necessary, but this is a correctness issue, and I haven't
> thought of a better check).
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (15 preceding siblings ...)
2004-12-02 9:02 ` jh at suse dot cz
@ 2004-12-10 17:09 ` neroden at gcc dot gnu dot org
2004-12-18 22:49 ` steven at gcc dot gnu dot org
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: neroden at gcc dot gnu dot org @ 2004-12-10 17:09 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From neroden at gcc dot gnu dot org 2004-12-10 17:09 -------
Oh, cool. :-) Did you get the patch linked up?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (16 preceding siblings ...)
2004-12-10 17:09 ` neroden at gcc dot gnu dot org
@ 2004-12-18 22:49 ` steven at gcc dot gnu dot org
2004-12-19 16:16 ` jh at suse dot cz
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-18 22:49 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-18 22:49 -------
Honza,
PING!!!!
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (17 preceding siblings ...)
2004-12-18 22:49 ` steven at gcc dot gnu dot org
@ 2004-12-19 16:16 ` jh at suse dot cz
2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: jh at suse dot cz @ 2004-12-19 16:16 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From jh at suse dot cz 2004-12-19 16:16 -------
Subject: Re: [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
>
> ------- Additional Comments From steven at gcc dot gnu dot org 2004-12-18 22:49 -------
> Honza,
>
> PING!!!!
I am still backlogged but I will look into it now. What about cleaning up tree-inline.c and other
mess? That thing is hell to merge...
Honza
>
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (18 preceding siblings ...)
2004-12-19 16:16 ` jh at suse dot cz
@ 2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
2004-12-22 12:20 ` steven at gcc dot gnu dot org
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-19 19:58 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-19 19:58 -------
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01412.html>.
--
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |patch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (19 preceding siblings ...)
2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
@ 2004-12-22 12:20 ` steven at gcc dot gnu dot org
2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
2004-12-30 13:21 ` steven at gcc dot gnu dot org
22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-22 12:20 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-22 12:11 -------
One of these days, someone will have to explain to me how such a small
patch for a bug marked "critical regression" can stay unreviewed for
three days... This kind of patches should *not* need pinging.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (20 preceding siblings ...)
2004-12-22 12:20 ` steven at gcc dot gnu dot org
@ 2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
2004-12-30 13:21 ` steven at gcc dot gnu dot org
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2004-12-30 13:16 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From cvs-commit at gcc dot gnu dot org 2004-12-30 13:16 -------
Subject: Bug 18019
CVSROOT: /cvs/gcc
Module name: gcc
Changes by: hubicka@gcc.gnu.org 2004-12-30 13:16:14
Modified files:
gcc : ChangeLog
gcc/config/i386: i386.md
Log message:
PR target/18019
* i386.md (movqi_1): Fix -Os instruction choice.
Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6985&r2=2.6986
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.594&r2=1.595
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
` (21 preceding siblings ...)
2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
@ 2004-12-30 13:21 ` steven at gcc dot gnu dot org
22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-30 13:21 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-30 13:21 -------
.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
^ permalink raw reply [flat|nested] 24+ messages in thread