public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch
@ 2004-10-15 18:00 stuart at apple dot com
  2004-10-15 18:05 ` [Bug c/18019] " stuart at apple dot com
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-10-15 18:00 UTC (permalink / raw)
  To: gcc-bugs

When compiling the testcase (strcpy()) with -Os -march=pentium4, GCC generates a word-fetch 
instead of a byte-fetch.  This can provoke memory faults (e.g. at the end of a page).

Omitting "-march=pentium4" generates correct code.

Here is the testcase:
------------------------------------------------
char *
mystrcpy(char * __restrict to, const char * __restrict from)
{
  char *save = to;
  
  for (; (*to = *from); ++from, ++to);
  return(save);
}
----------------------------------------------------
And the invocation:  % gcc.fsf.pure.obj/gcc/xgcc -B gcc.fsf.pure.obj/gcc -Os -S mystrcpy.c 
-march=pentium4
----------------------------------------------------
And the result (error is marked):
----------------------------------------------------
       .file   "mystrcpy.c"
        .text
.globl mystrcpy
        .type   mystrcpy, @function
mystrcpy:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %ecx
        movl    8(%ebp), %edx
        jmp     .L2
.L3:
        incl    %ecx
        incl    %edx
.L2:
        movl    (%ecx), %eax            #### should be 'movb'
        movb    %al, (%edx)
        testb   %al, %al
        jne     .L3
        movl    8(%ebp), %eax
        popl    %ebp
        ret
        .size   mystrcpy, .-mystrcpy
        .ident  "GCC: (GNU) 4.0.0 20041015 (experimental)"
        .section        .note.GNU-stack,"",@progbits
----------------------------------------------------------
Also reproducible on i686-apple-darwin.

-- 
           Summary: -march=pentium4 generates word fetch instead of byte
                    fetch
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: stuart at apple dot com
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux
  GCC host triplet: i686-pc-linux
GCC target triplet: i686-pc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/18019] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
@ 2004-10-15 18:05 ` stuart at apple dot com
  2004-10-15 18:12 ` [Bug target/18019] " pinskia at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-10-15 18:05 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stuart at apple dot com  2004-10-15 18:05 -------
Created an attachment (id=7359)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7359&action=view)
testcase

Attaching the testcase for covenience.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
  2004-10-15 18:05 ` [Bug c/18019] " stuart at apple dot com
@ 2004-10-15 18:12 ` pinskia at gcc dot gnu dot org
  2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-15 18:12 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-15 18:12 -------
This is definitely a target problem, the RTL is correct:
(insn:HI 23 22 24 (set (reg:QI 0 ax [orig:60 D.1459 ] [60])
        (mem:QI (reg/v/f:SI 2 cx [orig:63 from ] [63]) [0 S1 A8])) 48 {*movqi_1} (nil)
    (nil))

I think the comment in i386.md summarizies the issues:
;; Situation is quite tricky about when to choose full sized (SImode) move
;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
;; partial register dependency machines (such as AMD Athlon), where QImode
;; moves issue extra dependency and for partial register stalls machines
;; that don't use QImode patterns (and QImode move cause stall on the next
;; instruction).
;; 
;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial
;; register stall machines with, where we use QImode instructions, since
;; partial register stall can be caused there.  Then we use movzx.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
  2004-10-15 18:05 ` [Bug c/18019] " stuart at apple dot com
  2004-10-15 18:12 ` [Bug target/18019] " pinskia at gcc dot gnu dot org
@ 2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
  2004-10-15 18:27 ` stuart at apple dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-15 18:23 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-15 18:23 -------
I think this is expected behavior (on pentium4 word fetches are always faster than byte fetches).

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (2 preceding siblings ...)
  2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
@ 2004-10-15 18:27 ` stuart at apple dot com
  2004-10-15 18:36 ` [Bug target/18019] [4.0 Regression] " pinskia at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-10-15 18:27 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stuart at apple dot com  2004-10-15 18:27 -------
The bug was discovered when it walked off the end of a VM page and faulted.  Are you certain this is 
"expected behavior?"

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (3 preceding siblings ...)
  2004-10-15 18:27 ` stuart at apple dot com
@ 2004-10-15 18:36 ` pinskia at gcc dot gnu dot org
  2004-10-28 15:19 ` steven at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-15 18:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-15 18:36 -------
Confirmed, something is wrong in the .md file.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
           Keywords|                            |wrong-code
   Last reconfirmed|0000-00-00 00:00:00         |2004-10-15 18:36:39
               date|                            |
            Summary|-march=pentium4 generates   |[4.0 Regression] -
                   |word fetch instead of byte  |march=pentium4 generates
                   |fetch                       |word fetch instead of byte
                   |                            |fetch
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (4 preceding siblings ...)
  2004-10-15 18:36 ` [Bug target/18019] [4.0 Regression] " pinskia at gcc dot gnu dot org
@ 2004-10-28 15:19 ` steven at gcc dot gnu dot org
  2004-10-28 15:22 ` steven at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-10-28 15:19 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2004-10-28 15:19 -------
vrong quode 

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |critical


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (5 preceding siblings ...)
  2004-10-28 15:19 ` steven at gcc dot gnu dot org
@ 2004-10-28 15:22 ` steven at gcc dot gnu dot org
  2004-11-09  4:50 ` pinskia at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-10-28 15:22 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2004-10-28 15:22 -------
Is this perhaps related to Bug 17949? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (6 preceding siblings ...)
  2004-10-28 15:22 ` steven at gcc dot gnu dot org
@ 2004-11-09  4:50 ` pinskia at gcc dot gnu dot org
  2004-11-09 18:10 ` roger at eyesopen dot com
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-09  4:50 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-09 04:50 -------
No this is not related to PR 17949 at all.

The behavior changed between 2004-01-18 and 2004-01-23.

Which means most likely this:
2004-01-19  Roger Sayle  <roger@eyesopen.com>

        * config/i386/i386.md (*movhi_1, *movqi_1): When optimizing for
        size, don't use the larger zero-extending loads.

Roger this patch can cause bad code if done on a page boundary and the next page is protected.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at eyesopen dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (7 preceding siblings ...)
  2004-11-09  4:50 ` pinskia at gcc dot gnu dot org
@ 2004-11-09 18:10 ` roger at eyesopen dot com
  2004-11-09 19:34 ` stuart at apple dot com
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: roger at eyesopen dot com @ 2004-11-09 18:10 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From roger at eyesopen dot com  2004-11-09 18:10 -------
I believe that the bug is a latent problem that was only exposed by my patch.
The *movqi_1 (and *movhi_1) pattern(s) have two attributes "type" and "mode".
The mode attribute indicates the RTL machine mode the move should be done in
QImode or SImode, and the type attribute is either IMOVX or IMOV indicating with
or without extension respectively.  My patch tweaked the type attribute, such
that when optimizing for size IMOV is smaller on x86 than IMOVX.  The
independent underlying problem is that the mode is incorrectly set to SImode,
even if the alignment is unknown on partial register stall machines.

IMOVX  SImode   -> movzbl
IMOVX  QImode   -> movzbl
IMOV   SImode   -> movl
IMOV   QImode   -> movb

This issue was hidden prior to my patch as fortuituously even if the mode was
incorrectly SImode, using a IMOVX instruction would only read a single byte.

The issue appears to be when to use SImode and when QImode.  Clearly, partial
register stall machines get a performance benefit from using SImode, but it
looks like there's no attempt to check the alignment of the access when making
this optimization.

I'm happy to revert my patch, clearly a minor code size tweak is much less
important than a correctness issue.  But I'm trying to convince myself that
this isn't papering over the more fundamental problem in mode selection.

Any help/advice/opinions would be greatly appreciated.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (8 preceding siblings ...)
  2004-11-09 18:10 ` roger at eyesopen dot com
@ 2004-11-09 19:34 ` stuart at apple dot com
  2004-11-09 20:56 ` roger at eyesopen dot com
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-11-09 19:34 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stuart at apple dot com  2004-11-09 19:34 -------
I agree with Roger.

I'm suspicious of the "0,1,2 ... TARGET_PARTIAL_xx" clauses of the "*movqi_1" pattern.  (Also the 
analogous parts of "*movhi_1".)

I've tried reverting Roger's patch, and excising the TARGET_PARTIAL_xx clauses; either change appears 
to fix the problem.  I've also successfully regression-tested the excision.

I will invite Jan Hubicka, author of the TARGET_PARTIAL_xx clauses (i386.md, v1.503), to look at this.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (9 preceding siblings ...)
  2004-11-09 19:34 ` stuart at apple dot com
@ 2004-11-09 20:56 ` roger at eyesopen dot com
  2004-11-16 19:40 ` stuart at apple dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: roger at eyesopen dot com @ 2004-11-09 20:56 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From roger at eyesopen dot com  2004-11-09 20:56 -------
I agree with Stuart :>

Although both of the patches he investigated will cure the problem, reverting
mine will generate a movzbl, whilst disabling the TARGET_PARTIAL_... bits will
generate a "movb".  As hinted in the original problem description, when
optimizing for size we ideally want the smaller "movb" (2 bytes vs. 3 bytes). 
But I still don't think disabling the TARGET_PARTIAL_... bits completely is
an ideal solution.  It would be better to take advantage of any alignment
information on the MEM to determine whether this transformation is safe, and
still benefit from this optimization when it is.

I'm also still not convinced that there aren't any non -Os cases where the
existing TARGET_PARTIAL_... code is incorrect without checking alignment.
Perhaps initially we should disable TARGET_PARTIAL_... only when optimizing
for size? I'm not convinced its perfect, but it would resolve this failure
and not hurt performance at -O2/-O3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (10 preceding siblings ...)
  2004-11-09 20:56 ` roger at eyesopen dot com
@ 2004-11-16 19:40 ` stuart at apple dot com
  2004-11-27 18:28 ` neroden at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-11-16 19:40 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stuart at apple dot com  2004-11-16 19:39 -------
Here is the body of an email I sent to Jan Hubicka concerning this bug.  In the body of the message, 
'you' refers to Jan.
--------------------------------------------------------------
For discussion, here is the pattern in question as it exists on the FSF mainline today:

  1503  ;; Situation is quite tricky about when to choose full sized (SImode) move
  1504  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
  1505  ;; partial register dependency machines (such as AMD Athlon), where QImode
  1506  ;; moves issue extra dependency and for partial register stalls machines
  1507  ;; that don't use QImode patterns (and QImode move cause stall on the next
  1508  ;; instruction).
  1509  ;;
  1510  ;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial
  1511  ;; register stall machines with, where we use QImode instructions, since
  1512  ;; partial register stall can be caused there.  Then we use movzx.
  1513  (define_insn "*movqi_1"
  1514    [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
  1515          (match_operand:QI 1 "general_operand"      " q,qn,qm,q,rn,qm,qn"))]
  1516    "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
  1517  {
  1518    switch (get_attr_type (insn))
  1519      {
  1520      case TYPE_IMOVX:
  1521        if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM)
  1522          abort ();
  1523        return "movz{bl|x}\t{%1, %k0|%k0, %1}";
  1524      default:
  1525        if (get_attr_mode (insn) == MODE_SI)
  1526          return "mov{l}\t{%k1, %k0|%k0, %k1}";
  1527        else
  1528          return "mov{b}\t{%1, %0|%0, %1}";
  1529      }
  1530  }
  1531    [(set (attr "type")
  1532       (cond [(ne (symbol_ref "optimize_size") (const_int 0))
  1533                (const_string "imov")
  1534              (and (eq_attr "alternative" "3")
  1535                   (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1536                            (const_int 0))
  1537                        (eq (symbol_ref "TARGET_QIMODE_MATH")
  1538                            (const_int 0))))
  1539                (const_string "imov")
  1540              (eq_attr "alternative" "3,5")
  1541                (const_string "imovx")
  1542              (and (ne (symbol_ref "TARGET_MOVX")
  1543                       (const_int 0))
  1544                   (eq_attr "alternative" "2"))
  1545                (const_string "imovx")
  1546             ]
  1547             (const_string "imov")))
  1548     (set (attr "mode")
  1549        (cond [(eq_attr "alternative" "3,4,5")
  1550                 (const_string "SI")
  1551               (eq_attr "alternative" "6")
  1552                 (const_string "QI")
  1553               (eq_attr "type" "imovx")
  1554                 (const_string "SI")
  1555               (and (eq_attr "type" "imov")
  1556                    (and (eq_attr "alternative" "0,1,2")
  1557                         (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY")
  1558                             (const_int 0))))
  1559                 (const_string "SI")
  1560               ;; Avoid partial register stalls when not using QImode arithmetic
  1561               (and (eq_attr "type" "imov")
  1562                    (and (eq_attr "alternative" "0,1,2")
  1563                         (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1564                                  (const_int 0))
  1565                              (eq (symbol_ref "TARGET_QIMODE_MATH")
  1566                                  (const_int 0)))))
  1567                 (const_string "SI")
  1568             ]
  1569             (const_string "QI")))])

Roger added lines 1532-1533 in January of this year.  It looks like you added lines 1555-1567 in 2000.

The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use SImode if "imov" and 
byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb".  (The testcase is 
strcpy(); see the Bugzilla.)  For the following discussion, note that GCC currently matches "movqi_1" 
alternative #2 ("q" and "qm" in the attribute list) on the critical byte-fetch-from-memory in the strcpy() 
testcase.

It appears to me that the 1555-1559 clause depends upon any CPU with TARGET_MOVX always 
choosing "movx|movbl" over "imove".  If some not-yet-existing CPU had 
TARGET_PARTIAL_REG_DEPENDENCY but did /not/ have TARGET_MOVX support, I think this would 
generate a 'movl' where a 'movb' is required.  (Yes, I agree no such CPU exists, nor is one likely to be 
built.)  Roger's patch made it choose "imove" because it's a smaller instruction than imovx (one byte 
versus two, plus whatever additional mod/r/m glop).

Furthermore, as Roger pointed out in the Bugzilla, if we've chosen to use imovx, what do we gain by 
marking it with SImode?  It appears to generate the same "movx/movzbl" instruction either way.  (I am 
really confused by this.)

I can think of several potential fixes.  All are suspect, because I'm not convinced I fully understand 
what's going on:

a) revert Rogers patch (I'd prefer to keep it)
b) remove the ",2" from line 1556 (clause won't apply to byte loads)
c) remove lines 1555-1559 (allow byte-register/byte-register mov insns on P4-class CPUs? not clear to 
me)
d) mark the bug "behaves correctly"
e) some idea of Jan's that is much smarter than any of the above

If you immediately see how this should work, please make a suggestion, or even commit a fix.  I claim 
no expertise in this area.

As a side note, I'm curious about this usage of TARGET_QIMODE_MATH; CVS suggests that you made it 
synonymous with TRUE in March 2000.  In the subsequent April, you added lines 1563-1566.  If 
TARGET_QIMODE_MATH is always true, will the clause on lines 1561-1567 ever be used?  Again, I just 
don't understand how this is supposed to work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (11 preceding siblings ...)
  2004-11-16 19:40 ` stuart at apple dot com
@ 2004-11-27 18:28 ` neroden at gcc dot gnu dot org
  2004-12-02  1:08 ` stuart at apple dot com
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: neroden at gcc dot gnu dot org @ 2004-11-27 18:28 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From neroden at gcc dot gnu dot org  2004-11-27 18:28 -------
I suggest excising the TARGET_PARTIAL clauses ASAP, and reinstating them only 
when alignment checking (and/or something else) is used in them to verify that 
the transform is safe.  That seems to be the most correct thing to do. 
If someone is up to writing the alignment checking right now, of course, 
that's great, but I'm not! 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (12 preceding siblings ...)
  2004-11-27 18:28 ` neroden at gcc dot gnu dot org
@ 2004-12-02  1:08 ` stuart at apple dot com
  2004-12-02  3:35 ` neroden at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: stuart at apple dot com @ 2004-12-02  1:08 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stuart at apple dot com  2004-12-02 01:07 -------
Jan emailed this to me privately.  Appended here for completeness.  - stuart
----------------------------------------------------------------
Just to clarify things a bit.  TARGET_MOVX and
TARGET_PARTIAL_REG_DEPENDENCY is not about supporting some feature but
about a way the CPU deals with dependencies on partial registers.  Some
CPUs (Athlon+,P4+) deal with partial register writes as read-modify
operation of the whole thing (TARGET_PARTIAL_REG_DEPENDENCT) and for
some of these it is profitable to do dummy zero extend (TARGET_MOVX)
instead of loads to avoid the dependency, while others (K6, P3) give it
another internal name and don't see the false dependency
(TARGET_PARTIAL_REG_STALL). On the other hand they get penalty if the
result is used as a whole register.  There is unlikely to be ever CPU
spoiled up in both directions..

However the Roger's patch, as I understand it, is about avoiding movx as
it encodes longer on -Os.  It seems to me that for targets not defining
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we should always
produce the straighforward movq as expected, while for
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use
the full moves as long as they don't encode longer.  I can't check right
now, but i believe it is only the movl imm, register that comes out
longer and that is the alternative 2.

We can also probably kill the TARGET_QIMODE_MATH as it is no longer
used.

There is the type and mode argument not only to choose the particular
instruction but also to drive scheduling (K6 for instance has limited
supply of units that do 8bit operations).  imovx is 32bit operation so
it needs to get SImode.  I would simply break out the alternative 2 from
both conditionals and would additionally check optimize_size to be
nonzero

I can prepare patch later next week unless someone beats me :)

Honza

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (13 preceding siblings ...)
  2004-12-02  1:08 ` stuart at apple dot com
@ 2004-12-02  3:35 ` neroden at gcc dot gnu dot org
  2004-12-02  9:02 ` jh at suse dot cz
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: neroden at gcc dot gnu dot org @ 2004-12-02  3:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From neroden at gcc dot gnu dot org  2004-12-02 03:35 -------
Jan's message quoted in the previous comment seems to be orthogonal to the 
main problem in this bug.  The problem is that a word fetch is being generated 
which *reads out of bounds memory*. 
 
If the last byte in a page is being fetched, you must not extend that to fetch 
the next byte; if the first byte in a page is being fetched, you must not 
extend that to fetch the previous byte.  Those are the key failure situations 
which must be prevented, regardless of whether this is -Os, -O2, or -O0. 
 
It doesn't appear to me that the changes proposed in Jan's message actually 
address that, since none of them seem to feature a alignment or other type of 
correctness check before converting a QImode move into a (possibly illegal) 
SImode move. 
 
In particular, this statement looks like it's wrong: 
>while for 
>TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use 
>the full moves as long as they don't encode longer.  
 
I believe a check is required that the full moves don't cause a segmentation 
violation.  An alignment check would be sufficient (it would prohibit more 
cases than strictly necessary, but this is a correctness issue, and I haven't 
thought of a better check). 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (14 preceding siblings ...)
  2004-12-02  3:35 ` neroden at gcc dot gnu dot org
@ 2004-12-02  9:02 ` jh at suse dot cz
  2004-12-10 17:09 ` neroden at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: jh at suse dot cz @ 2004-12-02  9:02 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From jh at suse dot cz  2004-12-02 09:02 -------
Subject: Re:  [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

> 
> ------- Additional Comments From neroden at gcc dot gnu dot org  2004-12-02 03:35 -------
> Jan's message quoted in the previous comment seems to be orthogonal to the 
> main problem in this bug.  The problem is that a word fetch is being generated 
> which *reads out of bounds memory*. 
>  
> If the last byte in a page is being fetched, you must not extend that to fetch 
> the next byte; if the first byte in a page is being fetched, you must not 
> extend that to fetch the previous byte.  Those are the key failure situations 
> which must be prevented, regardless of whether this is -Os, -O2, or -O0. 
>  
> It doesn't appear to me that the changes proposed in Jan's message actually 
> address that, since none of them seem to feature a alignment or other type of 
> correctness check before converting a QImode move into a (possibly illegal) 
> SImode move. 

We don't need to check that since the code always use zero extension
conversion (unless it hit the buggy -Os conditional) and zero extension
still reads byte, just writes whole register.  It is not good idea to
use full sized read even for aligned sources since we will get partial
memory stalls.

We use full sized moves only for register to register moves where it is
safe.

I will prepare the patch later today.
Honza
>  
> In particular, this statement looks like it's wrong: 
> >while for 
> >TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use 
> >the full moves as long as they don't encode longer.  
>  
> I believe a check is required that the full moves don't cause a segmentation 
> violation.  An alignment check would be sufficient (it would prohibit more 
> cases than strictly necessary, but this is a correctness issue, and I haven't 
> thought of a better check). 
>  
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (15 preceding siblings ...)
  2004-12-02  9:02 ` jh at suse dot cz
@ 2004-12-10 17:09 ` neroden at gcc dot gnu dot org
  2004-12-18 22:49 ` steven at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: neroden at gcc dot gnu dot org @ 2004-12-10 17:09 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From neroden at gcc dot gnu dot org  2004-12-10 17:09 -------
Oh, cool.  :-)  Did you get the patch linked up? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (16 preceding siblings ...)
  2004-12-10 17:09 ` neroden at gcc dot gnu dot org
@ 2004-12-18 22:49 ` steven at gcc dot gnu dot org
  2004-12-19 16:16 ` jh at suse dot cz
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-18 22:49 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2004-12-18 22:49 -------
Honza, 
 
PING!!!! 
 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (17 preceding siblings ...)
  2004-12-18 22:49 ` steven at gcc dot gnu dot org
@ 2004-12-19 16:16 ` jh at suse dot cz
  2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: jh at suse dot cz @ 2004-12-19 16:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From jh at suse dot cz  2004-12-19 16:16 -------
Subject: Re:  [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

> 
> ------- Additional Comments From steven at gcc dot gnu dot org  2004-12-18 22:49 -------
> Honza, 
>  
> PING!!!! 

I am still backlogged but I will look into it now.  What about cleaning up tree-inline.c and other
mess?  That thing is hell to merge...

Honza
>  
>  
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (18 preceding siblings ...)
  2004-12-19 16:16 ` jh at suse dot cz
@ 2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
  2004-12-22 12:20 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-19 19:58 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-19 19:58 -------
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01412.html>.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (19 preceding siblings ...)
  2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
@ 2004-12-22 12:20 ` steven at gcc dot gnu dot org
  2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
  2004-12-30 13:21 ` steven at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-22 12:20 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2004-12-22 12:11 -------
One of these days, someone will have to explain to me how such a small
patch for a bug marked "critical regression" can stay unreviewed for
three days...  This kind of patches should *not* need pinging.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (20 preceding siblings ...)
  2004-12-22 12:20 ` steven at gcc dot gnu dot org
@ 2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
  2004-12-30 13:21 ` steven at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2004-12-30 13:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2004-12-30 13:16 -------
Subject: Bug 18019

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	hubicka@gcc.gnu.org	2004-12-30 13:16:14

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.md 

Log message:
	PR target/18019
	* i386.md (movqi_1): Fix -Os instruction choice.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6985&r2=2.6986
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.594&r2=1.595



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
  2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
                   ` (21 preceding siblings ...)
  2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
@ 2004-12-30 13:21 ` steven at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-30 13:21 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2004-12-30 13:21 -------
.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2004-12-30 13:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-15 18:00 [Bug c/18019] New: -march=pentium4 generates word fetch instead of byte fetch stuart at apple dot com
2004-10-15 18:05 ` [Bug c/18019] " stuart at apple dot com
2004-10-15 18:12 ` [Bug target/18019] " pinskia at gcc dot gnu dot org
2004-10-15 18:23 ` pinskia at gcc dot gnu dot org
2004-10-15 18:27 ` stuart at apple dot com
2004-10-15 18:36 ` [Bug target/18019] [4.0 Regression] " pinskia at gcc dot gnu dot org
2004-10-28 15:19 ` steven at gcc dot gnu dot org
2004-10-28 15:22 ` steven at gcc dot gnu dot org
2004-11-09  4:50 ` pinskia at gcc dot gnu dot org
2004-11-09 18:10 ` roger at eyesopen dot com
2004-11-09 19:34 ` stuart at apple dot com
2004-11-09 20:56 ` roger at eyesopen dot com
2004-11-16 19:40 ` stuart at apple dot com
2004-11-27 18:28 ` neroden at gcc dot gnu dot org
2004-12-02  1:08 ` stuart at apple dot com
2004-12-02  3:35 ` neroden at gcc dot gnu dot org
2004-12-02  9:02 ` jh at suse dot cz
2004-12-10 17:09 ` neroden at gcc dot gnu dot org
2004-12-18 22:49 ` steven at gcc dot gnu dot org
2004-12-19 16:16 ` jh at suse dot cz
2004-12-19 19:58 ` pinskia at gcc dot gnu dot org
2004-12-22 12:20 ` steven at gcc dot gnu dot org
2004-12-30 13:16 ` cvs-commit at gcc dot gnu dot org
2004-12-30 13:21 ` steven at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).