From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-118590-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31990 invoked by alias); 16 Nov 2004 19:40:20 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 31876 invoked by uid 48); 16 Nov 2004 19:40:07 -0000
Date: Tue, 16 Nov 2004 19:40:00 -0000
Message-ID: <20041116194007.31871.qmail@sourceware.org>
From: "stuart at apple dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
In-Reply-To: <20041015180025.18019.stuart@apple.com>
References: <20041015180025.18019.stuart@apple.com>
Reply-To: gcc-bugzilla@gcc.gnu.org
Subject: [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch
X-Bugzilla-Reason: CC
X-SW-Source: 2004-11/txt/msg01999.txt.bz2
List-Id: <gcc-bugs.sourceware.org>


------- Additional Comments From stuart at apple dot com  2004-11-16 19:39 -------
Here is the body of an email I sent to Jan Hubicka concerning this bug.  In the body of the message, 
'you' refers to Jan.
--------------------------------------------------------------
For discussion, here is the pattern in question as it exists on the FSF mainline today:

  1503  ;; Situation is quite tricky about when to choose full sized (SImode) move
  1504  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
  1505  ;; partial register dependency machines (such as AMD Athlon), where QImode
  1506  ;; moves issue extra dependency and for partial register stalls machines
  1507  ;; that don't use QImode patterns (and QImode move cause stall on the next
  1508  ;; instruction).
  1509  ;;
  1510  ;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial
  1511  ;; register stall machines with, where we use QImode instructions, since
  1512  ;; partial register stall can be caused there.  Then we use movzx.
  1513  (define_insn "*movqi_1"
  1514    [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
  1515          (match_operand:QI 1 "general_operand"      " q,qn,qm,q,rn,qm,qn"))]
  1516    "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
  1517  {
  1518    switch (get_attr_type (insn))
  1519      {
  1520      case TYPE_IMOVX:
  1521        if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM)
  1522          abort ();
  1523        return "movz{bl|x}\t{%1, %k0|%k0, %1}";
  1524      default:
  1525        if (get_attr_mode (insn) == MODE_SI)
  1526          return "mov{l}\t{%k1, %k0|%k0, %k1}";
  1527        else
  1528          return "mov{b}\t{%1, %0|%0, %1}";
  1529      }
  1530  }
  1531    [(set (attr "type")
  1532       (cond [(ne (symbol_ref "optimize_size") (const_int 0))
  1533                (const_string "imov")
  1534              (and (eq_attr "alternative" "3")
  1535                   (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1536                            (const_int 0))
  1537                        (eq (symbol_ref "TARGET_QIMODE_MATH")
  1538                            (const_int 0))))
  1539                (const_string "imov")
  1540              (eq_attr "alternative" "3,5")
  1541                (const_string "imovx")
  1542              (and (ne (symbol_ref "TARGET_MOVX")
  1543                       (const_int 0))
  1544                   (eq_attr "alternative" "2"))
  1545                (const_string "imovx")
  1546             ]
  1547             (const_string "imov")))
  1548     (set (attr "mode")
  1549        (cond [(eq_attr "alternative" "3,4,5")
  1550                 (const_string "SI")
  1551               (eq_attr "alternative" "6")
  1552                 (const_string "QI")
  1553               (eq_attr "type" "imovx")
  1554                 (const_string "SI")
  1555               (and (eq_attr "type" "imov")
  1556                    (and (eq_attr "alternative" "0,1,2")
  1557                         (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY")
  1558                             (const_int 0))))
  1559                 (const_string "SI")
  1560               ;; Avoid partial register stalls when not using QImode arithmetic
  1561               (and (eq_attr "type" "imov")
  1562                    (and (eq_attr "alternative" "0,1,2")
  1563                         (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1564                                  (const_int 0))
  1565                              (eq (symbol_ref "TARGET_QIMODE_MATH")
  1566                                  (const_int 0)))))
  1567                 (const_string "SI")
  1568             ]
  1569             (const_string "QI")))])

Roger added lines 1532-1533 in January of this year.  It looks like you added lines 1555-1567 in 2000.

The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use SImode if "imov" and 
byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb".  (The testcase is 
strcpy(); see the Bugzilla.)  For the following discussion, note that GCC currently matches "movqi_1" 
alternative #2 ("q" and "qm" in the attribute list) on the critical byte-fetch-from-memory in the strcpy() 
testcase.

It appears to me that the 1555-1559 clause depends upon any CPU with TARGET_MOVX always 
choosing "movx|movbl" over "imove".  If some not-yet-existing CPU had 
TARGET_PARTIAL_REG_DEPENDENCY but did /not/ have TARGET_MOVX support, I think this would 
generate a 'movl' where a 'movb' is required.  (Yes, I agree no such CPU exists, nor is one likely to be 
built.)  Roger's patch made it choose "imove" because it's a smaller instruction than imovx (one byte 
versus two, plus whatever additional mod/r/m glop).

Furthermore, as Roger pointed out in the Bugzilla, if we've chosen to use imovx, what do we gain by 
marking it with SImode?  It appears to generate the same "movx/movzbl" instruction either way.  (I am 
really confused by this.)

I can think of several potential fixes.  All are suspect, because I'm not convinced I fully understand 
what's going on:

a) revert Rogers patch (I'd prefer to keep it)
b) remove the ",2" from line 1556 (clause won't apply to byte loads)
c) remove lines 1555-1559 (allow byte-register/byte-register mov insns on P4-class CPUs? not clear to 
me)
d) mark the bug "behaves correctly"
e) some idea of Jan's that is much smarter than any of the above

If you immediately see how this should work, please make a suggestion, or even commit a fix.  I claim 
no expertise in this area.

As a side note, I'm curious about this usage of TARGET_QIMODE_MATH; CVS suggests that you made it 
synonymous with TRUE in March 2000.  In the subsequent April, you added lines 1563-1566.  If 
TARGET_QIMODE_MATH is always true, will the clause on lines 1561-1567 ever be used?  Again, I just 
don't understand how this is supposed to work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019