From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31990 invoked by alias); 16 Nov 2004 19:40:20 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 31876 invoked by uid 48); 16 Nov 2004 19:40:07 -0000 Date: Tue, 16 Nov 2004 19:40:00 -0000 Message-ID: <20041116194007.31871.qmail@sourceware.org> From: "stuart at apple dot com" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20041015180025.18019.stuart@apple.com> References: <20041015180025.18019.stuart@apple.com> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch X-Bugzilla-Reason: CC X-SW-Source: 2004-11/txt/msg01999.txt.bz2 List-Id: ------- Additional Comments From stuart at apple dot com 2004-11-16 19:39 ------- Here is the body of an email I sent to Jan Hubicka concerning this bug. In the body of the message, 'you' refers to Jan. -------------------------------------------------------------- For discussion, here is the pattern in question as it exists on the FSF mainline today: 1503 ;; Situation is quite tricky about when to choose full sized (SImode) move 1504 ;; over QImode moves. For Q_REG -> Q_REG move we use full size only for 1505 ;; partial register dependency machines (such as AMD Athlon), where QImode 1506 ;; moves issue extra dependency and for partial register stalls machines 1507 ;; that don't use QImode patterns (and QImode move cause stall on the next 1508 ;; instruction). 1509 ;; 1510 ;; For loads of Q_REG to NONQ_REG we use full sized moves except for partial 1511 ;; register stall machines with, where we use QImode instructions, since 1512 ;; partial register stall can be caused there. Then we use movzx. 1513 (define_insn "*movqi_1" 1514 [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m") 1515 (match_operand:QI 1 "general_operand" " q,qn,qm,q,rn,qm,qn"))] 1516 "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM" 1517 { 1518 switch (get_attr_type (insn)) 1519 { 1520 case TYPE_IMOVX: 1521 if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM) 1522 abort (); 1523 return "movz{bl|x}\t{%1, %k0|%k0, %1}"; 1524 default: 1525 if (get_attr_mode (insn) == MODE_SI) 1526 return "mov{l}\t{%k1, %k0|%k0, %k1}"; 1527 else 1528 return "mov{b}\t{%1, %0|%0, %1}"; 1529 } 1530 } 1531 [(set (attr "type") 1532 (cond [(ne (symbol_ref "optimize_size") (const_int 0)) 1533 (const_string "imov") 1534 (and (eq_attr "alternative" "3") 1535 (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL") 1536 (const_int 0)) 1537 (eq (symbol_ref "TARGET_QIMODE_MATH") 1538 (const_int 0)))) 1539 (const_string "imov") 1540 (eq_attr "alternative" "3,5") 1541 (const_string "imovx") 1542 (and (ne (symbol_ref "TARGET_MOVX") 1543 (const_int 0)) 1544 (eq_attr "alternative" "2")) 1545 (const_string "imovx") 1546 ] 1547 (const_string "imov"))) 1548 (set (attr "mode") 1549 (cond [(eq_attr "alternative" "3,4,5") 1550 (const_string "SI") 1551 (eq_attr "alternative" "6") 1552 (const_string "QI") 1553 (eq_attr "type" "imovx") 1554 (const_string "SI") 1555 (and (eq_attr "type" "imov") 1556 (and (eq_attr "alternative" "0,1,2") 1557 (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY") 1558 (const_int 0)))) 1559 (const_string "SI") 1560 ;; Avoid partial register stalls when not using QImode arithmetic 1561 (and (eq_attr "type" "imov") 1562 (and (eq_attr "alternative" "0,1,2") 1563 (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL") 1564 (const_int 0)) 1565 (eq (symbol_ref "TARGET_QIMODE_MATH") 1566 (const_int 0))))) 1567 (const_string "SI") 1568 ] 1569 (const_string "QI")))]) Roger added lines 1532-1533 in January of this year. It looks like you added lines 1555-1567 in 2000. The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use SImode if "imov" and byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb". (The testcase is strcpy(); see the Bugzilla.) For the following discussion, note that GCC currently matches "movqi_1" alternative #2 ("q" and "qm" in the attribute list) on the critical byte-fetch-from-memory in the strcpy() testcase. It appears to me that the 1555-1559 clause depends upon any CPU with TARGET_MOVX always choosing "movx|movbl" over "imove". If some not-yet-existing CPU had TARGET_PARTIAL_REG_DEPENDENCY but did /not/ have TARGET_MOVX support, I think this would generate a 'movl' where a 'movb' is required. (Yes, I agree no such CPU exists, nor is one likely to be built.) Roger's patch made it choose "imove" because it's a smaller instruction than imovx (one byte versus two, plus whatever additional mod/r/m glop). Furthermore, as Roger pointed out in the Bugzilla, if we've chosen to use imovx, what do we gain by marking it with SImode? It appears to generate the same "movx/movzbl" instruction either way. (I am really confused by this.) I can think of several potential fixes. All are suspect, because I'm not convinced I fully understand what's going on: a) revert Rogers patch (I'd prefer to keep it) b) remove the ",2" from line 1556 (clause won't apply to byte loads) c) remove lines 1555-1559 (allow byte-register/byte-register mov insns on P4-class CPUs? not clear to me) d) mark the bug "behaves correctly" e) some idea of Jan's that is much smarter than any of the above If you immediately see how this should work, please make a suggestion, or even commit a fix. I claim no expertise in this area. As a side note, I'm curious about this usage of TARGET_QIMODE_MATH; CVS suggests that you made it synonymous with TRUE in March 2000. In the subsequent April, you added lines 1563-1566. If TARGET_QIMODE_MATH is always true, will the clause on lines 1561-1567 ever be used? Again, I just don't understand how this is supposed to work. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019