From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30579 invoked by alias); 8 Dec 2002 06:06:01 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 30560 invoked by uid 71); 8 Dec 2002 06:06:01 -0000 Resent-Date: 8 Dec 2002 06:06:01 -0000 Resent-Message-ID: <20021208060601.30559.qmail@sources.redhat.com> Resent-From: gcc-gnats@gcc.gnu.org (GNATS Filer) Resent-Cc: gcc-prs@gcc.gnu.org, gcc-bugs@gcc.gnu.org Resent-Reply-To: gcc-gnats@gcc.gnu.org, otaylor@redhat.com Received: (qmail 30033 invoked by uid 61); 8 Dec 2002 05:59:46 -0000 Message-Id: <20021208055946.30032.qmail@sources.redhat.com> Date: Sat, 07 Dec 2002 22:06:00 -0000 From: otaylor@redhat.com Reply-To: otaylor@redhat.com To: gcc-gnats@gcc.gnu.org X-Send-Pr-Version: gnatsweb-2.9.3 (1.1.1.1.2.31) Subject: target/8871: Inefficient zero_extendsidi2 for MMX X-SW-Source: 2002-12/txt/msg00458.txt.bz2 List-Id: >Number: 8871 >Category: target >Synopsis: Inefficient zero_extendsidi2 for MMX >Confidential: no >Severity: serious >Priority: medium >Responsible: unassigned >State: open >Class: pessimizes-code >Submitter-Id: net >Arrival-Date: Sat Dec 07 22:06:01 PST 2002 >Closed-Date: >Last-Modified: >Originator: otaylor@redhat.com >Release: CVS Head, 7 December 2002 >Organization: >Environment: Linux/ia32 >Description: When moving a 32-bit quantity into an MMX register, GCC first zero-extends it as if doing 64-bit arithmetic emulation, then uses movq to move it into the register. So, code like: === xorl %edx, %edx movl %eax, -16(%ebp) movl %edx, -12(%ebp) movq -16(%ebp), %mm1 === Instead of simply: === movd %eax, %mm1 === This (and associated overhead) causes a pretty big hit for the typical uses of MMX.... the attached demonstration patch improved one alpha-compositing routine from 29 million pixels/sec to 51 million pixels/sec. (With the patch, results for a range of routines were comparable to hand-written assembly.) The attached patch just replaces the existing patterns for zero_extendsidi2 with a pattern using movd. This is clearly wrong, but my minimal GCC hacking skills proved unequal to integrating it in properly. >How-To-Repeat: A simple example demonstrating the code generation is: === typedef int di __attribute__ ((mode(DI))); di foo (unsigned int a, unsigned int b) { return __builtin_ia32_por (a, b); } === >Fix: >Release-Note: >Audit-Trail: >Unformatted: ----gnatsweb-attachment---- Content-Type: application/octet-stream; name="zero_extend.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="zero_extend.patch" SW5kZXg6IGkzODYubWQKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL2N2c3Jvb3QvZ2NjL2djYy9nY2Mv Y29uZmlnL2kzODYvaTM4Ni5tZCx2CnJldHJpZXZpbmcgcmV2aXNpb24gMS40MDQKZGlmZiAtdSAt cCAtcjEuNDA0IGkzODYubWQKLS0tIGkzODYubWQJMTkgTm92IDIwMDIgMjI6NTI6NDAgLTAwMDAJ MS40MDQKKysrIGkzODYubWQJOCBEZWMgMjAwMiAwNTo0MjozNCAtMDAwMApAQCAtMzAzMSw2MCAr MzAzMSwxMiBAQAogCSAgICAgIChjbG9iYmVyIChyZWc6Q0MgMTcpKV0pXQogICAiIikKIAotOzsg JSUlIEtpbGwgbWUgb25jZSBtdWx0aS13b3JkIG9wcyBhcmUgc2FuZS4KLShkZWZpbmVfZXhwYW5k ICJ6ZXJvX2V4dGVuZHNpZGkyIgotICBbKHNldCAobWF0Y2hfb3BlcmFuZDpESSAwICJyZWdpc3Rl cl9vcGVyYW5kIiAiPXIiKQotICAgICAoemVyb19leHRlbmQ6REkgKG1hdGNoX29wZXJhbmQ6U0kg MSAibm9uaW1tZWRpYXRlX29wZXJhbmQiICJybSIpKSldCi0gICIiCi0gICJpZiAoIVRBUkdFVF82 NEJJVCkKLSAgICAgewotICAgICAgIGVtaXRfaW5zbiAoZ2VuX3plcm9fZXh0ZW5kc2lkaTJfMzIg KG9wZXJhbmRzWzBdLCBvcGVyYW5kc1sxXSkpOwotICAgICAgIERPTkU7Ci0gICAgIH0KLSAgIikK LQotKGRlZmluZV9pbnNuICJ6ZXJvX2V4dGVuZHNpZGkyXzMyIgotICBbKHNldCAobWF0Y2hfb3Bl cmFuZDpESSAwICJub25pbW1lZGlhdGVfb3BlcmFuZCIgIj1yLD9yLD8qbyIpCi0JKHplcm9fZXh0 ZW5kOkRJIChtYXRjaF9vcGVyYW5kOlNJIDEgIm5vbmltbWVkaWF0ZV9vcGVyYW5kIiAiMCxybSxy IikpKQotICAgKGNsb2JiZXIgKHJlZzpDQyAxNykpXQotICAiIVRBUkdFVF82NEJJVCIKLSAgIiMi Ci0gIFsoc2V0X2F0dHIgIm1vZGUiICJTSSIpXSkKLQotKGRlZmluZV9pbnNuICJ6ZXJvX2V4dGVu ZHNpZGkyX3JleDY0IgotICBbKHNldCAobWF0Y2hfb3BlcmFuZDpESSAwICJub25pbW1lZGlhdGVf b3BlcmFuZCIgIj1yLG8iKQotICAgICAoemVyb19leHRlbmQ6REkgKG1hdGNoX29wZXJhbmQ6U0kg MSAibm9uaW1tZWRpYXRlX29wZXJhbmQiICJybSwwIikpKV0KLSAgIlRBUkdFVF82NEJJVCIKLSAg IkAKLSAgIG1vdlx0eyVrMSwgJWswfCVrMCwgJWsxfQotICAgIyIKLSAgWyhzZXRfYXR0ciAidHlw ZSIgImltb3Z4LGltb3YiKQotICAgKHNldF9hdHRyICJtb2RlIiAiU0ksREkiKV0pCi0KLShkZWZp bmVfc3BsaXQKLSAgWyhzZXQgKG1hdGNoX29wZXJhbmQ6REkgMCAibWVtb3J5X29wZXJhbmQiICIi KQotICAgICAoemVyb19leHRlbmQ6REkgKG1hdGNoX2R1cCAwKSkpXQotICAiVEFSR0VUXzY0QklU IgotICBbKHNldCAobWF0Y2hfZHVwIDQpIChjb25zdF9pbnQgMCkpXQotICAic3BsaXRfZGkgKCZv cGVyYW5kc1swXSwgMSwgJm9wZXJhbmRzWzNdLCAmb3BlcmFuZHNbNF0pOyIpCi0KLShkZWZpbmVf c3BsaXQgCi0gIFsoc2V0IChtYXRjaF9vcGVyYW5kOkRJIDAgInJlZ2lzdGVyX29wZXJhbmQiICIi KQotCSh6ZXJvX2V4dGVuZDpESSAobWF0Y2hfb3BlcmFuZDpTSSAxICJyZWdpc3Rlcl9vcGVyYW5k IiAiIikpKQotICAgKGNsb2JiZXIgKHJlZzpDQyAxNykpXQotICAiIVRBUkdFVF82NEJJVCAmJiBy ZWxvYWRfY29tcGxldGVkCi0gICAmJiB0cnVlX3JlZ251bSAob3BlcmFuZHNbMF0pID09IHRydWVf cmVnbnVtIChvcGVyYW5kc1sxXSkiCi0gIFsoc2V0IChtYXRjaF9kdXAgNCkgKGNvbnN0X2ludCAw KSldCi0gICJzcGxpdF9kaSAoJm9wZXJhbmRzWzBdLCAxLCAmb3BlcmFuZHNbM10sICZvcGVyYW5k c1s0XSk7IikKLQotKGRlZmluZV9zcGxpdCAKLSAgWyhzZXQgKG1hdGNoX29wZXJhbmQ6REkgMCAi bm9uaW1tZWRpYXRlX29wZXJhbmQiICIiKQotCSh6ZXJvX2V4dGVuZDpESSAobWF0Y2hfb3BlcmFu ZDpTSSAxICJnZW5lcmFsX29wZXJhbmQiICIiKSkpCi0gICAoY2xvYmJlciAocmVnOkNDIDE3KSld Ci0gICIhVEFSR0VUXzY0QklUICYmIHJlbG9hZF9jb21wbGV0ZWQiCi0gIFsoc2V0IChtYXRjaF9k dXAgMykgKG1hdGNoX2R1cCAxKSkKLSAgIChzZXQgKG1hdGNoX2R1cCA0KSAoY29uc3RfaW50IDAp KV0KLSAgInNwbGl0X2RpICgmb3BlcmFuZHNbMF0sIDEsICZvcGVyYW5kc1szXSwgJm9wZXJhbmRz WzRdKTsiKQorKGRlZmluZV9pbnNuICJ6ZXJvX2V4dGVuZHNpZGkyIgorICBbKHNldCAobWF0Y2hf b3BlcmFuZDpESSAwICJub25pbW1lZGlhdGVfb3BlcmFuZCIgIj15IikKKwkoemVyb19leHRlbmQ6 REkgKG1hdGNoX29wZXJhbmQ6U0kgMSAibm9uaW1tZWRpYXRlX29wZXJhbmQiICJybSIpKSldCisg ICJUQVJHRVRfTU1YIgorICAibW92ZFx0eyUxLCAlMHwlMCwgJTF9IgorICBbKHNldF9hdHRyICJt b2RlIiAiREkiKV0pCiAKIChkZWZpbmVfaW5zbiAiemVyb19leHRlbmRoaWRpMiIKICAgWyhzZXQg KG1hdGNoX29wZXJhbmQ6REkgMCAicmVnaXN0ZXJfb3BlcmFuZCIgIj1yLHIiKQo=