From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24110 invoked by alias); 9 Sep 2010 23:13:48 -0000 Received: (qmail 24094 invoked by uid 22791); 9 Sep 2010 23:13:47 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from tx2ehsobe002.messaging.microsoft.com (HELO TX2EHSOBE004.bigfish.com) (65.55.88.12) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 09 Sep 2010 23:13:37 +0000 Received: from mail16-tx2-R.bigfish.com (10.9.14.251) by TX2EHSOBE004.bigfish.com (10.9.40.24) with Microsoft SMTP Server id 8.1.340.0; Thu, 9 Sep 2010 23:13:35 +0000 Received: from mail16-tx2 (localhost.localdomain [127.0.0.1]) by mail16-tx2-R.bigfish.com (Postfix) with ESMTP id 221111C20166; Thu, 9 Sep 2010 23:13:35 +0000 (UTC) X-SpamScore: -11 X-BigFish: VPS-11(z3b68iz146fK1432N4015Lzz1202hzzz32i2a8h34h62h) X-Spam-TCS-SCL: 1:0 Received: from mail16-tx2 (localhost.localdomain [127.0.0.1]) by mail16-tx2 (MessageSwitch) id 1284074014827127_2512; Thu, 9 Sep 2010 23:13:34 +0000 (UTC) Received: from TX2EHSMHS047.bigfish.com (unknown [10.9.14.238]) by mail16-tx2.bigfish.com (Postfix) with ESMTP id BC06C14F004C; Thu, 9 Sep 2010 23:13:34 +0000 (UTC) Received: from ausb3extmailp02.amd.com (163.181.251.22) by TX2EHSMHS047.bigfish.com (10.9.99.147) with Microsoft SMTP Server (TLS) id 14.0.482.44; Thu, 9 Sep 2010 23:13:24 +0000 Received: from ausb3twp01.amd.com (ausb3twp01.amd.com [163.181.250.37]) by ausb3extmailp02.amd.com (Switch-3.2.7/Switch-3.2.7) with SMTP id o89NBjEl013418; Thu, 9 Sep 2010 18:12:19 -0500 X-M-MSG: Received: from sausexhtp02.amd.com (sausexhtp02.amd.com [163.181.3.152]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by ausb3twp01.amd.com (Tumbleweed MailGate 3.7.2) with ESMTP id 2773510286AF; Thu, 9 Sep 2010 18:09:22 -0500 (CDT) Received: from SAUSEXMBP01.amd.com ([163.181.3.198]) by sausexhtp02.amd.com ([163.181.3.152]) with mapi; Thu, 9 Sep 2010 18:09:23 -0500 From: "Fang, Changpeng" To: Ian Bolton CC: "gcc@gcc.gnu.org" Date: Thu, 09 Sep 2010 23:13:00 -0000 Subject: RE: How to avoid auto-vectorization for this loop (rolls at most 3 times) Message-ID: References: ,<680044E4997F5343A2C58032DDD099161733F9@ZIPPY.Emea.Arm.com> In-Reply-To: <680044E4997F5343A2C58032DDD099161733F9@ZIPPY.Emea.Arm.com> Content-Type: multipart/mixed; boundary="_003_D4C76825A6780047854A11E93CDE84D05B04FB5DSAUSEXMBP01amdc_" MIME-Version: 1.0 X-Reverse-DNS: ausb3extmailp02.amd.com Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2010-09/txt/msg00135.txt.bz2 --_003_D4C76825A6780047854A11E93CDE84D05B04FB5DSAUSEXMBP01amdc_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-length: 973 >> It seems the auto-vectorizer could not recognize that this loop will >> roll at most 3 times. >> And it will generate quite messy code. >> >> int a[1024], b[1024]; >> void foo (int n) >> { >> int i; >> for (i =3D (n/4)*4; i< n; i++) >> a[i] =3D a[i] + b[i]; >> } >> >> How can we correctly estimate the number of iterations for this case >> and use this info for the vectorizer? >Does it recognise it if you rewrite the loop as follows: >for (i =3D n&~0x3; i< n; i++) > a[i] =3D a[i] + b[i]; NO.=20=20 But it is OK for the following case: for (i =3D n-3; i< n; i++) a[i] =3D a[i] + b[i]; It seems it fails at the case of "unknown but small". Anyway, this mostly affects compilation time and code size, and has limited impact on=20 performance. For for (i =3D n&~0x3; i< n; i++) a[i] =3D a[i] + b[i];=20 The attached foo-O3-no-tree-vectorize.s is what we expect from the optimize= r. foo-O3.s is too bad. Thanks, Changpeng =20= --_003_D4C76825A6780047854A11E93CDE84D05B04FB5DSAUSEXMBP01amdc_ Content-Type: application/octet-stream; name="foo-O3-no-tree-vectorize.s" Content-Description: foo-O3-no-tree-vectorize.s Content-Disposition: attachment; filename="foo-O3-no-tree-vectorize.s"; size=500; creation-date="Thu, 09 Sep 2010 18:06:55 GMT"; modification-date="Thu, 09 Sep 2010 18:06:55 GMT" Content-Transfer-Encoding: base64 Content-length: 680 CS5maWxlCSJmb28xLmMiCgkudGV4dAoJLnAyYWxpZ24gNCwsMTUKCS5nbG9i bAlmb28KCS50eXBlCWZvbywgQGZ1bmN0aW9uCmZvbzoKLkxGQjA6CgkuY2Zp X3N0YXJ0cHJvYwoJbW92bAklZWRpLCAlZWF4CglhbmRsCSQtNCwgJWVheAoJ Y21wbAklZWF4LCAlZWRpCglqbGUJLkwxCgkucDJhbGlnbiA0LCwxMAoJLnAy YWxpZ24gMwouTDQ6Cgltb3ZzbHEJJWVheCwgJXJkeAoJYWRkbAkkMSwgJWVh eAoJbW92bAlhKCwlcmR4LDQpLCAlZWN4CglhZGRsCWIoLCVyZHgsNCksICVl Y3gKCWNtcGwJJWVkaSwgJWVheAoJbW92bAklZWN4LCBhKCwlcmR4LDQpCglq bmUJLkw0Ci5MMToKCXJlcAoJcmV0CgkuY2ZpX2VuZHByb2MKLkxGRTA6Cgku c2l6ZQlmb28sIC4tZm9vCgkuY29tbQliLDQwOTYsMzIKCS5jb21tCWEsNDA5 NiwzMgoJLmlkZW50CSJHQ0M6IChHTlUpIDQuNi4wIDIwMTAwODMxIChleHBl cmltZW50YWwpIgoJLnNlY3Rpb24JLm5vdGUuR05VLXN0YWNrLCIiLEBwcm9n Yml0cwo= --_003_D4C76825A6780047854A11E93CDE84D05B04FB5DSAUSEXMBP01amdc_ Content-Type: application/octet-stream; name="foo-O3.s" Content-Description: foo-O3.s Content-Disposition: attachment; filename="foo-O3.s"; size=1634; creation-date="Thu, 09 Sep 2010 18:07:10 GMT"; modification-date="Thu, 09 Sep 2010 18:07:10 GMT" Content-Transfer-Encoding: base64 Content-length: 2217 CS5maWxlCSJmb28xLmMiCgkudGV4dAoJLnAyYWxpZ24gNCwsMTUKCS5nbG9i bAlmb28KCS50eXBlCWZvbywgQGZ1bmN0aW9uCmZvbzoKLkxGQjA6CgkuY2Zp X3N0YXJ0cHJvYwoJbW92bAklZWRpLCAlZXNpCglwdXNocQklcmJwCgkuY2Zp X2RlZl9jZmFfb2Zmc2V0IDE2CglhbmRsCSQtNCwgJWVzaQoJY21wbAklZXNp LCAlZWRpCglwdXNocQklcmJ4CgkuY2ZpX2RlZl9jZmFfb2Zmc2V0IDI0Cglq bGUJLkwxCgkuY2ZpX29mZnNldCAzLCAtMjQKCS5jZmlfb2Zmc2V0IDYsIC0x NgoJbW92c2xxCSVlc2ksICVyYngKCW1vdmwJJWVkaSwgJXI5ZAoJbGVhcQlh KCwlcmJ4LDQpLCAlcjgKCXN1YmwJJWVzaSwgJXI5ZAoJYW5kbAkkMTUsICVy OGQKCXNocnEJJDIsICVyOAoJbmVnbAklcjhkCglhbmRsCSQzLCAlcjhkCglj bXBsCSVyOWQsICVyOGQKCWNtb3ZhCSVyOWQsICVyOGQKCXRlc3RsCSVyOGQs ICVyOGQKCW1vdgklcjhkLCAlZWJwCglqZQkuTDgKCW1vdmwJJWVzaSwgJWVh eAoJLnAyYWxpZ24gNCwsMTAKCS5wMmFsaWduIDMKLkw0OgoJbW92c2xxCSVl YXgsICVyZHgKCWFkZGwJJDEsICVlYXgKCW1vdmwJYSgsJXJkeCw0KSwgJWVj eAoJYWRkbAliKCwlcmR4LDQpLCAlZWN4Cgltb3ZsCSVlY3gsIGEoLCVyZHgs NCkKCW1vdmwJJWVheCwgJWVkeAoJc3VibAklZXNpLCAlZWR4CgljbXBsCSVl ZHgsICVyOGQKCWphCS5MNAoJY21wbAklcjhkLCAlcjlkCglqZQkuTDEKLkwz OgoJbW92bAklcjlkLCAlcjExZAoJc3VibAklcjhkLCAlcjExZAoJbW92bAkl cjExZCwgJXI4ZAoJc2hybAkkMiwgJXI4ZAoJbGVhbAkwKCwlcjgsNCksICVy MTBkCgl0ZXN0bAklcjEwZCwgJXIxMGQKCWplCS5MOQoJbGVhcQkoJXJieCwl cmJwKSwgJXI5Cgl4b3JsCSVlZHgsICVlZHgKCXhvcmwJJWVjeCwgJWVjeAoJ c2FscQkkMiwgJXI5CglsZWFxCWEoJXI5KSwgJXJzaQoJYWRkcQkkYiwgJXI5 CgkucDJhbGlnbiA0LCwxMAoJLnAyYWxpZ24gMwouTDY6Cgltb3ZkcXUJKCVy OSwlcmR4KSwgJXhtbTAKCWFkZGwJJDEsICVlY3gKCXBhZGRkCSglcnNpLCVy ZHgpLCAleG1tMAoJbW92ZHFhCSV4bW0wLCAoJXJzaSwlcmR4KQoJYWRkcQkk MTYsICVyZHgKCWNtcGwJJXI4ZCwgJWVjeAoJamIJLkw2CglhZGRsCSVyMTBk LCAlZWF4CgljbXBsCSVyMTBkLCAlcjExZAoJamUJLkwxCgkucDJhbGlnbiA0 LCwxMAoJLnAyYWxpZ24gMwouTDk6Cgltb3ZzbHEJJWVheCwgJXJkeAoJYWRk bAkkMSwgJWVheAoJbW92bAlhKCwlcmR4LDQpLCAlZWN4CglhZGRsCWIoLCVy ZHgsNCksICVlY3gKCWNtcGwJJWVheCwgJWVkaQoJbW92bAklZWN4LCBhKCwl cmR4LDQpCglqZwkuTDkKLkwxOgoJcG9wcQklcmJ4CgkuY2ZpX3JlbWVtYmVy X3N0YXRlCgkuY2ZpX2RlZl9jZmFfb2Zmc2V0IDE2Cglwb3BxCSVyYnAKCS5j ZmlfZGVmX2NmYV9vZmZzZXQgOAoJcmV0Ci5MODoKCS5jZmlfcmVzdG9yZV9z dGF0ZQoJbW92bAklZXNpLCAlZWF4CglqbXAJLkwzCgkuY2ZpX2VuZHByb2MK LkxGRTA6Cgkuc2l6ZQlmb28sIC4tZm9vCgkuY29tbQliLDQwOTYsMzIKCS5j b21tCWEsNDA5NiwzMgoJLmlkZW50CSJHQ0M6IChHTlUpIDQuNi4wIDIwMTAw ODMxIChleHBlcmltZW50YWwpIgoJLnNlY3Rpb24JLm5vdGUuR05VLXN0YWNr LCIiLEBwcm9nYml0cwo= --_003_D4C76825A6780047854A11E93CDE84D05B04FB5DSAUSEXMBP01amdc_--