From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14822 invoked by alias); 17 May 2012 18:35:34 -0000 Received: (qmail 14711 invoked by uid 22791); 17 May 2012 18:35:33 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_DD,TW_DQ,TW_PX,TW_VD,TW_XX,TW_ZJ X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 17 May 2012 18:35:21 +0000 From: "ubizjak at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 Date: Thu, 17 May 2012 18:35:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: ubizjak at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.0 X-Bugzilla-Changed-Fields: Status Last reconfirmed Ever Confirmed Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-05/txt/msg01750.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346 Uros Bizjak changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2012-05-17 Ever Confirmed|0 |1 --- Comment #3 from Uros Bizjak 2012-05-17 18:29:12 UTC --- Confirmed, -O2 vs. -O2 -ftree-vectorize on x86_64: -O2 -ftree-vectorize: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 43.83 9.73 9.73 64 0.15 0.15 cptrf2_ 40.68 18.76 9.03 6685 0.00 0.00 trs2a2.2054 7.70 20.47 1.71 64 0.03 0.03 gentrs_ 1.49 20.80 0.33 64 0.01 0.01 cptrf1_ 1.40 21.11 0.31 1 0.31 12.33 matsim_ 1.40 21.42 0.31 6685 0.00 0.00 invima.2045 1.13 21.67 0.25 64 0.00 0.00 cmpcpt_ -O2: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 55.20 9.20 9.20 6685 0.00 0.00 trs2a2.2054 23.40 13.10 3.90 64 0.06 0.06 cptrf2_ 10.38 14.83 1.73 64 0.03 0.03 gentrs_ 2.58 15.26 0.43 64 0.01 0.01 cptrf1_ 2.34 15.65 0.39 6685 0.00 0.00 invima.2045 1.98 15.98 0.33 1 0.33 6.58 matsim_ 1.14 16.17 0.19 64 0.00 0.00 cmpcpt_ cptrf2_ runtime increased for almost 6 seconds! The only vectorization is in: 3530: LOOP VECTORIZED. rnflow.f90:3510: note: vectorized 1 loops in function. Which corresponds to: ! ______________________________________________________________________ real, dimension (1:nxtr), intent (in) :: xxtrt ! extrema integer, intent (in) :: nxtr ! leur nombre integer, dimension (1:nxtr), intent (out) :: ixtrt ! indices integer, intent (out) :: kerr ! code d'erreur ! ______________________________________________________________________ ! kerr = 0 ixtrt = 0 <<<<<<<<<<<<<< HERE This vectorization results in zeroing of certain memory area: pxor %xmm0, %xmm0 leaq (%rdx,%r8,4), %r8 xorl %esi, %esi .p2align 4,,10 .p2align 3 .L183: addq $1, %rsi movdqa %xmm0, (%r8) addq $16, %r8 cmpq %rsi, %r11 ja .L183 And this causes 6 second difference ?!