From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26347 invoked by alias); 12 Oct 2012 13:41:59 -0000 Received: (qmail 26296 invoked by uid 48); 12 Oct 2012 13:41:37 -0000 From: "glisse at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element Date: Fri, 12 Oct 2012 13:41:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: glisse at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-10/txt/msg01182.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855 --- Comment #2 from Marc Glisse 2012-10-12 13:41:35 UTC --- (In reply to comment #1) > Does not work for + though, as -0.0 + 0.0 is 0.0. [...] > On the tree level we see in-memory v because of the component modification: > > _7 = BIT_FIELD_REF ; > _8 = _7 - 1.0e+0; > BIT_FIELD_REF = _8; > v.0_10 = v; > v.1_11 = v.0_10 * { 2.0e+0, 2.0e+0 }; > v = v.1_11; > > so either lowering this differently in the first place or detecting > this kind of pattern would fix it. Do you mean that at the tree level v[0] -= 1.0 could be changed to v -= {1., 0.} ? That's not exactly what Ulrich was suggesting. It could be nice too, but then we would need a different optimization in the back-end that detects the special case of a vector subtraction where the second part of one argument is 0, in order to produce the optimal code. In the x86 md, the sd instruction is represented as: [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 (plusminus:VF_128 (match_operand:VF_128 1 "register_operand" "0,x") (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) (match_dup 1) (const_int 1)))] which is going to be hard to recognize from: (insn 26 53 28 4 (set (reg:DF 81 [ D.2546 ]) (vec_select:DF (reg/v:V2DF 73 [ v ]) (parallel [ (const_int 0 [0]) ]))) d.c:12 1408 {sse2_storelpd} (nil)) (insn 28 26 29 4 (set (reg:DF 82 [ D.2546 ]) (minus:DF (reg:DF 81 [ D.2546 ]) (reg:DF 84))) d.c:12 760 {*fop_df_1_sse} (expr_list:REG_DEAD (reg:DF 81 [ D.2546 ]) (nil))) (insn 29 28 30 4 (set (reg/v:V2DF 73 [ v ]) (vec_concat:V2DF (reg:DF 82 [ D.2546 ]) (vec_select:DF (reg/v:V2DF 73 [ v ]) (parallel [ (const_int 1 [0x1]) ])))) d.c:12 1411 {sse2_loadlpd} (expr_list:REG_DEAD (reg:DF 82 [ D.2546 ]) (nil))) However, since that's only 3 insn, providing an additional define_insn for the same instruction but with a pattern of vec_select and vec_concat might be enough for combine.