From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-403506-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 26347 invoked by alias); 12 Oct 2012 13:41:59 -0000
Received: (qmail 26296 invoked by uid 48); 12 Oct 2012 13:41:37 -0000
From: "glisse at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
Date: Fri, 12 Oct 2012 13:41:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: glisse at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-54855-4-KGLhiDfv3r@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-54855-4@http.gcc.gnu.org/bugzilla/>
References: <bug-54855-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2012-10/txt/msg01182.txt.bz2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 13:41:35 UTC ---
(In reply to comment #1)
> Does not work for + though, as -0.0 + 0.0 is 0.0.
[...]
> On the tree level we see in-memory v because of the component modification:
> 
>   _7 = BIT_FIELD_REF <v, 64, 0>;
>   _8 = _7 - 1.0e+0;
>   BIT_FIELD_REF <v, 64, 0> = _8;
>   v.0_10 = v;
>   v.1_11 = v.0_10 * { 2.0e+0, 2.0e+0 };
>   v = v.1_11;
> 
> so either lowering this differently in the first place or detecting
> this kind of pattern would fix it.

Do you mean that at the tree level v[0] -= 1.0 could be changed to v -= {1.,
0.} ? That's not exactly what Ulrich was suggesting. It could be nice too, but
then we would need a different optimization in the back-end that detects the
special case of a vector subtraction where the second part of one argument is
0, in order to produce the optimal code.

In the x86 md, the sd instruction is represented as:
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
        (vec_merge:VF_128
          (plusminus:VF_128
            (match_operand:VF_128 1 "register_operand" "0,x")
            (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
          (match_dup 1)
          (const_int 1)))]

which is going to be hard to recognize from:
(insn 26 53 28 4 (set (reg:DF 81 [ D.2546 ])
        (vec_select:DF (reg/v:V2DF 73 [ v ])
            (parallel [
                    (const_int 0 [0])
                ]))) d.c:12 1408 {sse2_storelpd}
     (nil))
(insn 28 26 29 4 (set (reg:DF 82 [ D.2546 ])
        (minus:DF (reg:DF 81 [ D.2546 ])
            (reg:DF 84))) d.c:12 760 {*fop_df_1_sse}
     (expr_list:REG_DEAD (reg:DF 81 [ D.2546 ])
        (nil)))
(insn 29 28 30 4 (set (reg/v:V2DF 73 [ v ])
        (vec_concat:V2DF (reg:DF 82 [ D.2546 ])
            (vec_select:DF (reg/v:V2DF 73 [ v ])
                (parallel [
                        (const_int 1 [0x1])
                    ])))) d.c:12 1411 {sse2_loadlpd}
     (expr_list:REG_DEAD (reg:DF 82 [ D.2546 ])
        (nil)))

However, since that's only 3 insn, providing an additional define_insn for the
same instruction but with a pattern of vec_select and vec_concat might be
enough for combine.