From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7258 invoked by alias); 2 Jun 2011 08:46:54 -0000 Received: (qmail 7247 invoked by uid 22791); 2 Jun 2011 08:46:53 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,TW_VF X-Spam-Check-By: sourceware.org Received: from mail-pw0-f47.google.com (HELO mail-pw0-f47.google.com) (209.85.160.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 02 Jun 2011 08:46:39 +0000 Received: by pwj9 with SMTP id 9so443741pwj.20 for ; Thu, 02 Jun 2011 01:46:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.143.63.17 with SMTP id q17mr83105wfk.186.1307004398614; Thu, 02 Jun 2011 01:46:38 -0700 (PDT) Received: by 10.143.61.7 with HTTP; Thu, 2 Jun 2011 01:46:38 -0700 (PDT) In-Reply-To: References: Date: Thu, 02 Jun 2011 08:46:00 -0000 Message-ID: Subject: Re: [patch] Improve detection of widening multiplication in the vectorizer From: Ira Rosen To: Richard Guenther Cc: gcc-patches@gcc.gnu.org, Patch Tracking Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg00128.txt.bz2 On 1 June 2011 15:14, Richard Guenther wrote: > On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen wrote: >> On 1 June 2011 12:42, Richard Guenther wrot= e: >> >>> Did you think about moving pass_optimize_widening_mul before >>> loop optimizations? =A0Does that pass catch the cases you are >>> teaching the pattern recognizer? =A0I think we should try to expose >>> these more complicated instructions to loop optimizers. >>> >> >> pass_optimize_widening_mul doesn't catch these cases, but I can try to >> teach it instead of the vectorizer. >> I am now testing >> >> Index: passes.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- passes.c =A0 =A0(revision 174391) >> +++ passes.c =A0 =A0(working copy) >> @@ -870,6 +870,7 @@ >> =A0 =A0 =A0 NEXT_PASS (pass_split_crit_edges); >> =A0 =A0 =A0 NEXT_PASS (pass_pre); >> =A0 =A0 =A0 NEXT_PASS (pass_sink_code); >> + =A0 =A0 =A0NEXT_PASS (pass_optimize_widening_mul); >> =A0 =A0 =A0 NEXT_PASS (pass_tree_loop); >> =A0 =A0 =A0 =A0{ >> =A0 =A0 =A0 =A0 =A0struct opt_pass **p =3D &pass_tree_loop.pass.sub; >> @@ -934,7 +935,6 @@ >> =A0 =A0 =A0 NEXT_PASS (pass_forwprop); >> =A0 =A0 =A0 NEXT_PASS (pass_phiopt); >> =A0 =A0 =A0 NEXT_PASS (pass_fold_builtins); >> - =A0 =A0 =A0NEXT_PASS (pass_optimize_widening_mul); >> =A0 =A0 =A0 NEXT_PASS (pass_tail_calls); >> =A0 =A0 =A0 NEXT_PASS (pass_rename_ssa_copies); >> =A0 =A0 =A0 NEXT_PASS (pass_uncprop); >> >> to see how it affects other loop optimizations (vectorizer pattern >> tests obviously fail). Looks like it needs copy_prop and dce as well: Index: passes.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,9 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_dce); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p =3D &pass_tree_loop.pass.sub; @@ -934,7 +937,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); otherwise I get (on x86_64-suse-linux) FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd Ira > > Thanks. =A0I would hope that we eventually can get rid of the > pattern recognizer ... at least for SSE there is also always > a scalar variant instruction for each vectorized one. > > Richard. >