From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 107361 invoked by alias); 9 Sep 2015 08:12:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 107346 invoked by uid 89); 9 Sep 2015 08:12:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f177.google.com Received: from mail-yk0-f177.google.com (HELO mail-yk0-f177.google.com) (209.85.160.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 09 Sep 2015 08:12:46 +0000 Received: by ykei199 with SMTP id i199so3550202yke.0 for ; Wed, 09 Sep 2015 01:12:44 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.170.146.66 with SMTP id n63mr35963379ykc.6.1441786364225; Wed, 09 Sep 2015 01:12:44 -0700 (PDT) Received: by 10.103.26.7 with HTTP; Wed, 9 Sep 2015 01:12:44 -0700 (PDT) In-Reply-To: <20150908154904.GA5849@msticlxl57.ims.intel.com> References: <20150619132130.GA15263@msticlxl57.ims.intel.com> <55BFD483.9020107@redhat.com> <20150821134447.GA3232@msticlxl57.ims.intel.com> <55D753F6.6020900@redhat.com> <20150908154904.GA5849@msticlxl57.ims.intel.com> Date: Wed, 09 Sep 2015 08:20:00 -0000 Message-ID: Subject: Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target From: Uros Bizjak To: Ilya Enkovich Cc: Jeff Law , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2015-09/txt/msg00571.txt.bz2 On Tue, Sep 8, 2015 at 5:49 PM, Ilya Enkovich wrot= e: > On 21 Aug 10:38, Jeff Law wrote: >> On 08/21/2015 07:44 AM, Ilya Enkovich wrote: >> >>Our of curiosity, what does LLVM do here in terms of costing >> >>models? >> > >> >Unfortunately I have no idea where and how LLVM does this >> >optimization. Will try to find out. For now I just try to follow a >> >common sense and don't hurt any benchmark performance. >> Sounds wise. No reason we can't look at the overall heuristics they're >> using for when this optimization ought to fire. >> >> >> >> >>From a correctness standpoint, one of the interesting tests would >> >>be to turn off all tuning -- ie, always convert if it's supposed to >> >>be possible. Then throw as much code as possible at it and see if >> >>anything breaks. Also a good time to instrument so that you can >> >>then build testcases from real-world code. >> > >> >I did such testing previously for SPEC. >> Excellent to hear. >> >> > Now I also tried it for >> >bootstrap and found issue with EH edges. Fixed it in a new version. >> >> >> When you track down the bootstrap failure, you might consider adding a >> test for whatever went wrong to the suite if it's feasible. > > I added several tests including EH one. > >> >> >> > >> >Thanks a lot for your review! Here is an updated version. Bootstrap >> >is OK. Regression testing shows a fail in gcc.dg/lower-subreg-1.c. It >> >happens because ior:DI is a subject for a new optimization and is not >> >lowered by subreg pass. I see test had multiple modifications to be >> >disabled on different targets. Will it actually be tested anywhere if >> >I disable it for i386? Probably remove the test? >> >> I'd twiddle the test to turn off your new pass. Which leads to the >> comment that your pass needs to be selectable via a -m argument. > > I added an option to control new pass but it doesn't affect regressed tes= t. Test is compiled using -O and new pass doesn't even work. Regression is= caused by my new patterns which make lowering of 64-bit IOR on subreg pass= unnecessary. Test tries to check we split register on subreg1 for 64-bit = IOR and fails. > >> >> Jeff >> > > Here is an updated version with tests and an option added. > > Thanks, > Ilya > -- > gcc/ > > 2015-09-08 Ilya Enkovich > > * config/i386/i386.c: Include dbgcnt.h. > (has_non_address_hard_reg): New. > (convertible_comparison_p): New. > (scalar_to_vector_candidate_p): New. > (remove_non_convertible_regs): New. > (scalar_chain): New. > (scalar_chain::scalar_chain): New. > (scalar_chain::~scalar_chain): New. > (scalar_chain::add_to_queue): New. > (scalar_chain::mark_dual_mode_def): New. > (scalar_chain::analyze_register_chain): New. > (scalar_chain::add_insn): New. > (scalar_chain::build): New. > (scalar_chain::compute_convert_gain): New. > (scalar_chain::replace_with_subreg): New. > (scalar_chain::replace_with_subreg_in_insn): New. > (scalar_chain::emit_conversion_insns): New. > (scalar_chain::make_vector_copies): New. > (scalar_chain::convert_reg): New. > (scalar_chain::convert_op): New. > (scalar_chain::convert_insn): New. > (scalar_chain::convert): New. > (convert_scalars_to_vector): New. > (pass_data_stv): New. > (pass_stv): New. > (make_pass_stv): New. > (ix86_option_override): Created and register stv pass. > (flag_opts): Add -mstv. > (ix86_option_override_internal): Likewise. > * config/i386/i386.md (SWIM1248x): New. > (*movdi_internal): Remove '*' modifier for xmm to mem alternative. > (and3): Use SWIM1248x iterator instead of SWIM. > (*anddi3_doubleword): New. > (*zext_doubleword): New. > (*zextqi_doubleword): New. > (3): Use SWIM1248x iterator instead of SWIM. > (*di3_doubleword): New. > * config/i386/i386.opt (mstv): New. > * dbgcnt.def (stv_conversion): New. > > gcc/testsuite/ > > 2015-09-08 Ilya Enkovich > > * gcc.target/i386/pr65105-1.c: New. > * gcc.target/i386/pr65105-2.c: New. > * gcc.target/i386/pr65105-3.c: New. > * gcc.target/i386/pr65105-4.C: New. Please depend new changes to insn patterns to TARGET_STV. This way, non-STV compiles will behave exactly as now. +;; Math-dependant integer modes with DImode. +(define_mode_iterator SWIM1248x [(QI "TARGET_QIMODE_MATH") + (HI "TARGET_HIMODE_MATH") + SI DI]) + DI should depend on TARGET_STV && TARGET_SSE2 @@ -2093,9 +2098,9 @@ (define_insn "*movdi_internal" [(set (match_operand:DI 0 "nonimmediate_operand" - "=3Dr ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r ,?*Ym,*v,*v,*v,m ,?r ,?r,?*Yi,?*Ym,?*Yi,*k,*k ,*r ,*m") + "=3Dr ,o ,r,r ,r,m ,*y,*y,?*y,?m,?r ,?*Ym,*v,*v,*v,m,?r ,?r,?*Yi,?*Ym,?*Yi,*k,*k ,*r ,*m") (match_operand:DI 1 "general_operand" - "riFo,riF,Z,rem,i,re,C ,*y,m ,*y,*Yn,r ,C ,*v,m ,*v,*Yj,*v,r ,*Yj ,*Yn ,*r ,*km,*k,*k"))] + "riFo,riF,Z,rem,i,re,C ,*y,m ,*y,*Yn,r ,C ,*v,m ,v,*Yj,*v,r ,*Yj ,*Yn ,*r ,*km,*k,*k"))] "!(MEM_P (operands[0]) && MEM_P (operands[1]))" { Please add new alternative and use enabled attribute to conditionaly select correct alternative. Preferrably, the new alternative should be just after the one it changes, so you will have to change many of the alternative's numbers in attribute calculations. +(define_insn_and_split "*anddi3_doubleword" + [(set (match_operand:DI 0 "nonimmediate_operand" "=3Dr,rm,r") + (and:DI + (match_operand:DI 1 "nonimmediate_operand" "%0,0,0") + (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,rm"))) + (clobber (reg:CC FLAGS_REG))] + "!TARGET_64BIT && ix86_binary_operator_ok (AND, DImode, operands)" + "#" + "!TARGET_64BIT && reload_completed" + [(parallel [(set (match_dup 0) You should add TARGET_STV && TARGET_SSE2 in the above and other added patte= rns.