public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Ajit Agarwal <aagarwa1@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	bergner@linux.ibm.com
Subject: Re: [PATCH] rs6000: suboptimal code for returning bool value on target ppc
Date: Thu, 16 Mar 2023 16:13:39 +0530	[thread overview]
Message-ID: <64d20729-65d5-dcd2-af37-9bbabd4c874b@linux.ibm.com> (raw)
In-Reply-To: <CAFiYyc1Dn=2qf8tQPOd_sRD6qQEvwzzEqECwT18XuPipaLHQfQ@mail.gmail.com>



On 16/03/23 4:00 pm, Richard Biener wrote:
> On Thu, Mar 16, 2023 at 11:12 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>
>>
>> Hello Richard:
>>
>> On 16/03/23 3:22 pm, Richard Biener wrote:
>>> On Thu, Mar 16, 2023 at 9:19 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 16/03/23 1:44 pm, Richard Biener wrote:
>>>>> On Thu, Mar 16, 2023 at 9:11 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>>>>
>>>>>> Hello Richard:
>>>>>>
>>>>>> On 16/03/23 1:10 pm, Richard Biener wrote:
>>>>>>> On Thu, Mar 16, 2023 at 6:21 AM Ajit Agarwal via Gcc-patches
>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>>
>>>>>>>> Hello All:
>>>>>>>>
>>>>>>>>
>>>>>>>> This patch eliminates unnecessary zero extension instruction from power generated assembly.
>>>>>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>>>>>
>>>>>>> What makes this so special that we cannot deal with it from generic code?
>>>>>>> In particular we do have the REE pass, why is target specific
>>>>>>> knowledge neccessary
>>>>>>> to eliminate the extension?
>>>>>>>
>>>>>>
>>>>>> For returning bool values and comparision with integers generates the following by all the rtl passes.
>>>>>>
>>>>>> set compare (subreg)
>>>>>> set if_then_else
>>>>>> Convert SImode -> QImode
>>>>>> set zero_extend to SImode from QImode
>>>>>> set return value 0 in one path of cfg.
>>>>>> set return value 1 in other path of cfg.
>>>>>>
>>>>>> This pass replaces the above zero extension and conversion from QImode to DImode with copy operation to keep QImode in 64 bit registers in powerpc target.
>>>>>
>>>>> Sorry, I can't parse that - as there's no testcase with the patch I
>>>>> cannot even try to see what the actual RTL
>>>>> looks like (without the pass).
>>>>>
>>>>
>>>> Here is the PR with bugzilla.
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103784
>>>>
>>>> I can add the attached testcase with this PR in the patch.
>>>
>>> I don't see any zero-extends there.
>>>
>>
>> Here is the testcase.
>>
>>
>> bool (int a, int b)
>> {
>>           if (a > 2)
>>                       return false;
>>            if (b < 10)
>>                        return true;
>>              return false;
>> }
>>
>> compiled with gcc -O3 -m64 testcase.cc -mcpu=power9 -save-temps.
>>
>> Here is the rtl after cse.
>> (note 12 11 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>> (insn 15 12 16 3 (set (reg:CC 123)
>>         (compare:CC (subreg/s/u:SI (reg/v:DI 120 [ b ]) 0)
>>             (const_int 9 [0x9]))) "ext.cc":5:5 796 {*cmpsi_signed}
>>      (expr_list:REG_DEAD (reg/v:DI 120 [ b ])
>>         (nil)))
>> (insn 16 15 17 3 (set (reg:SI 124)
>>         (const_int 1 [0x1])) "ext.cc":5:5 555 {*movsi_internal1}
>>      (nil))
>> (insn 17 16 18 3 (set (reg:SI 122)
>>         (if_then_else:SI (gt (reg:CC 123)
>>                 (const_int 0 [0]))
>>             (const_int 0 [0])
>>             (reg:SI 124))) "ext.cc":5:5 344 {isel_cc_si}
>>      (expr_list:REG_DEAD (reg:SI 124)
>>         (expr_list:REG_DEAD (reg:CC 123)
>>             (nil))))
>> (insn 18 17 32 3 (set (reg:QI 117 [ _1 ])
>>         (subreg:QI (reg:SI 122) 0)) "ext.cc":5:5 562 {*movqi_internal}
>>      (expr_list:REG_DEAD (reg:SI 122)
>>         (nil)))
>>       ; pc falls through to BB 5
>> (code_label 32 18 31 4 3 (nil) [1 uses])
>> (note 31 32 5 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>> (insn 5 31 19 4 (set (reg:QI 117 [ _1 ])
>>         (const_int 0 [0])) "ext.cc":4:16 562 {*movqi_internal}
>>      (nil))
>> (code_label 19 5 20 5 2 (nil) [0 uses])
>> (note 20 19 21 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
>> (insn 21 20 22 5 (set (reg:DI 126 [ _1 ])
>>         (zero_extend:DI (reg:QI 117 [ _1 ]))) "ext.cc":8:1 5 {zero_extendqidi2}
>>      (expr_list:REG_DEAD (reg:QI 117 [ _1 ])
>>         (nil)))
>> (insn 22 21 26 5 (set (reg:DI 118 [ <retval> ])
>>         (reg:DI 126 [ _1 ])) "ext.cc":8:1 681 {*movdi_internal64}
>>      (expr_list:REG_DEAD (reg:DI 126 [ _1 ])
>>         (nil)))
>> (insn 26 22 27 5 (set (reg/i:DI 3 3)
>>         (reg:DI 126 [ _1 ])) "ext.cc":8:1 681 {*movdi_internal64}
>>      (expr_list:REG_DEAD (reg:DI 118 [ <retval> ])
>>         (nil)))
>> (insn 27 26 0 5 (use (reg/i:DI 3 3)) "ext.cc":8:1 -1
>>      (nil))
> 
> But after combine there's just
> 
> (note 6 0 38 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 38 6 2 2 (set (reg:DI 126)
>         (reg:DI 3 3 [ a ])) "t.c":3:1 634 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 3 3 [ a ])
>         (nil)))
> (note 2 38 39 2 NOTE_INSN_DELETED)
> (insn 39 2 3 2 (set (reg:DI 127)
>         (reg:DI 4 4 [ b ])) "t.c":3:1 634 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 4 4 [ b ])
>         (nil)))
> (insn 3 39 4 2 (set (reg/v:DI 119 [ b ])
>         (reg:DI 127)) "t.c":3:1 634 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 127)
>         (nil)))
> (note 4 3 10 2 NOTE_INSN_FUNCTION_BEG)
> (insn 10 4 11 2 (set (reg:CC 120)
>         (compare:CC (subreg/s/u:SI (reg:DI 126) 0)
>             (const_int 2 [0x2]))) "t.c":4:6 755 {*cmpsi_signed}
>      (expr_list:REG_DEAD (reg:DI 126)
>         (nil)))
> (jump_insn 11 10 12 2 (set (pc)
>         (if_then_else (gt (reg:CC 120)
>                 (const_int 0 [0]))
>             (label_ref:DI 32)
>             (pc))) "t.c":4:6 838 {*cbranch}
>      (expr_list:REG_DEAD (reg:CC 120)
>         (int_list:REG_BR_PROB 365072228 (nil)))
>  -> 32)
> (note 12 11 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
> (note 15 12 16 3 NOTE_INSN_DELETED)
> (note 16 15 17 3 NOTE_INSN_DELETED)
> (note 17 16 19 3 NOTE_INSN_DELETED)
> (insn 19 17 32 3 (parallel [
>             (set (reg:DI 117 [ <retval> ])
>                 (le:DI (subreg/s/u:SI (reg/v:DI 119 [ b ]) 0)
>                     (const_int 9 [0x9])))
>             (clobber (scratch:DI))
>             (clobber (scratch:DI))
>             (clobber (scratch:CC))
>         ]) "t.c":6:6 783 {ledisi2_isel}
>      (expr_list:REG_DEAD (reg/v:DI 119 [ b ])
>         (nil)))
>       ; pc falls through to BB 5
> (code_label 32 19 31 4 3 (nil) [1 uses])
> (note 31 32 5 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 5 31 20 4 (set (reg:DI 117 [ <retval> ])
>         (const_int 0 [0])) "t.c":5:12 634 {*movdi_internal64}
>      (nil))
> (code_label 20 5 21 5 2 (nil) [0 uses])
> (note 21 20 26 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
> (insn 26 21 27 5 (set (reg/i:DI 3 3)
>         (reg:DI 117 [ <retval> ])) "t.c":9:1 634 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 117 [ <retval> ])
>         (nil)))
> (insn 27 26 0 5 (use (reg/i:DI 3 3)) "t.c":9:1 -1
>      (nil))
> 
> and we get
> 
> foo:
> .LFB0:
>         .cfi_startproc
>         cmpwi 0,3,2
>         bgt 0,.L3
>         cmpwi 0,4,9
>         li 3,1
>         isel 3,0,3,1
>         blr
>         .p2align 4,,15
> .L3:
>         li 3,0
>         blr
> 
> where I don't see what we can do better (ok, not knowing ppc very much)
> 

After combine I get the following:

(insn 10 4 11 2 (set (reg:CC 121)
        (compare:CC (subreg/s/u:SI (reg:DI 127) 0)
            (const_int 2 [0x2]))) "ext.cc":3:4 796 {*cmpsi_signed}
     (expr_list:REG_DEAD (reg:DI 127)
        (nil)))
(jump_insn 11 10 12 2 (set (pc)
        (if_then_else (gt (reg:CC 121)
                (const_int 0 [0]))
            (label_ref:DI 32)
            (pc))) "ext.cc":3:4 879 {*cbranch}
     (expr_list:REG_DEAD (reg:CC 121)
        (int_list:REG_BR_PROB 365072228 (nil)))
 -> 32)
(note 12 11 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(note 15 12 16 3 NOTE_INSN_DELETED)
(note 16 15 17 3 NOTE_INSN_DELETED)
(insn 17 16 18 3 (parallel [
            (set (reg:SI 122)
                (le:SI (subreg/s/u:SI (reg/v:DI 120 [ b ]) 0)
                    (const_int 9 [0x9])))
            (clobber (scratch:SI))
            (clobber (scratch:SI))
            (clobber (scratch:CC))
        ]) "ext.cc":5:5 814 {lesisi2_isel}
     (expr_list:REG_DEAD (reg/v:DI 120 [ b ])
        (nil)))
(insn 18 17 32 3 (set (reg:QI 117 [ _1 ])
        (subreg:QI (reg:SI 122) 0)) "ext.cc":5:5 562 {*movqi_internal}
     (expr_list:REG_DEAD (reg:SI 122)
        (nil)))
      ; pc falls through to BB 5
(code_label 32 18 31 4 3 (nil) [1 uses])
(note 31 32 5 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 5 31 19 4 (set (reg:QI 117 [ _1 ])
        (const_int 0 [0])) "ext.cc":4:16 562 {*movqi_internal}
     (nil))
(code_label 19 5 20 5 2 (nil) [0 uses])
(note 20 19 21 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 21 20 26 5 NOTE_INSN_DELETED)
(insn 26 21 27 5 (set (reg/i:DI 3 %r3)
        (and:DI (subreg:DI (reg:QI 117 [ _1 ]) 0)
            (const_int 1 [0x1]))) "ext.cc":8:1 207 {anddi3_mask}
     (expr_list:REG_DEAD (reg:QI 117 [ _1 ])
        (nil)))
(insn 27 26 0 5 (use (reg/i:DI 3 %r3)) "ext.cc":8:1 -1
     (nil))

and here is the assembly:


        .file   "ext.cc"
        .machine power9
        .abiversion 2
        .section        ".text"
        .align 2
        .p2align 4,,15
        .globl _Z3fooii
        .type   _Z3fooii, @function
_Z3fooii:
.LFB0:
        .cfi_startproc
        cmpwi %cr0,%r3,2
        bgt %cr0,.L3
        cmpwi %cr0,%r4,9
        li %r3,1
        isel %r3,0,%r3,1
        rldicl %r3,%r3,0,63
        blr
        .p2align 4,,15
.L3:
        li %r3,0
        rldicl %r3,%r3,0,63
        blr
        .long 0
        .byte 0,9,0,0,0,0,0,0
        .cfi_endproc
.LFE0:
        .size   _Z3fooii,.-_Z3fooii
        .ident  "GCC: (GNU) 13.0.1 20230310 (experimental)"
        .section        .note.GNU-stack,"",@progbits

Did you try with -O3.

Thanks & Regards
Ajit
>>
>> Thanks & Regards
>> Ajit
>>
>>>> Thanks & Regards
>>>> Ajit
>>>>> Richard.
>>>>>
>>>>>> Thanks & Regards
>>>>>> Ajit
>>>>>>>> +  In cfgexpand pass QImode is generated with
>>>>>>>> +  bool register value and this pass uses QI
>>>>>>>> +  as 64 bit registers.
>>>>>>>> +
>>>>>>
>>>>>>>>         rs6000: suboptimal code for returning bool value on target ppc.
>>>>>>>>
>>>>>>>>         New pass to eliminate unnecessary zero extension. This pass
>>>>>>>>         is registered after cse rtl pass.
>>>>>>>>
>>>>>>>>         2023-03-16  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>         * config/rs6000/rs6000-passes.def: Registered zero elimination
>>>>>>>>         pass.
>>>>>>>>         * config/rs6000/rs6000-zext-elim.cc: Add new pass.
>>>>>>>>         * config.gcc: Add new executable.
>>>>>>>>         * config/rs6000/rs6000-protos.h: Add new prototype for zero
>>>>>>>>         elimination pass.
>>>>>>>>         * config/rs6000/rs6000.cc: Add new prototype for zero
>>>>>>>>         elimination pass.
>>>>>>>>         * config/rs6000/t-rs6000: Add new rule.
>>>>>>>>         * expr.cc: Modified gcc assert.
>>>>>>>>         * explow.cc: Modified gcc assert.
>>>>>>>>         * optabs.cc: Modified gcc assert.
>>>>>>>> ---
>>>>>>>>  gcc/config.gcc                        |   4 +-
>>>>>>>>  gcc/config/rs6000/rs6000-passes.def   |   2 +
>>>>>>>>  gcc/config/rs6000/rs6000-protos.h     |   1 +
>>>>>>>>  gcc/config/rs6000/rs6000-zext-elim.cc | 361 ++++++++++++++++++++++++++
>>>>>>>>  gcc/config/rs6000/rs6000.cc           |   2 +
>>>>>>>>  gcc/config/rs6000/t-rs6000            |   5 +
>>>>>>>>  gcc/explow.cc                         |   3 +-
>>>>>>>>  gcc/expr.cc                           |   4 +-
>>>>>>>>  gcc/optabs.cc                         |   3 +-
>>>>>>>>  9 files changed, 379 insertions(+), 6 deletions(-)
>>>>>>>>  create mode 100644 gcc/config/rs6000/rs6000-zext-elim.cc
>>>>>>>>
>>>>>>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>>>>>>> index da3a6d3ba1f..e8ac9d882f0 100644
>>>>>>>> --- a/gcc/config.gcc
>>>>>>>> +++ b/gcc/config.gcc
>>>>>>>> @@ -503,7 +503,7 @@ or1k*-*-*)
>>>>>>>>         ;;
>>>>>>>>  powerpc*-*-*)
>>>>>>>>         cpu_type=rs6000
>>>>>>>> -       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
>>>>>>>> +       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-zext-elim.o rs6000-logue.o"
>>>>>>>>         extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>>>>>>>>         extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>>>>>>>>         extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
>>>>>>>> @@ -538,7 +538,7 @@ riscv*)
>>>>>>>>         ;;
>>>>>>>>  rs6000*-*-*)
>>>>>>>>         extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
>>>>>>>> -       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
>>>>>>>> +       extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-zext-elim.o rs6000-logue.o"
>>>>>>>>         extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>>>>>>>>         target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.cc \$(srcdir)/config/rs6000/rs6000-call.cc"
>>>>>>>>         target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
>>>>>>>> diff --git a/gcc/config/rs6000/rs6000-passes.def b/gcc/config/rs6000/rs6000-passes.def
>>>>>>>> index ca899d5f7af..d7500feddf1 100644
>>>>>>>> --- a/gcc/config/rs6000/rs6000-passes.def
>>>>>>>> +++ b/gcc/config/rs6000/rs6000-passes.def
>>>>>>>> @@ -28,6 +28,8 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>>>       The power8 does not have instructions that automaticaly do the byte swaps
>>>>>>>>       for loads and stores.  */
>>>>>>>>    INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
>>>>>>>> +  INSERT_PASS_AFTER (pass_cse, 1, pass_analyze_zext);
>>>>>>>> +
>>>>>>>>
>>>>>>>>    /* Pass to do the PCREL_OPT optimization that combines the load of an
>>>>>>>>       external symbol's address along with a single load or store using that
>>>>>>>> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
>>>>>>>> index 1a4fc1df668..f6cf2d673d4 100644
>>>>>>>> --- a/gcc/config/rs6000/rs6000-protos.h
>>>>>>>> +++ b/gcc/config/rs6000/rs6000-protos.h
>>>>>>>> @@ -340,6 +340,7 @@ namespace gcc { class context; }
>>>>>>>>  class rtl_opt_pass;
>>>>>>>>
>>>>>>>>  extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
>>>>>>>> +extern rtl_opt_pass *make_pass_analyze_zext (gcc::context *);
>>>>>>>>  extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
>>>>>>>>  extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
>>>>>>>>  extern bool rs6000_quadword_masked_address_p (const_rtx exp);
>>>>>>>> diff --git a/gcc/config/rs6000/rs6000-zext-elim.cc b/gcc/config/rs6000/rs6000-zext-elim.cc
>>>>>>>> new file mode 100644
>>>>>>>> index 00000000000..777c7a5a387
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/gcc/config/rs6000/rs6000-zext-elim.cc
>>>>>>>> @@ -0,0 +1,361 @@
>>>>>>>> +/* Subroutine to eliminate redundant zero extend for power architecture.
>>>>>>>> +   Copyright (C) 1991-2023 Free Software Foundation, Inc.
>>>>>>>> +
>>>>>>>> +   This file is part of GCC.
>>>>>>>> +
>>>>>>>> +   GCC is free software; you can redistribute it and/or modify it
>>>>>>>> +   under the terms of the GNU General Public License as published
>>>>>>>> +   by the Free Software Foundation; either version 3, or (at your
>>>>>>>> +   option) any later version.
>>>>>>>> +
>>>>>>>> +   GCC is distributed in the hope that it will be useful, but WITHOUT
>>>>>>>> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
>>>>>>>> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
>>>>>>>> +   License for more details.
>>>>>>>> +
>>>>>>>> +   You should have received a copy of the GNU General Public License
>>>>>>>> +   along with GCC; see the file COPYING3.  If not see
>>>>>>>> +   <http://www.gnu.org/licenses/>.  */
>>>>>>>> +
>>>>>>>> +/* This pass remove unnecessary zero extension instruction from
>>>>>>>> +  power generated assembly. This pass is register after cse
>>>>>>>> +  pass.
>>>>>>>> +  Identifies the following sequence of instruction after cse
>>>>>>>> +  rtl pass.
>>>>>>>> +
>>>>>>>> +  set compare (subreg)
>>>>>>>> +  set if_then_else
>>>>>>>> +  set SImode -> QImode
>>>>>>>> +  set zero_extend to DImode from QImode
>>>>>>>> +  set return value 0 in one path of cfg.
>>>>>>>> +  set return value 1 in other path of cfg.
>>>>>>>> +
>>>>>>>> +  In cfgexpand pass QImode is generated with
>>>>>>>> +  bool register value and this pass uses QI
>>>>>>>> +  as 64 bit registers.
>>>>>>>> +
>>>>>>>> +  This pass replace copy operation from QImode to DImode
>>>>>>>> +  and return appropriate return values.*/
>>>>>>>> +
>>>>>>>> +#define IN_TARGET_CODE 1
>>>>>>>> +
>>>>>>>> +#include "config.h"
>>>>>>>> +#include "system.h"
>>>>>>>> +#include "coretypes.h"
>>>>>>>> +#include "backend.h"
>>>>>>>> +#include "rtl.h"
>>>>>>>> +#include "tree.h"
>>>>>>>> +#include "memmodel.h"
>>>>>>>> +#include "df.h"
>>>>>>>> +#include "tm_p.h"
>>>>>>>> +#include "ira.h"
>>>>>>>> +#include "print-tree.h"
>>>>>>>> +#include "varasm.h"
>>>>>>>> +#include "explow.h"
>>>>>>>> +#include "expr.h"
>>>>>>>> +#include "output.h"
>>>>>>>> +#include "tree-pass.h"
>>>>>>>> +
>>>>>>>> +/* This is based on the union-find logic in web.cc.  web_entry_base is
>>>>>>>> +   defined in df.h.  */
>>>>>>>> +class zext_web_entry : public web_entry_base
>>>>>>>> +{
>>>>>>>> + public:
>>>>>>>> +  /* Pointer to the insn.  */
>>>>>>>> +  rtx_insn *insn;
>>>>>>>> +  unsigned int is_relevant : 1;
>>>>>>>> +  /* Set if insn is a load.  */
>>>>>>>> +  unsigned int is_load : 1;
>>>>>>>> +  /* Set if insn is a store.  */
>>>>>>>> +  unsigned int is_store : 1;
>>>>>>>> +  unsigned int is_zext :1 ;
>>>>>>>> +  unsigned int is_move :1;
>>>>>>>> +  unsigned int is_delete_move :1;
>>>>>>>> +  /* Set if this insn should be deleted.  */
>>>>>>>> +  unsigned int will_delete : 1;
>>>>>>>> +  unsigned int will_delete_chances : 1;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/* Checks if instruction is zero extension
>>>>>>>> + * with QIMode to DImode.*/
>>>>>>>> +static unsigned int
>>>>>>>> +insn_is_zext_p(rtx insn)
>>>>>>>> +{
>>>>>>>> +  rtx body = PATTERN (insn);
>>>>>>>> +
>>>>>>>> +  if (GET_CODE (body) == SET
>>>>>>>> +      && GET_MODE(SET_DEST (body)) == DImode
>>>>>>>> +      && GET_CODE(SET_SRC (body)) == ZERO_EXTEND)
>>>>>>>> +  {
>>>>>>>> +    rtx set = XEXP (SET_SRC (body), 0);
>>>>>>>> +
>>>>>>>> +    if (REG_P (set))
>>>>>>>> +    {
>>>>>>>> +      if (GET_MODE (set) == QImode) return 1;
>>>>>>>> +    }
>>>>>>>> +    else
>>>>>>>> +      return 0;
>>>>>>>> +  }
>>>>>>>> +  return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Checks if instruction is SET operation with QImode.*/
>>>>>>>> +static unsigned int
>>>>>>>> +insn_is_store_p (rtx insn)
>>>>>>>> +{
>>>>>>>> +  rtx body = PATTERN (insn);
>>>>>>>> +  if (GET_CODE (body) == SET
>>>>>>>> +      && SUBREG_P(SET_SRC (body))
>>>>>>>> +      && !CONST_INT_P(SET_SRC (body))
>>>>>>>> +      && GET_MODE(XEXP (SET_SRC (body), 0)) == SImode
>>>>>>>> +      && GET_MODE(SET_SRC (body)) == QImode)
>>>>>>>> +    return 1;
>>>>>>>> +
>>>>>>>> +  return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Find out zero extension removal candidate with use-def web.*/
>>>>>>>> +static void
>>>>>>>> +find_zero_ext_elimination_candidate (zext_web_entry *insn_entry,
>>>>>>>> +                                    rtx insn, df_ref def)
>>>>>>>> +{
>>>>>>>> +  struct df_link *link = DF_REF_CHAIN (def);
>>>>>>>> +
>>>>>>>> +  rtx move_insn = NULL_RTX;
>>>>>>>> +  rtx compare_insn = NULL_RTX;
>>>>>>>> +
>>>>>>>> +  while (link)
>>>>>>>> +  {
>>>>>>>> +    if (!DF_REF_INSN_INFO (link->ref))
>>>>>>>> +      insn_entry[INSN_UID(insn)].will_delete_chances = 0;
>>>>>>>> +
>>>>>>>> +    if (DF_REF_INSN_INFO (link->ref))
>>>>>>>> +      {
>>>>>>>> +       rtx use_insn = DF_REF_INSN (link->ref);
>>>>>>>> +
>>>>>>>> +       if (GET_CODE (PATTERN (use_insn)) == SET
>>>>>>>> +           && (GET_CODE (SET_SRC (PATTERN (use_insn))) == IF_THEN_ELSE))
>>>>>>>> +         {
>>>>>>>> +           if (GET_CODE (PATTERN (insn)) == SET
>>>>>>>> +               && GET_CODE (SET_SRC (PATTERN (insn))) == COMPARE)
>>>>>>>> +             {
>>>>>>>> +               rtx body = XEXP (SET_SRC (PATTERN (insn)), 0);
>>>>>>>> +
>>>>>>>> +               if (SUBREG_P (body))
>>>>>>>> +                 {
>>>>>>>> +                   compare_insn = use_insn;
>>>>>>>> +                   rtx compare_body = XEXP (SET_SRC (PATTERN (compare_insn)), 0);
>>>>>>>> +
>>>>>>>> +                   if (compare_insn
>>>>>>>> +                       && ((REGNO (XEXP (compare_body, 0)))
>>>>>>>> +                               == REGNO (SET_DEST (PATTERN (insn)))))
>>>>>>>> +                     insn_entry[INSN_UID(use_insn)].will_delete_chances = 1;
>>>>>>>> +                 }
>>>>>>>> +              }
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +       if (insn_is_store_p(use_insn)
>>>>>>>> +           && GET_CODE (PATTERN (insn)) == SET
>>>>>>>> +           && (GET_CODE (SET_SRC (PATTERN(insn))) == IF_THEN_ELSE))
>>>>>>>> +         {
>>>>>>>> +           if (GET_MODE (SET_DEST (PATTERN (insn))) == SImode)
>>>>>>>> +             {
>>>>>>>> +               if (insn_entry[INSN_UID(insn)].will_delete_chances)
>>>>>>>> +                 insn_entry[INSN_UID(use_insn)].will_delete_chances = 1;
>>>>>>>> +             }
>>>>>>>> +         }
>>>>>>>> +
>>>>>>>> +       if (insn_is_zext_p (insn))
>>>>>>>> +         {
>>>>>>>> +           if (GET_CODE (PATTERN (use_insn)) == SET
>>>>>>>> +               && REG_P (SET_SRC (PATTERN (use_insn))))
>>>>>>>> +             {
>>>>>>>> +               if (move_insn
>>>>>>>> +                   && REGNO (SET_SRC (PATTERN (use_insn)))
>>>>>>>> +                      == REGNO (SET_SRC (PATTERN (move_insn)))
>>>>>>>> +                   && insn_entry[INSN_UID(insn)].is_delete_move)
>>>>>>>> +                 {
>>>>>>>> +                   insn_entry[INSN_UID (insn)].is_move = 1;
>>>>>>>> +                   break;
>>>>>>>> +                 }
>>>>>>>> +                 else if (insn_entry[INSN_UID (insn)].will_delete)
>>>>>>>> +                   {
>>>>>>>> +                     move_insn = use_insn;
>>>>>>>> +                     insn_entry[INSN_UID(insn)].is_delete_move= 1;
>>>>>>>> +                   }
>>>>>>>> +             }
>>>>>>>> +         }
>>>>>>>> +
>>>>>>>> +       if (insn_is_zext_p (use_insn))
>>>>>>>> +         {
>>>>>>>> +           insn_entry[INSN_UID (use_insn)].is_zext = 1;
>>>>>>>> +           insn_entry[INSN_UID(use_insn)].is_relevant = 1;
>>>>>>>> +
>>>>>>>> +           if (insn_is_store_p (insn)
>>>>>>>> +               && insn_entry[INSN_UID (insn)].will_delete_chances)
>>>>>>>> +           {
>>>>>>>> +             insn_entry[INSN_UID (use_insn)].will_delete = 1;
>>>>>>>> +             insn_entry[INSN_UID (insn)].will_delete = 1;
>>>>>>>> +             insn_entry[INSN_UID( insn)].is_store = 1;
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +          if (NONDEBUG_INSN_P (use_insn))
>>>>>>>> +            unionfind_union (insn_entry + INSN_UID (insn),
>>>>>>>> +                             insn_entry + INSN_UID (use_insn));
>>>>>>>> +       }
>>>>>>>> +      }
>>>>>>>> +
>>>>>>>> +    link = link->next;
>>>>>>>> +  }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Replace QImode extensions with copy operations.*/
>>>>>>>> +static void
>>>>>>>> +replace_marked_insns (zext_web_entry *insn_entry, unsigned i)
>>>>>>>> +{
>>>>>>>> +  rtx_insn *insn = insn_entry[i].insn;
>>>>>>>> +  rtx body = PATTERN (insn);
>>>>>>>> +  rtx src_reg;
>>>>>>>> +  src_reg = XEXP (SET_SRC (body), 0);
>>>>>>>> +  set_mode_and_regno (src_reg, DImode, REGNO(src_reg));
>>>>>>>> +
>>>>>>>> +  if (GET_MODE(SET_DEST(body)) != DImode)
>>>>>>>> +    set_mode_and_regno (SET_DEST(body), DImode, REGNO (SET_DEST (body)));
>>>>>>>> +
>>>>>>>> +  rtx copy = gen_rtx_SET (SET_DEST (body), src_reg);
>>>>>>>> +  rtx_insn *new_insn = emit_insn_before (copy, insn);
>>>>>>>> +  set_block_for_insn (new_insn, BLOCK_FOR_INSN (insn));
>>>>>>>> +  df_insn_rescan (new_insn);
>>>>>>>> +
>>>>>>>> +  df_insn_delete (insn);
>>>>>>>> +  remove_insn (insn);
>>>>>>>> +  insn->set_deleted ();
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Main entry point for this pass.  */
>>>>>>>> +unsigned int
>>>>>>>> +rs6000_analyze_zext (function *fun)
>>>>>>>> +{
>>>>>>>> +  zext_web_entry *insn_entry;
>>>>>>>> +  basic_block bb;
>>>>>>>> +  rtx_insn *insn, *curr_insn = 0;
>>>>>>>> +
>>>>>>>> +  /* Dataflow analysis for use-def chains.  */
>>>>>>>> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
>>>>>>>> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>>>>>>>> +  df_analyze ();
>>>>>>>> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>>>>>>>> +
>>>>>>>> +  /* Rebuild ud- and du-chains.  */
>>>>>>>> +  df_remove_problem (df_chain);
>>>>>>>> +  df_process_deferred_rescans ();
>>>>>>>> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
>>>>>>>> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>>>>>>>> +  df_analyze ();
>>>>>>>> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>>>>>>>> +
>>>>>>>> +  /* Allocate structure to represent webs of insns.  */
>>>>>>>> +  insn_entry = XCNEWVEC (zext_web_entry, get_max_uid ());
>>>>>>>> +
>>>>>>>> +  /* Walk the insns to gather basic data.  */
>>>>>>>> +  FOR_ALL_BB_FN (bb, fun)
>>>>>>>> +    FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
>>>>>>>> +    {
>>>>>>>> +      unsigned int uid = INSN_UID (insn);
>>>>>>>> +      if (NONDEBUG_INSN_P (insn))
>>>>>>>> +       {
>>>>>>>> +         insn_entry[uid].insn = insn;
>>>>>>>> +
>>>>>>>> +         if (GET_CODE (insn) == insn_is_store_p (insn))
>>>>>>>> +           {
>>>>>>>> +             insn_entry[uid].is_store = 1;
>>>>>>>> +             insn_entry[uid].is_relevant = 1;
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +         /* Walk the uses and defs to identify the optimization
>>>>>>>> +            candidates.*/
>>>>>>>> +         struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
>>>>>>>> +         df_ref mention;
>>>>>>>> +
>>>>>>>> +         FOR_EACH_INSN_INFO_DEF (mention, insn_info)
>>>>>>>> +           {
>>>>>>>> +             insn_entry[uid].is_relevant = 1;
>>>>>>>> +             insn_entry[uid].is_store = insn_is_store_p (insn);
>>>>>>>> +             find_zero_ext_elimination_candidate (insn_entry, insn, mention);
>>>>>>>> +           }
>>>>>>>> +
>>>>>>>> +         if (insn_entry[uid].is_relevant)
>>>>>>>> +           {
>>>>>>>> +             /* Determine if this is a store.  */
>>>>>>>> +             insn_entry[uid].is_store = insn_is_store_p (insn);
>>>>>>>> +           }
>>>>>>>> +       }
>>>>>>>> +     }
>>>>>>>> +
>>>>>>>> +   unsigned e = get_max_uid (), i;
>>>>>>>> +
>>>>>>>> +   int store_index = -1;
>>>>>>>> +
>>>>>>>> +   /* Replace with copy operation.*/
>>>>>>>> +   for (i = 0; i < e; ++i)
>>>>>>>> +     {
>>>>>>>> +       if (insn_entry[i].is_store && insn_entry[i].will_delete)
>>>>>>>> +        store_index  = i;
>>>>>>>> +
>>>>>>>> +       if ((store_index != -1)
>>>>>>>> +            && insn_entry[i].is_move && insn_entry[i].will_delete)
>>>>>>>> +         {
>>>>>>>> +           replace_marked_insns (insn_entry, store_index);
>>>>>>>> +           replace_marked_insns (insn_entry, i);
>>>>>>>> +         }
>>>>>>>> +     }
>>>>>>>> +    /* Clean up.  */
>>>>>>>> +    free (insn_entry);
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +const pass_data pass_data_analyze_zext =
>>>>>>>> +{
>>>>>>>> +  RTL_PASS, /* type */
>>>>>>>> +  "zext", /* name */
>>>>>>>> +  OPTGROUP_NONE, /* optinfo_flags */
>>>>>>>> +  TV_NONE, /* tv_id */
>>>>>>>> +  0, /* properties_required */
>>>>>>>> +  0, /* properties_provided */
>>>>>>>> +  0, /* properties_destroyed */
>>>>>>>> +  0, /* todo_flags_start */
>>>>>>>> +  TODO_df_finish, /* todo_flags_finish */
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +class pass_analyze_zext : public rtl_opt_pass
>>>>>>>> +{
>>>>>>>> +public:
>>>>>>>> +  pass_analyze_zext(gcc::context *ctxt)
>>>>>>>> +    : rtl_opt_pass(pass_data_analyze_zext, ctxt)
>>>>>>>> +  {}
>>>>>>>> +
>>>>>>>> +  /* opt_pass methods: */
>>>>>>>> +  virtual bool gate (function *)
>>>>>>>> +    {
>>>>>>>> +      return (optimize > 0 );
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +  virtual unsigned int execute (function *fun)
>>>>>>>> +    {
>>>>>>>> +      return rs6000_analyze_zext (fun);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +  opt_pass *clone ()
>>>>>>>> +    {
>>>>>>>> +      return new pass_analyze_zext (m_ctxt);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +}; // class pass_analyze_zext
>>>>>>>> +
>>>>>>>> +rtl_opt_pass *
>>>>>>>> +make_pass_analyze_zext (gcc::context *ctxt)
>>>>>>>> +{
>>>>>>>> +  return new pass_analyze_zext (ctxt);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>>>>>>> index 8e0b0d022db..6541334bf2d 100644
>>>>>>>> --- a/gcc/config/rs6000/rs6000.cc
>>>>>>>> +++ b/gcc/config/rs6000/rs6000.cc
>>>>>>>> @@ -1178,6 +1178,8 @@ static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
>>>>>>>>                                           bool);
>>>>>>>>  rtl_opt_pass *make_pass_analyze_swaps (gcc::context*);
>>>>>>>>
>>>>>>>> +rtl_opt_pass *make_pass_analyze_zext (gcc::context*);
>>>>>>>> +
>>>>>>>>  /* Hash table stuff for keeping track of TOC entries.  */
>>>>>>>>
>>>>>>>>  struct GTY((for_user)) toc_hash_struct
>>>>>>>> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
>>>>>>>> index f183b42ce1d..c1f61591d2f 100644
>>>>>>>> --- a/gcc/config/rs6000/t-rs6000
>>>>>>>> +++ b/gcc/config/rs6000/t-rs6000
>>>>>>>> @@ -35,6 +35,11 @@ rs6000-p8swap.o: $(srcdir)/config/rs6000/rs6000-p8swap.cc
>>>>>>>>         $(COMPILE) $<
>>>>>>>>         $(POSTCOMPILE)
>>>>>>>>
>>>>>>>> +rs6000-zext-elim.o: $(srcdir)/config/rs6000/rs6000-zext-elim.cc
>>>>>>>> +       $(COMPILE) $<
>>>>>>>> +       $(POSTCOMPILE)
>>>>>>>> +
>>>>>>>> +
>>>>>>>>  rs6000-d.o: $(srcdir)/config/rs6000/rs6000-d.cc
>>>>>>>>         $(COMPILE) $<
>>>>>>>>         $(POSTCOMPILE)
>>>>>>>> diff --git a/gcc/explow.cc b/gcc/explow.cc
>>>>>>>> index 32e9498ee07..316aa975e40 100644
>>>>>>>> --- a/gcc/explow.cc
>>>>>>>> +++ b/gcc/explow.cc
>>>>>>>> @@ -654,7 +654,8 @@ copy_to_mode_reg (machine_mode mode, rtx x)
>>>>>>>>    if (! general_operand (x, VOIDmode))
>>>>>>>>      x = force_operand (x, temp);
>>>>>>>>
>>>>>>>> -  gcc_assert (GET_MODE (x) == mode || GET_MODE (x) == VOIDmode);
>>>>>>>> +  gcc_assert (mode == DImode || GET_MODE (x) == mode
>>>>>>>> +              || GET_MODE (x) == VOIDmode);
>>>>>>>>    if (x != temp)
>>>>>>>>      emit_move_insn (temp, x);
>>>>>>>>    return temp;
>>>>>>>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>>>>>>>> index 15be1c8db99..6162ef92b88 100644
>>>>>>>> --- a/gcc/expr.cc
>>>>>>>> +++ b/gcc/expr.cc
>>>>>>>> @@ -4223,9 +4223,9 @@ emit_move_insn (rtx x, rtx y)
>>>>>>>>    rtx y_cst = NULL_RTX;
>>>>>>>>    rtx_insn *last_insn;
>>>>>>>>    rtx set;
>>>>>>>> -
>>>>>>>>    gcc_assert (mode != BLKmode
>>>>>>>> -             && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));
>>>>>>>> +             && (mode == DImode || GET_MODE (y) == mode
>>>>>>>> +             || GET_MODE (y) == VOIDmode));
>>>>>>>>
>>>>>>>>    /* If we have a copy that looks like one of the following patterns:
>>>>>>>>         (set (subreg:M1 (reg:M2 ...)) (subreg:M1 (reg:M2 ...)))
>>>>>>>> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
>>>>>>>> index 4c641cab192..9d22fadc7ef 100644
>>>>>>>> --- a/gcc/optabs.cc
>>>>>>>> +++ b/gcc/optabs.cc
>>>>>>>> @@ -7902,7 +7902,8 @@ maybe_legitimize_operand (enum insn_code icode, unsigned int opno,
>>>>>>>>      input:
>>>>>>>>        gcc_assert (mode != VOIDmode);
>>>>>>>>        gcc_assert (GET_MODE (op->value) == VOIDmode
>>>>>>>> -                 || GET_MODE (op->value) == mode);
>>>>>>>> +                 || GET_MODE (op->value) == mode
>>>>>>>> +                 || mode == DImode);
>>>>>>>>        if (maybe_legitimize_operand_same_code (icode, opno, op))
>>>>>>>>         return true;
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.31.1
>>>>>>>>

  reply	other threads:[~2023-03-16 10:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16  5:20 Ajit Agarwal
2023-03-16  7:40 ` Richard Biener
2023-03-16  8:11   ` Ajit Agarwal
2023-03-16  8:14     ` Richard Biener
2023-03-16  8:19       ` Ajit Agarwal
2023-03-16  9:52         ` Richard Biener
2023-03-16 10:11           ` Ajit Agarwal
2023-03-16 10:30             ` Richard Biener
2023-03-16 10:43               ` Ajit Agarwal [this message]
2023-03-16 10:56                 ` Richard Biener
2023-03-16 11:43                   ` Ajit Agarwal
2023-03-16 14:48             ` Jeff Law
2023-03-17 11:49               ` Ajit Agarwal
2023-03-17  3:37 ` Surya Kumari Jangala
2023-03-17 21:20   ` Peter Bergner
2023-03-18  3:53     ` Peter Bergner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64d20729-65d5-dcd2-af37-9bbabd4c874b@linux.ibm.com \
    --to=aagarwa1@linux.ibm.com \
    --cc=bergner@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).