From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe36.google.com (mail-vs1-xe36.google.com [IPv6:2607:f8b0:4864:20::e36]) by sourceware.org (Postfix) with ESMTPS id 789673858D20 for ; Mon, 12 Jun 2023 15:17:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 789673858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vs1-xe36.google.com with SMTP id ada2fe7eead31-43dc7a78a37so1174985137.0 for ; Mon, 12 Jun 2023 08:17:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686583034; x=1689175034; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0+azTbguvZuWmQJe8TExLHl3Qf/DNMAfjeMEkOudjSg=; b=T+V97tTNH4hgQQKrIKEUw7KWR4eMO5m3azzy/kr7jJSnhuBSKMBwVdi8pTvzhQtp2J jfVoMZOogEY/+PW/goouJfQcBliXAYZt0SxK5hatlmOwXH/1Zm02pT9+tjIU/wQaQ1zE RJOKCFXxecevdOSRVoNO6nJWQLEiVyG8eqKax4GAOYDIB2I8zfE2Tbis0hPHlx7Try5V WH57O7oU8aopTBtaQgkv7N8egNR+oi0nNWuo/1/PErqd6qzn75+B+CiEAhlT8JfJTyo8 krHGGIM/KPhdxeSPPGLwDRK0F45yW5kHC/bnrjVWQIUc8+rN/S2IhvR8r+Uots0zl4zO iKIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686583034; x=1689175034; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0+azTbguvZuWmQJe8TExLHl3Qf/DNMAfjeMEkOudjSg=; b=WbQwCLn/wWNx6nNaz1ZnvmPxeLTRkFnNMb36tOVH7vyKb2mohp7M1HIMmeO2GW73XG 11kACa5azTyK5n+6GQOO/hyqJzY55pwR6IfYyh2sbbvCc1cQM0VUUWBi3uNpmnjjfqZr D19KrqzQS3BygQyYLTwA8PrutGzQNLk9djgYaWGRXvknDJFccnCh0k6i/NljkR7wdxMX +nAYLje562h5HBa4s9T4HXaeDIkgqI9eqbguWMQcgCckX9NWlc6iDrbfeluTGKI9qa3z 2aIRjKq4ExXAqq0LOKu9PeUJDciC84vOUqRTMivRbOVTe1qH2cWZwxMqrL2rDPLl5mGp 0Hlw== X-Gm-Message-State: AC+VfDyLnmLK1tfY1MLS7vrQrftoxKdq6woFIDtDAMOfJJOAzqoRJq3a AILYiFzlNe5820tfVPnfY/GBKzbrtzrgLfrzAUw= X-Google-Smtp-Source: ACHHUZ6MifChR957Jw/33j5CknA2pi4y7KHwKFBb547STDZ6f9/lCJ2vQJqYhIHZIaFmig+mJfGLnxnrfsKwFWXHnDg= X-Received: by 2002:a67:f490:0:b0:43b:1dfa:2534 with SMTP id o16-20020a67f490000000b0043b1dfa2534mr4296321vsn.10.1686583034078; Mon, 12 Jun 2023 08:17:14 -0700 (PDT) MIME-Version: 1.0 References: <20230607055215.29332-1-gaofei@eswincomputing.com> <20230607055215.29332-4-gaofei@eswincomputing.com> In-Reply-To: <20230607055215.29332-4-gaofei@eswincomputing.com> From: Kito Cheng Date: Mon, 12 Jun 2023 23:17:01 +0800 Message-ID: Subject: Re: [PATCH 3/4] [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate To: Fei Gao Cc: gcc-patches@gcc.gnu.org, palmer@dabbelt.com, jeffreyalaw@gmail.com, sinan.lin@linux.alibaba.com, jiawei@iscas.ac.cn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_BODY,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I would suggest breaking this patch into two parts: RISC-V part and the rest part (shrink-wrap.h / shrink-wrap.cc). On Wed, Jun 7, 2023 at 1:55=E2=80=AFPM Fei Gao = wrote: > > Disable zcmp multi push/pop if shrink-wrap-separate is active. > > So in -Os that prefers smaller code size, by default shrink-wrap-separate > is disabled while zcmp multi push/pop is enabled. > > And in -O2 and others that prefers speed, by default shrink-wrap-separate > is enabled while zcmp multi push/pop is disabled. To force enabling zcmp = multi > push/pop in this case, -fno-shrink-wrap-separate has to be explictly give= n. > > The following TC shows the issues in -O2 before this patch with both > shrink-wrap-separate and zcmp multi push/pop active. > 1. duplicated store of s regs. > 2. cm.push pushes ra, s0-s11 in reverse order than what normal > prologue does, causing stack corruption and failure to resotre s regs. > > TC: zcmp_shrink_wrap_separate.c included in this patch. > > output asm before this patch: > calc_func: > cm.push {ra, s0-s3}, -32 > ... > beq a5,zero,.L2 > ... > .L2: > ... > sw s1,20(sp) //issue here > sw s3,12(sp) //issue here > ... > sw s2,16(sp) //issue here > > output asm after this patch: > calc_func: > addi sp,sp,-32 > sw s0,24(sp) > ... > beq a5,zero,.L2 > ... > .L2: > ... > sw s1,20(sp) > sw s3,12(sp) > ... > sw s2,16(sp) > gcc/ChangeLog: > > * config/riscv/riscv.cc > (riscv_avoid_shrink_wrapping_separate): wrap the condition check = in > riscv_avoid_shrink_wrapping_separate. > (riscv_avoid_multi_push): avoid multi push if shrink_wrapping_sep= arate > is active. > (riscv_get_separate_components): call riscv_avoid_shrink_wrapping= _separate > * shrink-wrap.cc (try_shrink_wrapping_separate): call > use_shrink_wrapping_separate. > (use_shrink_wrapping_separate):wrap the condition > check in use_shrink_wrapping_separate > * shrink-wrap.h (use_shrink_wrapping_separate): add to extern > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test. > * gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test. > > Signed-off-by: Fei Gao > Co-Authored-By: Zhangjin Liao > --- > gcc/config/riscv/riscv.cc | 19 +++- > gcc/shrink-wrap.cc | 25 +++-- > gcc/shrink-wrap.h | 1 + > .../riscv/zcmp_shrink_wrap_separate.c | 97 +++++++++++++++++++ > .../riscv/zcmp_shrink_wrap_separate2.c | 97 +++++++++++++++++++ > 5 files changed, 228 insertions(+), 11 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separ= ate.c > create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separ= ate2.c > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index f60c241a526..b505cdeca34 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. If not see > #include "cfghooks.h" > #include "cfgloop.h" > #include "cfgrtl.h" > +#include "shrink-wrap.h" > #include "sel-sched.h" > #include "fold-const.h" > #include "gimple-iterator.h" > @@ -389,6 +390,7 @@ static const struct riscv_tune_param optimize_size_tu= ne_info =3D { > false, /* use_divmod_expansion *= / > }; > > +static bool riscv_avoid_shrink_wrapping_separate (); > static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool= *); > static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *= ); > > @@ -4910,6 +4912,8 @@ riscv_avoid_multi_push(const struct riscv_frame_inf= o *frame) > || cfun->machine->interrupt_handler_p > || cfun->machine->varargs_size !=3D 0 > || crtl->args.pretend_args_size !=3D 0 > + || (use_shrink_wrapping_separate () > + && !riscv_avoid_shrink_wrapping_separate ()) > || (frame->mask & ~ MULTI_PUSH_GPR_MASK)) > return true; > > @@ -6077,6 +6081,17 @@ riscv_epilogue_uses (unsigned int regno) > return false; > } > > +static bool > +riscv_avoid_shrink_wrapping_separate () > +{ > + if (riscv_use_save_libcall (&cfun->machine->frame) > + || cfun->machine->interrupt_handler_p > + || !cfun->machine->frame.gp_sp_offset.is_constant ()) > + return true; > + > + return false; > +} > + > /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ > > static sbitmap > @@ -6086,9 +6101,7 @@ riscv_get_separate_components (void) > sbitmap components =3D sbitmap_alloc (FIRST_PSEUDO_REGISTER); > bitmap_clear (components); > > - if (riscv_use_save_libcall (&cfun->machine->frame) > - || cfun->machine->interrupt_handler_p > - || !cfun->machine->frame.gp_sp_offset.is_constant ()) > + if (riscv_avoid_shrink_wrapping_separate ()) > return components; > > offset =3D cfun->machine->frame.gp_sp_offset.to_constant (); > diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc > index b8d7b557130..d534964321a 100644 > --- a/gcc/shrink-wrap.cc > +++ b/gcc/shrink-wrap.cc > @@ -1776,16 +1776,14 @@ insert_prologue_epilogue_for_components (sbitmap = components) > commit_edge_insertions (); > } > > -/* The main entry point to this subpass. FIRST_BB is where the prologue > - would be normally put. */ > -void > -try_shrink_wrapping_separate (basic_block first_bb) > +bool > +use_shrink_wrapping_separate (void) > { > if (!(SHRINK_WRAPPING_ENABLED > - && flag_shrink_wrap_separate > - && optimize_function_for_speed_p (cfun) > - && targetm.shrink_wrap.get_separate_components)) > - return; > + && flag_shrink_wrap_separate > + && optimize_function_for_speed_p (cfun) > + && targetm.shrink_wrap.get_separate_components)) > + return false; > > /* We don't handle "strange" functions. */ > if (cfun->calls_alloca > @@ -1794,6 +1792,17 @@ try_shrink_wrapping_separate (basic_block first_bb= ) > || crtl->calls_eh_return > || crtl->has_nonlocal_goto > || crtl->saves_all_registers) > + return false; > + > + return true; > +} > + > +/* The main entry point to this subpass. FIRST_BB is where the prologue > + would be normally put. */ > +void > +try_shrink_wrapping_separate (basic_block first_bb) > +{ > + if (!use_shrink_wrapping_separate ()) > return; > > /* Ask the target what components there are. If it returns NULL, don'= t > diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h > index 161647711a3..82386c2b712 100644 > --- a/gcc/shrink-wrap.h > +++ b/gcc/shrink-wrap.h > @@ -26,6 +26,7 @@ along with GCC; see the file COPYING3. If not see > extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_S= ET); > extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_se= q); > extern void try_shrink_wrapping_separate (basic_block first_bb); > +extern bool use_shrink_wrapping_separate (void); > #define SHRINK_WRAPPING_ENABLED \ > (flag_shrink_wrap && targetm.have_simple_return ()) > > diff --git a/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c b= /gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c > new file mode 100644 > index 00000000000..11f87aee607 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c > @@ -0,0 +1,97 @@ > +/* { dg-do compile } */ > +/* { dg-options " -O2 -march=3Drv32imaf_zca_zcmp -mabi=3Dilp32f" } */ > +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto= "} } */ > + > +typedef struct MAT_PARAMS_S > +{ > + int N; > + signed short *A; > + signed short *B; > + signed int *C; > +} mat_params; > + > +typedef struct CORE_PORTABLE_S > +{ > + unsigned char portable_id; > +} core_portable; > + > +typedef struct RESULTS_S > +{ > + /* inputs */ > + signed short seed1; /* Initializing seed */ > + signed short seed2; /* Initializing seed */ > + signed short seed3; /* Initializing seed */ > + void * memblock[4]; /* Pointer to safe memory location = */ > + unsigned int size; /* Size of the data */ > + unsigned int iterations; /* Number of iterations to ex= ecute */ > + unsigned int execs; /* Bitmask of operations to e= xecute */ > + struct list_head_s *list; > + mat_params mat; > + /* outputs */ > + unsigned short crc; > + unsigned short crclist; > + unsigned short crcmatrix; > + unsigned short crcstate; > + signed short err; > + /* ultithread specific */ > + core_portable port; > +} core_results; > + > +extern signed short > +core_bench_state(unsigned int, void *, signed short, signed short, signe= d short, unsigned short); > + > +extern signed short > +core_bench_matrix(mat_params *, signed short, unsigned short); > + > +extern unsigned short > +crcu16(signed short, unsigned short); > + > +signed short > +calc_func(signed short *pdata, core_results *res) > +{ > + signed short data =3D *pdata; > + signed short retval; > + unsigned char optype > + =3D (data >> 7) > + & 1; /* bit 7 indicates if the function result has been cache= d */ > + if (optype) /* if cached, use cache */ > + return (data & 0x007f); > + else > + { /* otherwise calculate and cache the r= esult */ > + signed short flag =3D data & 0x7; /* bits 0-2 is type of functio= n to perform */ > + signed short dtype > + =3D ((data >> 3) > + & 0xf); /* bits 3-6 is specific data for the operat= ion */ > + dtype |=3D dtype << 4; /* replicate the lower 4 bits to get an 8= b value */ > + switch (flag) > + { > + case 0: > + if (dtype < 0x22) /* set min period for bit corruption *= / > + dtype =3D 0x22; > + retval =3D core_bench_state(res->size, > + res->memblock[3], > + res->seed1, > + res->seed2, > + dtype, > + res->crc); > + if (res->crcstate =3D=3D 0) > + res->crcstate =3D retval; > + break; > + case 1: > + retval =3D core_bench_matrix(&(res->mat), dtype, res->cr= c); > + if (res->crcmatrix =3D=3D 0) > + res->crcmatrix =3D retval; > + break; > + default: > + retval =3D data; > + break; > + } > + res->crc =3D crcu16(retval, res->crc); > + retval &=3D 0x007f; > + *pdata =3D (data & 0xff00) | 0x0080 | retval; /* cache the resul= t */ > + return retval; > + } > +} > + > +/* { dg-final { scan-assembler-not "cm\.push" } } */ > + > diff --git a/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c = b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c > new file mode 100644 > index 00000000000..ec7e9c39b5d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c > @@ -0,0 +1,97 @@ > +/* { dg-do compile } */ > +/* { dg-options " -O2 -fno-shrink-wrap-separate -march=3Drv32imaf_zca_zc= mp -mabi=3Dilp32f" } */ > +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto= "} } */ > + > +typedef struct MAT_PARAMS_S > +{ > + int N; > + signed short *A; > + signed short *B; > + signed int *C; > +} mat_params; > + > +typedef struct CORE_PORTABLE_S > +{ > + unsigned char portable_id; > +} core_portable; > + > +typedef struct RESULTS_S > +{ > + /* inputs */ > + signed short seed1; /* Initializing seed */ > + signed short seed2; /* Initializing seed */ > + signed short seed3; /* Initializing seed */ > + void * memblock[4]; /* Pointer to safe memory location = */ > + unsigned int size; /* Size of the data */ > + unsigned int iterations; /* Number of iterations to ex= ecute */ > + unsigned int execs; /* Bitmask of operations to e= xecute */ > + struct list_head_s *list; > + mat_params mat; > + /* outputs */ > + unsigned short crc; > + unsigned short crclist; > + unsigned short crcmatrix; > + unsigned short crcstate; > + signed short err; > + /* ultithread specific */ > + core_portable port; > +} core_results; > + > +extern signed short > +core_bench_state(unsigned int, void *, signed short, signed short, signe= d short, unsigned short); > + > +extern signed short > +core_bench_matrix(mat_params *, signed short, unsigned short); > + > +extern unsigned short > +crcu16(signed short, unsigned short); > + > +signed short > +calc_func(signed short *pdata, core_results *res) > +{ > + signed short data =3D *pdata; > + signed short retval; > + unsigned char optype > + =3D (data >> 7) > + & 1; /* bit 7 indicates if the function result has been cache= d */ > + if (optype) /* if cached, use cache */ > + return (data & 0x007f); > + else > + { /* otherwise calculate and cache the r= esult */ > + signed short flag =3D data & 0x7; /* bits 0-2 is type of functio= n to perform */ > + signed short dtype > + =3D ((data >> 3) > + & 0xf); /* bits 3-6 is specific data for the operat= ion */ > + dtype |=3D dtype << 4; /* replicate the lower 4 bits to get an 8= b value */ > + switch (flag) > + { > + case 0: > + if (dtype < 0x22) /* set min period for bit corruption *= / > + dtype =3D 0x22; > + retval =3D core_bench_state(res->size, > + res->memblock[3], > + res->seed1, > + res->seed2, > + dtype, > + res->crc); > + if (res->crcstate =3D=3D 0) > + res->crcstate =3D retval; > + break; > + case 1: > + retval =3D core_bench_matrix(&(res->mat), dtype, res->cr= c); > + if (res->crcmatrix =3D=3D 0) > + res->crcmatrix =3D retval; > + break; > + default: > + retval =3D data; > + break; > + } > + res->crc =3D crcu16(retval, res->crc); > + retval &=3D 0x007f; > + *pdata =3D (data & 0xff00) | 0x0080 | retval; /* cache the resul= t */ > + return retval; > + } > +} > + > +/* { dg-final { scan-assembler "cm\.push" } } */ > + > -- > 2.17.1 >