public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool
@ 2020-10-14 11:56 kjetilos at gmail dot com
  2020-10-15 20:34 ` [Bug other/97417] " wilson at gcc dot gnu.org
                   ` (61 more replies)
  0 siblings, 62 replies; 63+ messages in thread
From: kjetilos at gmail dot com @ 2020-10-14 11:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

            Bug ID: 97417
           Summary: RISC-V Unnecessary andi instruction when loading
                    volatile bool
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kjetilos at gmail dot com
  Target Milestone: ---

When loading a volatile bool value I see that the compiler adds unnecessary
andi instruction. Here is reproducer:

#include <stdbool.h>

extern volatile bool active;

int foo(void)
{
  if (!active) {
    return 42;
  } else {
    return -42;
  }
}

And the code generated looks like this:

foo:
        lui     a5,%hi(active)
        lbu     a5,%lo(active)(a5)
        li      a0,42
        andi    a5,a5,0xff
        beq     a5,zero,.L1
        li      a0,-42
.L1:
        ret

The andi instruction seems to be unnecessary since lbu is zero extending the
byte.

ref. https://github.com/riscv/riscv-gnu-toolchain/issues/737

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
@ 2020-10-15 20:34 ` wilson at gcc dot gnu.org
  2020-10-21  3:36 ` jiawei at iscas dot ac.cn
                   ` (60 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-10-15 20:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-10-15
     Ever confirmed|0                           |1

--- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> ---
Comparing with the ARM port, I see that in the ARM port, the movqi pattern
emits
(insn 8 7 9 2 (set (reg:SI 117)
        (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8])))
"tmp.c\
":7:7 -1
     (nil))
(insn 9 8 10 2 (set (reg:QI 116)
        (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1
     (nil))
and then later it combines the subreg operation with the following zero_extend
and cancels them out.

Whereas in the RISC-V port, the movqi pattern emits
(insn 9 7 10 2 (set (reg:QI 76)
        (mem/v/c:QI (lo_sum:DI (reg:DI 74)
                (symbol_ref:DI ("active") [flags 0xc4]  <var_decl
0x7f9f0310312\
0 active>)) [1 active+0 S1 A8])) "tmp.c":7:7 -1
     (nil))
and then combine refuses to combine the following zero-extend with this insn as
the memory operation is volatile.

So it seems we need to rewrite the RISC-V port to make movqi and movhi zero
extend to si/di mode and then subreg.  That probably will require cascading
changes to avoid code size and performance regressions.

Looks like a tractable small to medium size project, but will need to wait for
a volunteer to work on it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
  2020-10-15 20:34 ` [Bug other/97417] " wilson at gcc dot gnu.org
@ 2020-10-21  3:36 ` jiawei at iscas dot ac.cn
  2020-10-21  5:58 ` wilson at gcc dot gnu.org
                   ` (59 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: jiawei at iscas dot ac.cn @ 2020-10-21  3:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

jiawei <jiawei at iscas dot ac.cn> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jiawei at iscas dot ac.cn

--- Comment #2 from jiawei <jiawei at iscas dot ac.cn> ---
(In reply to Jim Wilson from comment #1)
> Comparing with the ARM port, I see that in the ARM port, the movqi pattern
> emits
> (insn 8 7 9 2 (set (reg:SI 117)
>         (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8])))
> "tmp.c\
> ":7:7 -1
>      (nil))
> (insn 9 8 10 2 (set (reg:QI 116)
>         (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1
>      (nil))
> and then later it combines the subreg operation with the following
> zero_extend and cancels them out.
> 
> Whereas in the RISC-V port, the movqi pattern emits
> (insn 9 7 10 2 (set (reg:QI 76)
> 	(mem/v/c:QI (lo_sum:DI (reg:DI 74)
>                 (symbol_ref:DI ("active") [flags 0xc4]  <var_decl
> 0x7f9f0310312\
> 0 active>)) [1 active+0 S1 A8])) "tmp.c":7:7 -1
>      (nil))
> and then combine refuses to combine the following zero-extend with this insn
> as the memory operation is volatile.
> 
> So it seems we need to rewrite the RISC-V port to make movqi and movhi zero
> extend to si/di mode and then subreg.  That probably will require cascading
> changes to avoid code size and performance regressions.
> 
> Looks like a tractable small to medium size project, but will need to wait
> for a volunteer to work on it.

Hi Jim, My name is Jiawei Chen. I am from the PLCT Lab. I had recurrented this
bug, and want to try to help fixing this bug. What should I modify,is there any
suggestions?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
  2020-10-15 20:34 ` [Bug other/97417] " wilson at gcc dot gnu.org
  2020-10-21  3:36 ` jiawei at iscas dot ac.cn
@ 2020-10-21  5:58 ` wilson at gcc dot gnu.org
  2020-10-27 11:18 ` jiawei at iscas dot ac.cn
                   ` (58 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-10-21  5:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #3 from Jim Wilson <wilson at gcc dot gnu.org> ---
The basic idea here is that the movqi pattern in riscv.md currently emits RTL
for a load that looks like this
  (set (reg:QI target) (mem:QI (address)))
As an experiment, we want to try changing that to something like this
  (set (reg:DI temp) (zero_extend:DI (mem:DI (address))))
  (set (reg:QI target) (subreg:QI (reg:DI temp) 0))
The hope is that the optimizer will combine the subreg with following
operations resulting in smaller faster code at the end.  And this should also
solve the volatile load optimization problem.  So we need a patch, and then we
need experiments to see if the patch actually produces better code on real
programs.  It should be fairly easy to write the patch even if you don't have
any gcc experience.  The testing part of this is probably more work than the
patch writing.

The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to
be modified to look for qimode loads from memory, allocate a temporary
register, do a zero_extending load into the temp reg, and then a subreg copy
into the target register.

You will probably also need to handle cases where both the target and source
are memory locations, in which case this already gets split into two
instructions, a load followed by a store.

You can look at the movqi pattern in arm.md file to see an example of how to do
this, where it calls gen_zero_extendqisi2.  Though for RISC-V, we would want
gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32.

If the movqi change works, then we would want similar changes for movhi and
maybe also movsi for rv64.

It might also be worth checking whether zero-extend or sign-extend is the
better choice.  We zero extend char by default, so that should be best.  For
rv64, the hardware sign extends simode to dimode by default, so sign-extend is
probably best for that.  For himode I'm not sure, I think we prefer sign-extend
by default, but that isn't necessarily the best choice for loads.  This would
have to be tested.

You can see rtl dumps by using -fdump-rtl-all.  The combiner is the pass that
should be optimizing away the unnecessary zero-extend.  You can see details of
what the combiner pass is doing by using -fdump-rtl-combine-all.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (2 preceding siblings ...)
  2020-10-21  5:58 ` wilson at gcc dot gnu.org
@ 2020-10-27 11:18 ` jiawei at iscas dot ac.cn
  2020-10-27 15:25 ` wilson at gcc dot gnu.org
                   ` (57 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: jiawei at iscas dot ac.cn @ 2020-10-27 11:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #4 from jiawei <jiawei at iscas dot ac.cn> ---
I had did some tests with this problem and find:

foo.c

#include <stdbool.h>

extern volatile bool active;

int foo(void)
{
  if (!active) {
    return 42;
  } else {
    return -42;
  }
}

code generated in foo.s

foo:
        lui     a5,%hi(active)
        lbu     a5,%lo(active)(a5)
        li      a0,42
        andi    a5,a5,0xff
        beq     a5,zero,.L2
        li      a0,-42

When we remove the keyword `volatile`

foo_without_volatile.c

#include <stdbool.h>

extern bool active;

int foo(void)
{
  if (!active) {
    return 42;
  } else {
    return -42;
  }
}

code generated in foo_without_volatile.s

foo:
        lui     a5,%hi(active)
        lbu     a5,%lo(active)(a5)
        li      a0,42
        beq     a5,zero,.L2
        li      a0,-42

and then we change the type from `bool` to `int`

foo_int.c

#include <stdbool.h>

extern volatile int active;

int foo(void)
{
  if (!active) {
    return 42;
  } else {
    return -42;
  }
}

code generated in foo_int.s

foo:
        lui     a5,%hi(active)
        lw      a5,%lo(active)(a5)
        li      a0,42
        sext.w  a5,a5
        beq     a5,zero,.L2
        li      a0,-42

the `sext.w` instruction replace the `andi`

We also remove the keyword `volatile` in foo_int_without_volatile.c

#include <stdbool.h>

extern int active;

int foo(void)
{
  if (!active) {
    return 42;
  } else {
    return -42;
  }
}

code generated in foo_int_without_volatile.s also look like optimized

foo:
        lui     a5,%hi(active)
        lw      a5,%lo(active)(a5)
        li      a0,42
        beq     a5,zero,.L2
        li      a0,-42

Maybe this problem is due to the keyword `volatile`.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (3 preceding siblings ...)
  2020-10-27 11:18 ` jiawei at iscas dot ac.cn
@ 2020-10-27 15:25 ` wilson at gcc dot gnu.org
  2020-10-29  5:27 ` admin at levyhsu dot com
                   ` (56 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-10-27 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #5 from Jim Wilson <wilson at gcc dot gnu.org> ---
Yes, the volatile is the problem.  We need to disable some optimizations like
the combiner to avoid breaking the semantics of volatile.  However, if you try
looking at other ports, like arm, you can see that they don't have this problem
because they generate different RTL at the start and hence do not need the
combiner to generate the sign-extended load.  So the proposal here is that we
modify the RISC-V gcc port to also emit the sign-extended load at RTL
generation time, which solves this problem. And then we need to do some testing
to make sure that this actually generates good code in every case, as we don't
want to accidentally introduce a code size or performance regression while
fixing this volatile optimization problem.

If you are curious about the combiner issue, see the init_recog_no_volatile
call in combine.c.  If you comment that out, the andi will be optimized away. 
But we can't remove that call, because that would break programs using
volatile.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (4 preceding siblings ...)
  2020-10-27 15:25 ` wilson at gcc dot gnu.org
@ 2020-10-29  5:27 ` admin at levyhsu dot com
  2020-10-29 22:49 ` wilson at gcc dot gnu.org
                   ` (55 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-10-29  5:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |admin at levyhsu dot com

--- Comment #6 from Levy <admin at levyhsu dot com> ---
Hi Jim

Levy from StarFive. 

Adding following code to the head of riscv_legitimize_move() according to your
reply seems to solve the problem:

if(mode == QImode && MEM_P (src) && REG_P (dest))
  {
    rtx temp_reg;
    if (TARGET_64BIT)
    {
      temp_reg = gen_reg_rtx (DImode);
      emit_insn(gen_zero_extendqidi2(temp_reg, src));
    }
    else
    {
      temp_reg = gen_reg_rtx (SImode);
      emit_insn(gen_zero_extendqisi2(temp_reg, src));
    }

    riscv_emit_move(dest, gen_rtx_SUBREG(QImode,temp_reg,0));
    return true;
  }

same foo.c will produce:
foo:
        lui     a5,%hi(active)
        lbu     a5,%lo(active)(a5)
        li      a0,42
        bne     a5,zero,.L6
        ret
.L6:
        li      a0,-42
        ret
        .size   foo, .-foo
        .ident  "GCC: (GNU) 10.2.0"

Not sure if I'm doing it right, especially for 64bit DImode because I've only
been with gcc for a month. Just wonder if you have time after Monday's compiler
meeting so we may discuss movsi, movhi and MEM to MEM copy.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (5 preceding siblings ...)
  2020-10-29  5:27 ` admin at levyhsu dot com
@ 2020-10-29 22:49 ` wilson at gcc dot gnu.org
  2020-10-30  8:35 ` admin at levyhsu dot com
                   ` (54 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-10-29 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #7 from Jim Wilson <wilson at gcc dot gnu.org> ---
That patch is basically correct.  I would suggest using gen_lowpart instead of
gen_rtx_SUBREG as a minor cleanup.  It will do the same thing, and is shorter
and easier to read.

There is one problem here that you can't generate new pseudo registers during
register allocation, or after register allocation is complete.  So you need to
disable this optimization in this case.  You can do that by adding a check for
can_create_pseudo_p ().  This is already used explicitly in one place in
riscv_legitimize_move and implicitly in some subfunctions, and is used in the
arm.md movqi pattern.

The next part is testing the patch.  We need some correctness testing.  You can
just run the gcc testsuite for that.  And we need some code size/performance
testing.  I'd suggest compiling some code with and without the patch and check
function sizes and look for anything that got bigger with the patch, and check
to see if it is a problem.  I like to use the toolchain libraries like libc.a
and libstdc++.a since they are being built anways, but using a nice benchmark
is OK also as long as it is big enough to stress the compiler.

If the patch passes testing, then we can look at expanding the scope to handle
more modes, and also handle MEM dest in addition to REG dest.

Yes, we can discuss this Monday.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (6 preceding siblings ...)
  2020-10-29 22:49 ` wilson at gcc dot gnu.org
@ 2020-10-30  8:35 ` admin at levyhsu dot com
  2020-10-30 12:46 ` admin at levyhsu dot com
                   ` (53 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-10-30  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #8 from Levy <admin at levyhsu dot com> ---
Created attachment 49470
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49470&action=edit
optimization fix for BUG 97417

proposing a temp patch here in case someone needs it, then I'll submit a full
patch with test case later.

Following code was added to the riscv_legitimize_move () in the
riscv-gcc/gcc/config/riscv/riscv.c

  if(mode == QImode && MEM_P (src) && REG_P (dest) && can_create_pseudo_p())
  {
    rtx temp_reg;

    if (TARGET_64BIT)
    {
      temp_reg = gen_reg_rtx (DImode);
      emit_insn(gen_zero_extendqidi2(temp_reg, src));
    }
    else
    {
      temp_reg = gen_reg_rtx (SImode);
      emit_insn(gen_zero_extendqisi2(temp_reg, src));
    }

    riscv_emit_move(dest, gen_lowpart(QImode,temp_reg));
    return true;
  }

Tested with make report-gcc, haven't done the regression/performance test yet:

========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected
case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |    0 /     0 |    0 /     0 |      - |

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (7 preceding siblings ...)
  2020-10-30  8:35 ` admin at levyhsu dot com
@ 2020-10-30 12:46 ` admin at levyhsu dot com
  2020-11-04  6:10 ` admin at levyhsu dot com
                   ` (52 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-10-30 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #9 from Levy <admin at levyhsu dot com> ---
Thanks Jim. See u on Monday.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (8 preceding siblings ...)
  2020-10-30 12:46 ` admin at levyhsu dot com
@ 2020-11-04  6:10 ` admin at levyhsu dot com
  2020-11-04  6:35 ` kito at gcc dot gnu.org
                   ` (51 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-04  6:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #10 from Levy <admin at levyhsu dot com> ---
Created attachment 49500
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49500&action=edit
Optimzation Patch for QI/HImode(32bit) and QI/HI/SImode(64bit)

Proposing second patch for QI/HImode(32bit) and QI/HI/SImode(64bit)
Both Zero-Extend & Subreg

Tested with make report-gcc
Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c

Both were failed due to dejaGNU rule:
/* { dg-final { scan-assembler "load1r:\n\taddi" } } */

But if we look at the assembly code, for same input in both file:

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

Current gcc risc-v port will produce:
load1r:
        addi    a5,a0,768
        lw      a4,36(a5)
        lw      a0,32(a5)
        addw    a0,a0,a4
        lw      a4,40(a5)
        addw    a4,a4,a0
        lw      a0,44(a5)
        addw    a0,a0,a4
        ret
Patched new port will produce:
load1r:
        lwu     a4,800(a0)
        lwu     a5,804(a0)
        addw    a5,a5,a4
        lwu     a4,808(a0)
        lwu     a0,812(a0)
        addw    a5,a5,a4
        addw    a0,a5,a0
        ret
With one instruction less, the patched one seems right and even faster to me.
However we still need a test on sign extend and check performance

Test case and performance evaluation will be provided later (hopefully)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (9 preceding siblings ...)
  2020-11-04  6:10 ` admin at levyhsu dot com
@ 2020-11-04  6:35 ` kito at gcc dot gnu.org
  2020-11-04  7:03 ` admin at levyhsu dot com
                   ` (50 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: kito at gcc dot gnu.org @ 2020-11-04  6:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Kito Cheng <kito at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kito at gcc dot gnu.org

--- Comment #11 from Kito Cheng <kito at gcc dot gnu.org> ---
> Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c

Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend with
memory.

You can take a look into riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze
and add logic to handle zero_extend and sign_extend.


> With one instruction less, the patched one seems right and even faster to me. However we still need a test on sign extend and check performance

shorten_memrefs is optimize for size, the idea is transform several load
instructions with large immediate to a load immediate instruction and load
instructions with small immediate, to use C-extensions instruction as possible.

so the instruction count seems increased, but the code size is smaller.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (10 preceding siblings ...)
  2020-11-04  6:35 ` kito at gcc dot gnu.org
@ 2020-11-04  7:03 ` admin at levyhsu dot com
  2020-11-05 22:57 ` wilson at gcc dot gnu.org
                   ` (49 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-04  7:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #12 from Levy <admin at levyhsu dot com> ---
(In reply to Kito Cheng from comment #11)
> > Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c
> 
> Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend
> with memory.
> 
> You can take a look into
> riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze and add logic to
> handle zero_extend and sign_extend.
> 
> 
> > With one instruction less, the patched one seems right and even faster to me. However we still need a test on sign extend and check performance
> 
> shorten_memrefs is optimize for size, the idea is transform several load
> instructions with large immediate to a load immediate instruction and load
> instructions with small immediate, to use C-extensions instruction as
> possible.
> 
> so the instruction count seems increased, but the code size is smaller.

Thank you cheng, I'll have a look.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (11 preceding siblings ...)
  2020-11-04  7:03 ` admin at levyhsu dot com
@ 2020-11-05 22:57 ` wilson at gcc dot gnu.org
  2020-11-06  1:20 ` wilson at gcc dot gnu.org
                   ` (48 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-05 22:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #13 from Jim Wilson <wilson at gcc dot gnu.org> ---
The attachments show the entire riscv.c file being deleted and then readded
with your change.  This is due to a line ending problem.  The original file has
the unix style linefeed and the new file has the windows style carriage return
and linefeed.  Git has a setting called core.autocrlf that can help with this. 
This will require using git diff instead of just diff though to generate
patche, but should give better diffs.  Or alternatively, maybe to can force the
original file to have msdos line endings before you make the diff, e.g. maybe
just loading the original file into your editor and saving it without making
changes will fix the line endings.

You have in the second patch
    (mode == QImode || mode == SImode || mode == HImode)
which is wrong but harmless for rv32 since we can't extend SImode, and is also
wrong for the eventual rv128 support.  You can fix this by using something like
    (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD
There is one place in riscv_legitimize_move that already uses this.

The code inside the if statement is a lot more verbose then necessary.  You can
use some helper functions to simplify it.  First you can use word_mode which is
DImode for rv64 and SImode for rv32 (and will be OImode for rv128).  Then you
can call a helper to do the mode conversion.  So for instance something like
  temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1));
should work.  That should emit an insn to do the zero extend and put it in a
reg.  Now you no longer need to check src mode or TARGET_64BIT as the code is
the same in all cases.  So it should just be about 3 or 4 lines of code for the
body of the if statement.

You have a check for REG_P (dest).  This is stricter than you need, since it
only works for REG and doesn't accept SUBREG.  We should handle both.  Also, it
isn't hard to also handle non-reg or non-subreg dests.  You just need to force
the source to a reg, and you already did that when you generated the
zero_extend operation.  So this code looks like it should work for any dest and
you should be able to drop the REG_P (dest) check.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (12 preceding siblings ...)
  2020-11-05 22:57 ` wilson at gcc dot gnu.org
@ 2020-11-06  1:20 ` wilson at gcc dot gnu.org
  2020-11-06  2:40 ` wilson at gcc dot gnu.org
                   ` (47 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-06  1:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #14 from Jim Wilson <wilson at gcc dot gnu.org> ---
Actually, for the SImode to DImode conversion, zero extending is unlikely to be
correct in that case.  The rv64 'w' instructions require the input to be
sign-extended from 32-bits to 64-bits, so a sign-extend is probably required. 
There will be a similar consideration for rv128 when we get to that.  So maybe
we should only handle QImode and HImode here.

Or maybe we make the choice of zero-extend or sign-extend depend on the mode,
and use zero-extend for smaller than SImode and sign-extend for SImode and
larger.

For qimode, char is unsigned by default, so zero extension is likely the right
choice.  For himode, it isn't clear which is best, but the arm port does a zero
extend.  Also, the Huawei code size proposal says that zero exnteded byte and
short loads are more important than sign extended byte and short load, so that
is more evidence that zero extend is more useful in those cases.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (13 preceding siblings ...)
  2020-11-06  1:20 ` wilson at gcc dot gnu.org
@ 2020-11-06  2:40 ` wilson at gcc dot gnu.org
  2020-11-06  2:44 ` kito at gcc dot gnu.org
                   ` (46 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-06  2:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #15 from Jim Wilson <wilson at gcc dot gnu.org> ---
I tried testing your first patch.  I just built unpatched/patch
rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like
libc, libstdc++, libm.

I managed to find a testcase from glibc that gives worse code with the patch.
struct
{
  unsigned int a : 1;
  unsigned int b : 1;
  unsigned int c : 1;
  unsigned int d : 1;
  unsigned int pad1 : 28;
} s;

void
sub (void)
{
  s.a = 1;
  s.c = 1;
}

Without the patch we get
sub:
        lui     a5,%hi(s)
        addi    a5,a5,%lo(s)
        lbu     a4,1(a5)
        ori     a4,a4,5
        sb      a4,1(a5)
        ret
and with the patch we get
sub:
        lui     a4,%hi(s)
        lbu     a5,%lo(s)(a4)
        andi    a5,a5,-6
        ori     a5,a5,5
        sb      a5,%lo(s)(a4)
        ret
Note the extra and instruction.

This seems to be a combine problem.  With the patched compiler, I see in the
-fdump-rtl-combine-all dump file
Trying 9 -> 11:
    9: r79:DI=r78:DI&0xfffffffffffffffa
      REG_DEAD r78:DI
   11: r81:DI=r79:DI|0x5
      REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 81)
    (ior:DI (and:DI (reg:DI 78 [ MEM <unsigned char> [(struct  *)&sD.1491] ])
            (const_int 250 [0xfa]))
        (const_int 5 [0x5])))
Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6
to AND 250.  If combine had left that constant alone, or if maybe combine
propagated that info about nonzero bits through to r81, then it should have
been able to optimize out the AND operation.  This does work when the load does
not zero extend by default.

The ARM port shows the exact same problem.  I see
sub:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r2, #:lower16:.LANCHOR0
        movt    r2, #:upper16:.LANCHOR0
        ldrb    r3, [r2]        @ zero_extendqisi2
        bic     r3, r3, #5
        orr     r3, r3, #5
        strb    r3, [r2]
        bx      lr
and the bic (bit clear) is obviously unnecessary.

This probably should be submitted as a separate bug if we don't want to fix it
here.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (14 preceding siblings ...)
  2020-11-06  2:40 ` wilson at gcc dot gnu.org
@ 2020-11-06  2:44 ` kito at gcc dot gnu.org
  2020-11-06  3:35 ` wilson at gcc dot gnu.org
                   ` (45 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: kito at gcc dot gnu.org @ 2020-11-06  2:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #16 from Kito Cheng <kito at gcc dot gnu.org> ---
> Or maybe we make the choice of zero-extend or sign-extend depend on the mode, and use zero-extend for smaller than SImode and sign-extend for SImode and larger.


Maybe depend on LOAD_EXTEND_OP?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (15 preceding siblings ...)
  2020-11-06  2:44 ` kito at gcc dot gnu.org
@ 2020-11-06  3:35 ` wilson at gcc dot gnu.org
  2020-11-06  9:46 ` admin at levyhsu dot com
                   ` (44 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-06  3:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #17 from Jim Wilson <wilson at gcc dot gnu.org> ---
Yes, LOAD_EXTEND_OP is a good suggestion.  We can maybe do something like
  int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
  temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, extend));

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (16 preceding siblings ...)
  2020-11-06  3:35 ` wilson at gcc dot gnu.org
@ 2020-11-06  9:46 ` admin at levyhsu dot com
  2020-11-06 11:38 ` admin at levyhsu dot com
                   ` (43 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-06  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #18 from Levy <admin at levyhsu dot com> ---
if (GET_MODE_CLASS (mode) == MODE_INT
              && GET_MODE_SIZE (mode) < UNITS_PER_WORD
        && can_create_pseudo_p()
        && MEM_P (src))
  {
    int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
    rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src,
extend));
    riscv_emit_move(dest, temp_reg);
    return true;
  }

tried to insert code at the beginning of riscv_legitimize_move() but seems
convert_to_mode() raised seg fault druing make

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (17 preceding siblings ...)
  2020-11-06  9:46 ` admin at levyhsu dot com
@ 2020-11-06 11:38 ` admin at levyhsu dot com
  2020-11-06 20:44 ` wilson at gcc dot gnu.org
                   ` (42 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-06 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #19 from Levy <admin at levyhsu dot com> ---
Also tested code without int extend, just zero-extend with unsignedp set to 1,
Still seg fault.

if (GET_MODE_CLASS (mode) == MODE_INT
              && GET_MODE_SIZE (mode) < UNITS_PER_WORD
        && can_create_pseudo_p()
        && MEM_P (src))
  {
    rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1));
    riscv_emit_move(dest, temp_reg);
    return true;
  }

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (18 preceding siblings ...)
  2020-11-06 11:38 ` admin at levyhsu dot com
@ 2020-11-06 20:44 ` wilson at gcc dot gnu.org
  2020-11-06 21:08 ` wilson at gcc dot gnu.org
                   ` (41 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-06 20:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #20 from Jim Wilson <wilson at gcc dot gnu.org> ---
Maybe convert_to_mode is recursively calling convert_to_mode until you run out
of stack space.  You can use --save-temps to generate a .i preprocessed file
form your input testcasxe, then run cc1 under gdb.  You need to give the
default arch/abi options or you will get an error
(gdb) run -march=rv64gc -mabi=lp64d -O2 tmp.i
when look at the backtrace when you hit the segfault.

There are other ways to get the  zero extend we need here.  We could just
generate the RTL for the insn directly instead of calling the
gen_zero_extendXiYi2 functions because we know that they exist and what form
they are in.  This is inconvenient though.  We could try querying the optabs
table to get the right insn code.  There is a gen_extend_insn function that
looks like it would work and is more direct than convert_to_mode. 
gen_extend_insn does require that the input is in a form that the convert will
accept, so it must be forced to a register if it isn't already a register. 
Another solution would be to use one of the rtx simplier functions, e.g. you
can do
     simplify_gen_unary (ZERO_EXTEND, word_mode, src, mode);
but the simplify functions are really for simplifying complex RTL to simpler
RTL, so I think not the right approach here.

I think gen_extend_insn is the best approach if we can't use convert_to_mode.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (19 preceding siblings ...)
  2020-11-06 20:44 ` wilson at gcc dot gnu.org
@ 2020-11-06 21:08 ` wilson at gcc dot gnu.org
  2020-11-09  8:35 ` admin at levyhsu dot com
                   ` (40 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-06 21:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #21 from Jim Wilson <wilson at gcc dot gnu.org> ---
I submitted my testcase as 97747 so it will get more attention.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (20 preceding siblings ...)
  2020-11-06 21:08 ` wilson at gcc dot gnu.org
@ 2020-11-09  8:35 ` admin at levyhsu dot com
  2020-11-09  9:22 ` admin at levyhsu dot com
                   ` (39 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-09  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #22 from Levy <admin at levyhsu dot com> ---
Under condition 

if (GET_MODE_CLASS (mode) == MODE_INT
              && GET_MODE_SIZE (mode) < UNITS_PER_WORD
        && can_create_pseudo_p()
        && MEM_P (src))

with var:

rtx temp_reg;
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);

I've tried the combination of:

gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend);
gen_extend_insn (temp_reg, force_reg (word_mode, src), word_mode, word_mode,
extend);
gen_extend_insn (temp_reg, src, word_mode, mode, extend);

with:
riscv_emit_move(dest, gen_lowpart (mode, temp_reg));
riscv_emit_move(dest, force_reg(mode, temp_reg));

then return true

All raised segfault during make gcc.

For example:

  if (GET_MODE_CLASS (mode) == MODE_INT
              && GET_MODE_SIZE (mode) < UNITS_PER_WORD
        && can_create_pseudo_p()
        && MEM_P (src))
  {
    rtx temp_reg;
    int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
    gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend);
    riscv_emit_move(dest, force_reg(mode, temp_reg));
    return true;
  }
At beginning of riscv_legitimize_move()

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (21 preceding siblings ...)
  2020-11-09  8:35 ` admin at levyhsu dot com
@ 2020-11-09  9:22 ` admin at levyhsu dot com
  2020-11-10  5:20 ` admin at levyhsu dot com
                   ` (38 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-09  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49470|0                           |1
        is obsolete|                            |
  Attachment #49500|0                           |1
        is obsolete|                            |

--- Comment #23 from Levy <admin at levyhsu dot com> ---
Created attachment 49524
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49524&action=edit
Third test patch

While I'm waiting for a solution, I've reused my second patch and made a new
patch.
Third test patch adds one extra function gen_extend_insn_auto() in optabs.c/h
then just called by riscv_legitimize_move()
Now it can emit sign/unsigned-extend insn automatically. 

Still haven't solved the shorten-memrefs issue. So it will still raise 2 error
when make report-gcc
So the -Os option (size optimization) may not behave as expected.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (22 preceding siblings ...)
  2020-11-09  9:22 ` admin at levyhsu dot com
@ 2020-11-10  5:20 ` admin at levyhsu dot com
  2020-11-10  5:29 ` kito at gcc dot gnu.org
                   ` (37 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-10  5:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49524|0                           |1
        is obsolete|                            |

--- Comment #24 from Levy <admin at levyhsu dot com> ---
Created attachment 49532
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49532&action=edit
QI/HI/SI/DI zero/sign-extend for RV32/64/128

Rewrote the third proposed patch.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (23 preceding siblings ...)
  2020-11-10  5:20 ` admin at levyhsu dot com
@ 2020-11-10  5:29 ` kito at gcc dot gnu.org
  2020-11-10  5:34 ` admin at levyhsu dot com
                   ` (36 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: kito at gcc dot gnu.org @ 2020-11-10  5:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #25 from Kito Cheng <kito at gcc dot gnu.org> ---
Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those
functions are RISC-V specific, so I would suggest you put in riscv.c and
riscv-protos.h.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (24 preceding siblings ...)
  2020-11-10  5:29 ` kito at gcc dot gnu.org
@ 2020-11-10  5:34 ` admin at levyhsu dot com
  2020-11-10  5:36 ` admin at levyhsu dot com
                   ` (35 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-10  5:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49532|0                           |1
        is obsolete|                            |

--- Comment #26 from Levy <admin at levyhsu dot com> ---
Created attachment 49533
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49533&action=edit
QI/HI/SI/DI zero/sign-extend for RV32/64/128

BUG fix for unimplemented code

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (25 preceding siblings ...)
  2020-11-10  5:34 ` admin at levyhsu dot com
@ 2020-11-10  5:36 ` admin at levyhsu dot com
  2020-11-10  6:01 ` admin at levyhsu dot com
                   ` (34 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-10  5:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #27 from Levy <admin at levyhsu dot com> ---
(In reply to Kito Cheng from comment #25)
> Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those
> functions are RISC-V specific, so I would suggest you put in riscv.c and
> riscv-protos.h.

No problem, I'll make a new patch.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (26 preceding siblings ...)
  2020-11-10  5:36 ` admin at levyhsu dot com
@ 2020-11-10  6:01 ` admin at levyhsu dot com
  2020-11-10 10:47 ` admin at levyhsu dot com
                   ` (33 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-10  6:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49533|0                           |1
        is obsolete|                            |

--- Comment #28 from Levy <admin at levyhsu dot com> ---
Created attachment 49534
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49534&action=edit
V4 patch for QI/HI/SI/DI zero/sign-extend for RV32/64/128

Suggest by Kito, The V4 patch moved the gen_extend_insn_auto() to riscv.c and
was included in riscv-protos.h since it handles riscv backend only.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (27 preceding siblings ...)
  2020-11-10  6:01 ` admin at levyhsu dot com
@ 2020-11-10 10:47 ` admin at levyhsu dot com
  2020-11-11  1:21 ` wilson at gcc dot gnu.org
                   ` (32 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-10 10:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49534|0                           |1
        is obsolete|                            |

--- Comment #29 from Levy <admin at levyhsu dot com> ---
Created attachment 49536
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49536&action=edit
QI/HI/SImode auto Zero/Sign-extend

Finally, make gen_extend_insn() seems to work with auto-extension based on Jim
and Kito's idea.

Just 10 lines of code at the beginning of riscv_legitimize_move() in
riscv-gcc/gcc/config/riscv.c

if (GET_MODE_CLASS (mode) == MODE_INT
            && GET_MODE_SIZE (mode) < UNITS_PER_WORD
      && can_create_pseudo_p()
      && MEM_P (src))
  {
      rtx temp_reg = gen_reg_rtx (word_mode);
      int zero_sign_extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
      emit_insn(gen_extend_insn(temp_reg, src, word_mode, mode,
zero_sign_extend));
      riscv_emit_move(dest, gen_lowpart(mode, temp_reg));
      return true;
  }

Haven't make report-gcc, but already passed 2-stage make. 
I'll have a test later.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (28 preceding siblings ...)
  2020-11-10 10:47 ` admin at levyhsu dot com
@ 2020-11-11  1:21 ` wilson at gcc dot gnu.org
  2020-11-11  5:43 ` admin at levyhsu dot com
                   ` (31 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-11  1:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #30 from Jim Wilson <wilson at gcc dot gnu.org> ---
Looking at your v2 patch, the first verison fails because you are passing
mismatched modes to emit_move_insn.  The version with gen_lowpart solves that
problem, but fails because of infinite recursion.

The v4 patch looks good.  There are some coding style issues, but I can always
fix them for you.  There should be a space before open paren.  The first &&
should line up with the last two.  The braces should be indented two more
spaces.  The comment needs to end with a period and two spaces.

In the comment, instead of "Separate ... to ..." I'd say "Expand ... to ..." or
maybe "Emit ... as ...".

Now we need the copyright assignment paperwork.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (29 preceding siblings ...)
  2020-11-11  1:21 ` wilson at gcc dot gnu.org
@ 2020-11-11  5:43 ` admin at levyhsu dot com
  2020-11-11  6:43 ` admin at levyhsu dot com
                   ` (30 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-11  5:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49536|0                           |1
        is obsolete|                            |

--- Comment #31 from Levy <admin at levyhsu dot com> ---
Created attachment 49542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49542&action=edit
QI/HI/SImode auto Zero/Sign-extend

Much appreciated Jim.

The new patch should solve the format issue by adding the same tabs on each
line.

In the meanwhile I'll try to patch the pass_shorten_memrefs::analyze() in
riscv-shorten-memrefs.c

Any idea on the combiner issue?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (30 preceding siblings ...)
  2020-11-11  5:43 ` admin at levyhsu dot com
@ 2020-11-11  6:43 ` admin at levyhsu dot com
  2020-11-11 19:35 ` wilson at gcc dot gnu.org
                   ` (29 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-11  6:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49542|0                           |1
        is obsolete|                            |

--- Comment #32 from Levy <admin at levyhsu dot com> ---
Created attachment 49543
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49543&action=edit
QI/HI/SImode auto Zero/Sign-extend

Added one missing space at the end of the comment.
Added one tab before each brace.
Replace all tabs with space (come on....)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (31 preceding siblings ...)
  2020-11-11  6:43 ` admin at levyhsu dot com
@ 2020-11-11 19:35 ` wilson at gcc dot gnu.org
  2020-11-12  1:26 ` admin at levyhsu dot com
                   ` (28 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-11 19:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #33 from Jim Wilson <wilson at gcc dot gnu.org> ---
I did say that I'm willing to fix code style issues.  All major software
projects I have worked with have coding style conventions.  It makes it easier
to maintain a large software base when everything is using the same coding
style.  We do have info to help you follow the GNU coding style.  See
https://gcc.gnu.org/wiki/FormattingCodeForGCC
which has emacs and vim settings to get the right code style.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (32 preceding siblings ...)
  2020-11-11 19:35 ` wilson at gcc dot gnu.org
@ 2020-11-12  1:26 ` admin at levyhsu dot com
  2020-11-13  0:00 ` wilson at gcc dot gnu.org
                   ` (27 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-12  1:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #34 from Levy <admin at levyhsu dot com> ---
(In reply to Jim Wilson from comment #33)
> I did say that I'm willing to fix code style issues.  All major software
> projects I have worked with have coding style conventions.  It makes it
> easier to maintain a large software base when everything is using the same
> coding style.  We do have info to help you follow the GNU coding style.  See
> https://gcc.gnu.org/wiki/FormattingCodeForGCC
> which has emacs and vim settings to get the right code style.

No problem and thank you, Jim, I'll try to catch up the coding style.
what about the combine issue and shorten-memrefs? 
Shall we fix it here or someplace else?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (33 preceding siblings ...)
  2020-11-12  1:26 ` admin at levyhsu dot com
@ 2020-11-13  0:00 ` wilson at gcc dot gnu.org
  2020-11-16  1:17 ` admin at levyhsu dot com
                   ` (26 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-13  0:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #35 from Jim Wilson <wilson at gcc dot gnu.org> ---
By combine issue, are you referring to the regression I mentioned in comment 3
and filed as bug 97747?  We can handle that as a separate issue.  It should be
uncommon.  I expect to get much more benefit from this patch than the downside
due to that combine issue.

As for the shorten-memrefs problem, I didn't notice that one in my testing.  It
does need to be fixed.  Taking a look now, it looks pretty simple to fix.  The
code currently looks for MEM, it needs to handle (SIGN_EXTEND (MEM)) and
((ZERO_EXTEND (MEM)).  See the get_si_mem_base_reg function.  You need to skip
over the sign_extend or zero_extend when looking fot the mem at the first call.
 Then at the second call you need to be careful to put the sign_extend or
zero_extend back when creating the new RTL.  Maybe just another arg to
get_si_mem_base so it can record the parent rtx code of the mem.  Or maybe do
this outside get_si_mem_base to skip over a sign/zero extend at the first call,
and then do the same at the second call but record what rtx we skipped over so
that we can put it back.  We can either handle this here or as another patch. 
But since you have some time while waiting for paperwork maybe you can try
writing a fix.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (34 preceding siblings ...)
  2020-11-13  0:00 ` wilson at gcc dot gnu.org
@ 2020-11-16  1:17 ` admin at levyhsu dot com
  2020-11-16  3:24 ` kito at gcc dot gnu.org
                   ` (25 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-16  1:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #36 from Levy <admin at levyhsu dot com> ---
It seems get_si_mem_base_reg() were called repeatly FOR_BB_INSNS from both
pass_shorten_memrefs::analyze and pass_shorten_memrefs::transform

Correct me if I'm wrong:
It seems we need some data structure (a linked list should work) to store the
zero/sign extend when we strip it off like:

if (GET_CODE (mem) == ZERO_EXTEND
      || GET_CODE (mem) == SIGN_EXTEND)
    mem = XEXP (mem, 0);
in each insns.

Then in pass_shorten_memrefs::transform(), each time get_si_mem_base_reg() is
called, we check whether if we need to put it back.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (35 preceding siblings ...)
  2020-11-16  1:17 ` admin at levyhsu dot com
@ 2020-11-16  3:24 ` kito at gcc dot gnu.org
  2020-11-17 10:19 ` admin at levyhsu dot com
                   ` (24 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: kito at gcc dot gnu.org @ 2020-11-16  3:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #37 from Kito Cheng <kito at gcc dot gnu.org> ---
Maybe we could add a parameter to indicate the type of memory access,
plain_mem, zext_mem or sext_mem for pass_shorten_memrefs::get_si_mem_base_reg.

e.g.
      for (int i = 0; i < 2; i++)
        {   
          rtx mem = XEXP (pat, i); 
          rtx addr;
          mem_access_type_t type;
          if (get_si_mem_base_reg (mem, &addr, &type))
            {   
              HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
              /* Do not transform store zero as these cannot be compressed.  */
              if (i == 0)
                {   
                  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
                    continue;
                }   
              if (m->get_or_insert (regno) > 3)
                {
                  addr
                    = targetm.legitimize_address (addr, addr, GET_MODE (mem));
                  switch (type)
                    {
                    case plain_mem:
                      XEXP (pat, i) = replace_equiv_address (mem, addr);
                      break;
                    case zext_mem:
                      ...
                      break;
                    case sext_mem:
                      ...
                      break;
                    }
                  df_insn_rescan (insn);
                }   
            }

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (36 preceding siblings ...)
  2020-11-16  3:24 ` kito at gcc dot gnu.org
@ 2020-11-17 10:19 ` admin at levyhsu dot com
  2020-11-18  6:09 ` admin at levyhsu dot com
                   ` (23 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-17 10:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #38 from Levy <admin at levyhsu dot com> ---
Created attachment 49575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49575&action=edit
riscv-shorten-memrefs_V1.patch

Did little bit change in get_si_mem_base_reg() and transform()
Now for the same c input array_test.c

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

int main ()
{
    int arr[300]= {0};
    arr[200] = 15;
    arr[201] = -33;
    arr[202] = 7;
    arr[203] = -999;
    int b = load1r(arr);
    printf("Result: %d\n",b);
    return 0;
}

in loadlr, when put a debug_rtx(pat) after:

(unpatched)XEXP (pat, i) = replace_equiv_address (mem, addr); 
or 
(patched)XEXP (XEXP (pat, I), 0) = replace_equiv_address(XEXP (mem, 0), addr);



unpatched gcc will produce follwing insns:
---------------------------------------------------------
(set (reg:SI 81 [ MEM[(int *)array_5(D) + 800B] ])
    (mem:SI (plus:DI (reg:DI 88)
            (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4 A32]))
(set (reg:SI 82 [ MEM[(int *)array_5(D) + 804B] ])
    (mem:SI (plus:DI (reg:DI 89)
            (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4 A32]))
(set (reg:SI 84 [ MEM[(int *)array_5(D) + 808B] ])
    (mem:SI (plus:DI (reg:DI 90)
            (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4 A32]))
(set (reg:SI 86 [ MEM[(int *)array_5(D) + 812B] ])
    (mem:SI (plus:DI (reg:DI 91)
            (const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4 A32]))
---------------------------------------------------------


patched gcc will produce follwing insns:
---------------------------------------------------------
(set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 92)
                (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])))
(set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 93)
                (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])))
(set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 94)
                (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])))
(set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 95)
                (const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4
A32])))
---------------------------------------------------------
which the patched one looks ok for me, but the final assembly code still shows
no change (both under -Os)

Not quite sure where the problem is, I'll have a look near
array_test.c.250r.shorten_memrefs tomorrow.

Please ignore the coding style as it's just a debug patch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (37 preceding siblings ...)
  2020-11-17 10:19 ` admin at levyhsu dot com
@ 2020-11-18  6:09 ` admin at levyhsu dot com
  2020-11-18 18:31 ` [Bug target/97417] " wilson at gcc dot gnu.org
                   ` (22 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-18  6:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #39 from Levy <admin at levyhsu dot com> ---
Checked all pass from 250r.shorten_memrefs to 270r.ce2

In 269r.combine I saw the following combination merged the replaced address:
-------------------------------------------------------
modifying insn i3    27: r92:DI=r96:DI+0x300
      REG_DEAD r96:DI
deferring rescan insn with uid = 27.
(!)allowing combination of insns 27 and 6
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i2    27: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3     6: r82:DI=sign_extend([r96:DI+0x320])
      REG_DEAD r96:DI
deferring rescan insn with uid = 6.
(!)allowing combination of insns 27 and 8
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i2    27: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3     8: r84:DI=sign_extend([r96:DI+0x324])
      REG_DEAD r96:DI
deferring rescan insn with uid = 8.
(!)allowing combination of insns 27 and 12
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i2    27: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3    12: r87:DI=sign_extend([r96:DI+0x328])
      REG_DEAD r96:DI
deferring rescan insn with uid = 12.
(!)allowing combination of insns 27 and 16
original costs 4 + 16 = 20
replacement cost 16
deferring deletion of insn with uid = 27.
modifying insn i3    16: r90:DI=sign_extend([r96:DI+0x32c])
      REG_DEAD r96:DI
deferring rescan insn with uid = 16.
allowing combination of insns 18 and 19
-------------------------------------------------------
Where in 268r.ud_dce both insns 27 are same (except for memory address):

(insn 27 26 28 2 (set (reg:DI 10 a0)
        (lo_sum:DI (reg/f:DI 85)
            (symbol_ref/f:DI ("*.LC0") [flags 0x82]  <var_decl 0x7f3546fe91b0
*.LC0>))) "array_test.c":21:5 133 {*lowdi}
     (expr_list:REG_DEAD (reg/f:DI 85)
        (expr_list:REG_EQUAL (symbol_ref/f:DI ("*.LC0") [flags 0x82]  <var_decl
0x7f3546fe91b0 *.LC0>)
            (nil))))
-------------------------------------------------------
In 270r.combine (patched):

(note 27 3 6 2 NOTE_INSN_DELETED)

and following insns 768 + 32/36/40/44 were put back as:

(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}


(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}


(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}

(insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
     (expr_list:REG_DEAD (reg:DI 96)
        (nil)))

Maybe combine.c needs some modification?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (38 preceding siblings ...)
  2020-11-18  6:09 ` admin at levyhsu dot com
@ 2020-11-18 18:31 ` wilson at gcc dot gnu.org
  2020-11-20  2:41 ` admin at levyhsu dot com
                   ` (21 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-18 18:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #40 from Jim Wilson <wilson at gcc dot gnu.org> ---
If you look at riscv.opt you will see that the -mshorten-memrefs option sets
the variable riscv_mshorten_memrefs.  If you grep for that, you will see that
it is used in riscv_address_cost in riscv.c.  I believe it is this change to
the address cost that is supposed to prevent the recombination back into
addresses that don't fit in compressed instructions.  So you need to look at
why this works in the current code, but not with your zero/sign extend load
patch.  Maybe there is something about the rtx costs for a regular load versus
a zero/sign extend load that causes the problem.  In the combine dump where it
says "original costs" and "replacement costs" that is where it is using
rtx_cost and riscv_address_cost.  The replacement cost should be more than the
original cost to prevent the recombination.  You should see that if you look at
the combine dump for the unpatched compiler.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (39 preceding siblings ...)
  2020-11-18 18:31 ` [Bug target/97417] " wilson at gcc dot gnu.org
@ 2020-11-20  2:41 ` admin at levyhsu dot com
  2020-11-20  3:32 ` wilson at gcc dot gnu.org
                   ` (20 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-20  2:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #41 from Levy <admin at levyhsu dot com> ---
When putting the same debug_rtx(addr) at the first line of riscv_address_cost()

Unpatched one outputs:
(plus:DI (reg/f:DI 15 a5 [88])
    (const_int 32 [0x20]))
(plus:DI (reg:DI 10 a0 [92])
    (const_int 800 [0x320]))
(plus:DI (reg/f:DI 15 a5 [88])
    (const_int 36 [0x24]))
(plus:DI (reg:DI 10 a0 [92])
    (const_int 804 [0x324]))
.......

Patched one outputs nothing. what it means to me is that riscv backend,
something like:
(sign_extend:DI (mem:SI (plus:DI (reg:DI 93) 

is never passed to riscv_address_cost(), unlike:
(mem:SI (plus:DI (reg:DI 89)

so whether riscv_mshorten_memrefs is set or not doesn't really matter here.
Then I traced back to where address_cost() is called, 

1.address_cost() is called by riscv_new_address_profitable_p() in riscv.c
2.riscv_new_address_profitable_p() is called by attempt_change() in
sched-deps.c
And since I'm not that familiar with struct mem_inc_info, this of trace could
be wrong:
3.attempt_change() is called by find_inc() in sched-deps.c
....(Still tracing)

------------------------------------------------------
Also since Arm can handle sign/zero/extend with subreg under -Os, I had a quick
search on arm.c

in arm_address_cost(), rtx here were passed as x, not addr (which addr may be
XEXP (mem, 0)). 

So GET_CODE (x) cam be used to determine whether it has a
MEM/LABEL_REF/SYMBOL_REF... at front. Then cost can be vary from 0 all the way
to 10.

This also worth some investigation to see how -Os works on arm side.
------------------------------------------------------

Need some time to work on it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (40 preceding siblings ...)
  2020-11-20  2:41 ` admin at levyhsu dot com
@ 2020-11-20  3:32 ` wilson at gcc dot gnu.org
  2020-11-23  6:17 ` admin at levyhsu dot com
                   ` (19 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-11-20  3:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #42 from Jim Wilson <wilson at gcc dot gnu.org> ---
riscv_address_cost is a hook, so it is targetm.address_cost which is only
called from address_cost which is only called in a few places one of which is
in postreload.c so that is the one I would look at first.  This is in
try_replace_in_use which is called from reload_combine_recognize_const_pattern
which is trying to put offsets back into mems which is exactly what we don't
want here.  This suggests that containing_mem isn't getting set when we have a
sign/zero extend.  It should get set in reload_combine_note_use.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (41 preceding siblings ...)
  2020-11-20  3:32 ` wilson at gcc dot gnu.org
@ 2020-11-23  6:17 ` admin at levyhsu dot com
  2020-11-23  6:38 ` admin at levyhsu dot com
                   ` (18 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-23  6:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #43 from Levy <admin at levyhsu dot com> ---
Thanks for pointing the hook out Jim.

for both patched and unpatched, so far I've been traced through

try_replace_in_use()
to
reload_combine_recognize_const_pattern()
to
reload_combine()
to
reload_cse_regs()
to
pass_postreload_cse::execute()

in postreload.c

-------------------------------------------------------------------
For reload_combine(), by printing each insn at front of the loop: (line 1302)

for (insn = get_last_insn (); insn; insn = prev)
{
   debug_rtx(insn)
   ...
}
-------------------------------------------------------------------
Unpatched gcc shows:

(insn 13 11 14 2 (set (reg:DI 10 a0)
        (sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                    (const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
     (nil))
(insn 11 10 13 2 (set (reg:SI 14 a4 [83])
        (plus:SI (reg:SI 14 a4 [orig:84 MEM[(int *)array_5(D) + 808B] ] [84])
            (reg:SI 10 a0 [80]))) "array_test.c":8:5 3 {addsi3}
     (nil))
(insn 10 8 11 2 (set (reg:DI 14 a4)
        (sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                    (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
     (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])
        (nil)))
(insn 8 7 10 2 (set (reg:SI 10 a0 [80])
        (plus:SI (reg:SI 10 a0 [orig:81 MEM[(int *)array_5(D) + 800B] ] [81])
            (reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 804B] ] [82])))
"array_test.c":7:5 3 {addsi3}
     (nil))
(insn 7 6 8 2 (set (reg:DI 14 a4)
        (sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                    (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])
        (nil)))
(insn 6 23 7 2 (set (reg:DI 10 a0)
        (sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                    (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])
        (nil)))
-------------------------------------------------------------------
Patched one shows already merged results:

(insn 16 14 18 2 (set (reg:DI 10 a0 [orig:90 MEM[(int *)array_5(D) + 812B] ]
[90])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
     (nil))
(insn 14 12 16 2 (set (reg:SI 15 a5 [85])
        (plus:SI (reg:SI 15 a5 [80])
            (reg:SI 14 a4 [orig:87 MEM[(int *)array_5(D) + 808B] ] [87])))
"array_test.c":8:5 3 {addsi3}
     (nil))
(insn 12 10 14 2 (set (reg:DI 14 a4 [orig:87 MEM[(int *)array_5(D) + 808B] ]
[87])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
     (nil))
(insn 10 8 12 2 (set (reg:SI 15 a5 [80])
        (plus:SI (reg:SI 15 a5 [orig:84 MEM[(int *)array_5(D) + 804B] ] [84])
            (reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 800B] ] [82])))
"array_test.c":7:5 3 {addsi3}
     (nil))
(insn 8 6 10 2 (set (reg:DI 15 a5 [orig:84 MEM[(int *)array_5(D) + 804B] ]
[84])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
(insn 6 27 8 2 (set (reg:DI 14 a4 [orig:82 MEM[(int *)array_5(D) + 800B] ]
[82])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
-------------------------------------------------------------------
Before reload_combine () is reload_cse_regs_1() in reload_cse_regs() which
"detects no-op moves where we happened to assign two different pseudo-registers
to the same hard register"

and pass_postreload_cse::execute calls reload_cse_regs()

pass_postreload_cse::execute() look like the entry point for postreload.c

In order to confirm it's not in postreload.c, I put:
----------------------------------------------------------
  rtx_insn *insn, *next;
  for (insn = get_insns (); insn; insn = next)
  {
          debug_rtx(insn);
          next = NEXT_INSN (insn);
  }
----------------------------------------------------------
in pass_postreload_cse::execute (function *fun)

right after:

if (!dbg_cnt (postreload_cse))
    return 0;
----------------------------------------------------------
Unpathed:

(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 3 2 NOTE_INSN_DELETED)
(note 3 2 23 2 NOTE_INSN_FUNCTION_BEG)
(insn 23 3 6 2 (set (reg/f:DI 15 a5 [88])
        (plus:DI (reg:DI 10 a0 [92])
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
(insn 6 23 7 2 (set (reg:SI 10 a0 [orig:81 MEM[(int *)array_5(D) + 800B] ]
[81])
        (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])) "array_test.c":7:5 136 {*movsi_internal}
     (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])
        (nil)))
(insn 7 6 8 2 (set (reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 804B] ] [82])
        (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])) "array_test.c":7:5 136 {*movsi_internal}
     (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])
        (nil)))
(insn 8 7 10 2 (set (reg:SI 10 a0 [80])
        (plus:SI (reg:SI 10 a0 [orig:81 MEM[(int *)array_5(D) + 800B] ] [81])
            (reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 804B] ] [82])))
"array_test.c":7:5 3 {addsi3}
     (nil))
(insn 10 8 11 2 (set (reg:SI 14 a4 [orig:84 MEM[(int *)array_5(D) + 808B] ]
[84])
        (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])) "array_test.c":8:5 136 {*movsi_internal}
     (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])
        (nil)))
(insn 11 10 13 2 (set (reg:SI 14 a4 [83])
        (plus:SI (reg:SI 14 a4 [orig:84 MEM[(int *)array_5(D) + 808B] ] [84])
            (reg:SI 10 a0 [80]))) "array_test.c":8:5 3 {addsi3}
     (nil))
(insn 13 11 14 2 (set (reg:SI 10 a0 [orig:86 MEM[(int *)array_5(D) + 812B] ]
[86])
        (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
                (const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4
A32])) "array_test.c":9:5 136 {*movsi_internal}
     (nil))
......
----------------------------------------------------------
Patched:

(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 3 2 NOTE_INSN_DELETED)
(note 3 2 27 2 NOTE_INSN_FUNCTION_BEG)
(note 27 3 6 2 NOTE_INSN_DELETED)
(insn 6 27 8 2 (set (reg:DI 14 a4 [orig:82 MEM[(int *)array_5(D) + 800B] ]
[82])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
(insn 8 6 10 2 (set (reg:DI 15 a5 [orig:84 MEM[(int *)array_5(D) + 804B] ]
[84])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
(insn 10 8 12 2 (set (reg:SI 15 a5 [80])
        (plus:SI (reg:SI 15 a5 [orig:84 MEM[(int *)array_5(D) + 804B] ] [84])
            (reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 800B] ] [82])))
"array_test.c":7:5 3 {addsi3}
     (nil))
(insn 12 10 14 2 (set (reg:DI 14 a4 [orig:87 MEM[(int *)array_5(D) + 808B] ]
[87])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
     (nil))
(insn 14 12 16 2 (set (reg:SI 15 a5 [85])
        (plus:SI (reg:SI 15 a5 [80])
            (reg:SI 14 a4 [orig:87 MEM[(int *)array_5(D) + 808B] ] [87])))
"array_test.c":8:5 3 {addsi3}
     (nil))
(insn 16 14 18 2 (set (reg:DI 10 a0 [orig:90 MEM[(int *)array_5(D) + 812B] ]
[90])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
                    (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
     (nil))
......
----------------------------------------------------------
In other words this was before reload_cse_regs(), before reload_combine(),
before reload_combine_recognize_const_pattern()....

------------------------------------------------------------------
In postreload.c at around line 939, comments says:

/* We look for a REG1 = REG2 + CONSTANT insn, followed by either
     uses of REG1 inside an address, or inside another add insn.  If
     possible and profitable, merge the addition into subsequent
     uses.  */

Which really looks like the function we're looking for, but I've debug each
insn from button to top, maybe I've missed something in between?

I'll investigate other places where address_cost() are called.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (42 preceding siblings ...)
  2020-11-23  6:17 ` admin at levyhsu dot com
@ 2020-11-23  6:38 ` admin at levyhsu dot com
  2020-11-23  7:43 ` admin at levyhsu dot com
                   ` (17 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-23  6:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #44 from Levy <admin at levyhsu dot com> ---
should_replace_address() in fwprop.c looks really interesting:

/* OLD is a memory address.  Return whether it is good to use NEW instead,
   for a memory access in the given MODE.  */

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (43 preceding siblings ...)
  2020-11-23  6:38 ` admin at levyhsu dot com
@ 2020-11-23  7:43 ` admin at levyhsu dot com
  2020-12-01  3:03 ` admin at levyhsu dot com
                   ` (16 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-11-23  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #45 from Levy <admin at levyhsu dot com> ---
Basically crossed out the rtlanal.c and fwprop.c
I'm looking back at riscv.c. Since address_cost() was called by hook function
new_address_profitable_p(), may be some place uses this function would provide
more info

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (44 preceding siblings ...)
  2020-11-23  7:43 ` admin at levyhsu dot com
@ 2020-12-01  3:03 ` admin at levyhsu dot com
  2020-12-08  9:22 ` admin at levyhsu dot com
                   ` (15 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-12-01  3:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #46 from Levy <admin at levyhsu dot com> ---
Looking at gcc/passed.def and gcc/config/riscv-passes.def:

pass_shorten_memrefs is inserted after NEXT_PASS (pass_rtl_store_motion);

  NEXT_PASS (pass_rtl_store_motion);
(pass_shorten_memrefs)
  NEXT_PASS (pass_cse_after_global_opts);
  NEXT_PASS (pass_rtl_ifcvt);
  NEXT_PASS (pass_reginfo_init);
  /* Perform loop optimizations.  It might be better to do them a bit
  sooner, but we want the profile feedback to work more
  efficiently.  */
  NEXT_PASS (pass_loop2);
  PUSH_INSERT_PASSES_WITHIN (pass_loop2)
NEXT_PASS (pass_rtl_loop_init);
NEXT_PASS (pass_rtl_move_loop_invariants);
NEXT_PASS (pass_rtl_unroll_loops);
NEXT_PASS (pass_rtl_doloop);
NEXT_PASS (pass_rtl_loop_done);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_lower_subreg2);
  NEXT_PASS (pass_web);
  NEXT_PASS (pass_rtl_cprop);
  NEXT_PASS (pass_cse2);
  NEXT_PASS (pass_rtl_dse1);
  NEXT_PASS (pass_rtl_fwprop_addr);
  NEXT_PASS (pass_inc_dec);
  NEXT_PASS (pass_initialize_regs);
  NEXT_PASS (pass_ud_rtl_dce);
  NEXT_PASS (pass_combine);
  NEXT_PASS (pass_if_after_combine);
  NEXT_PASS (pass_jump_after_combine);
  NEXT_PASS (pass_partition_blocks);
  NEXT_PASS (pass_outof_cfg_layout_mode);
  NEXT_PASS (pass_split_all_insns);
  NEXT_PASS (pass_lower_subreg3);
  NEXT_PASS (pass_df_initialize_no_opt);
  NEXT_PASS (pass_stack_ptr_mod);
  NEXT_PASS (pass_mode_switching);
  NEXT_PASS (pass_match_asm_constraints);
  NEXT_PASS (pass_sms);
  NEXT_PASS (pass_live_range_shrinkage);
  NEXT_PASS (pass_sched);
  NEXT_PASS (pass_early_remat);
  NEXT_PASS (pass_ira);
  NEXT_PASS (pass_reload);
  NEXT_PASS (pass_postreload);
  PUSH_INSERT_PASSES_WITHIN (pass_postreload)
NEXT_PASS (pass_postreload_cse);
NEXT_PASS (pass_gcse2);
NEXT_PASS (pass_split_after_reload);
......

After some debugging processes. it seems either:
1.The address cost info was calculated between (pass_combine) and
(pass_shorten_memrefs) for patched version, then merged in the combined pass. 
patched one is failed to be recognized as unpathed one due to Sign/Zero extend
then Subreg.
This can be verified by adding -fdisable-rtl-combine option when compile, also
the address_cost was not called for the whole time.

2.4 insn was determined(or say fixed?) before (pass_rtl_fwprop_addr), as for
patched version, I saw forward_propagate_and_simplify() was called for 4 extra
times, then pass all the way to
propagate_rtx()->propagate_rtx_1()->should_replace_address()->address_cost() in
fwprop.c

I've also tested the (pass_postreload) as mentioned by Jim and
new_address_profitable_p(). But they seem not to be the right one.

Need some time to examine and trace the pass between (pass_shorten_memrefs) and
(pass_rtl_fwprop_addr).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (45 preceding siblings ...)
  2020-12-01  3:03 ` admin at levyhsu dot com
@ 2020-12-08  9:22 ` admin at levyhsu dot com
  2020-12-14 10:43 ` admin at levyhsu dot com
                   ` (14 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-12-08  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #47 from Levy <admin at levyhsu dot com> ---
where insns are merged:
In combine.c (pass_combine)

rest_of_handle_combine()
calls:
combine_instructions()
calls:
creat_log_links() creates links of insn (768&32/36/40/44) 
for both patched and unpatched version with log_links()

Then in combine_instructions(), for_each_log_link(), try_combine() calls
combine_validate_cost()

in combine_validate_cost(), for the patched version:

OLD===================846930886===================OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (expr_list:REG_DEAD (reg:DI 96)
        (nil)))
i3 & Cost 16:
(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
Old Cost 20:



NEW===================846930886===================NEW
New_Cost: 20
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (expr_list:REG_DEAD (reg:DI 96)
        (nil)))
i3 & Cost 16:
(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
newpat:
(set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])))
newi2pat:
(set (reg/f:DI 92)
    (plus:DI (reg:DI 96)
        (const_int 768 [0x300])))
newotherpat:
(nil)
GO!---------------------------------------------------


OLD===================1681692777===================OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
i3 & Cost 16:
(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
Old Cost 20:



NEW===================1681692777===================NEW
New_Cost: 20
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
i3 & Cost 16:
(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
     (nil))
newpat:
(set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])))
newi2pat:
(set (reg/f:DI 92)
    (plus:DI (reg:DI 96)
        (const_int 768 [0x300])))
newotherpat:
(nil)
GO!---------------------------------------------------


OLD===================1714636915===================OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
i3 & Cost 16:
(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
     (nil))
Old Cost 20:



NEW===================1714636915===================NEW
New_Cost: 20
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
i3 & Cost 16:
(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
     (nil))
newpat:
(set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])))
newi2pat:
(set (reg/f:DI 92)
    (plus:DI (reg:DI 96)
        (const_int 768 [0x300])))
newotherpat:
(nil)
GO!---------------------------------------------------


OLD===================1957747793===================OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
i3 & Cost 16:
(insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
     (expr_list:REG_DEAD (reg/f:DI 92)
        (nil)))
Old Cost 20:



NEW===================1957747793===================NEW
New_Cost: 16
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
        (plus:DI (reg:DI 96)
            (const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
     (nil))
i3 & Cost 16:
(insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
        (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                    (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
     (expr_list:REG_DEAD (reg/f:DI 92)
        (nil)))
newpat:
(set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4
A32])))
newi2pat:
(nil)
newotherpat:
(nil)
GO!---------------------------------------------------

where the go stands for !reject (replacement happens) and newpat is the new
pattern.

And above merge is not observed in the unpatched version.

  One naive fix to verify this would be changing:
int reject = old_cost > 0 && new_cost > old_cost;
  to:
int reject = old_cost > 0 && new_cost >= old_cost;
  since both unmerged and merged results cost 20, but it would surely cause
side effect.

As for why the unpatched version's insns are missed here, I think it would be
better to look back at try_combine() or before.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (46 preceding siblings ...)
  2020-12-08  9:22 ` admin at levyhsu dot com
@ 2020-12-14 10:43 ` admin at levyhsu dot com
  2020-12-15  9:55 ` admin at levyhsu dot com
                   ` (13 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-12-14 10:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #48 from Levy <admin at levyhsu dot com> ---
Created attachment 49757
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49757&action=edit
Initial V1 patch on combine.c

Three patches together: 


               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected
case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |    0 /     0 |    0 /     0 |      - |

I'll merge all 3 patches together and fix all the debug/coding style/efficiency
/whatever problem with explanations later this week.

Looks likes it's fixed from my side.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (47 preceding siblings ...)
  2020-12-14 10:43 ` admin at levyhsu dot com
@ 2020-12-15  9:55 ` admin at levyhsu dot com
  2020-12-16  2:40 ` wilson at gcc dot gnu.org
                   ` (12 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-12-15  9:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy <admin at levyhsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49543|0                           |1
        is obsolete|                            |
  Attachment #49575|0                           |1
        is obsolete|                            |
  Attachment #49757|0                           |1
        is obsolete|                            |

--- Comment #49 from Levy <admin at levyhsu dot com> ---
Created attachment 49767
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49767&action=edit
Auto-extend Patch

Combined all three patches.

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected
case
                            |          gcc |          g++ |     gfortran |
 rv64imafdc/  lp64d/ medlow |    0 /     0 |    0 /     0 |      - |

(May require some work on coding style)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (48 preceding siblings ...)
  2020-12-15  9:55 ` admin at levyhsu dot com
@ 2020-12-16  2:40 ` wilson at gcc dot gnu.org
  2020-12-16  2:42 ` wilson at gcc dot gnu.org
                   ` (11 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-12-16  2:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #50 from Jim Wilson <wilson at gcc dot gnu.org> ---
The combine change is inconvenient.  We can't do that in stage3, and it means
we need to make sure that this doesn't break other targets.

If the combine change is a good idea, then I think you can just modify
is_just_move rather than add a new function.  Just skip over a ZERO_EXTEND or
SIGN_EXTEND before the the general_operand check.  We might need a mode check
against UNITS_PER_WORD since extending past the word size is not necessarily a
simple move.

So the problem stems from the code in combine that is willing to do a 2->2
split if neither original instruction is a simple move.  When we add a
SIGN_EXTEND or ZERO_EXTEND they aren't considered simple moves anymore.

There is still the question of why the instruction cost allows the change. 
First I noticed that riscv_address_cost has a hook to check for shorten_memrefs
but that riscv_rtx_costs isn't calling it.  It uses riscv_address_insns
instead.  So it seems like adding a shorten_memref check to the MEM case
riscv_rtx_costs might solve the problem.  But riscv_compressed_lw_address_p has
a !reload_completed check, and combine runs before reload, so that returns the
same result for both the new and old insns.  I think that reload_completed
check isn't quite right.  If we have a pseudo-reg, then we should allow that,
but if we have an offset that is clearly not compressible, then we should
reject it before reload.  So I think the reload_completed check should be moved
down to where it checks for a compressible register.  With those two changes, I
can make the testcase work without a combine change.  Though since I changed
how shorten_memrefs works we need a check to make sure this patch doens't
change code size.  I haven't tried to do that yet.

With my changes, in the combine dump, I see

Successfully matched this instruction:
(set (reg/f:DI 92)
    (plus:DI (reg:DI 96)
        (const_int 768 [0x300])))
Successfully matched this instruction:
(set (reg:DI 82 [ MEM[(intD.1 *)array_5(D) + 800B] ])
    (sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
                (const_int 800 [0x320])) [1 MEM[(intD.1 *)array_5(D) + 800B]+0
S4 A32])))
rejecting combination of insns 27 and 6
original costs 4 + 16 = 20
replacement costs 4 + 20 = 24

so now the replacement gets rejected because of the increased address cost.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (49 preceding siblings ...)
  2020-12-16  2:40 ` wilson at gcc dot gnu.org
@ 2020-12-16  2:42 ` wilson at gcc dot gnu.org
  2020-12-17 18:13 ` wilson at gcc dot gnu.org
                   ` (10 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-12-16  2:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #51 from Jim Wilson <wilson at gcc dot gnu.org> ---
Created attachment 49773
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49773&action=edit
untested fix to use instead of levy's combine.c patch

Needs testing without Levy's patch to make sure it doesn't accidentally
increase code size.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (50 preceding siblings ...)
  2020-12-16  2:42 ` wilson at gcc dot gnu.org
@ 2020-12-17 18:13 ` wilson at gcc dot gnu.org
  2020-12-17 18:26 ` jiawei at iscas dot ac.cn
                   ` (9 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2020-12-17 18:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #52 from Jim Wilson <wilson at gcc dot gnu.org> ---
I did some simple testing of my patch alone.  I modified the
riscv-gnu-toolchain makefile to use -Os to build all libraries, built a
rv32imac/ilp32 toolchain, and looked at library code size.  I see a few cases
where my patch gives smaller code, and no obvious cases where it gives larger
code, so I think it is OK.

I haven't tried a full test with my patch combined with Levy's patch minus the
combine change.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (51 preceding siblings ...)
  2020-12-17 18:13 ` wilson at gcc dot gnu.org
@ 2020-12-17 18:26 ` jiawei at iscas dot ac.cn
  2020-12-21 15:08 ` jiawei at iscas dot ac.cn
                   ` (8 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: jiawei at iscas dot ac.cn @ 2020-12-17 18:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #53 from jiawei <jiawei at iscas dot ac.cn> ---
   Hi Jim,

   Levy had asked me to help about the test. I was resigned on EEMBC and
   waiting acess for more benchmarks. Now I am testing on csibe and
   coremax-pro. I think will lineout the result in this week.If it need
   test on more set. Just mail me about it.

   Best wishes,
   Jiawei


   在 2020年12月18日 上午2:13,"wilson at gcc dot gnu.org"
   <gcc-bugzilla@gcc.gnu.org>写道:

     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

     --- Comment #52 from Jim Wilson <wilson at gcc dot gnu.org[1]> ---
     I did some simple testing of my patch alone. I modified the
     riscv-gnu-toolchain makefile to use -Os to build all libraries,
     built a
     rv32imac/ilp32 toolchain, and looked at library code size. I see a
     few cases
     where my patch gives smaller code, and no obvious cases where it
     gives larger
     code, so I think it is OK.

     I haven't tried a full test with my patch combined with Levy's
     patch minus the
     combine change.

     --
     You are receiving this mail because:
     You are on the CC list for the bug.



   1. http://gnu.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (52 preceding siblings ...)
  2020-12-17 18:26 ` jiawei at iscas dot ac.cn
@ 2020-12-21 15:08 ` jiawei at iscas dot ac.cn
  2020-12-21 15:38 ` kito at gcc dot gnu.org
                   ` (7 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: jiawei at iscas dot ac.cn @ 2020-12-21 15:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #54 from jiawei <jiawei at iscas dot ac.cn> ---
Hi Jim.

I had finished the test on the benchmark Coremark-pro.And it shows that the
patch doesn't accidentally increase code size.

This test with the args "XCMD='-c4' certify-all", and the result shows follow:

WORKLOAD RESULTS TABLE(origin gcc-10.2.0 from upstream compiled with -mabi=lp64
-march=rv64imafdc)

                                                 MultiCore SingleCore           
Workload Name                                     (iter/s)   (iter/s)   
Scaling
----------------------------------------------- ---------- ----------
----------
cjpeg-rose7-preset                                  119.05      46.08      
2.58
core                                                  1.10       0.28      
3.93
linear_alg-mid-100x100-sp                            35.19       8.92      
3.95
loops-all-mid-10k-sp                                  1.77       0.46      
3.85
nnet_test                                             1.69       0.51      
3.31
parser-125k                                          43.01      15.62      
2.75
radix2-big-64k                                      257.93      56.84      
4.54
sha-test                                            156.25      57.80      
2.70
zip-test                                             81.63      26.32      
3.10

MARK RESULTS TABLE

Mark Name                                        MultiCore SingleCore   
Scaling
----------------------------------------------- ---------- ----------
----------
CoreMark-PRO                                       2669.64     796.33      
3.35



WORKLOAD RESULTS TABLE(Add patch one -- Auto-extend Patch)

                                                 MultiCore SingleCore           
Workload Name                                     (iter/s)   (iter/s)   
Scaling
----------------------------------------------- ---------- ----------
----------
cjpeg-rose7-preset                                  131.58      45.87      
2.87
core                                                  1.08       0.28      
3.86
linear_alg-mid-100x100-sp                            35.49       8.75      
4.06
loops-all-mid-10k-sp                                  1.75       0.46      
3.80
nnet_test                                             1.68       0.51      
3.29
parser-125k                                          54.05      15.62      
3.46
radix2-big-64k                                      257.80      65.57      
3.93
sha-test                                            153.85      57.47      
2.68
zip-test                                             76.92      26.32      
2.92

MARK RESULTS TABLE

Mark Name                                        MultiCore SingleCore   
Scaling
----------------------------------------------- ---------- ----------
----------
CoreMark-PRO                                       2737.52     806.42      
3.39



WORKLOAD RESULTS TABLE(Add all patches -- Auto-extend Patch & untested fix to
use instead of levy's combine.c patch)

                                                 MultiCore SingleCore           
Workload Name                                     (iter/s)   (iter/s)   
Scaling
----------------------------------------------- ---------- ----------
----------
cjpeg-rose7-preset                                  120.48      45.66      
2.64
core                                                  1.08       0.28      
3.86
linear_alg-mid-100x100-sp                            35.61       8.77      
4.06
loops-all-mid-10k-sp                                  1.76       0.46      
3.83
nnet_test                                             1.68       0.51      
3.29
parser-125k                                          46.51      15.62      
2.98
radix2-big-64k                                      257.33      65.18      
3.95
sha-test                                            153.85      57.47      
2.68
zip-test                                             83.33      26.32      
3.17

MARK RESULTS TABLE

Mark Name                                        MultiCore SingleCore   
Scaling
----------------------------------------------- ---------- ----------
----------
CoreMark-PRO                                       2691.95     805.68      
3.34

The csibe test is still modifying,and will test after it works on
riscv-gnu-toolchain.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (53 preceding siblings ...)
  2020-12-21 15:08 ` jiawei at iscas dot ac.cn
@ 2020-12-21 15:38 ` kito at gcc dot gnu.org
  2020-12-21 16:09 ` jiawei at iscas dot ac.cn
                   ` (6 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: kito at gcc dot gnu.org @ 2020-12-21 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #55 from Kito Cheng <kito at gcc dot gnu.org> ---
Hi jiawei:

Thanks for the data, the performance changing for coremark-pro seems
interesting, could you find which part generate different code after the patch?

And I am curious what the platform you used for the performance data?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (54 preceding siblings ...)
  2020-12-21 15:38 ` kito at gcc dot gnu.org
@ 2020-12-21 16:09 ` jiawei at iscas dot ac.cn
  2020-12-22  6:35 ` admin at levyhsu dot com
                   ` (5 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: jiawei at iscas dot ac.cn @ 2020-12-21 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #56 from jiawei <jiawei at iscas dot ac.cn> ---
   Hi Kito,

   I test the performance data on qemu-riscv64, and compile the benchmark
   with riscv-unknown-linux-gnu-gcc -Os.
   All the modify is set in /coremark-pro/util/make/
   to change the toolchain and run env.

   I am also insterested about which part has changed due to the patch. I
   will try to find out the different.在 2020年12月21日 下午11:38,"kito at gcc
   dot gnu.org" <gcc-bugzilla@gcc.gnu.org>写道:

     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

     --- Comment #55 from Kito Cheng <kito at gcc dot gnu.org[1]> ---
     Hi jiawei:

     Thanks for the data, the performance changing for coremark-pro
     seems
     interesting, could you find which part generate different code
     after the patch?

     And I am curious what the platform you used for the performance
     data?

     --
     You are receiving this mail because:
     You are on the CC list for the bug.



   1. http://gnu.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (55 preceding siblings ...)
  2020-12-21 16:09 ` jiawei at iscas dot ac.cn
@ 2020-12-22  6:35 ` admin at levyhsu dot com
  2020-12-25  3:31 ` kito at gcc dot gnu.org
                   ` (4 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: admin at levyhsu dot com @ 2020-12-22  6:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #57 from Levy <admin at levyhsu dot com> ---
Thank you JiaWei for the CoreMark-Pro result.

Personally, I agree with Jim, since changing the split behaviour of try_combine
in the combine.c could affect other platforms, theoretically, we can fix it
with platform flag and UNITS_PER_WORD check or maybe Just skip over a
ZERO_EXTEND or SIGN_EXTEND before the general_operand check, but that seems
inconvenient.

Probably need more testing on all patches to see the differences in code size &
speed. Maybe after EEMBC results come out then decide what to proceed next.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (56 preceding siblings ...)
  2020-12-22  6:35 ` admin at levyhsu dot com
@ 2020-12-25  3:31 ` kito at gcc dot gnu.org
  2020-12-25  9:09 ` jiawei at iscas dot ac.cn
                   ` (3 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: kito at gcc dot gnu.org @ 2020-12-25  3:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #58 from Kito Cheng <kito at gcc dot gnu.org> ---
Hi jiawei:

I would suggest you just using inst count rather than cycle or time for
measuring benchmark if you using qemu, since qemu is functional simulator not
cycle accurate neither nearly-cycle accurate simulator, so the performance
number is coming from your native host (x86_64) cpu, and it also might 
sensitive on your host loading. or maybe you could try gem5 for that?

Thanks your helping benchmark that :)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (57 preceding siblings ...)
  2020-12-25  3:31 ` kito at gcc dot gnu.org
@ 2020-12-25  9:09 ` jiawei at iscas dot ac.cn
  2021-02-13 20:24 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  61 siblings, 0 replies; 63+ messages in thread
From: jiawei at iscas dot ac.cn @ 2020-12-25  9:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #59 from jiawei <jiawei at iscas dot ac.cn> ---
   Hi Kito,

   Okay, I will retest the benchmark on gem5.



   发自我的小米手机在 "kito at gcc dot gnu.org"
   <gcc-bugzilla@gcc.gnu.org>,2020年12月25日 上午11:31写道:

     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

     --- Comment #58 from Kito Cheng <kito at gcc dot gnu.org[1]> ---
     Hi jiawei:

     I would suggest you just using inst count rather than cycle or
     time for
     measuring benchmark if you using qemu, since qemu is functional
     simulator not
     cycle accurate neither nearly-cycle accurate simulator, so the
     performance
     number is coming from your native host (x86_64) cpu, and it also
     might
     sensitive on your host loading. or maybe you could try gem5 for
     that?

     Thanks your helping benchmark that :)

     --
     You are receiving this mail because:
     You are on the CC list for the bug.



   1. http://gnu.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (58 preceding siblings ...)
  2020-12-25  9:09 ` jiawei at iscas dot ac.cn
@ 2021-02-13 20:24 ` cvs-commit at gcc dot gnu.org
  2021-02-13 20:37 ` cvs-commit at gcc dot gnu.org
  2021-02-13 20:48 ` wilson at gcc dot gnu.org
  61 siblings, 0 replies; 63+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-13 20:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #60 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jim Wilson <wilson@gcc.gnu.org>:

https://gcc.gnu.org/g:a4953810bac524e19126a2745c75fed58db962c2

commit r11-7236-ga4953810bac524e19126a2745c75fed58db962c2
Author: Jim Wilson <jimw@sifive.com>
Date:   Sat Feb 13 12:13:08 2021 -0800

    RISC-V: Shorten memrefs improvement, partial fix 97417.

    We already have a check for riscv_shorten_memrefs in riscv_address_cost.
    This adds the same check to riscv_rtx_costs.  Making this work also
    requires a change to riscv_compressed_lw_address_p to work before reload
    by checking the offset and assuming any pseudo reg is OK.  Testing shows
    that this consistently gives small code size reductions.

            gcc/
            PR target/97417
            * config/riscv/riscv.c (riscv_compressed_lw_address_p): Drop early
            exit when !reload_completed.  Only perform check for compressed reg
            if reload_completed.
            (riscv_rtx_costs): In MEM case, when optimizing for size and
            shorten memrefs, if not compressible, then increase cost.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (59 preceding siblings ...)
  2021-02-13 20:24 ` cvs-commit at gcc dot gnu.org
@ 2021-02-13 20:37 ` cvs-commit at gcc dot gnu.org
  2021-02-13 20:48 ` wilson at gcc dot gnu.org
  61 siblings, 0 replies; 63+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-13 20:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #61 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jim Wilson <wilson@gcc.gnu.org>:

https://gcc.gnu.org/g:18fabc35f47f0abf4ec14d147098ec4e0734d2a3

commit r11-7237-g18fabc35f47f0abf4ec14d147098ec4e0734d2a3
Author: Levy Hsu <admin@levyhsu.com>
Date:   Sat Feb 13 12:26:33 2021 -0800

    RISC-V: Avoid zero/sign extend for volatile loads.  Fix for 97417.

    This expands sub-word loads as a zero/sign extended load, followed by
    a subreg.  This helps eliminate unnecessary zero/sign extend insns after
    the load, particularly for volatiles, but also in some other cases.
    Testing shows that it gives consistent code size decreases.

    Tested with riscv32-elf rv32imac/ilp32 and riscv64-linux rv64gc/lp064d
    builds and checks.  Some -gsplit-stack tests fail with the patch, but
    this turns out to be an existing bug with the split-stack support that
    I hadn't noticed before.  It isn't a bug in this patch.  Ignoring that
    there are no regressions.

    Committed.

            gcc/
            PR target/97417
            * config/riscv/riscv-shorten-memrefs.c (pass_shorten_memrefs): Add
            extend parameter to get_si_mem_base_reg declaration.
            (get_si_mem_base_reg): Add extend parameter.  Set it.
            (analyze): Pass extend arg to get_si_mem_base_reg.
            (transform): Likewise.  Use it when rewriting mems.
            * config/riscv/riscv.c (riscv_legitimize_move): Check for subword
            loads and emit sign/zero extending load followed by subreg move.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool
  2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
                   ` (60 preceding siblings ...)
  2021-02-13 20:37 ` cvs-commit at gcc dot gnu.org
@ 2021-02-13 20:48 ` wilson at gcc dot gnu.org
  61 siblings, 0 replies; 63+ messages in thread
From: wilson at gcc dot gnu.org @ 2021-02-13 20:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #62 from Jim Wilson <wilson at gcc dot gnu.org> ---
I committed my shorten memrefs patch and Levy's patch minus the combine change.
 I don't believe that we need the combine change.

I did notice one interesting case where we get unnecesssary zero extends.  I
will submit that as a new bug report.

I also noticed with riscv64-linux testing that some -gsplit-dwarf tests fail,
but this turns out to be a latent bug in the split-dwarf support.  I will
submit that as a new bug report.

I believe this is fixed, so closing as resolved.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2021-02-13 20:48 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 11:56 [Bug other/97417] New: RISC-V Unnecessary andi instruction when loading volatile bool kjetilos at gmail dot com
2020-10-15 20:34 ` [Bug other/97417] " wilson at gcc dot gnu.org
2020-10-21  3:36 ` jiawei at iscas dot ac.cn
2020-10-21  5:58 ` wilson at gcc dot gnu.org
2020-10-27 11:18 ` jiawei at iscas dot ac.cn
2020-10-27 15:25 ` wilson at gcc dot gnu.org
2020-10-29  5:27 ` admin at levyhsu dot com
2020-10-29 22:49 ` wilson at gcc dot gnu.org
2020-10-30  8:35 ` admin at levyhsu dot com
2020-10-30 12:46 ` admin at levyhsu dot com
2020-11-04  6:10 ` admin at levyhsu dot com
2020-11-04  6:35 ` kito at gcc dot gnu.org
2020-11-04  7:03 ` admin at levyhsu dot com
2020-11-05 22:57 ` wilson at gcc dot gnu.org
2020-11-06  1:20 ` wilson at gcc dot gnu.org
2020-11-06  2:40 ` wilson at gcc dot gnu.org
2020-11-06  2:44 ` kito at gcc dot gnu.org
2020-11-06  3:35 ` wilson at gcc dot gnu.org
2020-11-06  9:46 ` admin at levyhsu dot com
2020-11-06 11:38 ` admin at levyhsu dot com
2020-11-06 20:44 ` wilson at gcc dot gnu.org
2020-11-06 21:08 ` wilson at gcc dot gnu.org
2020-11-09  8:35 ` admin at levyhsu dot com
2020-11-09  9:22 ` admin at levyhsu dot com
2020-11-10  5:20 ` admin at levyhsu dot com
2020-11-10  5:29 ` kito at gcc dot gnu.org
2020-11-10  5:34 ` admin at levyhsu dot com
2020-11-10  5:36 ` admin at levyhsu dot com
2020-11-10  6:01 ` admin at levyhsu dot com
2020-11-10 10:47 ` admin at levyhsu dot com
2020-11-11  1:21 ` wilson at gcc dot gnu.org
2020-11-11  5:43 ` admin at levyhsu dot com
2020-11-11  6:43 ` admin at levyhsu dot com
2020-11-11 19:35 ` wilson at gcc dot gnu.org
2020-11-12  1:26 ` admin at levyhsu dot com
2020-11-13  0:00 ` wilson at gcc dot gnu.org
2020-11-16  1:17 ` admin at levyhsu dot com
2020-11-16  3:24 ` kito at gcc dot gnu.org
2020-11-17 10:19 ` admin at levyhsu dot com
2020-11-18  6:09 ` admin at levyhsu dot com
2020-11-18 18:31 ` [Bug target/97417] " wilson at gcc dot gnu.org
2020-11-20  2:41 ` admin at levyhsu dot com
2020-11-20  3:32 ` wilson at gcc dot gnu.org
2020-11-23  6:17 ` admin at levyhsu dot com
2020-11-23  6:38 ` admin at levyhsu dot com
2020-11-23  7:43 ` admin at levyhsu dot com
2020-12-01  3:03 ` admin at levyhsu dot com
2020-12-08  9:22 ` admin at levyhsu dot com
2020-12-14 10:43 ` admin at levyhsu dot com
2020-12-15  9:55 ` admin at levyhsu dot com
2020-12-16  2:40 ` wilson at gcc dot gnu.org
2020-12-16  2:42 ` wilson at gcc dot gnu.org
2020-12-17 18:13 ` wilson at gcc dot gnu.org
2020-12-17 18:26 ` jiawei at iscas dot ac.cn
2020-12-21 15:08 ` jiawei at iscas dot ac.cn
2020-12-21 15:38 ` kito at gcc dot gnu.org
2020-12-21 16:09 ` jiawei at iscas dot ac.cn
2020-12-22  6:35 ` admin at levyhsu dot com
2020-12-25  3:31 ` kito at gcc dot gnu.org
2020-12-25  9:09 ` jiawei at iscas dot ac.cn
2021-02-13 20:24 ` cvs-commit at gcc dot gnu.org
2021-02-13 20:37 ` cvs-commit at gcc dot gnu.org
2021-02-13 20:48 ` wilson at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).