* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
@ 2011-06-01 20:42 ` oleg.endo@t-online.de
2011-06-12 23:12 ` kkojima at gcc dot gnu.org
` (43 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-06-01 20:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #1 from Oleg Endo <oleg.endo@t-online.de> 2011-06-01 20:42:00 UTC ---
Created attachment 24412
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24412
Proposed Patch
Although the patch gets the job done, programmer's sense tells me it is fishy,
or at least pretty much brute forced cure of the symptoms, not the cause. It's
my first GCC patch, so any feedback is highly appreciated.
What I did was looking at the RTL, in particular the combine pass, identifying
patterns it failed to find a "shortcut" (tst insn) for and adding them to the
insn descriptions.
I also had to expand the costs calculation of the AND instruction to cover AND,
OR and XOR (on SH they are the same anyways), or else the cost of a matched
replacement insn would result in a rejection in the combine pass.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
2011-06-01 20:42 ` [Bug target/49263] " oleg.endo@t-online.de
@ 2011-06-12 23:12 ` kkojima at gcc dot gnu.org
2011-06-19 16:42 ` oleg.endo@t-online.de
` (42 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-06-12 23:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
Kazumoto Kojima <kkojima at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |sh*-*-*
--- Comment #2 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-06-12 23:11:19 UTC ---
It looks that playing with the internal behavior of the combine
pass is a bit fragile. Perhaps an alternative way is to define
a peephole for tst #imm8,r0, something like:
;; A peephole for the TST immediate instruction.
(define_peephole2
[(set (match_operand:SI 2 "arith_reg_operand" "r")
(and:SI (match_operand:SI 0 "arith_reg_operand" "z")
(match_operand:SI 1 "const_int_operand" "i")))
(set (reg:SI T_REG) (eq:SI (match_dup 2) (const_int 0)))]
"TARGET_SH1
&& peep2_reg_dead_p (2, operands[2])
&& (satisfies_constraint_I08 (operands[1])
|| satisfies_constraint_K08 (operands[1]))"
[(set (reg:SI T_REG)
(eq:SI (and:SI (match_dup 0) (match_dup 1)) (const_int 0)))]
"
{
operands[1] = GEN_INT (INTVAL (operands[1]) & 0xff);
}")
which will work at -O2 or -fpeephole2, though there are pros
and cons of peephole approach.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
2011-06-01 20:42 ` [Bug target/49263] " oleg.endo@t-online.de
2011-06-12 23:12 ` kkojima at gcc dot gnu.org
@ 2011-06-19 16:42 ` oleg.endo@t-online.de
2011-06-22 22:34 ` kkojima at gcc dot gnu.org
` (41 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-06-19 16:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #3 from Oleg Endo <oleg.endo@t-online.de> 2011-06-19 16:42:01 UTC ---
Thanks for having a look at it Kaz.
Yes, the fiddling with the combine pass is fragile. I've tried out your
peephole idea. It works in most cases but not all the time. E.g. it doesn't
catch the following example:
int test_imm (int i)
{
return (i & 3) ? 20 : 40;
}
mov #3,r1
and r1,r4
tst r4,r4
bt/s .L5
mov #40,r0
mov #20,r0
.L5:
rts
nop
Also the following...
int test_imm (int* i)
{
return (*i & 255) ? 20 : 40;
}
becomes on little endian (OK):
mov.b @r4,r1
tst r1,r1
bt/s .L10
mov #40,r0
mov #20,r0
.L10:
rts
nop
but on big endian (NG):
mov.l @r4,r1
extu.b r1,r1
tst r1,r1
bt/s .L10
mov #40,r0
mov #20,r0
.L10:
rts
nop
I'll give the thing another try.
Regarding the change to the andcosts function, the following should be better:
--- sh.c (revision 175188)
+++ sh.c (working copy)
@@ -243,7 +243,7 @@
static int flow_dependent_p (rtx, rtx);
static void flow_dependent_p_1 (rtx, const_rtx, void *);
static int shiftcosts (rtx);
-static int andcosts (rtx);
+static int and_xor_ior_costs (rtx, int code);
static int addsubcosts (rtx);
static int multcosts (rtx);
static bool unspec_caller_rtx_p (rtx);
@@ -2841,14 +2841,14 @@
return shift_insns[value];
}
-/* Return the cost of an AND operation. */
+/* Return the cost of an AND/XOR/IOR operation. */
static inline int
-andcosts (rtx x)
+and_xor_ior_costs (rtx x, int code)
{
int i;
- /* Anding with a register is a single cycle and instruction. */
+ /* register x register is a single cycle instruction. */
if (!CONST_INT_P (XEXP (x, 1)))
return 1;
@@ -2864,17 +2864,17 @@
}
/* These constants are single cycle extu.[bw] instructions. */
- if (i == 0xff || i == 0xffff)
+ if ((i == 0xff || i == 0xffff) && code == AND)
return 1;
- /* Constants that can be used in an and immediate instruction in a single
+ /* Constants that can be used in an instruction with an immediate are a
single
cycle, but this requires r0, so make it a little more expensive. */
if (CONST_OK_FOR_K08 (i))
return 2;
- /* Constants that can be loaded with a mov immediate and an and.
+ /* Constants that can be loaded with a mov immediate need one more cycle.
This case is probably unnecessary. */
if (CONST_OK_FOR_I08 (i))
return 2;
- /* Any other constants requires a 2 cycle pc-relative load plus an and.
+ /* Any other constant requires an additional 2 cycle pc-relative load.
This case is probably unnecessary. */
return 3;
}
@@ -3043,7 +3043,9 @@
return true;
case AND:
- *total = COSTS_N_INSNS (andcosts (x));
+ case XOR:
+ case IOR:
+ *total = COSTS_N_INSNS (and_xor_ior_costs (x, code));
return true;
case MULT:
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (2 preceding siblings ...)
2011-06-19 16:42 ` oleg.endo@t-online.de
@ 2011-06-22 22:34 ` kkojima at gcc dot gnu.org
2011-06-26 22:31 ` oleg.endo@t-online.de
` (40 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-06-22 22:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #4 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-06-22 22:34:04 UTC ---
Yes, that peephole doesn't catch all the patterns which could
make tst #imm8,r0 use. Perhaps it would be a good idea to get
numbers for the test like CSiBE test with the vanilla and new
insns/peepholes patched compilers. Something covers 80% of
the possible cases in the usual working set, it would be enough
successful for such a micro-optimization, I guess.
Cost patch looks fine to me. Could you propose it as a separate
patch on gcc-patches list with an appropriate ChangeLog entry?
When proposing it, please refer how you've tested it. Also
the numbers got with the patch are highly welcome.
BTW, do you have FSF copyright assignment for your GCC work?
Although the cost patch itself is essentially several lines which
doesn't require copyright assignment, the other changes you've
proposed clearly require the paper work, I think.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (3 preceding siblings ...)
2011-06-22 22:34 ` kkojima at gcc dot gnu.org
@ 2011-06-26 22:31 ` oleg.endo@t-online.de
2011-06-27 5:15 ` kkojima at gcc dot gnu.org
` (39 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-06-26 22:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #5 from Oleg Endo <oleg.endo@t-online.de> 2011-06-26 22:30:05 UTC ---
> Yes, that peephole doesn't catch all the patterns which could
> make tst #imm8,r0 use. Perhaps it would be a good idea to get
> numbers for the test like CSiBE test with the vanilla and new
> insns/peepholes patched compilers. Something covers 80% of
> the possible cases in the usual working set, it would be enough
> successful for such a micro-optimization, I guess.
I'd also like to see some numbers on those micro-improvements. I'll have a look
at CSiBE.
Anyway, why not just add all the currently known-to-work cases? What are your
concerns regarding that? I can imagine that it is a maintenance burden to keep
all those definitions and special cases in the MD up-to-date (bit rot etc). Do
you have anything other than that in mind?
It seems that others have been trying to solve the same problem in a very
similar way: http://patchwork.ozlabs.org/patch/58832/ ;)
I've figured that the following pattern works quite well for this particular
case. It seems to catch all the bit patterns. (sh_tst_const_ok and
sh_zero_extract_val are added functions in sh.c)
(define_insn_and_split "*tstsi3"
[(set (reg:SI T_REG)
(zero_extract:SI (match_operand:SI 0 "arith_reg_operand" "")
(match_operand:SI 1 "const_int_operand")
(match_operand:SI 2 "const_int_operand")))]
"TARGET_SH1 && sh_tst_const_ok (sh_zero_extract_val (operands[1],
operands[2]))"
"#"
"&& 1"
[(const_int 0)]
"{
int tstval = sh_zero_extract_val (operands[1], operands[2]);
emit_insn (gen_tstsi (operands[0], GEN_INT (tstval)));
DONE;
}"
)
...however, the problem with that one is that the T bit is inverted afterwards,
and thus the following branch logic (or whatever) gets inverted as well. One
option could be to emit some T bit inversion insn after gen_tstsi and then
optimize it away later on with e.g. a peephole (?), but I haven't tried that
yet.
The actual "problem" is that the combine pass first sees the andsi, eq, ...
insns and correctly matches them to those in the MD. But instead of continuing
to match patterns from the MD it expands the and insn into something like
zero_extract, which in turn will never be combined back into the tst insn.
Isn't there a way to tell the combine pass not to do so, but instead first look
deeper at what is in the MD?
Regarding the peephole:
> (satisfies_constraint_I08 (operands[1])
> || satisfies_constraint_K08 (operands[1])
I guess this might generate wrong code for e.g. "if (x & -2)". When x has any
bits[31:1] set this must return true. The code after the peephole optimization
will look only at the lower 8 bits and would possibly return false for x =
0xFF000000, which is wrong. So it should be satisfies_constraint_K08 only,
shouldn't it?
>
> Cost patch looks fine to me. Could you propose it as a separate
> patch on gcc-patches list with an appropriate ChangeLog entry?
> When proposing it, please refer how you've tested it. Also
> the numbers got with the patch are highly welcome.
>
> BTW, do you have FSF copyright assignment for your GCC work?
> Although the cost patch itself is essentially several lines which
> doesn't require copyright assignment, the other changes you've
> proposed clearly require the paper work, I think.
I'll contact you directly regarding that.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (4 preceding siblings ...)
2011-06-26 22:31 ` oleg.endo@t-online.de
@ 2011-06-27 5:15 ` kkojima at gcc dot gnu.org
2011-10-09 23:35 ` oleg.endo@t-online.de
` (38 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-06-27 5:15 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #6 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-06-27 05:14:36 UTC ---
(In reply to comment #5)
> Anyway, why not just add all the currently known-to-work cases? What are your
> concerns regarding that? I can imagine that it is a maintenance burden to keep
> all those definitions and special cases in the MD up-to-date (bit rot etc). Do
> you have anything other than that in mind?
Yep, maintenance burden but I don't mean ack/nak for anything.
If it's enough fruitful, we should take that route. When it
gives 5% improvement in the usual working set like as CSiBE,
hundreds lines would be OK, but if it's ~0.5% or less, it doesn't
look worth to add many patterns for that.
> Isn't there a way to tell the combine pass not to do so, but instead first look
> deeper at what is in the MD?
I don't know how to do it cleanly.
> I guess this might generate wrong code for e.g. "if (x & -2)". When x has any
> bits[31:1] set this must return true. The code after the peephole optimization
> will look only at the lower 8 bits and would possibly return false for x =
> 0xFF000000, which is wrong. So it should be satisfies_constraint_K08 only,
> shouldn't it?
You are right. That peephole was simply 'something like this'.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (5 preceding siblings ...)
2011-06-27 5:15 ` kkojima at gcc dot gnu.org
@ 2011-10-09 23:35 ` oleg.endo@t-online.de
2011-10-10 1:32 ` kkojima at gcc dot gnu.org
` (37 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-09 23:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #7 from Oleg Endo <oleg.endo@t-online.de> 2011-10-09 23:34:45 UTC ---
(In reply to comment #6)
> Yep, maintenance burden but I don't mean ack/nak for anything.
> If it's enough fruitful, we should take that route. When it
> gives 5% improvement in the usual working set like as CSiBE,
> hundreds lines would be OK, but if it's ~0.5% or less, it doesn't
> look worth to add many patterns for that.
>
> > Isn't there a way to tell the combine pass not to do so, but instead first look
> > deeper at what is in the MD?
>
> I don't know how to do it cleanly.
>
I've tried out a couple of things and got some CSiBE numbers based on
trunk rev 179430. Unfortunately only code size comparisons, no run time
performance numbers. The tests were compiled with
-ml -m4-single -Os -mnomacsave -mpretend-cmove -mfused-madd -freg-struct-return
Option 1)
Use many (~10) patterns in the MD and some cost calculation tuning.
The last patch required some adaptation, because the combine pass
started trying to match things slightly differently. I've also
noticed that it requires a special case for one pattern on SH4A...
size of all modules: 2916390 -> 2909026 -7364 / -0.252504 %
avg difference over all modules: -409.111111 / -0.273887 %
max: compiler 22808 -> 22812 +4 / +0.017538 %
min: libpng-1.2.5 99120 -> 97804 -1316 / -1.327684 %
Option 2)
I've added another combine pass which has the make_compound_operation
function turned off. The make_compound_operation function is used to
produce zero_extract patterns. If the resulting "simplified" pattern does
not match anything in the MD, combine reverts the transformation and
proceeds with the next insn. That way, it never tries to match the
tst #imm pattern in the MD.
With this option only ~5 patterns seem to be required and a small
extension of the costs calculation.
size of all modules: 2916390 -> 2909170 -7220 / -0.247566 %
avg difference over all modules: -401.111111 / -0.254423 %
max: compiler 22808 -> 22812 +4 / +0.017538 %
min: libpng-1.2.5 99120 -> 97836 -1284 / -1.295400 %
Not so spectacular on average. It highly depends on the type of SW being
compiled, but it hits quite a lot of files in CSiBE.
Option 2 seems more robust even if it seems less effective, what do you think?
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (6 preceding siblings ...)
2011-10-09 23:35 ` oleg.endo@t-online.de
@ 2011-10-10 1:32 ` kkojima at gcc dot gnu.org
2011-10-10 23:48 ` oleg.endo@t-online.de
` (36 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-10 1:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #8 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-10 01:31:42 UTC ---
(In reply to comment #7)
> Option 2 seems more robust even if it seems less effective, what do you think?
Another combine pass to reduce size less than 0.3% on one target
would be not acceptable, I guess. ~10 new patterns would be
overkill for that result, though I'm still expecting that a few
patterns of them were dominant. Could you get numbers which pattern
was used in the former option?
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (7 preceding siblings ...)
2011-10-10 1:32 ` kkojima at gcc dot gnu.org
@ 2011-10-10 23:48 ` oleg.endo@t-online.de
2011-10-11 1:47 ` kkojima at gcc dot gnu.org
` (35 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-10 23:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #9 from Oleg Endo <oleg.endo@t-online.de> 2011-10-10 23:48:17 UTC ---
Created attachment 25461
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25461
CSiBE comparisons
(In reply to comment #8)
>
> Another combine pass to reduce size less than 0.3% on one target
> would be not acceptable, I guess.
I'm sorry, I forgot to mention that it was just a proof of concept hack
of mine, just to see whether it has any chance to work at all.
I think it would be better to change/fix the behavior of the combine pass
in this regard, so that it tries matching combined patterns without
sophisticated transformations. I will try asking on the gcc list about that.
> ~10 new patterns would be
> overkill for that result, though I'm still expecting that a few
> patterns of them were dominant.
Yep, even if it turned out to be actually only 8 patterns in total, but
still.. I haven't looked at the issue with SH4A but most likely it would add
another one or two patterns... so basically ~10 :)
> Could you get numbers which pattern
> was used in the former option?
I think it would be a bit too much checking out each individual pattern.
Instead I grouped them into what they are effectively doing.
While I was at it, I also added your peephole idea, and a top 10 listing of
the individual files.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (8 preceding siblings ...)
2011-10-10 23:48 ` oleg.endo@t-online.de
@ 2011-10-11 1:47 ` kkojima at gcc dot gnu.org
2011-10-13 22:55 ` oleg.endo@t-online.de
` (34 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-11 1:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #10 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-11 01:47:03 UTC ---
(In reply to comment #9)
> 3) only zero_extract special cases
looks to be dominant.
> I'm sorry, I forgot to mention that it was just a proof of concept hack
> of mine, just to see whether it has any chance to work at all.
> I think it would be better to change/fix the behavior of the combine pass
> in this regard, so that it tries matching combined patterns without
> sophisticated transformations. I will try asking on the gcc list about that.
I see. I also expect that the experts have some idea for
this issue.
> I think it would be a bit too much checking out each individual pattern.
I don't think that it's too much. Those numbers can be easily
collected for CSiBE. If your patterns are named, you could
simply add "-dap -save-temps" to the compiler option which is
specified when ruining CSiBE's create-config and then get
the occurrences of testsi_6, for example, with something like
grep "testsi_6" `find . -name "*.s" -print` | wc -l
after running the CSiBE size test.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (9 preceding siblings ...)
2011-10-11 1:47 ` kkojima at gcc dot gnu.org
@ 2011-10-13 22:55 ` oleg.endo@t-online.de
2011-10-14 23:06 ` kkojima at gcc dot gnu.org
` (33 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-13 22:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
Oleg Endo <oleg.endo@t-online.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #24412|0 |1
is obsolete| |
--- Comment #11 from Oleg Endo <oleg.endo@t-online.de> 2011-10-13 22:54:54 UTC ---
Created attachment 25491
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25491
Proposed patch including test case
> > 3) only zero_extract special cases
>
> looks to be dominant.
Yes. I've briefly looked through the test sources. A popular use case
for bit test instructions seem to be single bit tests, which the patch
is basically adding.
> I see. I also expect that the experts have some idea for
> this issue.
Hm .. http://gcc.gnu.org/ml/gcc/2011-10/msg00189.html
Eric pointed me to the i386 back-end. Unfortunately, what I found there is
where I originally started...
;; Combine likes to form bit extractions for some tests. Humor it.
I.e. it is also coded against the behavior of the combine pass with a bunch
of pattern variations. I guess that's the way it's supposed to be done then :T
> I don't think that it's too much. Those numbers can be easily
> collected for CSiBE. If your patterns are named, you could
> simply add "-dap -save-temps" to the compiler option which is
> specified when ruining CSiBE's create-config and then get
> the occurrences of testsi_6, for example, with something like
> grep "testsi_6" `find . -name "*.s" -print` | wc -l
> after running the CSiBE size test.
Ah, right! With the attached latest patch applied to trunk rev 179778
the numbers for
"-ml -m4-single -Os -mnomacsave -mpretend-cmove -mfused-madd
-freg-struct-return"
look something like that:
tstsi_t: 1391
tsthi_t: 4
tstqi_t: 23
tstqi_t_zero: 667
tstsi_t_and_not: 598
tstsi_t_zero_extract_eq: 70
tstsi_t_zero_extract_xor: 923
Notice that the split contributes to the tstsi_t number.
Also, the 3 patterns
tstsi_t_zero_extract_xor
tstsi_t_zero_extract_subreg_xor_little
tstsi_t_zero_extract_subreg_xor_big
are basically one and the same. On SH4A the subreg variants are required,
because tstsi_t_zero_extract_xor will never match.
I've also added a special case to sh_rtx_costs to detect at least the tstsi_t
pattern. However, the other patterns are not really covered by that and the
combine pass calculates the cost as a sum of all the operations of the pattern.
I guess the selection of the test instruction could be stimulated a bit more
by a more accurate costs calculation, but my feeling is that it won't do a lot.
Cheers,
Oleg
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (10 preceding siblings ...)
2011-10-13 22:55 ` oleg.endo@t-online.de
@ 2011-10-14 23:06 ` kkojima at gcc dot gnu.org
2011-10-15 2:33 ` kkojima at gcc dot gnu.org
` (32 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-14 23:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #12 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-14 23:06:06 UTC ---
(In reply to comment #11)
> Created attachment 25491 [details]
> Proposed patch including test case
Looks fine. A very minor style nits:
> + if (GET_CODE (XEXP (x, 0)) == AND /* tst instruction. */
This comment looks a bit bogus. A full sentence comment would
be better.
> +
> +
There are some extra empty lines. GNU/GCC coding style says
that only one empty line is needed. I know that there are
extra empty lines already, but we should not add new ones :-)
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (11 preceding siblings ...)
2011-10-14 23:06 ` kkojima at gcc dot gnu.org
@ 2011-10-15 2:33 ` kkojima at gcc dot gnu.org
2011-11-20 14:20 ` oleg.endo@t-online.de
` (31 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-15 2:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #13 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-15 02:32:56 UTC ---
Author: kkojima
Date: Sat Oct 15 02:32:53 2011
New Revision: 180020
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180020
Log:
PR target/49263
* config/sh/sh.h (ZERO_EXTRACT_ANDMASK): New macro.
* config/sh/sh.c (sh_rtx_costs): Add test instruction case.
* config/sh/sh.md (tstsi_t): Name existing insn. Make inner
and instruction commutative.
(tsthi_t, tstqi_t, tstqi_t_zero, tstsi_t_and_not,
tstsi_t_zero_extract_eq, tstsi_t_zero_extract_xor,
tstsi_t_zero_extract_subreg_xor_little,
tstsi_t_zero_extract_subreg_xor_big): New insns.
(*movsicc_t_false, *movsicc_t_true): Replace space with tab in
asm output.
(*andsi_compact): Reorder alternatives so that K08 is considered
first.
* gcc.target/sh/pr49263.c: New.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.h
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (12 preceding siblings ...)
2011-10-15 2:33 ` kkojima at gcc dot gnu.org
@ 2011-11-20 14:20 ` oleg.endo@t-online.de
2011-12-29 1:09 ` oleg.endo@t-online.de
` (30 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-11-20 14:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #14 from Oleg Endo <oleg.endo@t-online.de> 2011-11-20 13:04:39 UTC ---
With trunk rev 181517 I have observed the following problem, which happens when
compiling for -m2*, -m3*, -m4* and -Os:
The function compiled with -m2 -Os
int test_int64_lowword (long long* x)
{
return *x & 0xFFFFFFFF ? -20 : -40;
}
results in the following code:
mov #0,r2 ! 42 movsi_i/3 [length = 2]
tst r2,r2 ! 44 cmpeqsi_t/1 [length = 2]
bf.s .L4 ! 45 branch_false [length = 2]
mov.l @(4,r4),r3 ! 12 movsi_i/5 [length = 2]
tst r3,r3 ! 46 cmpeqsi_t/1 [length = 2]
.L4:
bt .L3 ! 14 branch_true [length = 2]
mov #-20,r0 ! 4 movsi_i/3 [length = 2]
rts
nop ! 52 *return_i [length = 4]
.L3:
rts ! 54 *return_i [length = 2]
mov #-40,r0 ! 5 movsi_i/3 [length = 2]
When compiled with -O1 or -O2 the high word test is completely eliminated
correctly:
mov.l @(4,r4),r1 ! 10 movsi_i/5 [length = 2]
tst r1,r1 ! 17 cmpeqsi_t/1 [length = 2]
bt .L4 ! 18 branch_true [length = 2]
mov #-20,r0 ! 4 movsi_i/3 [length = 2]
rts
nop ! 50 *return_i [length = 4]
.L4:
rts ! 52 *return_i [length = 2]
mov #-40,r0 ! 5 movsi_i/3 [length = 2]
I'm not sure whether this is actually related to this PR, but have noticed it
with the test cases of this PR. It seems only to happen for comparison-like
insns. If I remember correctly, this problem did not exist when I started
working on this PR.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (13 preceding siblings ...)
2011-11-20 14:20 ` oleg.endo@t-online.de
@ 2011-12-29 1:09 ` oleg.endo@t-online.de
2012-02-26 16:28 ` olegendo at gcc dot gnu.org
` (29 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: oleg.endo@t-online.de @ 2011-12-29 1:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #15 from Oleg Endo <oleg.endo@t-online.de> 2011-12-29 00:34:53 UTC ---
(In reply to comment #14)
> With trunk rev 181517 I have observed the following problem, which happens when
> compiling for -m2*, -m3*, -m4* and -Os:
>
This is still present as of rev 182713 and seems to be a different issue.
I've created PR51697 for it.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (14 preceding siblings ...)
2011-12-29 1:09 ` oleg.endo@t-online.de
@ 2012-02-26 16:28 ` olegendo at gcc dot gnu.org
2012-02-26 23:29 ` olegendo at gcc dot gnu.org
` (28 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-02-26 16:28 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #16 from olegendo at gcc dot gnu.org 2012-02-26 13:31:36 UTC ---
Author: olegendo
Date: Sun Feb 26 13:31:32 2012
New Revision: 184585
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184585
Log:
PR target/49263
* gcc.target/sh/pr49263.c: New.
Added:
trunk/gcc/testsuite/gcc.target/sh/pr49263.c
Modified:
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (15 preceding siblings ...)
2012-02-26 16:28 ` olegendo at gcc dot gnu.org
@ 2012-02-26 23:29 ` olegendo at gcc dot gnu.org
2012-08-27 19:52 ` olegendo at gcc dot gnu.org
` (27 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-02-26 23:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
olegendo at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
CC| |olegendo at gcc dot gnu.org
Resolution| |FIXED
--- Comment #17 from olegendo at gcc dot gnu.org 2012-02-26 23:26:28 UTC ---
I guess this one is done now.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (16 preceding siblings ...)
2012-02-26 23:29 ` olegendo at gcc dot gnu.org
@ 2012-08-27 19:52 ` olegendo at gcc dot gnu.org
2012-10-28 22:02 ` olegendo at gcc dot gnu.org
` (26 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-08-27 19:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
Oleg Endo <olegendo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Last reconfirmed| |2012-08-27
Resolution|FIXED |
AssignedTo|unassigned at gcc dot |olegendo at gcc dot gnu.org
|gnu.org |
Ever Confirmed|0 |1
--- Comment #18 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-08-27 19:51:42 UTC ---
Not quite so done, actually. The following example case does not work
properly:
void test00 (int* x, int xb)
{
if (! (xb & 128))
x[0] = 0;
}
-O2 -m4:
mov r5,r0
and #128,r0
tst r0,r0
bf .L3
mov.l r0,@r4
.L3:
rts
nop
void test01 (int* x, int xb)
{
if (! (xb & 0x55))
x[0] = 0;
}
-O2 -m4:
mov r5,r0
and #85,r0
tst r0,r0
bf .L7
mov.l r0,@r4
.L7:
rts
nop
It seems that combine is trying to look for the following patterns:
Failed to match this instruction:
(set (pc)
(if_then_else (ne (zero_extract:SI (reg:SI 5 r5 [ xb ])
(const_int 1 [0x1])
(const_int 7 [0x7]))
(const_int 0 [0]))
(label_ref:SI 15)
(pc)))
Failed to match this instruction:
(set (pc)
(if_then_else (ne (and:SI (reg:SI 5 r5 [ xb ])
(const_int 85 [0x55]))
(const_int 0 [0]))
(label_ref:SI 15)
(pc)))
Another case that could see some improvement ...
bool test04 (int* x, int xb)
{
return ! (xb & 0x55);
}
-O2 -m4 (OK):
mov r5,r0
tst #85,r0
rts
movt r0
bool test02 (int* x, int xb)
{
return ! (xb & (1 << 6));
}
-O2 -m4 (NG):
mov r5,r0
mov #-6,r1
shld r1,r0
xor #1,r0
rts
and #1,r0
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (17 preceding siblings ...)
2012-08-27 19:52 ` olegendo at gcc dot gnu.org
@ 2012-10-28 22:02 ` olegendo at gcc dot gnu.org
2012-10-31 13:47 ` olegendo at gcc dot gnu.org
` (25 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-10-28 22:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #19 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-10-28 22:01:47 UTC ---
Another thing I've noticed...
Cases such as:
mov.l r0,@r2 ! LS
mov r13,r0 ! MT
and #7,r0 ! EX
tst r0,r0 ! MT
bt/s .L8 ! BR
mov.l r0,@(16,r1)
where the result of the and op is re-used would be slightly better as:
mov.l r0,@r2 ! LS
mov r13,r0 ! MT
tst #7,r0 ! MT
and #7,r0 ! EX
bt/s .L8 ! BR
mov.l r0,@(16,r1)
because it reduces dependency on the result of the and op and thus has a higher
chance to be executed in parallel.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (18 preceding siblings ...)
2012-10-28 22:02 ` olegendo at gcc dot gnu.org
@ 2012-10-31 13:47 ` olegendo at gcc dot gnu.org
2013-12-08 13:47 ` olegendo at gcc dot gnu.org
` (24 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-10-31 13:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #20 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-10-31 13:47:07 UTC ---
(In reply to comment #19)
> Another thing I've noticed...
> Cases such as:
>
> mov.l r0,@r2 ! LS
> mov r13,r0 ! MT
> and #7,r0 ! EX
> tst r0,r0 ! MT
> bt/s .L8 ! BR
> mov.l r0,@(16,r1)
>
> where the result of the and op is re-used would be slightly better as:
>
> mov.l r0,@r2 ! LS
> mov r13,r0 ! MT
> tst #7,r0 ! MT
> and #7,r0 ! EX
> bt/s .L8 ! BR
> mov.l r0,@(16,r1)
>
> because it reduces dependency on the result of the and op and thus has a higher
> chance to be executed in parallel.
Other similar cases where hoisting the tst insn would make sense:
mov.b @(13,r2),r0
extu.b r0,r0
tst #1,r0
bt .L53
mov.b @(13,r1),r0
extu.b r0,r0
tst r0,r0
bt/s .L37
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (19 preceding siblings ...)
2012-10-31 13:47 ` olegendo at gcc dot gnu.org
@ 2013-12-08 13:47 ` olegendo at gcc dot gnu.org
2013-12-17 12:37 ` olegendo at gcc dot gnu.org
` (23 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-12-08 13:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #21 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #18)
>
> It seems that combine is trying to look for the following patterns:
>
> Failed to match this instruction:
> (set (pc)
> (if_then_else (ne (and:SI (reg:SI 5 r5 [ xb ])
> (const_int 85 [0x55]))
> (const_int 0 [0]))
> (label_ref:SI 15)
> (pc)))
Implementing such a combine pattern like ...
(define_insn_and_split "*tst_cbranch"
[(set (pc)
(if_then_else (ne (and:SI (match_operand:SI 0 "logical_operand")
(match_operand:SI 1 "const_int_operand"))
(const_int 0))
(label_ref (match_operand 2))
(pc)))
(clobber (reg:SI T_REG))]
"TARGET_SH1"
"#"
"&& 1"
[(set (reg:SI T_REG) (eq:SI (and:SI (match_dup 0) (match_dup 1))
(const_int 0)))
(set (pc) (if_then_else (eq (reg:SI T_REG) (const_int 0))
(label_ref (match_dup 2))
(pc)))])
results in code such as following code:
mov #33,r1
mov r5,r0
tst #33,r0
bf/s .L3
and r5,r1
mov.l r1,@r4
.L3:
rts
nop
which is worse.
What happens is that the sequence is expanded to RTL as follows:
(insn 7 4 8 2 (set (reg:SI 163 [ D.1856 ])
(and:SI (reg/v:SI 162 [ xb ])
(const_int 33 [0x21]))) sh_tmp.cpp:17 -1
(nil))
(insn 8 7 9 2 (set (reg:SI 147 t)
(eq:SI (reg:SI 163 [ D.1856 ])
(const_int 0 [0]))) sh_tmp.cpp:17 -1
(nil))
(jump_insn 9 8 10 2 (set (pc)
(if_then_else (eq (reg:SI 147 t)
(const_int 0 [0]))
(label_ref:SI 15)
(pc))) sh_tmp.cpp:17 301 {*cbranch_t}
(int_list:REG_BR_PROB 3900 (nil))
-> 15)
(note 10 9 11 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 11 10 12 4 (set (reg:SI 164)
(const_int 0 [0])) sh_tmp.cpp:18 -1
(nil))
(insn 12 11 15 4 (set (mem:SI (reg/v/f:SI 161 [ x ]) [2 *x_5(D)+0 S4 A32])
(reg:SI 164)) sh_tmp.cpp:18 -1
(nil))
and the cse1 pass decides that the result of the and operation can be shared
and replaces the operand in insn 12 with reg:SI 163:
(insn 12 11 15 3 (set (mem:SI (reg/v/f:SI 161 [ x ]) [2 *x_5(D)+0 S4 A32])
(reg:SI 163 [ D.1856 ])) sh_tmp.cpp:18 258 {movsi_ie}
(expr_list:REG_DEAD (reg:SI 164)
(expr_list:REG_DEAD (reg/v/f:SI 161 [ x ])
(nil))))
and insn 11 becomes dead code and is eliminated.
All of that happens long time before combine, so the tst combine patterns have
no chance to reconstruct the original code.
A sequence such as
mov r5,r0
mov #0,r1
tst #33,r0
bf .L3
mov.l r1,@r4
.L3:
rts
nop
could probably be achieved by combining insn 7 and insn 8 shortly after RTL
expansion, or even during the expansion of insn 8 (by looking at previous
already expanded insns and emitting a tst insn directly).
The idea would be to reduce dependencies on the tested register which allows
better scheduling. In addition to that, on SH4A "mov #imm8,Rn" is an MT group
instruction which has a higher probability of being executed in parallel.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (20 preceding siblings ...)
2013-12-08 13:47 ` olegendo at gcc dot gnu.org
@ 2013-12-17 12:37 ` olegendo at gcc dot gnu.org
2014-12-30 18:45 ` olegendo at gcc dot gnu.org
` (22 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-12-17 12:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #22 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #21)
> What happens is that the sequence is expanded to RTL as follows:
>
> (insn 7 4 8 2 (set (reg:SI 163 [ D.1856 ])
> (and:SI (reg/v:SI 162 [ xb ])
> (const_int 33 [0x21]))) sh_tmp.cpp:17 -1
> (nil))
> (insn 8 7 9 2 (set (reg:SI 147 t)
> (eq:SI (reg:SI 163 [ D.1856 ])
> (const_int 0 [0]))) sh_tmp.cpp:17 -1
> (nil))
> (jump_insn 9 8 10 2 (set (pc)
> (if_then_else (eq (reg:SI 147 t)
> (const_int 0 [0]))
> (label_ref:SI 15)
> (pc))) sh_tmp.cpp:17 301 {*cbranch_t}
> (int_list:REG_BR_PROB 3900 (nil))
> -> 15)
> (note 10 9 11 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 11 10 12 4 (set (reg:SI 164)
> (const_int 0 [0])) sh_tmp.cpp:18 -1
> (nil))
> (insn 12 11 15 4 (set (mem:SI (reg/v/f:SI 161 [ x ]) [2 *x_5(D)+0 S4 A32])
> (reg:SI 164)) sh_tmp.cpp:18 -1
> (nil))
>
>
> and insn 11 becomes dead code and is eliminated.
> All of that happens long time before combine, so the tst combine patterns
> have no chance to reconstruct the original code.
>
Adding an early peephole pass as described in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59533#c2 and then adding the
following peephole:
;; Peephole after initial expansion.
(define_peephole2
[(set (match_operand:SI 0 "arith_reg_dest")
(and:SI (match_operand:SI 1 "arith_reg_operand")
(match_operand:SI 2 "logical_operand")))
(set (reg:SI T_REG) (eq:SI (match_dup 0) (const_int 0)))]
"TARGET_SH1 && can_create_pseudo_p ()"
[(set (reg:SI T_REG) (eq:SI (and:SI (match_dup 1) (match_dup 2))
(const_int 0)))
(set (match_dup 0) (and:SI (match_dup 1) (match_dup 2)))])
... fixes the problem and results in more uses of the tst #imm,r0 insn
according to the CSiBE set. On the other hand there is a total code size
increase of 792 bytes on the whole set. Below are some things that get worse
in the Linux source (mm/filemap.c):
mov.b @(15,r1),r0 -> mov.b @(15,r1),r0
cmp/pz r0 tst #128,r0 // cmp/pz has less
bf .L1016 bf .L1001 // pressure on r0
mov.b @(15,r0),r0 -> mov.b @(15,r0),r0
tst #4,r0 shar r0
bf .L107 shar r0
tst #1,r0
add #16,r0 -> add #16,r0
mov.b @(15,r0),r0 mov.b @(15,r0),r0
tst #16,r0 mov #-4,r1
bf/s .L509 shad r1,r0
tst #1,r0
bf/s .L509
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (21 preceding siblings ...)
2013-12-17 12:37 ` olegendo at gcc dot gnu.org
@ 2014-12-30 18:45 ` olegendo at gcc dot gnu.org
2015-01-24 13:05 ` olegendo at gcc dot gnu.org
` (21 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-30 18:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #23 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Tue Dec 30 18:44:27 2014
New Revision: 219111
URL: https://gcc.gnu.org/viewcvs?rev=219111&root=gcc&view=rev
Log:
gcc/testsuite/
PR target/49263
* gcc.target/sh/pr49263-1.c: New.
* gcc.target/sh/pr49263-2.c: New.
Added:
trunk/gcc/testsuite/gcc.target/sh/pr49263-1.c
trunk/gcc/testsuite/gcc.target/sh/pr49263-2.c
Modified:
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (22 preceding siblings ...)
2014-12-30 18:45 ` olegendo at gcc dot gnu.org
@ 2015-01-24 13:05 ` olegendo at gcc dot gnu.org
2015-01-26 23:57 ` olegendo at gcc dot gnu.org
` (20 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-01-24 13:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #25 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Sat Jan 24 13:04:53 2015
New Revision: 220081
URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev
Log:
gcc/
PR target/49263
PR target/53987
PR target/64345
PR target/59533
PR target/52933
PR target/54236
PR target/51244
* config/sh/sh-protos.h
(sh_extending_set_of_reg::can_use_as_unextended_reg,
sh_extending_set_of_reg::use_as_unextended_reg,
sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn,
sh_is_movrt_insn, sh_insn_operands_modified_between_p,
sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr,
sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions.
(sh_treg_insns): New class.
* config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook.
(scope_counter): New class.
(sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest,
sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn,
sh_extending_set_of_reg::can_use_as_unextended_reg,
sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr,
sh_in_recog_treg_set_expr, sh_try_split_insn_simple,
sh_split_treg_set_expr): New functions.
(addsubcosts): Handle treg_set_expr.
(sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT.
(sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND.
(sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases.
(sh_insn_operands_modified_between_p): Make non-static.
* config/sh/predicates.md (zero_extend_movu_operand): Allow
simple_mem_operand in addition to displacement_mem_operand.
(zero_extend_operand): Don't allow zero_extend_movu_operand.
(treg_set_expr, treg_set_expr_not_const01,
arith_reg_or_treg_set_expr): New predicates.
* config/sh/sh.md (tstsi_t): Use arith_reg_operand and
arith_or_int_operand instead of logical_operand. Convert to
insn_and_split. Try to optimize constant operand in splitter.
(tsthi_t, tstqi_t): Fold into *tst<mode>_t. Convert to insn_and_split.
(*tstqi_t_zero): Delete.
(*tst<mode>_t_subregs): Add !sh_in_recog_treg_set_expr split condition.
(tstsi_t_and_not): Delete.
(tst<mode>_t_zero_extract_eq): Rename to *tst<mode>_t_zero_extract.
Convert to insn_and_split.
(unnamed split, tstsi_t_zero_extract_xor,
tstsi_t_zero_extract_subreg_xor_little,
tstsi_t_zero_extract_subreg_xor_big): Delete.
(*tstsi_t_shift_mask): New insn_and_split.
(cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try
to recombine with surrounding insns when splitting.
(*negtstsi): Add !sh_in_recog_treg_set_expr condition.
(cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ...
(cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4,
*cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns.
(*cbranch_div0s: Delete.
(*addc): Convert to insn_and_split. Use treg_set_expr as 3rd operand.
Try to recombine with surrounding insns when splitting. Add operand
order variants.
(*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01.
(*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb,
*addc_r_r_msb, *addc_2r_msb): Delete.
(*addc_2r_lsb): Rename to *addc_2r_t. Use treg_set_expr. Add operand
order variant.
(*addc_negreg_t): New insn_and_split.
(*subc): Convert to insn_and_split. Use treg_set_expr as 3rd operand.
Try to recombine with surrounding insns when splitting.
Add operand order variants.
(*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New
insn_and_split patterns.
(*rotcr): Use arith_reg_or_treg_set_expr. Try to recombine with
surrounding insns when splitting.
(unnamed rotcr split): Use arith_reg_or_treg_set_expr.
(*rotcl): Likewise. Add zero_extract variant.
(*ashrsi2_31): New insn_and_split.
(*negc): Convert to insn_and_split. Use treg_set_expr.
(*zero_extend<mode>si2_disp_mem): Update comment.
(movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split
condition.
(*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr. Try to recombine
with surrounding insns when splitting.
(any_treg_expr_to_reg): New insn_and_split.
(*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2,
*neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5,
*neg_zero_extract_6, *zero_extract_0, *zero_extract_1,
*zero_extract_2): New single bit zero extract patterns.
(bld_reg, *bld_regqi): Fold into bld<mode>_reg.
(*get_thread_pointersi, store_gbr, *mov<mode>_gbr_load,
*mov<mode>_gbr_load, *mov<mode>_gbr_load, *mov<mode>_gbr_load,
*movdi_gbr_load): Use arith_reg_dest instead of register_operand for
set destination.
(set_thread_pointersi, load_gbr): Use arith_reg_operand instead of
register_operand for set source.
gcc/testsuite/
PR target/49263
PR target/53987
PR target/64345
PR target/59533
PR target/52933
PR target/54236
PR target/51244
* gcc.target/sh/pr64345-1.c: New.
* gcc.target/sh/pr64345-2.c: New.
* gcc.target/sh/pr59533-1.c: New.
* gcc.target/sh/pr49263.c: Adjust matching of expected insns.
* gcc.target/sh/pr52933-2.c: Likewise.
* gcc.target/sh/pr54089-1.c: Likewise.
* gcc.target/sh/pr54236-1.c: Likewise.
* gcc.target/sh/pr51244-20-sh2a.c: Likewise.
* gcc.target/sh/pr49263-1.c: Remove xfails.
* gcc.target/sh/pr49263-2.c: Likewise.
* gcc.target/sh/pr49263-3.c: Likewise.
* gcc.target/sh/pr53987-1.c: Likewise.
* gcc.target/sh/pr52933-1.c: Adjust matching of expected insns.
(test_24, test_25, test_26, test_27, test_28, test_29, test_30): New.
* gcc.target/sh/pr51244-12.c: Adjust matching of expected insns.
(test05, test06, test07, test08, test09, test10, test11, test12): New.
* gcc.target/sh/pr54236-3.c: Adjust matching of expected insns.
(test_002, test_003, test_004, test_005, test_006, test_007, test_008,
test_009): New.
* gcc.target/sh/pr51244-4.c: Adjust matching of expected insns.
(test_02): New.
Added:
trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c
trunk/gcc/testsuite/gcc.target/sh/pr64345-1.c
trunk/gcc/testsuite/gcc.target/sh/pr64345-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr49263-1.c
trunk/gcc/testsuite/gcc.target/sh/pr49263-2.c
trunk/gcc/testsuite/gcc.target/sh/pr49263-3.c
trunk/gcc/testsuite/gcc.target/sh/pr49263.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c
trunk/gcc/testsuite/gcc.target/sh/pr52933-1.c
trunk/gcc/testsuite/gcc.target/sh/pr52933-2.c
trunk/gcc/testsuite/gcc.target/sh/pr53987-1.c
trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (23 preceding siblings ...)
2015-01-24 13:05 ` olegendo at gcc dot gnu.org
@ 2015-01-26 23:57 ` olegendo at gcc dot gnu.org
2023-05-12 11:46 ` klepikov.alex+bugs at gmail dot com
` (19 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-01-26 23:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #26 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Mon Jan 26 23:56:05 2015
New Revision: 220144
URL: https://gcc.gnu.org/viewcvs?rev=220144&root=gcc&view=rev
Log:
gcc/
PR target/49263
* config/sh/sh.c (sh_split_treg_set_expr): Invoke emit_insn before
remove_insn.
* config/sh/sh.md (tstsi_t): Don't try to optimize constant with right
shifts if it already fits into K08.
gcc/testsuite/
PR target/49263
* gcc.target/sh/pr49263-4.c: New.
Added:
trunk/gcc/testsuite/gcc.target/sh/pr49263-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (24 preceding siblings ...)
2015-01-26 23:57 ` olegendo at gcc dot gnu.org
@ 2023-05-12 11:46 ` klepikov.alex+bugs at gmail dot com
2023-05-23 12:34 ` klepikov.alex+bugs at gmail dot com
` (18 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-12 11:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
Alexander Klepikov <klepikov.alex+bugs at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |klepikov.alex+bugs at gmail dot co
| |m
--- Comment #31 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
I've found new cases for SH2 and SH2E CPUs only:
#define FLAG 0x40
unsigned int f(char v){
return (v & FLAG) == FLAG;
}
For both big and little endian translates to dynamic shift call:
-O -m2 (or -m2e)
_f:
sts.l pr,@-r15
mov.l .L3,r1
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L3:
.long ___ashiftrt_r4_6
And
#define FLAG 0x40
#define ADDR 0xFFFF0000
#define P ((unsigned char *)ADDR)
unsigned int f(void){
return (*P & FLAG) == FLAG;
}
Translates to
_f:
sts.l pr,@-r15
mov.l .L3,r1
mov.b @r1,r4
mov.l .L4,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L3:
.long -65536
.L4:
.long ___ashiftrt_r4_6
Assembler output does not depend on ADDR value, but depends on variable (or
pointer) type. When type is integer, assembler code uses 'tst' for all options
'-m4', '-m2', '-m2e':
#define FLAG 0x40
unsigned int f(unsigned int v){
return (v & FLAG) == FLAG;
}
translates to
_f:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
and
#define FLAG 0x40
#define ADDR 0xFFFF0000
#define P ((unsigned int *)ADDR)
unsigned int f(void){
return (*P & FLAG) == FLAG;
}
translates to
_f:
mov.l .L2,r1
mov.l @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L2:
.long -65536
Interesting that when '-m4' flag is specified, later GCC always translates to
code with 'tst' instruction. I played with godbolt and that's what I found. GCC
ver 4 uses dynamic shift 'shad' with '-m4' option and library call with '-m2'
or '-m2e' options. GCC 9.5 and later uses 'tst' with '-m4' and library call
with both '-m2' and '-m2e' when FLAG==0x40 and 'shll' instruction with both
'-m2' and '-m2e' when FLAG==0x80. I remind you that this is happening when char
type used only.
Maybe SH4 solution can be extended to support SH2/SH2E?
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (25 preceding siblings ...)
2023-05-12 11:46 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-23 12:34 ` klepikov.alex+bugs at gmail dot com
2023-05-23 12:35 ` klepikov.alex+bugs at gmail dot com
` (17 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-23 12:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #32 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
I'm not sure whether I should write here or open new discussion, but these
topics are related very closely. I've been writing a patch to eliminate the
generation of dynamic shift instructions 'shad' and 'shld' completely at least
for SH4 CPU. And then I get a surprising result - in all the examples I gave
earlier, library call converted to 'tst' instructions!
Here is the patch itself (I also will attach a file):
--- ../gcc-12.3.0.orig/gcc/config/sh/sh.cc 2023-05-08 15:14:39.681161695
+0300
+++ ./gcc/config/sh/sh.cc 2023-05-23 12:23:25.964375731 +0300
@@ -3061,7 +3061,7 @@
else
insn_count = ashl_lshr_seq[shift_amount_i].insn_count;
- return TARGET_DYNSHIFT && (insn_count > 1 + SH_DYNAMIC_SHIFT_COST);
+ return TARGET_DYNSHIFT && (insn_count > 1 + SH_DYNAMIC_SHIFT_COST) && !
disable_dynshift;
}
/* Assuming we have a value that has been sign-extended by at least one bit,
@@ -3812,8 +3812,10 @@
rtx wrk;
char func[18];
int value;
+ int long_shift = disable_dynshift ? 30 : 19;
+ int short_shift = disable_dynshift ? 15 : 5;
- if (TARGET_DYNSHIFT)
+ if (TARGET_DYNSHIFT && ! disable_dynshift)
{
if (!CONST_INT_P (operands[2]))
{
@@ -3851,7 +3853,7 @@
emit_insn (gen_ashrsi2_31 (operands[0], operands[1]));
return true;
}
- else if (value >= 16 && value <= 19)
+ else if (value >= 16 && value <= long_shift)
{
wrk = gen_reg_rtx (SImode);
emit_insn (gen_ashrsi2_16 (wrk, operands[1]));
@@ -3862,7 +3864,7 @@
return true;
}
/* Expand a short sequence inline, longer call a magic routine. */
- else if (value <= 5)
+ else if (value <= short_shift)
{
wrk = gen_reg_rtx (SImode);
emit_move_insn (wrk, operands[1]);
diff -ur ../gcc-12.3.0.orig/gcc/config/sh/sh.opt ./gcc/config/sh/sh.opt
--- ../gcc-12.3.0.orig/gcc/config/sh/sh.opt 2023-05-08 15:14:39.689161810
+0300
+++ ./gcc/config/sh/sh.opt 2023-05-23 10:45:36.814371159 +0300
@@ -301,3 +301,7 @@
mlra
Target Var(sh_lra_flag) Init(0) Save
Use LRA instead of reload (transitional).
+
+mdisable-dynshift
+Target Var(disable_dynshift) Init(0)
+Disable dynamic shift 'shad' and 'shld' instructions
And here are my tests:
$ cat f.c
#define ADDR 0xFFFF0000
#define P ((unsigned char *)ADDR)
#define FLAG 0x40
#define S 7
unsigned char f(char v){
return (v & FLAG) == FLAG;
}
unsigned char f_(unsigned char v){
return (v & FLAG) == FLAG;
}
unsigned char f1(void){
return (*P & FLAG) == FLAG;
}
int f_signed_rshift(int v){
return v >> S;
}
int f_signed_lshift(int v){
return v << S;
}
unsigned int f_unsigned_rshift(unsigned int v){
return v >> S;
}
unsigned int f_unsigned_lshift(unsigned int v){
return v << S;
}
$ /usr/local/sh-toolchain/bin/sh-elf-gcc -c -mrenesas -m2e -mb -O
-fno-toplevel-reorder -mdisable-dynshift -S f.c
$ cat f.s
.file "f.c"
.text
.text
.align 1
.global _f
.type _f, @function
_f:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f, .-_f
.align 1
.global _f_
.type _f_, @function
_f_:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_, .-_f_
.align 1
.global _f1
.type _f1, @function
_f1:
mov.l .L4,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L5:
.align 2
.L4:
.long -65536
.size _f1, .-_f1
.align 1
.global _f_signed_rshift
.type _f_signed_rshift, @function
_f_signed_rshift:
mov r4,r0
shar r0
shar r0
shar r0
shar r0
shar r0
shar r0
rts
shar r0
.size _f_signed_rshift, .-_f_signed_rshift
.align 1
.global _f_signed_lshift
.type _f_signed_lshift, @function
_f_signed_lshift:
mov r4,r0
shll2 r0
shll2 r0
add r0,r0
rts
shll2 r0
.size _f_signed_lshift, .-_f_signed_lshift
.align 1
.global _f_unsigned_rshift
.type _f_unsigned_rshift, @function
_f_unsigned_rshift:
mov r4,r0
shlr2 r0
shlr2 r0
shlr r0
rts
shlr2 r0
.size _f_unsigned_rshift, .-_f_unsigned_rshift
.align 1
.global _f_unsigned_lshift
.type _f_unsigned_lshift, @function
_f_unsigned_lshift:
mov r4,r0
shll2 r0
shll2 r0
add r0,r0
rts
shll2 r0
.size _f_unsigned_lshift, .-_f_unsigned_lshift
.ident "GCC: (GNU) 12.3.0"
I also compiled my project with '-m2e' and new '-mdisable-dynshift' options and
tested it in SH-2E mone on Renesas's emulator that comes with High-performance
Embedded Workshop and all unit tests run as expected.
If this patch is useful let's include it in GCC.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (26 preceding siblings ...)
2023-05-23 12:34 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-23 12:35 ` klepikov.alex+bugs at gmail dot com
2023-05-23 19:05 ` olegendo at gcc dot gnu.org
` (16 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-23 12:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #33 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
Created attachment 55142
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55142&action=edit
Disable dynamic shift instructions patch
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (27 preceding siblings ...)
2023-05-23 12:35 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-23 19:05 ` olegendo at gcc dot gnu.org
2023-05-24 11:40 ` klepikov.alex+bugs at gmail dot com
` (15 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-23 19:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #34 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #33)
> Created attachment 55142 [details]
> Disable dynamic shift instructions patch
First of all, thanks for digging into this. This issue has been a can of
worms, due to all sorts of reasons.
As you have discovered, some code patterns take the shift instruction route,
which is basically decided earlier by the various middle-end optimizations.
There have also been some changes to those parts recently, but I haven't been
watching what it does for SH.
> unsigned int f(char v){
> return (v & FLAG) == FLAG;
> }
Bit-tests of char and unsigned char should be covered by the test-suite and
should work -- at least originally. However, what might be triggering this
problem is the '== FLAG' comparison. When I was working on this issue I only
used '== 0' or '!= 0' comparison. I can imagine that your test code triggers
some other middle end optimizations and hence we get this.
Can you try to rewrite your test code to something like this?
unsigned int f(char v){
return (v & FLAG) != 0;
}
... and see if it generates the tst instruction as expected?
> I also compiled my project with '-m2e' and new '-mdisable-dynshift'
> options and tested it in SH-2E mone on Renesas's emulator that comes
> with High-performance Embedded Workshop and all unit tests run as expected.
I'm not sure what the purpose of the '-mdisable-dynshift' option would be here
though. For '-m2e' TARGET_DYNSHIFT is already 'false'. So the option seems
misnamed.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (28 preceding siblings ...)
2023-05-23 19:05 ` olegendo at gcc dot gnu.org
@ 2023-05-24 11:40 ` klepikov.alex+bugs at gmail dot com
2023-05-24 11:57 ` olegendo at gcc dot gnu.org
` (14 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-24 11:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #35 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
(In reply to Oleg Endo from comment #34)
> Bit-tests of char and unsigned char should be covered by the test-suite and
> should work -- at least originally. However, what might be triggering this
> problem is the '== FLAG' comparison. When I was working on this issue I
> only used '== 0' or '!= 0' comparison. I can imagine that your test code
> triggers some other middle end optimizations and hence we get this.
Yes, I am sure that the problem is the '== FLAG' comparison. Before I reported
that bug, I tried to bypass it and this macro does not produce shift
instructions even on GCC 4.7:
#define BIT_MASK_IS_SET_(VALUE, BITMASK)\
({int _value = VALUE & BITMASK,\
_result;\
if (_value == BITMASK){\
_result = 1;\
}\
else {\
_result = 0;\
}\
_result;})
So this is definitely the comparison.
>
> Can you try to rewrite your test code to something like this?
>
> unsigned int f(char v){
> return (v & FLAG) != 0;
> }
>
> ... and see if it generates the tst instruction as expected?
>
As I understand, you meant the following (I added new functions at the end of
file):
$ cat f.c
#define ADDR 0xFFFF0000
#define P ((unsigned char *)ADDR)
#define FLAG 0x40
#define S 7
unsigned char f_char_var(char v){
return (v & FLAG) == FLAG;
}
unsigned char f_unsigned_char_var(unsigned char v){
return (v & FLAG) == FLAG;
}
unsigned char f_symbol(void){
return (*P & FLAG) == FLAG;
}
unsigned char f_symbol_zero(void){
return (*P & FLAG) == 0;
}
unsigned char f_symbol_non_zero(void){
return (*P & FLAG) != 0;
}
Compiler flags: -c -mrenesas -m2e -mb -O -fno-toplevel-reorder
With patch disabled:
$ cat f_clean.s
.file "f.c"
.text
.text
.align 1
.global _f_char_var
.type _f_char_var, @function
_f_char_var:
sts.l pr,@-r15
mov.l .L3,r1
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L4:
.align 2
.L3:
.long ___ashiftrt_r4_6
.size _f_char_var, .-_f_char_var
.align 1
.global _f_unsigned_char_var
.type _f_unsigned_char_var, @function
_f_unsigned_char_var:
sts.l pr,@-r15
mov.l .L7,r1
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L8:
.align 2
.L7:
.long ___ashiftrt_r4_6
.size _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.global _f_symbol
.type _f_symbol, @function
_f_symbol:
sts.l pr,@-r15
mov.l .L11,r1
mov.b @r1,r4
mov.l .L12,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L13:
.align 2
.L11:
.long -65536
.L12:
.long ___ashiftrt_r4_6
.size _f_symbol, .-_f_symbol
.align 1
.global _f_symbol_zero
.type _f_symbol_zero, @function
_f_symbol_zero:
mov.l .L15,r1
mov.b @r1,r0
tst #64,r0
rts
movt r0
.L16:
.align 2
.L15:
.long -65536
.size _f_symbol_zero, .-_f_symbol_zero
.align 1
.global _f_symbol_non_zero
.type _f_symbol_non_zero, @function
_f_symbol_non_zero:
sts.l pr,@-r15
mov.l .L19,r1
mov.b @r1,r4
mov.l .L20,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L21:
.align 2
.L19:
.long -65536
.L20:
.long ___ashiftrt_r4_6
.size _f_symbol_non_zero, .-_f_symbol_non_zero
.ident "GCC: (GNU) 12.3.0"
With patch enabled:
$ cat f.s
.file "f.c"
.text
.text
.align 1
.global _f_char_var
.type _f_char_var, @function
_f_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_char_var, .-_f_char_var
.align 1
.global _f_unsigned_char_var
.type _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.global _f_symbol
.type _f_symbol, @function
_f_symbol:
mov.l .L4,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L5:
.align 2
.L4:
.long -65536
.size _f_symbol, .-_f_symbol
.align 1
.global _f_symbol_zero
.type _f_symbol_zero, @function
_f_symbol_zero:
mov.l .L7,r1
mov.b @r1,r0
tst #64,r0
rts
movt r0
.L8:
.align 2
.L7:
.long -65536
.size _f_symbol_zero, .-_f_symbol_zero
.align 1
.global _f_symbol_non_zero
.type _f_symbol_non_zero, @function
_f_symbol_non_zero:
mov.l .L10,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L11:
.align 2
.L10:
.long -65536
.size _f_symbol_non_zero, .-_f_symbol_non_zero
.ident "GCC: (GNU) 12.3.0"
> I'm not sure what the purpose of the '-mdisable-dynshift' option would be
> here though. For '-m2e' TARGET_DYNSHIFT is already 'false'. So the option
> seems misnamed.
I choose that name because I wanted to disable dynamic shift instructions for
all CPUs. I did not hope that it will affect SH-2E code in such way.
I can rewrite the patch so that it only affects CPUs that do not support
dynamic shifts and disables library call for dynamic shifts. I'll do it anyway
because I need it badly. How do you think, what name of option would be better:
'-mdisable-dynshift-libcall' or '-mhw-shift'? Or if you want, please suggest
another one. Thank you!
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (29 preceding siblings ...)
2023-05-24 11:40 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-24 11:57 ` olegendo at gcc dot gnu.org
2023-05-24 13:34 ` klepikov.alex+bugs at gmail dot com
` (13 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-24 11:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #36 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #35)
>
> As I understand, you meant the following (I added new functions at the end
> of file):
>
> $ cat f.c
> #define ADDR 0xFFFF0000
> #define P ((unsigned char *)ADDR)
> #define FLAG 0x40
> #define S 7
> ....
Yes, that's what I meant, thanks.
Can you also compile for little endian, and most of all, use -O2 optimization
level. Some optimizations are not done below -O2.
>
> I choose that name because I wanted to disable dynamic shift instructions
> for all CPUs. I did not hope that it will affect SH-2E code in such way.
>
> I can rewrite the patch so that it only affects CPUs that do not support
> dynamic shifts and disables library call for dynamic shifts. I'll do it
> anyway because I need it badly. How do you think, what name of option would
> be better: '-mdisable-dynshift-libcall' or '-mhw-shift'? Or if you want,
> please suggest another one. Thank you!
'-mdisable-dynshift-libcall' would be more appropriate for what it tries to do,
I think. Although that is a whole different issue ... but what is it going to
do for real dynamic shifts on SH2?
What kind of code is it supposed to emit for things like
unsigned int dyn_shift (unsigned int x, unsigned int y)
{
return x << (y & 31);
}
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (30 preceding siblings ...)
2023-05-24 11:57 ` olegendo at gcc dot gnu.org
@ 2023-05-24 13:34 ` klepikov.alex+bugs at gmail dot com
2023-05-24 15:00 ` olegendo at gcc dot gnu.org
` (12 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-24 13:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #37 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
> Can you also compile for little endian, and most of all, use -O2
> optimization level. Some optimizations are not done below -O2.
Here's source file, I added functions with non-constant shifts
$ cat f.c
#define ADDR 0xFFFF0000
#define P ((unsigned char *)ADDR)
#define FLAG 0x40
#define S 7
unsigned char f_char_var(char v){
return (v & FLAG) == FLAG;
}
unsigned char f_unsigned_char_var(unsigned char v){
return (v & FLAG) == FLAG;
}
unsigned char f_symbol(void){
return (*P & FLAG) == FLAG;
}
unsigned char f_symbol_zero(void){
return (*P & FLAG) == 0;
}
unsigned char f_symbol_non_zero(void){
return (*P & FLAG) != 0;
}
unsigned int dyn_lshift (unsigned int x, unsigned int y)
{
return x << (y & 31);
}
unsigned int dyn_rshift (unsigned int x, unsigned int y)
{
return x >> (y & 31);
}
unsigned int really_dyn_lshift (unsigned int x, unsigned int y)
{
return x << y;
}
unsigned int really_dyn_rshift (unsigned int x, unsigned int y)
{
return x >> y;
}
With patch disabled, -O2 -mb:
$ cat f.s
.file "f.c"
.text
.text
.align 1
.align 2
.global _f_char_var
.type _f_char_var, @function
_f_char_var:
mov.l .L4,r1
sts.l pr,@-r15
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L5:
.align 2
.L4:
.long ___ashiftrt_r4_6
.size _f_char_var, .-_f_char_var
.align 1
.align 2
.global _f_unsigned_char_var
.type _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov.l .L8,r1
sts.l pr,@-r15
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L9:
.align 2
.L8:
.long ___ashiftrt_r4_6
.size _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.align 2
.global _f_symbol
.type _f_symbol, @function
_f_symbol:
mov.l .L12,r1
sts.l pr,@-r15
mov.b @r1,r4
mov.l .L13,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L14:
.align 2
.L12:
.long -65536
.L13:
.long ___ashiftrt_r4_6
.size _f_symbol, .-_f_symbol
.align 1
.align 2
.global _f_symbol_zero
.type _f_symbol_zero, @function
_f_symbol_zero:
mov.l .L16,r1
mov.b @r1,r0
tst #64,r0
rts
movt r0
.L17:
.align 2
.L16:
.long -65536
.size _f_symbol_zero, .-_f_symbol_zero
.align 1
.align 2
.global _f_symbol_non_zero
.type _f_symbol_non_zero, @function
_f_symbol_non_zero:
mov.l .L20,r1
sts.l pr,@-r15
mov.b @r1,r4
mov.l .L21,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L22:
.align 2
.L20:
.long -65536
.L21:
.long ___ashiftrt_r4_6
.size _f_symbol_non_zero, .-_f_symbol_non_zero
.align 1
.align 2
.global _dyn_lshift
.type _dyn_lshift, @function
_dyn_lshift:
mov.l .L25,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L26:
.align 2
.L25:
.long ___ashlsi3_r0
.size _dyn_lshift, .-_dyn_lshift
.align 1
.align 2
.global _dyn_rshift
.type _dyn_rshift, @function
_dyn_rshift:
mov.l .L29,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L30:
.align 2
.L29:
.long ___lshrsi3_r0
.size _dyn_rshift, .-_dyn_rshift
.align 1
.align 2
.global _really_dyn_lshift
.type _really_dyn_lshift, @function
_really_dyn_lshift:
mov.l .L33,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L34:
.align 2
.L33:
.long ___ashlsi3_r0
.size _really_dyn_lshift, .-_really_dyn_lshift
.align 1
.align 2
.global _really_dyn_rshift
.type _really_dyn_rshift, @function
_really_dyn_rshift:
mov.l .L37,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L38:
.align 2
.L37:
.long ___lshrsi3_r0
.size _really_dyn_rshift, .-_really_dyn_rshift
.ident "GCC: (GNU) 12.3.0"
With patch disabled, -O2 -ml
$ cat f.s
.file "f.c"
.text
.little
.text
.align 1
.align 2
.global _f_char_var
.type _f_char_var, @function
_f_char_var:
mov.l .L4,r1
sts.l pr,@-r15
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L5:
.align 2
.L4:
.long ___ashiftrt_r4_6
.size _f_char_var, .-_f_char_var
.align 1
.align 2
.global _f_unsigned_char_var
.type _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov.l .L8,r1
sts.l pr,@-r15
jsr @r1
exts.b r4,r4
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L9:
.align 2
.L8:
.long ___ashiftrt_r4_6
.size _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.align 2
.global _f_symbol
.type _f_symbol, @function
_f_symbol:
mov.l .L12,r1
sts.l pr,@-r15
mov.b @r1,r4
mov.l .L13,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L14:
.align 2
.L12:
.long -65536
.L13:
.long ___ashiftrt_r4_6
.size _f_symbol, .-_f_symbol
.align 1
.align 2
.global _f_symbol_zero
.type _f_symbol_zero, @function
_f_symbol_zero:
mov.l .L16,r1
mov.b @r1,r0
tst #64,r0
rts
movt r0
.L17:
.align 2
.L16:
.long -65536
.size _f_symbol_zero, .-_f_symbol_zero
.align 1
.align 2
.global _f_symbol_non_zero
.type _f_symbol_non_zero, @function
_f_symbol_non_zero:
mov.l .L20,r1
sts.l pr,@-r15
mov.b @r1,r4
mov.l .L21,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l @r15+,pr
rts
nop
.L22:
.align 2
.L20:
.long -65536
.L21:
.long ___ashiftrt_r4_6
.size _f_symbol_non_zero, .-_f_symbol_non_zero
.align 1
.align 2
.global _dyn_lshift
.type _dyn_lshift, @function
_dyn_lshift:
mov.l .L25,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L26:
.align 2
.L25:
.long ___ashlsi3_r0
.size _dyn_lshift, .-_dyn_lshift
.align 1
.align 2
.global _dyn_rshift
.type _dyn_rshift, @function
_dyn_rshift:
mov.l .L29,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L30:
.align 2
.L29:
.long ___lshrsi3_r0
.size _dyn_rshift, .-_dyn_rshift
.align 1
.align 2
.global _really_dyn_lshift
.type _really_dyn_lshift, @function
_really_dyn_lshift:
mov.l .L33,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L34:
.align 2
.L33:
.long ___ashlsi3_r0
.size _really_dyn_lshift, .-_really_dyn_lshift
.align 1
.align 2
.global _really_dyn_rshift
.type _really_dyn_rshift, @function
_really_dyn_rshift:
mov.l .L37,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L38:
.align 2
.L37:
.long ___lshrsi3_r0
.size _really_dyn_rshift, .-_really_dyn_rshift
.ident "GCC: (GNU) 12.3.0"
With patch enabled -O2 -mb
$ cat f.s
.file "f.c"
.text
.text
.align 1
.align 2
.global _f_char_var
.type _f_char_var, @function
_f_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_char_var, .-_f_char_var
.align 1
.align 2
.global _f_unsigned_char_var
.type _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.align 2
.global _f_symbol
.type _f_symbol, @function
_f_symbol:
mov.l .L5,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L6:
.align 2
.L5:
.long -65536
.size _f_symbol, .-_f_symbol
.align 1
.align 2
.global _f_symbol_zero
.type _f_symbol_zero, @function
_f_symbol_zero:
mov.l .L8,r1
mov.b @r1,r0
tst #64,r0
rts
movt r0
.L9:
.align 2
.L8:
.long -65536
.size _f_symbol_zero, .-_f_symbol_zero
.align 1
.align 2
.global _f_symbol_non_zero
.type _f_symbol_non_zero, @function
_f_symbol_non_zero:
mov.l .L11,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L12:
.align 2
.L11:
.long -65536
.size _f_symbol_non_zero, .-_f_symbol_non_zero
.align 1
.align 2
.global _dyn_lshift
.type _dyn_lshift, @function
_dyn_lshift:
mov.l .L15,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L16:
.align 2
.L15:
.long ___ashlsi3_r0
.size _dyn_lshift, .-_dyn_lshift
.align 1
.align 2
.global _dyn_rshift
.type _dyn_rshift, @function
_dyn_rshift:
mov.l .L19,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L20:
.align 2
.L19:
.long ___lshrsi3_r0
.size _dyn_rshift, .-_dyn_rshift
.align 1
.align 2
.global _really_dyn_lshift
.type _really_dyn_lshift, @function
_really_dyn_lshift:
mov.l .L23,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L24:
.align 2
.L23:
.long ___ashlsi3_r0
.size _really_dyn_lshift, .-_really_dyn_lshift
.align 1
.align 2
.global _really_dyn_rshift
.type _really_dyn_rshift, @function
_really_dyn_rshift:
mov.l .L27,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L28:
.align 2
.L27:
.long ___lshrsi3_r0
.size _really_dyn_rshift, .-_really_dyn_rshift
.ident "GCC: (GNU) 12.3.0"
With patch enabled, -O2 -ml
$ cat f.s
.file "f.c"
.text
.little
.text
.align 1
.align 2
.global _f_char_var
.type _f_char_var, @function
_f_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_char_var, .-_f_char_var
.align 1
.align 2
.global _f_unsigned_char_var
.type _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.size _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.align 2
.global _f_symbol
.type _f_symbol, @function
_f_symbol:
mov.l .L5,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L6:
.align 2
.L5:
.long -65536
.size _f_symbol, .-_f_symbol
.align 1
.align 2
.global _f_symbol_zero
.type _f_symbol_zero, @function
_f_symbol_zero:
mov.l .L8,r1
mov.b @r1,r0
tst #64,r0
rts
movt r0
.L9:
.align 2
.L8:
.long -65536
.size _f_symbol_zero, .-_f_symbol_zero
.align 1
.align 2
.global _f_symbol_non_zero
.type _f_symbol_non_zero, @function
_f_symbol_non_zero:
mov.l .L11,r1
mov.b @r1,r0
tst #64,r0
mov #-1,r0
rts
negc r0,r0
.L12:
.align 2
.L11:
.long -65536
.size _f_symbol_non_zero, .-_f_symbol_non_zero
.align 1
.align 2
.global _dyn_lshift
.type _dyn_lshift, @function
_dyn_lshift:
mov.l .L15,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L16:
.align 2
.L15:
.long ___ashlsi3_r0
.size _dyn_lshift, .-_dyn_lshift
.align 1
.align 2
.global _dyn_rshift
.type _dyn_rshift, @function
_dyn_rshift:
mov.l .L19,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L20:
.align 2
.L19:
.long ___lshrsi3_r0
.size _dyn_rshift, .-_dyn_rshift
.align 1
.align 2
.global _really_dyn_lshift
.type _really_dyn_lshift, @function
_really_dyn_lshift:
mov.l .L23,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L24:
.align 2
.L23:
.long ___ashlsi3_r0
.size _really_dyn_lshift, .-_really_dyn_lshift
.align 1
.align 2
.global _really_dyn_rshift
.type _really_dyn_rshift, @function
_really_dyn_rshift:
mov.l .L27,r1
sts.l pr,@-r15
jsr @r1
mov r5,r0
lds.l @r15+,pr
rts
nop
.L28:
.align 2
.L27:
.long ___lshrsi3_r0
.size _really_dyn_rshift, .-_really_dyn_rshift
.ident "GCC: (GNU) 12.3.0"
> '-mdisable-dynshift-libcall' would be more appropriate for what it tries to
> do, I think. Although that is a whole different issue ... but what is it
> going to do for real dynamic shifts on SH2?
>
> What kind of code is it supposed to emit for things like
>
> unsigned int dyn_shift (unsigned int x, unsigned int y)
> {
> return x << (y & 31);
> }
As far as I understand from GCC sources, function I patched 'expand_ashiftrt'
process only constant values of shift. As you can see earlier, I added your and
other examples to tests. It looks like really dynamic shifts translate to
library calls.
Should I test more exotic situations? If so, could you please help me with
really exotic or weired examples?
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (31 preceding siblings ...)
2023-05-24 13:34 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-24 15:00 ` olegendo at gcc dot gnu.org
2023-05-25 17:53 ` klepikov.alex+bugs at gmail dot com
` (11 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-24 15:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #38 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #37)
>
> As far as I understand from GCC sources, function I patched
> 'expand_ashiftrt' process only constant values of shift. As you can see
> earlier, I added your and other examples to tests.
OK, thanks for the additional test cases. It really looks like the way the
constant shift is expanded (via ashrsi3_n insn) on SH1/SH2 is getting in the
way.
The tst insn is mainly formed by the combine pass, which relies on certain insn
patterns and combinations thereof. See also sh.md, around line 530.
You can look at the debug output with the -fdump-rtl-all option to see what's
happening in the RTL passes.
What your patch is doing is to make it not emit the ashrsi3_n insn for constant
shifts altogether? I guess it will make code that actually needs those real
shifts larger, as it will always emit the whole shift stitching sequence. That
might be a good thing or not.
> It looks like really
> dynamic shifts translate to library calls.
So the option name '-mdisable-dynshift-libcall' doesn't make sense.
What it actually does is more like '-mdisable-constshift-libcall'.
>
> Should I test more exotic situations? If so, could you please help me with
> really exotic or weired examples?
Have you had a look at the existing test cases for this in
gcc/testsuite/gcc.target/sh ?
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (32 preceding siblings ...)
2023-05-24 15:00 ` olegendo at gcc dot gnu.org
@ 2023-05-25 17:53 ` klepikov.alex+bugs at gmail dot com
2023-05-25 21:32 ` olegendo at gcc dot gnu.org
` (10 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-25 17:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #39 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
> The tst insn is mainly formed by the combine pass, which relies on certain
> insn patterns and combinations thereof. See also sh.md, around line 530.
I'm sorry, but .md lang is too complicated for me.
> You can look at the debug output with the -fdump-rtl-all option to see
> what's happening in the RTL passes.
It looks like with optimization enabled it converts bitwise AND to right shift
and then optimizes again. But SH4 has 'shad' and 'shad' can be optimized to
'tst'. And SH2E has libcall instead of dynamic shift and libcll cannot be
converted. It seems that very first optimization spoils things.
But when we have numerous 'shar' instructions, optimization joins the game
again and converts them to 'tst'.
> What your patch is doing is to make it not emit the ashrsi3_n insn for
> constant shifts altogether? I guess it will make code that actually needs
> those real shifts larger, as it will always emit the whole shift stitching
> sequence. That might be a good thing or not.
You are absolutely right, the code will be larger when we do right shifts. But
there's situations when you can't use library. For exmple, SH7055 engine
control unit can barely contain the program. The library just won't fit.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (33 preceding siblings ...)
2023-05-25 17:53 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-25 21:32 ` olegendo at gcc dot gnu.org
2023-05-26 12:03 ` klepikov.alex+bugs at gmail dot com
` (9 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-25 21:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #40 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #39)
>
> I'm sorry, but .md lang is too complicated for me.
Yeah, it looks alien at first sight. But it's where a lot of things happen
w.r.t. instruction selection.
> It looks like with optimization enabled it converts bitwise AND to right
> shift and then optimizes again. But SH4 has 'shad' and 'shad' can be
> optimized to 'tst'. And SH2E has libcall instead of dynamic shift and libcll
> cannot be converted. It seems that very first optimization spoils things.
>
> But when we have numerous 'shar' instructions, optimization joins the game
> again and converts them to 'tst'.
Yes, something like this is what I remember happening there. I'll try to look
into the issue with your test cases and see if it's possible to add some
patterns to catch those.
BTW, have you tried it on a more recent GCC? There have also been some
optimizations in the middle-end (a bit more backend independent) for this kind
of thing.
> You are absolutely right, the code will be larger when we do right shifts.
> But there's situations when you can't use library. For exmple, SH7055 engine
> control unit can barely contain the program. The library just won't fit.
Have you tried to use whole program optimization via -flto and
-ffunction-sections? It should be able to strip out all unnecessary library
functions.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (34 preceding siblings ...)
2023-05-25 21:32 ` olegendo at gcc dot gnu.org
@ 2023-05-26 12:03 ` klepikov.alex+bugs at gmail dot com
2023-05-26 17:44 ` olegendo at gcc dot gnu.org
` (8 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-26 12:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #41 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
> > It looks like with optimization enabled it converts bitwise AND to right
> > shift and then optimizes again. But SH4 has 'shad' and 'shad' can be
> > optimized to 'tst'. And SH2E has libcall instead of dynamic shift and libcll
> > cannot be converted. It seems that very first optimization spoils things.
> >
> > But when we have numerous 'shar' instructions, optimization joins the game
> > again and converts them to 'tst'.
>
> Yes, something like this is what I remember happening there. I'll try to
> look into the issue with your test cases and see if it's possible to add
> some patterns to catch those.
Thank you! I have an idea. If it's impossible to defer initial optimization,
maybe it's possible to emit some intermediate insn and catch it and optimize
later?
> BTW, have you tried it on a more recent GCC? There have also been some
> optimizations in the middle-end (a bit more backend independent) for this
> kind of thing.
I tried 13.1 about week or two ago with the same result.
> Have you tried to use whole program optimization via -flto and
> -ffunction-sections? It should be able to strip out all unnecessary library
> functions.
Thank you for advice, I'll take a try.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (35 preceding siblings ...)
2023-05-26 12:03 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-26 17:44 ` olegendo at gcc dot gnu.org
2023-05-28 10:24 ` klepikov.alex+bugs at gmail dot com
` (7 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-26 17:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #42 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #41)
>
> Thank you! I have an idea. If it's impossible to defer initial optimization,
> maybe it's possible to emit some intermediate insn and catch it and optimize
> later?
This is basically what is supposed to be happening there already.
However, it's a bit of a dilemma.
1) If we don't have a dynamic shift insn and we smash the constant shift into
individual
stitching shifts early, it might open some new optimization opportunities, e.g.
by sharing intermediate shift results. Not sure though if that actually
happens in practice though.
2) Whether to use the dynamic shift insn or emit a function call or use
stitching shifts sequence, it all has an impact on register allocation and
other instruction use. This can be problematic during the course of RTL
optimization passes.
3) Even if we have a dynamic shift, sometimes it's more beneficial to emit a
shorter stitching shift sequence. Which one is better depends on the
surrounding code. I don't think there is anything good there to make the
proper choice.
Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR 52628,
PR 58017
> > BTW, have you tried it on a more recent GCC? There have also been some
> > optimizations in the middle-end (a bit more backend independent) for this
> > kind of thing.
>
> I tried 13.1 about week or two ago with the same result.
>
Good to know. Thanks for checking it!
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (36 preceding siblings ...)
2023-05-26 17:44 ` olegendo at gcc dot gnu.org
@ 2023-05-28 10:24 ` klepikov.alex+bugs at gmail dot com
2023-05-28 10:48 ` olegendo at gcc dot gnu.org
` (6 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-28 10:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #43 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
> > Thank you! I have an idea. If it's impossible to defer initial optimization,
> > maybe it's possible to emit some intermediate insn and catch it and optimize
> > later?
>
> This is basically what is supposed to be happening there already.
Well, not really. Look what's happening during expand pass when 'ashrsi3' is
expanding. Function 'expand_ashiftrt' is called and what it does at the end -
it explicitly emits 3 insns:
wrk = gen_reg_rtx (Pmode);
//This one
emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
sprintf (func, "__ashiftrt_r4_%d", value);
rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
//This one
emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
//And this one
emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
As far as I understand these insns could be catched later by a peephole and
converted to 'tstsi_t' insn like it is done for other much simple insn
sequences.
What I'm thinkig about is to emit only one, say 'compound', insn. Which could
be then splitted later somwhere in split pass to function call, to those 3
insns.
I wrote test code that emits only one bogus insn. This insn expands to pure asm
code. Of course, that asm code is invalid, because it is impossible to place a
libcall label at the end of function with pure asm code injection. But then all
what is could be coverted to 'tst', converts to 'tst'. And all pure right
shifts converts to invalid asm code, of course.
That's why I am thinking about possibility of emitting some intermediate insn
at expand pass that will defer it real expanding. But I still don't know how to
do it right and even if it is possible.
By the way, right shift for integers expands to only one 'lshiftrt' insn and
that's why it can be catched and converted to 'tst'.
>
> However, it's a bit of a dilemma.
>
> 1) If we don't have a dynamic shift insn and we smash the constant shift
> into individual
> stitching shifts early, it might open some new optimization opportunities,
> e.g. by sharing intermediate shift results. Not sure though if that
> actually happens in practice though.
>
> 2) Whether to use the dynamic shift insn or emit a function call or use
> stitching shifts sequence, it all has an impact on register allocation and
> other instruction use. This can be problematic during the course of RTL
> optimization passes.
>
> 3) Even if we have a dynamic shift, sometimes it's more beneficial to emit a
> shorter stitching shift sequence. Which one is better depends on the
> surrounding code. I don't think there is anything good there to make the
> proper choice.
>
> Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR
> 52628, PR 58017
Thank you for your time and detailed explanations! I agree with you on all
points. Software cannot be perfect and it's OK for GCC not to be super
optimized, so this part better sholud be left intact.
By the way, I tried to link library to my project and I figured out that linker
is smart enough to link only necessary library functions even without LTO. So
increase in size is about 100 or 200 bytes, that is acceptable. Thank you very
much for help!
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (37 preceding siblings ...)
2023-05-28 10:24 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-28 10:48 ` olegendo at gcc dot gnu.org
2023-05-29 14:54 ` klepikov.alex+bugs at gmail dot com
` (5 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-28 10:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #44 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #43)
>
> Well, not really. Look what's happening during expand pass when 'ashrsi3' is
> expanding. Function 'expand_ashiftrt' is called and what it does at the end
> - it explicitly emits 3 insns:
> [...]
>
> By the way, right shift for integers expands to only one 'lshiftrt' insn and
> that's why it can be catched and converted to 'tst'.
>
Yeah, I might have dropped the ball on the right shift patterns back then and
only reworked the left shift patterns to do that.
>
> As far as I understand these insns could be catched later by a peephole and
> converted to 'tstsi_t' insn like it is done for other much simple insn
> sequences.
It's the combine RTL pass and split1 RTL pass that does most of this work here.
Peephole pass in GCC is too simplistic for this.
>
> Thank you for your time and detailed explanations! I agree with you on all
> points. Software cannot be perfect and it's OK for GCC not to be super
> optimized, so this part better sholud be left intact.
We can't have it perfect, but we can try ;)
>
> By the way, I tried to link library to my project and I figured out that
> linker is smart enough to link only necessary library functions even without
> LTO. So increase in size is about 100 or 200 bytes, that is acceptable.
> Thank you very much for help!
You're welcome.
Yes, to strip out unused library functions it doesn't need LTO. But using LTO
for embedded/MCU firmware, I find it can reduce the code size by about 20%.
For example, it can also inline small library functions in your code (if the
library was also compiled with LTO).
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (38 preceding siblings ...)
2023-05-28 10:48 ` olegendo at gcc dot gnu.org
@ 2023-05-29 14:54 ` klepikov.alex+bugs at gmail dot com
2023-05-30 1:48 ` egallager at gcc dot gnu.org
` (4 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-29 14:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #45 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
>I have an idea. If it's impossible to defer initial optimization,
> maybe it's possible to emit some intermediate insn and catch it and optimize
> later?
Good news. I've made a proof of concept. It works at least sometimes - on
simple tests.
$ cat f.c
#define A 0xFFFF0000
#define P ((unsigned char *)A)
#define F 64
#define S 8
unsigned char f_non_zero(unsigned char v){
return (v & F) != 0;
}
unsigned f_sym_non_zero(void){
return (*P & F) != 0;
}
unsigned f_sym_mask(void){
return (*P & F) == F;
}
int f_rshift(char v){
return v >> S;
}
$ /usr/local/sh-toolchain/bin/sh-elf-gcc -O2 -mb -m2e -da -S f.c
$ cat f.s
.file "f.c"
.text
.text
.align 1
.align 2
.global _f_non_zero
.type _f_non_zero, @function
_f_non_zero:
mov r4,r0
sts.l pr,@-r15
tst #64,r0
mov #-1,r0
negc r0,r0
lds.l @r15+,pr
rts
nop
.size _f_non_zero, .-_f_non_zero
.align 1
.align 2
.global _f_sym_non_zero
.type _f_sym_non_zero, @function
_f_sym_non_zero:
mov.l .L6,r1
sts.l pr,@-r15
mov.b @r1,r0
tst #64,r0
mov #-1,r0
negc r0,r0
lds.l @r15+,pr
rts
nop
.L7:
.align 2
.L6:
.long -65536
.size _f_sym_non_zero, .-_f_sym_non_zero
.align 1
.align 2
.global _f_sym_mask
.type _f_sym_mask, @function
_f_sym_mask:
mov.l .L10,r1
sts.l pr,@-r15
mov.b @r1,r0
tst #64,r0
mov #-1,r0
negc r0,r0
lds.l @r15+,pr
rts
nop
.L11:
.align 2
.L10:
.long -65536
.size _f_sym_mask, .-_f_sym_mask
.align 1
.align 2
.global _f_rshift
.type _f_rshift, @function
_f_rshift:
mov.l .L14,r1
sts.l pr,@-r15
jsr @r1
exts.b r4,r4
mov r4,r0
lds.l @r15+,pr
rts
nop
.L15:
.align 2
.L14:
.long ___ashiftrt_r4_8
.size _f_rshift, .-_f_rshift
.ident "GCC: (GNU) 13.1.0"
$ /usr/local/sh-toolchain/bin/sh-elf-gcc -O2 -ml -m2e -da -S f.c
$ cat f.s
.file "f.c"
.text
.little
.text
.align 1
.align 2
.global _f_non_zero
.type _f_non_zero, @function
_f_non_zero:
mov r4,r0
sts.l pr,@-r15
tst #64,r0
mov #-1,r0
negc r0,r0
lds.l @r15+,pr
rts
nop
.size _f_non_zero, .-_f_non_zero
.align 1
.align 2
.global _f_sym_non_zero
.type _f_sym_non_zero, @function
_f_sym_non_zero:
mov.l .L6,r1
sts.l pr,@-r15
mov.b @r1,r0
tst #64,r0
mov #-1,r0
negc r0,r0
lds.l @r15+,pr
rts
nop
.L7:
.align 2
.L6:
.long -65536
.size _f_sym_non_zero, .-_f_sym_non_zero
.align 1
.align 2
.global _f_sym_mask
.type _f_sym_mask, @function
_f_sym_mask:
mov.l .L10,r1
sts.l pr,@-r15
mov.b @r1,r0
tst #64,r0
mov #-1,r0
negc r0,r0
lds.l @r15+,pr
rts
nop
.L11:
.align 2
.L10:
.long -65536
.size _f_sym_mask, .-_f_sym_mask
.align 1
.align 2
.global _f_rshift
.type _f_rshift, @function
_f_rshift:
mov.l .L14,r1
sts.l pr,@-r15
jsr @r1
exts.b r4,r4
mov r4,r0
lds.l @r15+,pr
rts
nop
.L15:
.align 2
.L14:
.long ___ashiftrt_r4_8
.size _f_rshift, .-_f_rshift
.ident "GCC: (GNU) 13.1.0"
Splitting takes place at split1 pass as expected. Here is the patch itself.
$ cat gcc-13.1.0-ashrsi3_libcall.patch
diff -ur gcc-13.1.0.orig/gcc/config/sh/sh-protos.h
gcc-13.1.0/gcc/config/sh/sh-protos.h
--- gcc-13.1.0.orig/gcc/config/sh/sh-protos.h 2023-04-26 10:09:39.000000000
+0300
+++ gcc-13.1.0/gcc/config/sh/sh-protos.h 2023-05-29 11:45:05.134723435
+0300
@@ -78,6 +78,7 @@
extern void gen_shifty_op (int, rtx *);
extern void gen_shifty_hi_op (int, rtx *);
extern bool expand_ashiftrt (rtx *);
+extern bool expand_ashrsi3_libcall (rtx *);//delete
extern bool sh_dynamicalize_shift_p (rtx);
extern int shl_and_kind (rtx, rtx, int *);
extern int shl_and_length (rtx);
diff -ur gcc-13.1.0.orig/gcc/config/sh/sh.cc gcc-13.1.0/gcc/config/sh/sh.cc
--- gcc-13.1.0.orig/gcc/config/sh/sh.cc 2023-04-26 10:09:39.000000000 +0300
+++ gcc-13.1.0/gcc/config/sh/sh.cc 2023-05-29 17:09:54.602787537 +0300
@@ -3875,11 +3877,37 @@
wrk = gen_reg_rtx (Pmode);
/* Load the value into an arg reg and call a helper. */
- emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
+ /*emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
sprintf (func, "__ashiftrt_r4_%d", value);
rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));
emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
+ return true;*/
+
+ if (dump_file)
+ fprintf(dump_file, "ashrsi3: Emitting collapsed libcall\n");
+ emit_insn (gen_ashrsi3_libcall_collapsed (operands[0], operands[1],
GEN_INT(value)));//delete
+ return true;//delete
+}
+
+//delete
+bool
+expand_ashrsi3_libcall (rtx *operands) {
+ char func[18];
+
+ if (dump_file)
+ fprintf(dump_file, "ashrsi3_libcall_collapsed: Expanding ashrsi3
libcall\n");
+
+ rtx wrk = gen_reg_rtx (Pmode);
+ emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
+
+ sprintf (func, "__ashiftrt_r4_%d", INTVAL (operands[2]));
+
+ rtx lab = function_symbol (wrk, func, SFUNC_STATIC).lab;
+
+ emit_insn (gen_ashrsi3_n (operands[2], wrk, lab));
+ emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));
+
return true;
}
diff -ur gcc-13.1.0.orig/gcc/config/sh/sh.md gcc-13.1.0/gcc/config/sh/sh.md
--- gcc-13.1.0.orig/gcc/config/sh/sh.md 2023-04-26 10:09:39.000000000 +0300
+++ gcc-13.1.0/gcc/config/sh/sh.md 2023-05-29 17:10:42.752779922 +0300
@@ -3867,6 +3867,35 @@
[(set_attr "type" "sfunc")
(set_attr "needs_delay_slot" "yes")])
+(define_insn "ashrsi3_libcall_collapsed"
+ [(set (match_operand:SI 0 "arith_reg_dest" "=r")
+ (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand" "0")
+ (match_operand:SI 2 "const_int_operand")))
+ (clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))]
+ "TARGET_SH1"
+ "OOPS"
+ [(set_attr "type" "dyn_shift")
+ (set_attr "needs_delay_slot" "yes")])
+
+(define_insn_and_split "ashrsi3_libcall_expand"
+ [(parallel [(set (match_operand:SI 0 "arith_reg_dest")
+ (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand")
+ (match_operand:SI 2 "const_int_operand"))
+ )(clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ ])]
+ "TARGET_SH1"
+ "OOPS_1"
+ "&& 1"
+ [(const_int 0)]
+{
+ if (expand_ashrsi3_libcall(operands))
+ DONE;
+ else
+ FAIL;
+})
+
;; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;; DImode arithmetic shift right
I did it by feel, actually picking up the parameters until it worked. So please
check it and improve it because I'm sure it will break something. Thank you.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (39 preceding siblings ...)
2023-05-29 14:54 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-30 1:48 ` egallager at gcc dot gnu.org
2023-05-30 1:56 ` olegendo at gcc dot gnu.org
` (3 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: egallager at gcc dot gnu.org @ 2023-05-30 1:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #46 from Eric Gallager <egallager at gcc dot gnu.org> ---
Reminder that patches go to the gcc-patches mailing list
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (40 preceding siblings ...)
2023-05-30 1:48 ` egallager at gcc dot gnu.org
@ 2023-05-30 1:56 ` olegendo at gcc dot gnu.org
2023-05-30 12:42 ` klepikov.alex+bugs at gmail dot com
` (2 subsequent siblings)
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-30 1:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #47 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Eric Gallager from comment #46)
> Reminder that patches go to the gcc-patches mailing list
It's OK. We're just throwing around ideas, nothing final
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (41 preceding siblings ...)
2023-05-30 1:56 ` olegendo at gcc dot gnu.org
@ 2023-05-30 12:42 ` klepikov.alex+bugs at gmail dot com
2023-05-30 19:57 ` olegendo at gcc dot gnu.org
2023-05-30 20:00 ` olegendo at gcc dot gnu.org
44 siblings, 0 replies; 46+ messages in thread
From: klepikov.alex+bugs at gmail dot com @ 2023-05-30 12:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #48 from Alexander Klepikov <klepikov.alex+bugs at gmail dot com> ---
I made tests (including *.c files from GCC testsuite) and everything looks fine
for now. But I'm still afraid that pattern for 'ashrsi3_libcall_expand' is too
wide. It is possible to narrow it down as much as possible by adding distinct
attribute and set when emitting 'ashrsi3_libcall_collapsed' and then check it
and fail if not set:
(define_attr "libcall_collapsed"
"ashrsi3,nil"
(const_string "nil"))
(define_insn "ashrsi3_libcall_collapsed"
[(set (match_operand:SI 0 "arith_reg_dest" "=r")
(ashiftrt:SI (match_operand:SI 1 "arith_reg_operand" "0")
(match_operand:SI 2 "const_int_operand")))
(clobber (reg:SI T_REG))
(clobber (reg:SI PR_REG))]
"TARGET_SH1"
"OOPS"
[(set_attr "type" "dyn_shift")
(set_attr "libcall_collapsed" "ashrsi3")
(set_attr "needs_delay_slot" "yes")])
if (get_attr_libcall_collapsed(insn) != LIBCALL_COLLAPSED_ASHRSI3)
return false;
It will be super safe then but ugly a little bit.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (42 preceding siblings ...)
2023-05-30 12:42 ` klepikov.alex+bugs at gmail dot com
@ 2023-05-30 19:57 ` olegendo at gcc dot gnu.org
2023-05-30 20:00 ` olegendo at gcc dot gnu.org
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-30 19:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #49 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Alexander Klepikov from comment #48)
> I made tests (including *.c files from GCC testsuite) and everything looks
> fine for now. But I'm still afraid that pattern for 'ashrsi3_libcall_expand'
> is too wide. It is possible to narrow it down as much as possible by adding
> distinct attribute and set when emitting 'ashrsi3_libcall_collapsed' and
> then check it and fail if not set:
>
For this kind of change, the whole GCC test suite needs to be ran for at least
big/little -m2,-m4 variants.
+(define_insn_and_split "ashrsi3_libcall_expand"
+ [(parallel [(set (match_operand:SI 0 "arith_reg_dest")
+ (ashiftrt:SI (match_operand:SI 1 "arith_reg_operand")
+ (match_operand:SI 2 "const_int_operand"))
+ )(clobber (reg:SI T_REG))
+ (clobber (reg:SI PR_REG))
+ ])]
The 'parallel' construct looks strange.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
2011-06-01 20:17 [Bug target/49263] New: SH Target: underutilized "TST #imm, R0" instruction oleg.endo@t-online.de
` (43 preceding siblings ...)
2023-05-30 19:57 ` olegendo at gcc dot gnu.org
@ 2023-05-30 20:00 ` olegendo at gcc dot gnu.org
44 siblings, 0 replies; 46+ messages in thread
From: olegendo at gcc dot gnu.org @ 2023-05-30 20:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #50 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Actually, let's take any further discussion of shift patterns to PR 54089.
^ permalink raw reply [flat|nested] 46+ messages in thread