* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
@ 2020-05-09 2:31 ` cvs-commit at gcc dot gnu.org
2020-05-09 3:46 ` hp at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-09 2:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hans-Peter Nilsson <hp@gcc.gnu.org>:
https://gcc.gnu.org/g:27228024598c3515389cdb378346433fb2c48551
commit r11-222-g27228024598c3515389cdb378346433fb2c48551
Author: Hans-Peter Nilsson <hp@axis.com>
Date: Thu Jan 23 02:30:49 2020 +0100
cris: Emit trivial btstq expected by gcc.target/cris/sync-2i.c, sync-2c.c
As the added FIXME says, the new insn_and_split generates only a
small subset of the bit-tests that can be matched by "*btst" and
that were emitted by the undecc0rated cris.md at combine-time,
but it's naturally separable from a general variant by being
just what's needed for the test-cases that were previously
xfailed, and that no additional CCmodes are required.
gcc:
PR target/93372
* config/cris/cris.md (zcond): New code_iterator.
("*cbranch<mode>4_btstq<CC>"): New insn_and_split.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
2020-05-09 2:31 ` [Bug target/93372] cris performance regressions due to de-cc0 work cvs-commit at gcc dot gnu.org
@ 2020-05-09 3:46 ` hp at gcc dot gnu.org
2020-07-13 8:15 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: hp at gcc dot gnu.org @ 2020-05-09 3:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
--- Comment #3 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
In https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545452.html I mentioned a
performance-regression with coremark, from 5227456 cycles (with cc0) to 5238564
(CC_REG), which is about 0.21%.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
2020-05-09 2:31 ` [Bug target/93372] cris performance regressions due to de-cc0 work cvs-commit at gcc dot gnu.org
2020-05-09 3:46 ` hp at gcc dot gnu.org
@ 2020-07-13 8:15 ` cvs-commit at gcc dot gnu.org
2020-07-13 8:15 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-07-13 8:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hans-Peter Nilsson <hp@gcc.gnu.org>:
https://gcc.gnu.org/g:ef07c7a5884c130b48e653993bfaaf1ae9e6dedd
commit r11-2048-gef07c7a5884c130b48e653993bfaaf1ae9e6dedd
Author: Hans-Peter Nilsson <hp@axis.com>
Date: Wed Jul 8 23:59:12 2020 +0200
cris: Use addi.b for additions where flags aren't inspected
Comparing to the cc0 version of the CRIS port, I ran a few
microbenchmarks, for example gcc.c-torture/execute/arith-rand.c,
where there's sometimes an addition between an operation of
interest and the test on the result.
Unfortunately this patch doesn't remedy all the performance
regression for that program. But, this patch by itself helps
and makes sense to commit separately: lots of addi.b in
previously empty delay-slots, with functions shortened by one or
a few insns, in libgcc. I had an experience with the
reload-related caveat of % on constraints, which is "fixed"
documentationwise since long (soon 15 years ago;
be3914df4cc8/r105517). I removed an even older related FIXME.
gcc:
PR target/93372
* config/cris/cris.md ("*add<mode>3_addi"): New splitter.
("*addi_b_<mode>"): New pattern.
("*addsi3<setnz>"): Remove stale %-related comment.
gcc/testsuite:
PR target/93372
* gcc.target/cris/pr93372-45.c: New test.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2020-07-13 8:15 ` cvs-commit at gcc dot gnu.org
@ 2020-07-13 8:15 ` cvs-commit at gcc dot gnu.org
2020-07-16 23:53 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-07-13 8:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hans-Peter Nilsson <hp@gcc.gnu.org>:
https://gcc.gnu.org/g:9a2ae08b02d185a11e3e525e100ba637ce81c7ff
commit r11-2050-g9a2ae08b02d185a11e3e525e100ba637ce81c7ff
Author: Hans-Peter Nilsson <hp@axis.com>
Date: Sun Jul 12 18:41:25 2020 +0200
cris: Add new pass eliminating compares after delay-slot-filling
Delayed-branch-slot-filling a.k.a. reorg or dbr, often causes
opportunities for more compare-elimination than were visible for
the cmpelim pass. With cc0, these were caught by the
elimination pass run in "final", thus the missed opportunities
is a regression. A simple reorg-aware pass run just after reorg
handles most of them, if not all. I chose to keep the "mach2"
pass identifier string I copy-pasted from the SPARC port instead
of inventing one like "postdbr_cmpelim". Note the gap in numbers
in the test-case file names.
gcc:
PR target/93372
* config/cris/cris-passes.def: New file.
* config/cris/t-cris (PASSES_EXTRA): Add cris-passes.def.
* config/cris/cris.c: Add infrastructure bits and pass execute
function cris_postdbr_cmpelim.
* config/cris/cris-protos.h (make_pass_cris_postdbr_cmpelim):
Declare.
gcc/testsuite:
* gcc.target/cris/pr93372-44.c, gcc.target/cris/pr93372-46.c: New.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2020-07-13 8:15 ` cvs-commit at gcc dot gnu.org
@ 2020-07-16 23:53 ` cvs-commit at gcc dot gnu.org
2020-08-24 1:15 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-07-16 23:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>:
https://gcc.gnu.org/g:84c5396d4bdbf9f1d628c77db4421808f9a9dcb6
commit r11-2185-g84c5396d4bdbf9f1d628c77db4421808f9a9dcb6
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date: Thu Jul 16 23:42:46 2020 +0000
combine: Use single_set for is_just_move
Since we now only call is_just_move on the original instructions, we
always have an rtx_insn* (not just a pattern), so we can use single_set
on it. This makes no detectable difference at all on all thirty Linux
targets I test, but it does help cris, and it is simpler, cleaner code
anyway.
2020-07-16 Hans-Peter Nilsson <hp@axis.com>
Segher Boessenkool <segher@kernel.crashing.org>
PR target/93372
* combine.c (is_just_move): Take an rtx_insn* as argument. Use
single_set on it.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2020-07-16 23:53 ` cvs-commit at gcc dot gnu.org
@ 2020-08-24 1:15 ` cvs-commit at gcc dot gnu.org
2021-04-27 11:38 ` jakub at gcc dot gnu.org
2021-04-27 14:54 ` hp at gcc dot gnu.org
7 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-08-24 1:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hans-Peter Nilsson <hp@gcc.gnu.org>:
https://gcc.gnu.org/g:0e6c51de8ec47bf5f0dfaabfd1898c722d0485b4
commit r11-2814-g0e6c51de8ec47bf5f0dfaabfd1898c722d0485b4
Author: Hans-Peter Nilsson <hp@axis.com>
Date: Mon Aug 24 03:15:21 2020 +0200
reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM
This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
fillers using a method similar to that in commit 33c2207d3fda,
where care was taken for fill_simple_delay_slots to allow such
insns when scanning for delay-slot fillers *backwards* (before
the insn).
A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
For cc0 targets, insns don't mention clobbering cc0, so the
clobbers are mentioned in the "resources" only as a special
entity and only for compare-insns and branches, where the cc0
value matters.
In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
the register liveness detection in reorg.c / resource.c treats
that as a blocker (for other insns mentioning it, i.e. most)
when looking for delay-slot-filling candidates. This means that
when comparing core and performance for a delay-slot cc0 target
before and after the de-cc0 conversion, the inability to fill a
delay slot after conversion manifests as a regression. This was
one such case, for CRIS, with random_bitstring in
gcc.c-torture/execute/arith-rand-ll.c as well as the target
libgcc division function.
After this, all known performance regressions compared to cc0
are fixed.
gcc:
PR target/93372
* reorg.c (fill_slots_from_thread): Allow trial insns that clobber
TARGET_FLAGS_REGNUM as delay-slot fillers.
gcc/testsuite:
PR target/93372
* gcc.target/cris/pr93372-47.c: New test.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2020-08-24 1:15 ` cvs-commit at gcc dot gnu.org
@ 2021-04-27 11:38 ` jakub at gcc dot gnu.org
2021-04-27 14:54 ` hp at gcc dot gnu.org
7 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-27 11:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|11.0 |11.2
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.1 has been released, retargeting bugs to GCC 11.2.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/93372] cris performance regressions due to de-cc0 work
[not found] <bug-93372-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2021-04-27 11:38 ` jakub at gcc dot gnu.org
@ 2021-04-27 14:54 ` hp at gcc dot gnu.org
7 siblings, 0 replies; 8+ messages in thread
From: hp at gcc dot gnu.org @ 2021-04-27 14:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93372
Hans-Peter Nilsson <hp at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #9 from Hans-Peter Nilsson <hp at gcc dot gnu.org> ---
Whoops, this should be closed. All observed related regressions were fixed for
gcc-11.
^ permalink raw reply [flat|nested] 8+ messages in thread