* [Bug rtl-optimization/7061] Access of bytes in struct parameters
[not found] <bug-7061-4@http.gcc.gnu.org/bugzilla/>
@ 2021-09-22 20:26 ` gabravier at gmail dot com
2022-05-30 20:40 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: gabravier at gmail dot com @ 2021-09-22 20:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061
Gabriel Ravier <gabravier at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gabravier at gmail dot com
--- Comment #7 from Gabriel Ravier <gabravier at gmail dot com> ---
Compiling this under ia64 seems to now be optimized perfectly as of at least
GCC 10, though the other ones look like they're still badly handled.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/7061] Access of bytes in struct parameters
[not found] <bug-7061-4@http.gcc.gnu.org/bugzilla/>
2021-09-22 20:26 ` [Bug rtl-optimization/7061] Access of bytes in struct parameters gabravier at gmail dot com
@ 2022-05-30 20:40 ` cvs-commit at gcc dot gnu.org
2022-06-10 14:20 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-30 20:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:1ad584d538d349db13cfa8440222d91d5e9aff3f
commit r13-859-g1ad584d538d349db13cfa8440222d91d5e9aff3f
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Mon May 30 21:32:58 2022 +0100
Allow SCmode and DImode to be tieable with TARGET_64BIT on x86_64.
This patch is a form of insurance policy in case my patch for PR 7061 runs
into problems on non-x86 targets; the middle-end can add an extra check
that the backend is happy placing SCmode and DImode values in the same
register, before creating a SUBREG. Unfortunately, ix86_modes_tieable_p
currently claims this is not allowed(?), even though the default target
hook for modes_tieable_p is to always return true [i.e. false can be
used to specifically prohibit bad combinations], and the x86_64 ABI
passes SCmode values in DImode registers!. This makes the backend's
modes_tiable_p hook a little more forgiving, and additionally enables
interconversion between SCmode and V2SFmode, and between DCmode and
VD2Fmode, which opens interesting opporutunities in the future.
2022-05-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.cc (ix86_modes_tieable_p): Allow SCmode to be
tieable with DImode on TARGET_64BIT, and SCmode tieable with
V2SFmode, and DCmode with V2DFmode.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/7061] Access of bytes in struct parameters
[not found] <bug-7061-4@http.gcc.gnu.org/bugzilla/>
2021-09-22 20:26 ` [Bug rtl-optimization/7061] Access of bytes in struct parameters gabravier at gmail dot com
2022-05-30 20:40 ` cvs-commit at gcc dot gnu.org
@ 2022-06-10 14:20 ` cvs-commit at gcc dot gnu.org
2022-06-11 13:29 ` david.bolvansky at gmail dot com
2022-06-27 6:49 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-10 14:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:1753a7120109c1d3b682f9487d6cca64fb2f0929
commit r13-1038-g1753a7120109c1d3b682f9487d6cca64fb2f0929
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri Jun 10 15:14:23 2022 +0100
PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.
For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }
GCC with -O2 currently generates:
re: movq %xmm0, -8(%rsp)
movss -8(%rsp), %xmm0
ret
im: movq %xmm0, -8(%rsp)
movss -4(%rsp), %xmm0
ret
with this patch we now generate:
re: ret
im: movq %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm0
ret
[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].
The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers. If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.
An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes. This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this. This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
2020-06-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/7061
* expr.cc (emit_group_store): For groups that consist of a single
scalar integer register that hold a complex mode value, use
gen_lowpart to generate a SUBREG to "view_convert" to the complex
mode. For modes of different sizes, first convert to an integer
mode of the appropriate size.
gcc/testsuite/ChangeLog
PR rtl-optimization/7061
* gcc.target/i386/pr7061-1.c: New test case.
* gcc.target/i386/pr7061-2.c: New test case.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/7061] Access of bytes in struct parameters
[not found] <bug-7061-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2022-06-10 14:20 ` cvs-commit at gcc dot gnu.org
@ 2022-06-11 13:29 ` david.bolvansky at gmail dot com
2022-06-27 6:49 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 7+ messages in thread
From: david.bolvansky at gmail dot com @ 2022-06-11 13:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061
Dávid Bolvanský <david.bolvansky at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |david.bolvansky at gmail dot com
--- Comment #10 from Dávid Bolvanský <david.bolvansky at gmail dot com> ---
llvm emits just:
im: # @im
shufps xmm0, xmm0, 85 # xmm0 = xmm0[1,1,1,1]
ret
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/7061] Access of bytes in struct parameters
[not found] <bug-7061-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2022-06-11 13:29 ` david.bolvansky at gmail dot com
@ 2022-06-27 6:49 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-27 6:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:64d4f27a0ce47e97867512bda7fa5683acf8a134
commit r13-1282-g64d4f27a0ce47e97867512bda7fa5683acf8a134
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Mon Jun 27 07:47:40 2022 +0100
Implement __imag__ of float _Complex using shufps on x86_64.
This patch is a follow-up improvement to my recent patch for
PR rtl-optimization/7061. That patch added the test case
gcc.target/i386/pr7061-2.c:
float im(float _Complex a) { return __imag__ a; }
For which GCC on x86_64 currently generates:
movq %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm0
ret
but with this patch we now generate (the same as LLVM):
shufps $85, %xmm0, %xmm0
ret
This is achieved by providing a define_insn_and_split that allows
truncated lshiftrt:DI by 32 to be performed on either SSE or general
regs, where if the register allocator prefers to use SSE, we split
to a shufps_v4si, or if not, we use a regular shrq.
2022-06-27 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/7061
* config/i386/i386.md (*highpartdisi2): New define_insn_and_split.
gcc/testsuite/ChangeLog
PR rtl-optimization/7061
* gcc.target/i386/pr7061-2.c: Update to look for shufps.
^ permalink raw reply [flat|nested] 7+ messages in thread