public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/96401] New: [nvptx] Take advantage of subword ld/st/cvt
@ 2020-07-31 13:36 vries at gcc dot gnu.org
2020-07-31 13:49 ` [Bug target/96401] " vries at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: vries at gcc dot gnu.org @ 2020-07-31 13:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401
Bug ID: 96401
Summary: [nvptx] Take advantage of subword ld/st/cvt
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Consider test-case test.c:
...
void
foo (void)
{
volatile unsigned int v;
volatile unsigned short v2;
v2 = v;
}
...
With the current compiler, we have:
...
$ gcc test.c -S -o- -O2
...
.reg.u32 %r22;
.reg.u16 %r24;
ld.u32 %r22, [%frame];
cvt.u16.u32 %r24, %r22;
st.u16 [%frame+4], %r24;
}
...
As it happens, the nvptx manual states at 5.2.2 "Restricted Use of Sub-Word
Sizes":
...
For convenience, ld, st, and cvt instructions permit source and destination
data operands to be wider than the instruction-type size, so that narrow values
may be loaded, stored, and converted using regular-width registers. For
example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit
registers when being loaded, stored, or converted to other types and sizes.
...
In other words, we may emit instead:
...
.reg.u32 %r22;
ld.u32 %r22, [%frame];
st.u16 [%frame+4], %r22;
...
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt
2020-07-31 13:36 [Bug target/96401] New: [nvptx] Take advantage of subword ld/st/cvt vries at gcc dot gnu.org
@ 2020-07-31 13:49 ` vries at gcc dot gnu.org
2020-07-31 14:03 ` vries at gcc dot gnu.org
2020-07-31 14:14 ` vries at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vries at gcc dot gnu.org @ 2020-07-31 13:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401
--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #0)
> In other words, we may emit instead:
> ...
> .reg.u32 %r22;
> ld.u32 %r22, [%frame];
> st.u16 [%frame+4], %r22;
> ...
So, why don't we?
Using -dP we see the respective insns:
...
//(insn 5 2 6 2
// (set (reg:SI 22 [ v$0_1 ])
// (mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128]))
// "test.c":7:6 6 {*movsi_insn}
// (nil))
ld.u32 %r22, [%frame]; // 5 [c=4] *movsi_insn/1
//(insn 6 5 9 2
// (set (reg:HI 24 [ v$0_1 ])
// (subreg:HI (reg:SI 22 [ v$0_1 ]) 0))
// "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:SI 22 [ v$0_1 ])
// (nil)))
cvt.u16.u32 %r24, %r22; // 6 [c=12] *movhi_insn/0
//(insn 9 6 12 2
// (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
// (const_int 4 [0x4])) [2 v2+0 S2 A32])
// (reg:HI 24 [ v$0_1 ]))
// "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ])
// (nil)))
st.u16 [%frame+4], %r24; // 9 [c=4] *movhi_insn/2
...
I went to investigate why combine doesn't combine insns 6 and 9, that is, why
doesn't it generate:
...
//(insn 9 6 12 2
// (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
// (const_int 4 [0x4])) [2 v2+0 S2 A32])
// (subreg:HI (reg:SI 22 [ v$0_1 ]) 0))
// "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:HI 22 [ v$0_1 ])
// (nil)))
...
Part of the required changes is to make the movhi_insn store alternative work
for subreg source operand:
...
@@ -229,8 +234,8 @@
(define_insn "*mov<mode>_insn"
[(set (match_operand:QHSDIM 0 "nonimmediate_operand" "=R,R,m")
- (match_operand:QHSDIM 1 "general_operand" "Ri,m,R"))]
- "!MEM_P (operands[0]) || REG_P (operands[1])"
+ (match_operand:QHSDIM 1 "general_operand" "Ri,m,Q"))]
+ "!MEM_P (operands[0]) || REG_P (operands[1]) || SUBREG_P (operands[1])"
{
if (which_alternative == 1)
return "%.\\tld%A1%u1\\t%0, %1;";
...
which required me to define:
...
+(define_constraint "Q"
+ "A pseudo register or subreg."
+ (ior (match_code "reg")
+ (match_code "subreg")))
+
...
[ Note that this constraint is an oddity, like the R constraint: it's not a
register constraint. ]
After debugging I found that I needed this as well:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d2f321fcbcc..2234edad53b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -6444,7 +6444,7 @@ nvptx_data_alignment (const_tree type, unsigned int
basic_align)
static bool
nvptx_modes_tieable_p (machine_mode, machine_mode)
{
- return false;
+ return true;
}
/* Implement TARGET_HARD_REGNO_NREGS. */
...
due to this bit in combine.c:subst():
...
/* In general, don't install a subreg involving two
modes not tieable. It can worsen register
allocation, and can even make invalid reload
insns, since the reg inside may need to be copied
from in the outside mode, and that may be invalid
if it is an fp reg copied in integer mode.
...
Using these changes, I get the desired:
...
.reg.u32 %r22;
ld.u32 %r22, [%frame];
st.u16 [%frame+4], %r22;
...
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt
2020-07-31 13:36 [Bug target/96401] New: [nvptx] Take advantage of subword ld/st/cvt vries at gcc dot gnu.org
2020-07-31 13:49 ` [Bug target/96401] " vries at gcc dot gnu.org
@ 2020-07-31 14:03 ` vries at gcc dot gnu.org
2020-07-31 14:14 ` vries at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vries at gcc dot gnu.org @ 2020-07-31 14:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #1)
> Using these changes, I get the desired:
> ...
> .reg.u32 %r22;
> ld.u32 %r22, [%frame];
> st.u16 [%frame+4], %r22;
> ...
And to be precise about it, that's starting at fwprop1 that we have two insns:
...
(insn 5 2 9 2
(set (reg:SI 22 [ v$0_1 ])
(mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128]))
"test.c":7:6 6 {*movsi_insn}
(nil))
(insn 9 5 0 2
(set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
(const_int 4 [0x4])) [2 v2+0 S2 A32])
(subreg:HI (reg:SI 22 [ v$0_1 ]) 0))
"test.c":7:6 5 {*movhi_insn}
(expr_list:REG_DEAD (reg:SI 23 [ _2 ])
(nil)))
...
Which is a bit earlier (at 247r) than combine (at 271r).
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt
2020-07-31 13:36 [Bug target/96401] New: [nvptx] Take advantage of subword ld/st/cvt vries at gcc dot gnu.org
2020-07-31 13:49 ` [Bug target/96401] " vries at gcc dot gnu.org
2020-07-31 14:03 ` vries at gcc dot gnu.org
@ 2020-07-31 14:14 ` vries at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vries at gcc dot gnu.org @ 2020-07-31 14:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401
--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
Note that with the proposed TARGET_TRULY_NOOP_TRUNCATION -> false change (
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549896.html ), we start out
with the same ptx insns, but with the cvt.u16.u32 a truncate instead of a
subreg move:
...
//(insn 5 2 6 2
// (set (reg:SI 22 [ v$0_1 ])
// (mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128]))
// "test.c":7:6 6 {*movsi_insn}
// (nil))
ld.u32 %r22, [%frame]; // 5 [c=4] *movsi_insn/1
//(insn 6 5 9 2
// (set (reg:HI 24 [ v$0_1 ])
// (truncate:HI (reg:SI 22 [ v$0_1 ])))
"test.c":7:6 30 {truncsihi2}
// (expr_list:REG_DEAD (reg:SI 22 [ v$0_1 ])
// (nil)))
cvt.u16.u32 %r24, %r22; // 6 [c=4] truncsihi2/0
//(insn 9 6 12 2
// (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
// (const_int 4 [0x4])) [2 v2+0 S2 A32])
// (reg:HI 24 [ v$0_1 ])) "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ])
// (nil)))
st.u16 [%frame+4], %r24; // 9 [c=4] *movhi_insn/2
...
Still, with the changes in comment 1 enabled we end up with the desired two
insns, though a bit later, at cse2 (265r), and not using movhi_insn:
...
(insn 9 5 0 2 (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
(const_int 4 [0x4])) [2 v2+0 S2 A32])
(truncate:HI (reg:SI 22 [ v$0_1 ]))) "test.c":7:6 30 {truncsihi2}
(expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ])
(nil)))
...
so we might get this just with the nvptx_modes_tieable_p change.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-31 14:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 13:36 [Bug target/96401] New: [nvptx] Take advantage of subword ld/st/cvt vries at gcc dot gnu.org
2020-07-31 13:49 ` [Bug target/96401] " vries at gcc dot gnu.org
2020-07-31 14:03 ` vries at gcc dot gnu.org
2020-07-31 14:14 ` vries at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).