* [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) @ 2011-08-05 18:51 Uros Bizjak 2011-08-07 12:39 ` Uros Bizjak 0 siblings, 1 reply; 7+ messages in thread From: Uros Bizjak @ 2011-08-05 18:51 UTC (permalink / raw) To: gcc-patches; +Cc: GCC Development [-- Attachment #1: Type: text/plain, Size: 2436 bytes --] Hello! Attached patch introduces generation of addr32 prefixed addresses, mainly intended to merge ZERO_EXTRACTed LEA calculations into address. After fixing various inconsistencies with "o" constraints, the patch works surprisingly well (in its current form fixes all reported problems in the PR [1]), but one problem remains w.r.t. handling of "o" constraint. Patched gcc ICEs on gcc.dg/torture/pr47744-2.c with: $ ~/gcc-build-fast/gcc/cc1 -O2 -mx32 -std=gnu99 -quiet pr47744-2.c pr47744-2.c: In function ‘matmul_i16’: pr47744-2.c:40:1: error: insn does not satisfy its constraints: (insn 116 66 67 4 (set (reg:TI 0 ax) (mem:TI (zero_extend:DI (plus:SI (reg:SI 4 si [orig:114 ivtmp.26 ] [114]) (reg:SI 5 di [orig:101 dest_y ] [101]))) [6 MEM[base: dest_y_18, index: ivtmp.26_53, offset: 0B]+0 S16 A128])) pr47744-2.c:34 60 {*movti_internal_rex64} (nil)) pr47744-2.c:40:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:403 Please submit a full bug report, ... ... due to the fact that the address is not offsetable, and plus ((zero_extend (...)) (const_int ...)) gets rejected from ix86_legitimate_address_p. However, the section "16.8.1 Simple Constraints" of the documentation claims: --quote-- * A nonoffsettable memory reference can be reloaded by copying the address into a register. So if the constraint uses the letter `o', all memory references are taken care of. --/quote-- As I read this sentence, the RTX is forced into a temporary register, and reload tries to satisfy "o" constraint with plus ((reg ...) (const_int ...)), as said at the introduction of "o" constraint a couple of pages earlier. Unfortunately, this does not seem to be the case. Is there anything wrong with my approach, or is there something wrong in reload? 2011-08-05 Uros Bizjak <ubizjak@gmail.com> PR target/49781 * config/i386/i386.c (ix86_decompose_address): Allow zero-extended SImode addresses. (ix86_print_operand_address): Handle zero-extended addresses. (memory_address_length): Add length of addr32 prefix for zero-extended addresses. * config/i386/predicates.md (lea_address_operand): Reject zero-extended operands. Patch is otherwise bootstrapped and tested on x86_64-pc-linux-gnu {,-m32} without regressions. [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49781 Thanks, Uros. [-- Attachment #2: p.diff.txt --] [-- Type: text/plain, Size: 2370 bytes --] Index: config/i386/predicates.md =================================================================== --- config/i386/predicates.md (revision 177456) +++ config/i386/predicates.md (working copy) @@ -801,6 +801,10 @@ struct ix86_address parts; int ok; + /* LEA handles zero-extend by itself. */ + if (GET_CODE (op) == ZERO_EXTEND) + return false; + ok = ix86_decompose_address (op, &parts); gcc_assert (ok); return parts.seg == SEG_DEFAULT; Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 177456) +++ config/i386/i386.c (working copy) @@ -11146,6 +11146,14 @@ ix86_decompose_address (rtx addr, struct ix86_addr int retval = 1; enum ix86_address_seg seg = SEG_DEFAULT; + /* Allow zero-extended SImode addresses, + they will be emitted with addr32 prefix. */ + if (TARGET_64BIT + && GET_CODE (addr) == ZERO_EXTEND + && GET_MODE (addr) == DImode + && GET_MODE (XEXP (addr, 0)) == SImode) + addr = XEXP (addr, 0); + if (REG_P (addr)) base = addr; else if (GET_CODE (addr) == SUBREG) @@ -14163,9 +14171,13 @@ ix86_print_operand_address (FILE *file, rtx addr) } else { - /* Print DImode registers on 64bit targets to avoid addr32 prefixes. */ - int code = TARGET_64BIT ? 'q' : 0; + int code = 0; + /* Print SImode registers for zero-extended addresses to force + addr32 prefix. Otherwise print DImode registers to avoid it. */ + if (TARGET_64BIT) + code = (GET_CODE (addr) == ZERO_EXTEND) ? 'l' : 'q'; + if (ASSEMBLER_DIALECT == ASM_ATT) { if (disp) @@ -21776,7 +21788,8 @@ assign_386_stack_local (enum machine_mode mode, en } \f /* Calculate the length of the memory address in the instruction - encoding. Does not include the one-byte modrm, opcode, or prefix. */ + encoding. Includes addr32 prefix, does not include the one-byte modrm, + opcode, or other prefixes. */ int memory_address_length (rtx addr) @@ -21803,8 +21816,10 @@ memory_address_length (rtx addr) base = parts.base; index = parts.index; disp = parts.disp; - len = 0; + /* Add length of addr32 prefix. */ + len = (GET_CODE (addr) == ZERO_EXTEND); + /* Rule of thumb: - esp as the base always wants an index, - ebp as the base always wants a displacement, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) 2011-08-05 18:51 [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) Uros Bizjak @ 2011-08-07 12:39 ` Uros Bizjak 2011-08-08 15:30 ` Ulrich Weigand 0 siblings, 1 reply; 7+ messages in thread From: Uros Bizjak @ 2011-08-07 12:39 UTC (permalink / raw) To: gcc-patches; +Cc: GCC Development, H.J. Lu On Fri, Aug 5, 2011 at 8:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > As I read this sentence, the RTX is forced into a temporary register, > and reload tries to satisfy "o" constraint with plus ((reg ...) > (const_int ...)), as said at the introduction of "o" constraint a > couple of pages earlier. Unfortunately, this does not seem to be the > case. > > Is there anything wrong with my approach, or is there something wrong in reload? To answer my own question, the problem was in *add<dwi>3_doubleword pattern, defined as: (define_insn_and_split "*add<dwi>3_doubleword" [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=r,o") (plus:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0") (match_operand:<DWI> 2 "<general_operand>" "ro<di>,r<di>"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)" When reload tried to satisfy alternative 1 (the "o" and matching "0") with a non-offsettable (in this particular case, zero-extended) address, it CSE'd operand 0 and operand 1 to a temporary TImode register. Unfortunately a Timode move has its own constraints: (define_insn "*movti_internal_rex64" [(set (match_operand:TI 0 "nonimmediate_operand" "=!r,o,x,x,xm") (match_operand:TI 1 "general_operand" "riFo,riF,C,xm,x"))] "TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))" where move from/to a general register to/from non-offsettable memory is not valid. Although, it would be nice for reload to subsequently fix CSE'd non-offsetable memory by copying address to temporary reg (*as said in the documentation*), we could simply require an XMM temporary for TImode reloads to/from integer registers, and this fixes ICE for x32. The testcase to play with (gcc -O2 -mx32): --cut here-- void test (__int128 *array, int idx, int off) { __int128 *dest = &array [idx]; dest[0] += 1; dest[off] = 0; } --cut here-- So, following additional patch saves the day: Index: i386/i386.c =================================================================== --- i386/i386.c (revision 177536) +++ i386/i386.c (working copy) @@ -28233,6 +28248,15 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class enum machine_mode mode, secondary_reload_info *sri ATTRIBUTE_UNUSED) { + /* Double-word spills from general registers to non-offsettable + memory references go through XMM register. Following code + handles zero-extended addresses on x32 target. */ + if (TARGET_64BIT + && GET_MODE_SIZE (mode) > UNITS_PER_WORD + && rclass == GENERAL_REGS + && !offsettable_memref_p (x)) + return SSE_REGS; + /* QImode spills from non-QI registers require intermediate register on 32bit targets. */ if (!TARGET_64BIT Uros. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) 2011-08-07 12:39 ` Uros Bizjak @ 2011-08-08 15:30 ` Ulrich Weigand 2011-08-08 17:12 ` Uros Bizjak 0 siblings, 1 reply; 7+ messages in thread From: Ulrich Weigand @ 2011-08-08 15:30 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, GCC Development, H.J. Lu Uros Bizjak wrote: > Although, it would be nice for reload to subsequently fix CSE'd > non-offsetable memory by copying address to temporary reg (*as said in > the documentation*), we could simply require an XMM temporary for > TImode reloads to/from integer registers, and this fixes ICE for x32. Moves are special as far as reload is concerned. If there is already a move instruction present *before* reload, it will get fixed up according to its constraints as any other instruction. However, reload will *introduce* new moves as part of its operation, and those will *not* themselves get reloaded. Instead, reload simply assumes that every plain move will just succeed without requiring any reload; if this is not true, the target *must* provide a secondary reload for this move. (Note that the secondary reload could also work by reloading the target address into a temporary; that's up to the target to implement.) Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) 2011-08-08 15:30 ` Ulrich Weigand @ 2011-08-08 17:12 ` Uros Bizjak 2011-08-08 17:14 ` H.J. Lu 2011-08-09 7:41 ` Uros Bizjak 0 siblings, 2 replies; 7+ messages in thread From: Uros Bizjak @ 2011-08-08 17:12 UTC (permalink / raw) To: Ulrich Weigand; +Cc: gcc-patches, GCC Development, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 2145 bytes --] On Mon, Aug 8, 2011 at 5:30 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Uros Bizjak wrote: > >> Although, it would be nice for reload to subsequently fix CSE'd >> non-offsetable memory by copying address to temporary reg (*as said in >> the documentation*), we could simply require an XMM temporary for >> TImode reloads to/from integer registers, and this fixes ICE for x32. > > Moves are special as far as reload is concerned. If there is already > a move instruction present *before* reload, it will get fixed up > according to its constraints as any other instruction. > > However, reload will *introduce* new moves as part of its operation, > and those will *not* themselves get reloaded. Instead, reload simply > assumes that every plain move will just succeed without requiring > any reload; if this is not true, the target *must* provide a > secondary reload for this move. > > (Note that the secondary reload could also work by reloading the > target address into a temporary; that's up to the target to > implement.) Whoa, indeed. Using attached patch that reloads memory address instead of going through XMM register, the code for the testcase improves from: test: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 sall $4, %esi addl %edi, %esi movdqa (%esi), %xmm0 movdqa %xmm0, -16(%rsp) movq -16(%rsp), %rcx movq -8(%rsp), %rbx addq $1, %rcx adcq $0, %rbx movq %rcx, -16(%rsp) sall $4, %edx movq %rbx, -8(%rsp) movdqa -16(%rsp), %xmm0 movdqa %xmm0, (%esi) pxor %xmm0, %xmm0 movdqa %xmm0, (%edx,%esi) popq %rbx .cfi_def_cfa_offset 8 ret to: test: .LFB0: .cfi_startproc sall $4, %esi pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 addl %edi, %esi pxor %xmm0, %xmm0 mov %esi, %eax movq (%rax), %rcx movq 8(%rax), %rbx addq $1, %rcx adcq $0, %rbx sall $4, %edx movq %rcx, (%rax) movq %rbx, 8(%rax) movdqa %xmm0, (%edx,%esi) popq %rbx .cfi_def_cfa_offset 8 ret H.J., can you please test attached patch? This optimization won't trigger on x86_64 anymore. Thanks, Uros. [-- Attachment #2: z.diff.txt --] [-- Type: text/plain, Size: 2657 bytes --] Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 177565) +++ config/i386/i386.md (working copy) @@ -2073,6 +2073,40 @@ (const_string "orig"))) (set_attr "mode" "SI,DI,DI,DI,SI,DI,DI,DI,DI,DI,TI,DI,TI,DI,DI,DI,DI,DI")]) +;; Reload patterns to support multi-word load/store +;; with non-offsetable address. +(define_expand "reload_noff_store" + [(parallel [(match_operand 0 "memory_operand" "=m") + (match_operand 1 "register_operand" "r") + (match_operand:DI 2 "register_operand" "=&r")])] + "TARGET_64BIT" +{ + rtx mem = operands[0]; + rtx addr = XEXP (mem, 0); + + emit_move_insn (operands[2], addr); + mem = replace_equiv_address_nv (mem, operands[2]); + + emit_insn (gen_rtx_SET (VOIDmode, mem, operands[1])); + DONE; +}) + +(define_expand "reload_noff_load" + [(parallel [(match_operand 0 "register_operand" "=r") + (match_operand 1 "memory_operand" "m") + (match_operand:DI 2 "register_operand" "=r")])] + "TARGET_64BIT" +{ + rtx mem = operands[1]; + rtx addr = XEXP (mem, 0); + + emit_move_insn (operands[2], addr); + mem = replace_equiv_address_nv (mem, operands[2]); + + emit_insn (gen_rtx_SET (VOIDmode, operands[0], mem)); + DONE; +}) + ;; Convert impossible stores of immediate to existing instructions. ;; First try to get scratch register and go through it. In case this ;; fails, move by 32bit parts. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 177566) +++ config/i386/i386.c (working copy) @@ -28245,18 +28245,25 @@ static reg_class_t ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass, - enum machine_mode mode, - secondary_reload_info *sri ATTRIBUTE_UNUSED) + enum machine_mode mode, secondary_reload_info *sri) { /* Double-word spills from general registers to non-offsettable memory - references (zero-extended addresses) go through XMM register. */ + references (zero-extended addresses) require special handling. */ if (TARGET_64BIT && MEM_P (x) && GET_MODE_SIZE (mode) > UNITS_PER_WORD && rclass == GENERAL_REGS && !offsettable_memref_p (x)) - return SSE_REGS; + { + sri->icode = (in_p + ? CODE_FOR_reload_noff_load + : CODE_FOR_reload_noff_store); + /* Add the cost of move to a temporary. */ + sri->extra_cost = 1; + return NO_REGS; + } + /* QImode spills from non-QI registers require intermediate register on 32bit targets. */ if (!TARGET_64BIT ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) 2011-08-08 17:12 ` Uros Bizjak @ 2011-08-08 17:14 ` H.J. Lu 2011-08-09 7:41 ` Uros Bizjak 1 sibling, 0 replies; 7+ messages in thread From: H.J. Lu @ 2011-08-08 17:14 UTC (permalink / raw) To: Uros Bizjak; +Cc: Ulrich Weigand, gcc-patches, GCC Development On Mon, Aug 8, 2011 at 10:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Aug 8, 2011 at 5:30 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> Uros Bizjak wrote: >> >>> Although, it would be nice for reload to subsequently fix CSE'd >>> non-offsetable memory by copying address to temporary reg (*as said in >>> the documentation*), we could simply require an XMM temporary for >>> TImode reloads to/from integer registers, and this fixes ICE for x32. >> >> Moves are special as far as reload is concerned. If there is already >> a move instruction present *before* reload, it will get fixed up >> according to its constraints as any other instruction. >> >> However, reload will *introduce* new moves as part of its operation, >> and those will *not* themselves get reloaded. Instead, reload simply >> assumes that every plain move will just succeed without requiring >> any reload; if this is not true, the target *must* provide a >> secondary reload for this move. >> >> (Note that the secondary reload could also work by reloading the >> target address into a temporary; that's up to the target to >> implement.) > > Whoa, indeed. > > Using attached patch that reloads memory address instead of going > through XMM register, the code for the testcase improves from: > > test: > .LFB0: > .cfi_startproc > pushq %rbx > .cfi_def_cfa_offset 16 > .cfi_offset 3, -16 > sall $4, %esi > addl %edi, %esi > movdqa (%esi), %xmm0 > movdqa %xmm0, -16(%rsp) > movq -16(%rsp), %rcx > movq -8(%rsp), %rbx > addq $1, %rcx > adcq $0, %rbx > movq %rcx, -16(%rsp) > sall $4, %edx > movq %rbx, -8(%rsp) > movdqa -16(%rsp), %xmm0 > movdqa %xmm0, (%esi) > pxor %xmm0, %xmm0 > movdqa %xmm0, (%edx,%esi) > popq %rbx > .cfi_def_cfa_offset 8 > ret > > to: > > test: > .LFB0: > .cfi_startproc > sall $4, %esi > pushq %rbx > .cfi_def_cfa_offset 16 > .cfi_offset 3, -16 > addl %edi, %esi > pxor %xmm0, %xmm0 > mov %esi, %eax > movq (%rax), %rcx > movq 8(%rax), %rbx > addq $1, %rcx > adcq $0, %rbx > sall $4, %edx > movq %rcx, (%rax) > movq %rbx, 8(%rax) > movdqa %xmm0, (%edx,%esi) > popq %rbx > .cfi_def_cfa_offset 8 > ret > > H.J., can you please test attached patch? This optimization won't > trigger on x86_64 anymore. > I will test it. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) 2011-08-08 17:12 ` Uros Bizjak 2011-08-08 17:14 ` H.J. Lu @ 2011-08-09 7:41 ` Uros Bizjak 2011-08-09 15:40 ` H.J. Lu 1 sibling, 1 reply; 7+ messages in thread From: Uros Bizjak @ 2011-08-09 7:41 UTC (permalink / raw) To: Ulrich Weigand; +Cc: gcc-patches, GCC Development, H.J. Lu On Mon, Aug 8, 2011 at 7:11 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> Moves are special as far as reload is concerned. If there is already >> a move instruction present *before* reload, it will get fixed up >> according to its constraints as any other instruction. >> >> However, reload will *introduce* new moves as part of its operation, >> and those will *not* themselves get reloaded. Instead, reload simply >> assumes that every plain move will just succeed without requiring >> any reload; if this is not true, the target *must* provide a >> secondary reload for this move. >> >> (Note that the secondary reload could also work by reloading the >> target address into a temporary; that's up to the target to >> implement.) > > Whoa, indeed. > > Using attached patch that reloads memory address instead of going > through XMM register, the code for the testcase improves from: Committed to mainline with following ChangeLog entry: 2011-08-09 Uros Bizjak <ubizjak@gmail.com> PR target/49781 * config/i386/i386.md (reload_noff_load): New. (reload_noff_store): Ditto. * config/i386/i386.c (ix86_secondary_reload): Use CODE_FOR_reload_noff_load and CODE_FOR_reload_noff_store to handle double-word moves from/to non-offsetable addresses instead of generating XMM temporary. Re-bootstrapped and re-tested on x86_64-pc-linux-gnu {,-m32}. Uros. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) 2011-08-09 7:41 ` Uros Bizjak @ 2011-08-09 15:40 ` H.J. Lu 0 siblings, 0 replies; 7+ messages in thread From: H.J. Lu @ 2011-08-09 15:40 UTC (permalink / raw) To: Uros Bizjak; +Cc: Ulrich Weigand, gcc-patches, GCC Development On Tue, Aug 9, 2011 at 12:40 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Aug 8, 2011 at 7:11 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > >>> Moves are special as far as reload is concerned. If there is already >>> a move instruction present *before* reload, it will get fixed up >>> according to its constraints as any other instruction. >>> >>> However, reload will *introduce* new moves as part of its operation, >>> and those will *not* themselves get reloaded. Instead, reload simply >>> assumes that every plain move will just succeed without requiring >>> any reload; if this is not true, the target *must* provide a >>> secondary reload for this move. >>> >>> (Note that the secondary reload could also work by reloading the >>> target address into a temporary; that's up to the target to >>> implement.) >> >> Whoa, indeed. >> >> Using attached patch that reloads memory address instead of going >> through XMM register, the code for the testcase improves from: > > Committed to mainline with following ChangeLog entry: > > 2011-08-09 Uros Bizjak <ubizjak@gmail.com> > > PR target/49781 > * config/i386/i386.md (reload_noff_load): New. > (reload_noff_store): Ditto. > * config/i386/i386.c (ix86_secondary_reload): Use > CODE_FOR_reload_noff_load and CODE_FOR_reload_noff_store to handle > double-word moves from/to non-offsetable addresses instead of > generating XMM temporary. > > Re-bootstrapped and re-tested on x86_64-pc-linux-gnu {,-m32}. > No regressions on x32 with GCC, glibc and SPEC CPU 2K/2006. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-08-09 15:40 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-05 18:51 [RFC PATCH, i386]: Allow zero_extended addresses (+ problems with reload and offsetable address, "o" constraint) Uros Bizjak 2011-08-07 12:39 ` Uros Bizjak 2011-08-08 15:30 ` Ulrich Weigand 2011-08-08 17:12 ` Uros Bizjak 2011-08-08 17:14 ` H.J. Lu 2011-08-09 7:41 ` Uros Bizjak 2011-08-09 15:40 ` H.J. Lu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).