public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up
@ 2022-02-24 9:54 xavier.leroy at inria dot fr
2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: xavier.leroy at inria dot fr @ 2022-02-24 9:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
Bug ID: 104674
Summary: i686 sse2: The two results of __divmoddi4 are mixed up
Product: gcc
Version: 11.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: xavier.leroy at inria dot fr
Target Milestone: ---
Created attachment 52505
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52505&action=edit
Repro case
Configuration: GCC 11.2.0 configured for i686-pc-cygwin, as packaged by Cygwin,
running under Cygwin.
When compiled with
gcc -O2 -mfpmath=sse -msse2
the attached repro case incorrectly prints
888888.08889
The correct result is 888888.00000.
"gcc -O2" and "gcc -O1 -mfpmath=sse -msse2" produce the correct result.
Looking at the assembly code generated by "gcc -O2 -mfpmath=sse -msse2", we see
that the two results of __divmoddi4 (the quotient and the remainder) end up
stored at the same stack location:
leal 40(%esp), %eax <-- address where to store the remainder
movl 68(%esp), %edx
movl $10000000, 8(%esp)
movl %eax, 16(%esp)
movl 64(%esp), %eax
movl $0, 12(%esp)
movl %eax, (%esp)
movl %edx, 4(%esp)
call ___divmoddi4 <-- quotient is in edx:eax
movsd LC1, %xmm2
movd %eax, %xmm0
movd %edx, %xmm3
punpckldq %xmm3, %xmm0 <-- quotient is in xmm0
movq %xmm0, 40(%esp) <-- overwrites the remainder
Regards,
- Xavier Leroy
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
@ 2022-02-24 10:04 ` rguenth at gcc dot gnu.org
2022-02-24 10:24 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-24 10:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |needs-bisection
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Component|middle-end |target
Last reconfirmed| |2022-02-24
Target Milestone|--- |11.3
Target|i686-pc-cygwin |i?86-*-*
Summary|i686 sse2: The two results |[11/12 Regression] i686
|of __divmoddi4 are mixed up |sse2: The two results of
| |__divmoddi4 are mixed up
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. It seems to work for me with GCC 10 but that uses separate div/mod
calls.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
@ 2022-02-24 10:24 ` jakub at gcc dot gnu.org
2022-02-24 12:25 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 10:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
CC| |jakub at gcc dot gnu.org
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
Priority|P3 |P2
Status|NEW |ASSIGNED
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with my r11-3671-gbf510679bb3f9bfd6019666065016bb26a5b5466
I'll have a look.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
2022-02-24 10:24 ` jakub at gcc dot gnu.org
@ 2022-02-24 12:25 ` jakub at gcc dot gnu.org
2022-02-24 12:29 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 12:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |uros at gcc dot gnu.org
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, when emitting the __divmoddi4 call, expand_DIVMOD ->
ix86_expand_divmod_libfunc calls
assign_386_stack_local (E_DImode, SLOT_TEMP)
to obtain a temporary stack slot for the remainder.
(mem:DI (plus:SI (frame) (const_int -8)))
is what is returned and the IL looks reasonable e.g. in vregs:
(insn 12 6 13 2 (parallel [
(set (reg:SI 97)
(plus:SI (reg/f:SI 19 frame)
(const_int -8 [0xfffffffffffffff8])))
(clobber (reg:CC 17 flags))
]) 229 {*addsi_1}
(nil))
...
(insn 19 18 20 2 (set (reg:DI 89 [ divmod_tmp_15 ])
(reg:DI 0 ax)) 80 {*movdi_internal}
(nil))
(insn 20 19 21 2 (set (reg:DI 90 [ divmod_tmp_15+8 ])
(mem/c:DI (plus:SI (reg/f:SI 19 frame)
(const_int -8 [0xfffffffffffffff8])) [0 S8 A64])) 80
{*movdi_internal}
(nil))
...
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
(float:DF (reg:DI 89 [ divmod_tmp_15 ]))) "pr104674.c":8:10 214
{*floatdidf2_i387}
(nil))
...
(insn 30 29 31 2 (set (reg:DF 98)
(float:DF (reg:SI 104 [ divmod_tmp_15+8 ]))) "pr104674.c":9:14 207
{*floatsidf2}
(expr_list:REG_DEAD (reg:SI 104 [ divmod_tmp_15+8 ])
(nil)))
i.e. it first loads from the temporary slot and only afterwards does some
further operations on the results.
Later on that insn 20 becomes
(insn 67 19 21 2 (set (reg:SI 104 [ divmod_tmp_15+8 ])
(mem/c:SI (plus:SI (reg/f:SI 19 frame)
(const_int -8 [0xfffffffffffffff8])) [0 S4 A64])) 81
{*movsi_internal}
(nil))
but it is still ok. Combine propagates that memory load into a later insn
though, so we have:
...
(insn 70 18 19 2 (set (reg:DI 106)
(reg:DI 0 ax)) -1
(expr_list:REG_DEAD (reg:DI 0 ax)
(nil)))
...
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
(float:DF (reg:DI 106))) "pr104674.c":8:10 214 {*floatdidf2_i387}
(expr_list:REG_DEAD (reg:DI 106)
(nil)))
...
(insn 30 29 31 2 (set (reg:DF 98)
(float:DF (mem/c:SI (plus:SI (reg/f:SI 19 frame)
(const_int -8 [0xfffffffffffffff8])) [0 S4 A64])))
"pr104674.c":9:14 207 {*floatsidf2}
(nil))
i.e. effective it extended the lifetime of the DImode SLOT_TEMP (well, the low
SImode part of it) across insn 25.
But then the split1 pass splits the:
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
(float:DF (reg:DI 106))) "pr104674.c":8:10 214 {*floatdidf2_i387}
(expr_list:REG_DEAD (reg:DI 106)
(nil)))
insn into:
(insn 72 24 26 2 (parallel [
(set (reg/v:DF 87 [ s ])
(float:DF (reg:DI 106)))
(clobber (mem/c:DI (plus:SI (reg/f:SI 19 frame)
(const_int -8 [0xfffffffffffffff8])) [0 S8 A64]))
(clobber (scratch:V4SI))
(clobber (scratch:V4SI))
]) "pr104674.c":8:10 -1
(nil))
and uses there assign_386_stack_local (E_DImode, SLOT_TEMP) which returns
the same temporary slot which is unfortunately live across that instruction:
;; Avoid store forwarding (partial memory) stall penalty
;; by passing DImode value through XMM registers. */
(define_split
[(set (match_operand:X87MODEF 0 "register_operand")
(float:X87MODEF
(match_operand:DI 1 "register_operand")))]
"!TARGET_64BIT && TARGET_INTER_UNIT_MOVES_TO_VEC
&& TARGET_80387 && X87_ENABLE_FLOAT (<X87MODEF:MODE>mode, DImode)
&& TARGET_SSE2 && optimize_function_for_speed_p (cfun)
&& can_create_pseudo_p ()"
[(const_int 0)]
{
emit_insn (gen_floatdi<mode>2_i387_with_xmm
(operands[0], operands[1],
assign_386_stack_local (DImode, SLOT_TEMP)));
DONE;
})
>From what I can see, SLOT_TEMP is used in:
i386.md: assign_386_stack_local (DImode, SLOT_TEMP)));
i386.md: assign_386_stack_local (DImode, SLOT_TEMP)));
sync.md: assign_386_stack_local (DImode, SLOT_TEMP)));
sync.md: assign_386_stack_local (DImode, SLOT_TEMP)));
i386-expand.cc: target = assign_386_stack_local (SImode, SLOT_TEMP);
i386-expand.cc: target = assign_386_stack_local (SImode, SLOT_TEMP);
i386-expand.cc: rtx rem = assign_386_stack_local (mode, SLOT_TEMP);
and except for this define_split, all other uses are either in some builtin's
expansion or in define_expand, those look good, but in this define_split, I
think it can't guarantee that SLOT_TEMP isn't live across the insn being split.
so we need to use a different SLOT_* kind there.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
` (2 preceding siblings ...)
2022-02-24 12:25 ` jakub at gcc dot gnu.org
@ 2022-02-24 12:29 ` jakub at gcc dot gnu.org
2022-02-24 13:03 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 12:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems similar to PR78791.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
` (3 preceding siblings ...)
2022-02-24 12:29 ` jakub at gcc dot gnu.org
@ 2022-02-24 13:03 ` jakub at gcc dot gnu.org
2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 13:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 52508
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52508&action=edit
gcc12-pr104674.patch
Untested fix.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
` (4 preceding siblings ...)
2022-02-24 13:03 ` jakub at gcc dot gnu.org
@ 2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
2022-02-25 11:15 ` [Bug target/104674] [11 " jakub at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-25 11:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:eabf7bbe601f2c0d87bd0a1012d7a602df2037da
commit r12-7388-geabf7bbe601f2c0d87bd0a1012d7a602df2037da
Author: Jakub Jelinek <jakub@redhat.com>
Date: Fri Feb 25 12:06:52 2022 +0100
i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm
[PR104674]
As mentioned in the PR, the following testcase is miscompiled for similar
reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
places during expansion and during expansion we can guarantee that the
lifetime of those temporary slot doesn't overlap. But the following
splitter uses SLOT_TEMP too and in between expansion and split1 there is
a possibility that something extends the lifetime of SLOT_TEMP created
slots across an instruction that will be split by this splitter.
The following patch fixes it by using a new temp slot kind to make sure
it doesn't reuse a SLOT_TEMP that could be live across the instruction.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104674
* config/i386/i386.h (enum ix86_stack_slot): Add
SLOT_FLOATxFDI_387.
* config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm):
Use
SLOT_FLOATxFDI_387 rather than SLOT_TEMP.
* gcc.target/i386/pr104674.c: New test.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
` (5 preceding siblings ...)
2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
@ 2022-02-25 11:15 ` jakub at gcc dot gnu.org
2022-03-29 5:53 ` cvs-commit at gcc dot gnu.org
2022-03-30 8:15 ` jakub at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-25 11:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[11/12 Regression] i686 |[11 Regression] i686 sse2:
|sse2: The two results of |The two results of
|__divmoddi4 are mixed up |__divmoddi4 are mixed up
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed on the trunk so far.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
` (6 preceding siblings ...)
2022-02-25 11:15 ` [Bug target/104674] [11 " jakub at gcc dot gnu.org
@ 2022-03-29 5:53 ` cvs-commit at gcc dot gnu.org
2022-03-30 8:15 ` jakub at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-29 5:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:acb9ea44fcceea0a54a89c7f94af4338c10759ef
commit r11-9720-gacb9ea44fcceea0a54a89c7f94af4338c10759ef
Author: Jakub Jelinek <jakub@redhat.com>
Date: Fri Feb 25 12:06:52 2022 +0100
i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm
[PR104674]
As mentioned in the PR, the following testcase is miscompiled for similar
reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
places during expansion and during expansion we can guarantee that the
lifetime of those temporary slot doesn't overlap. But the following
splitter uses SLOT_TEMP too and in between expansion and split1 there is
a possibility that something extends the lifetime of SLOT_TEMP created
slots across an instruction that will be split by this splitter.
The following patch fixes it by using a new temp slot kind to make sure
it doesn't reuse a SLOT_TEMP that could be live across the instruction.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104674
* config/i386/i386.h (enum ix86_stack_slot): Add
SLOT_FLOATxFDI_387.
* config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm):
Use
SLOT_FLOATxFDI_387 rather than SLOT_TEMP.
* gcc.target/i386/pr104674.c: New test.
(cherry picked from commit eabf7bbe601f2c0d87bd0a1012d7a602df2037da)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/104674] [11 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
` (7 preceding siblings ...)
2022-03-29 5:53 ` cvs-commit at gcc dot gnu.org
@ 2022-03-30 8:15 ` jakub at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-03-30 8:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 11.3 too.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-03-30 8:15 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24 9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
2022-02-24 10:24 ` jakub at gcc dot gnu.org
2022-02-24 12:25 ` jakub at gcc dot gnu.org
2022-02-24 12:29 ` jakub at gcc dot gnu.org
2022-02-24 13:03 ` jakub at gcc dot gnu.org
2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
2022-02-25 11:15 ` [Bug target/104674] [11 " jakub at gcc dot gnu.org
2022-03-29 5:53 ` cvs-commit at gcc dot gnu.org
2022-03-30 8:15 ` jakub at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).