public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up
@ 2022-02-24  9:54 xavier.leroy at inria dot fr
  2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: xavier.leroy at inria dot fr @ 2022-02-24  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

            Bug ID: 104674
           Summary: i686 sse2: The two results of __divmoddi4 are mixed up
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xavier.leroy at inria dot fr
  Target Milestone: ---

Created attachment 52505
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52505&action=edit
Repro case

Configuration: GCC 11.2.0 configured for i686-pc-cygwin, as packaged by Cygwin,
running under Cygwin.

When compiled with 

gcc -O2 -mfpmath=sse -msse2

the attached repro case incorrectly prints

888888.08889

The correct result is 888888.00000.

"gcc -O2" and "gcc -O1 -mfpmath=sse -msse2" produce the correct result.

Looking at the assembly code generated by "gcc -O2 -mfpmath=sse -msse2", we see
that the two results of __divmoddi4 (the quotient and the remainder) end up
stored at the same stack location:

        leal    40(%esp), %eax       <-- address where to store the remainder
        movl    68(%esp), %edx
        movl    $10000000, 8(%esp)
        movl    %eax, 16(%esp)
        movl    64(%esp), %eax
        movl    $0, 12(%esp)
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        call    ___divmoddi4         <-- quotient is in edx:eax
        movsd   LC1, %xmm2
        movd    %eax, %xmm0
        movd    %edx, %xmm3
        punpckldq       %xmm3, %xmm0 <-- quotient is in xmm0
        movq    %xmm0, 40(%esp)      <-- overwrites the remainder

Regards,

- Xavier Leroy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
@ 2022-02-24 10:04 ` rguenth at gcc dot gnu.org
  2022-02-24 10:24 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-24 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needs-bisection
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
          Component|middle-end                  |target
   Last reconfirmed|                            |2022-02-24
   Target Milestone|---                         |11.3
             Target|i686-pc-cygwin              |i?86-*-*
            Summary|i686 sse2: The two results  |[11/12 Regression] i686
                   |of __divmoddi4 are mixed up |sse2: The two results of
                   |                            |__divmoddi4 are mixed up

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  It seems to work for me with GCC 10 but that uses separate div/mod
calls.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
  2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
@ 2022-02-24 10:24 ` jakub at gcc dot gnu.org
  2022-02-24 12:25 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-bisection             |
                 CC|                            |jakub at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
           Priority|P3                          |P2
             Status|NEW                         |ASSIGNED

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with my r11-3671-gbf510679bb3f9bfd6019666065016bb26a5b5466
I'll have a look.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
  2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
  2022-02-24 10:24 ` jakub at gcc dot gnu.org
@ 2022-02-24 12:25 ` jakub at gcc dot gnu.org
  2022-02-24 12:29 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 12:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, when emitting the __divmoddi4 call, expand_DIVMOD ->
ix86_expand_divmod_libfunc calls
assign_386_stack_local (E_DImode, SLOT_TEMP)
to obtain a temporary stack slot for the remainder.
(mem:DI (plus:SI (frame) (const_int -8)))
is what is returned and the IL looks reasonable e.g. in vregs:
(insn 12 6 13 2 (parallel [
            (set (reg:SI 97)
                (plus:SI (reg/f:SI 19 frame)
                    (const_int -8 [0xfffffffffffffff8])))
            (clobber (reg:CC 17 flags))
        ]) 229 {*addsi_1}
     (nil))
...
(insn 19 18 20 2 (set (reg:DI 89 [ divmod_tmp_15 ])
        (reg:DI 0 ax)) 80 {*movdi_internal}
     (nil))
(insn 20 19 21 2 (set (reg:DI 90 [ divmod_tmp_15+8 ])
        (mem/c:DI (plus:SI (reg/f:SI 19 frame)
                (const_int -8 [0xfffffffffffffff8])) [0  S8 A64])) 80
{*movdi_internal}
     (nil))
...
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
        (float:DF (reg:DI 89 [ divmod_tmp_15 ]))) "pr104674.c":8:10 214
{*floatdidf2_i387}
     (nil))
...
(insn 30 29 31 2 (set (reg:DF 98)
        (float:DF (reg:SI 104 [ divmod_tmp_15+8 ]))) "pr104674.c":9:14 207
{*floatsidf2}
     (expr_list:REG_DEAD (reg:SI 104 [ divmod_tmp_15+8 ])
        (nil)))
i.e. it first loads from the temporary slot and only afterwards does some
further operations on the results.
Later on that insn 20 becomes
(insn 67 19 21 2 (set (reg:SI 104 [ divmod_tmp_15+8 ])
        (mem/c:SI (plus:SI (reg/f:SI 19 frame)
                (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])) 81
{*movsi_internal}
     (nil))
but it is still ok.  Combine propagates that memory load into a later insn
though, so we have:
...
(insn 70 18 19 2 (set (reg:DI 106)
        (reg:DI 0 ax)) -1
     (expr_list:REG_DEAD (reg:DI 0 ax)
        (nil)))
...
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
        (float:DF (reg:DI 106))) "pr104674.c":8:10 214 {*floatdidf2_i387}
     (expr_list:REG_DEAD (reg:DI 106)
        (nil)))
...
(insn 30 29 31 2 (set (reg:DF 98)
        (float:DF (mem/c:SI (plus:SI (reg/f:SI 19 frame)
                    (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])))
"pr104674.c":9:14 207 {*floatsidf2}
     (nil))
i.e. effective it extended the lifetime of the DImode SLOT_TEMP (well, the low
SImode part of it) across insn 25.
But then the split1 pass splits the:
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
        (float:DF (reg:DI 106))) "pr104674.c":8:10 214 {*floatdidf2_i387}
     (expr_list:REG_DEAD (reg:DI 106)
        (nil)))
insn into:
(insn 72 24 26 2 (parallel [
            (set (reg/v:DF 87 [ s ])
                (float:DF (reg:DI 106)))
            (clobber (mem/c:DI (plus:SI (reg/f:SI 19 frame)
                        (const_int -8 [0xfffffffffffffff8])) [0  S8 A64]))
            (clobber (scratch:V4SI))
            (clobber (scratch:V4SI))
        ]) "pr104674.c":8:10 -1
     (nil))
and uses there assign_386_stack_local (E_DImode, SLOT_TEMP) which returns
the same temporary slot which is unfortunately live across that instruction:
;; Avoid store forwarding (partial memory) stall penalty
;; by passing DImode value through XMM registers.  */

(define_split
  [(set (match_operand:X87MODEF 0 "register_operand")
        (float:X87MODEF
          (match_operand:DI 1 "register_operand")))]
  "!TARGET_64BIT && TARGET_INTER_UNIT_MOVES_TO_VEC
   && TARGET_80387 && X87_ENABLE_FLOAT (<X87MODEF:MODE>mode, DImode)
   && TARGET_SSE2 && optimize_function_for_speed_p (cfun)
   && can_create_pseudo_p ()"
  [(const_int 0)]
{
  emit_insn (gen_floatdi<mode>2_i387_with_xmm
             (operands[0], operands[1],
              assign_386_stack_local (DImode, SLOT_TEMP)));
  DONE;
})

>From what I can see, SLOT_TEMP is used in:
i386.md:              assign_386_stack_local (DImode, SLOT_TEMP)));
i386.md:                   assign_386_stack_local (DImode, SLOT_TEMP)));
sync.md:                assign_386_stack_local (DImode, SLOT_TEMP)));
sync.md:                  assign_386_stack_local (DImode, SLOT_TEMP)));
i386-expand.cc:      target = assign_386_stack_local (SImode, SLOT_TEMP);
i386-expand.cc:      target = assign_386_stack_local (SImode, SLOT_TEMP);
i386-expand.cc:  rtx rem = assign_386_stack_local (mode, SLOT_TEMP);
and except for this define_split, all other uses are either in some builtin's
expansion or in define_expand, those look good, but in this define_split, I
think it can't guarantee that SLOT_TEMP isn't live across the insn being split.
so we need to use a different SLOT_* kind there.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
                   ` (2 preceding siblings ...)
  2022-02-24 12:25 ` jakub at gcc dot gnu.org
@ 2022-02-24 12:29 ` jakub at gcc dot gnu.org
  2022-02-24 13:03 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 12:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems similar to PR78791.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
                   ` (3 preceding siblings ...)
  2022-02-24 12:29 ` jakub at gcc dot gnu.org
@ 2022-02-24 13:03 ` jakub at gcc dot gnu.org
  2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-24 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 52508
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52508&action=edit
gcc12-pr104674.patch

Untested fix.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11/12 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
                   ` (4 preceding siblings ...)
  2022-02-24 13:03 ` jakub at gcc dot gnu.org
@ 2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
  2022-02-25 11:15 ` [Bug target/104674] [11 " jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-25 11:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:eabf7bbe601f2c0d87bd0a1012d7a602df2037da

commit r12-7388-geabf7bbe601f2c0d87bd0a1012d7a602df2037da
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Feb 25 12:06:52 2022 +0100

    i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm
[PR104674]

    As mentioned in the PR, the following testcase is miscompiled for similar
    reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
    places during expansion and during expansion we can guarantee that the
    lifetime of those temporary slot doesn't overlap.  But the following
    splitter uses SLOT_TEMP too and in between expansion and split1 there is
    a possibility that something extends the lifetime of SLOT_TEMP created
    slots across an instruction that will be split by this splitter.

    The following patch fixes it by using a new temp slot kind to make sure
    it doesn't reuse a SLOT_TEMP that could be live across the instruction.

    2022-02-25  Jakub Jelinek  <jakub@redhat.com>

            PR target/104674
            * config/i386/i386.h (enum ix86_stack_slot): Add
SLOT_FLOATxFDI_387.
            * config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm):
Use
            SLOT_FLOATxFDI_387 rather than SLOT_TEMP.

            * gcc.target/i386/pr104674.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
                   ` (5 preceding siblings ...)
  2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
@ 2022-02-25 11:15 ` jakub at gcc dot gnu.org
  2022-03-29  5:53 ` cvs-commit at gcc dot gnu.org
  2022-03-30  8:15 ` jakub at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-25 11:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11/12 Regression] i686     |[11 Regression] i686 sse2:
                   |sse2: The two results of    |The two results of
                   |__divmoddi4 are mixed up    |__divmoddi4 are mixed up

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed on the trunk so far.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
                   ` (6 preceding siblings ...)
  2022-02-25 11:15 ` [Bug target/104674] [11 " jakub at gcc dot gnu.org
@ 2022-03-29  5:53 ` cvs-commit at gcc dot gnu.org
  2022-03-30  8:15 ` jakub at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-29  5:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:acb9ea44fcceea0a54a89c7f94af4338c10759ef

commit r11-9720-gacb9ea44fcceea0a54a89c7f94af4338c10759ef
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Feb 25 12:06:52 2022 +0100

    i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm
[PR104674]

    As mentioned in the PR, the following testcase is miscompiled for similar
    reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
    places during expansion and during expansion we can guarantee that the
    lifetime of those temporary slot doesn't overlap.  But the following
    splitter uses SLOT_TEMP too and in between expansion and split1 there is
    a possibility that something extends the lifetime of SLOT_TEMP created
    slots across an instruction that will be split by this splitter.

    The following patch fixes it by using a new temp slot kind to make sure
    it doesn't reuse a SLOT_TEMP that could be live across the instruction.

    2022-02-25  Jakub Jelinek  <jakub@redhat.com>

            PR target/104674
            * config/i386/i386.h (enum ix86_stack_slot): Add
SLOT_FLOATxFDI_387.
            * config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm):
Use
            SLOT_FLOATxFDI_387 rather than SLOT_TEMP.

            * gcc.target/i386/pr104674.c: New test.

    (cherry picked from commit eabf7bbe601f2c0d87bd0a1012d7a602df2037da)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/104674] [11 Regression] i686 sse2: The two results of __divmoddi4 are mixed up
  2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
                   ` (7 preceding siblings ...)
  2022-03-29  5:53 ` cvs-commit at gcc dot gnu.org
@ 2022-03-30  8:15 ` jakub at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-03-30  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 11.3 too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-30  8:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24  9:54 [Bug c/104674] New: i686 sse2: The two results of __divmoddi4 are mixed up xavier.leroy at inria dot fr
2022-02-24 10:04 ` [Bug target/104674] [11/12 Regression] " rguenth at gcc dot gnu.org
2022-02-24 10:24 ` jakub at gcc dot gnu.org
2022-02-24 12:25 ` jakub at gcc dot gnu.org
2022-02-24 12:29 ` jakub at gcc dot gnu.org
2022-02-24 13:03 ` jakub at gcc dot gnu.org
2022-02-25 11:07 ` cvs-commit at gcc dot gnu.org
2022-02-25 11:15 ` [Bug target/104674] [11 " jakub at gcc dot gnu.org
2022-03-29  5:53 ` cvs-commit at gcc dot gnu.org
2022-03-30  8:15 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).