Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
@ 2018-05-07  8:28 Umesh Kalappa
  2018-05-07  8:38 ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Umesh Kalappa @ 2018-05-07  8:28 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc, Jakub Jelinek

CCed Jakub,


> Hi Alex,
> Agree that float division don't touch memory ,but fdiv  result (stack
> register ) is stored  back to a memory i.e fResult .
>
> So compiler barrier in the inline asm i.e ::memory should prevent the
> shrinkage of  instructions like  "fstps   fResult(%rip)"    behind the
> fence ?
>
> BTW ,if we make fDivident  and  fResult = 0.0f  gloabls,the code
> emitted looks ok  i.e
> #gcc -S test.c -O3  -mmmx -mno-sse
>
>        flds    .LC0(%rip)
>         fsts    fDivident(%rip)
>         fdivs   .LC1(%rip)
>         fstps   fResult(%rip)
> #APP
> # 10 "test.c" 1
>         mfence
> # 0 "" 2
> #NO_APP
>         flds    fResult(%rip)
>         movl    $.LC2, %edi
>         xorl    %eax, %eax
>         fstpl   (%rsp)
>         call    printf
>
> So i strongly believe that ,its compiler issue and please feel free
> correct me in any case.
>
> Thank you and waiting for your reply.
>
> ~Umesh
>
>
>
>
> On Fri, Apr 13, 2018 at 5:58 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
>> On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
>>> The mfence instruction with memory clobber asm instruction should create a
>>> barrier between division and printf instructions.
>>
>> No, floating-point division does not touch memory, so the asm does not (and
>> need not) restrict its motion.
>>
>> Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-05-07  8:28 GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction Umesh Kalappa
@ 2018-05-07  8:38 ` Jakub Jelinek
  2018-05-29  4:20   ` Umesh Kalappa
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2018-05-07  8:38 UTC (permalink / raw)
  To: Umesh Kalappa; +Cc: Alexander Monakov, gcc

On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote:
> CCed Jakub,

> > Agree that float division don't touch memory ,but fdiv  result (stack
> > register ) is stored  back to a memory i.e fResult .

That doesn't really matter.  It is stored to a stack spill slot, something
that doesn't have address taken and other code (e.g. in other threads) can't
in a valid program access it.  That is not considered memory for the
inline-asm, only objects that must live in memory count.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-05-07  8:38 ` Jakub Jelinek
@ 2018-05-29  4:20   ` Umesh Kalappa
  0 siblings, 0 replies; 10+ messages in thread
From: Umesh Kalappa @ 2018-05-29  4:20 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Alexander Monakov, gcc

Ok, thanks for the clarification jakub.

Umesg

On Mon, May 7, 2018, 2:08 PM Jakub Jelinek <jakub@redhat.com> wrote:

> On Mon, May 07, 2018 at 01:58:48PM +0530, Umesh Kalappa wrote:
> > CCed Jakub,
>
> > > Agree that float division don't touch memory ,but fdiv  result (stack
> > > register ) is stored  back to a memory i.e fResult .
>
> That doesn't really matter.  It is stored to a stack spill slot, something
> that doesn't have address taken and other code (e.g. in other threads)
> can't
> in a valid program access it.  That is not considered memory for the
> inline-asm, only objects that must live in memory count.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-04-13 12:34 ` Alexander Monakov
  2018-04-13 12:34   ` Vivek Kinhekar
  2018-04-13 13:51   ` Vivek Kinhekar
@ 2018-05-04 12:50   ` Umesh Kalappa
  2 siblings, 0 replies; 10+ messages in thread
From: Umesh Kalappa @ 2018-05-04 12:50 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc

Hi Alex ,

Agree that float division don't touch memory ,but fdiv  result (stack
register ) is stored  back to a memory i.e fResult .

So compiler barrier in the inline asm i.e ::memory should prevent the
shrinkage of  instructions like  "fstps   fResult(%rip)"    behind the
fence ?

BTW ,if we make fDivident  and  fResult = 0.0f  gloabls,the code
emitted looks ok  i.e
#gcc -S test.c -O3  -mmmx -mno-sse

       flds    .LC0(%rip)
        fsts    fDivident(%rip)
        fdivs   .LC1(%rip)
        fstps   fResult(%rip)
#APP
# 10 "test.c" 1
        mfence
# 0 "" 2
#NO_APP
        flds    fResult(%rip)
        movl    $.LC2, %edi
        xorl    %eax, %eax
        fstpl   (%rsp)
        call    printf

So i strongly believe that ,its compiler issue and please feel free
correct me in any case.

Thank you and waiting for your reply.

~Umesh




On Fri, Apr 13, 2018 at 5:58 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
>> The mfence instruction with memory clobber asm instruction should create a
>> barrier between division and printf instructions.
>
> No, floating-point division does not touch memory, so the asm does not (and
> need not) restrict its motion.
>
> Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-04-13 13:51   ` Vivek Kinhekar
@ 2018-04-13 16:10     ` Jakub Jelinek
  2018-04-13 14:32       ` Vivek Kinhekar
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2018-04-13 16:10 UTC (permalink / raw)
  To: Vivek Kinhekar; +Cc: Alexander Monakov, gcc

On Fri, Apr 13, 2018 at 01:34:21PM +0000, Vivek Kinhekar wrote:
> Hello Alexander,
> 
> In the given testcase, the generated fdivrs instruction performs the
> division of a symbol ref (memory value) by FPU Stack Register and stores
> the value in FPU Stack Register.

The stack registers are not memory.

> Please find the following RTL Dump of the fdivrs instruction generated. 
> It clearly access the memory for read access! 

That is a constant read, that doesn't count either.  It is in memory only
because the instruction doesn't support constant immediates, the memory is
read-only.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-04-13 16:10     ` Jakub Jelinek
@ 2018-04-13 14:32       ` Vivek Kinhekar
  0 siblings, 0 replies; 10+ messages in thread
From: Vivek Kinhekar @ 2018-04-13 14:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Alexander Monakov, gcc

Oh! Thanks for the quick response, Jakub.

Regards,
Vivek Kinhekar

-----Original Message-----
From: Jakub Jelinek <jakub@redhat.com> 
Sent: Friday, April 13, 2018 7:08 PM
To: Vivek Kinhekar <vivek.kinhekar@blackfigtech.com>
Cc: Alexander Monakov <amonakov@ispras.ru>; gcc@gcc.gnu.org
Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

On Fri, Apr 13, 2018 at 01:34:21PM +0000, Vivek Kinhekar wrote:
> Hello Alexander,
> 
> In the given testcase, the generated fdivrs instruction performs the 
> division of a symbol ref (memory value) by FPU Stack Register and 
> stores the value in FPU Stack Register.

The stack registers are not memory.

> Please find the following RTL Dump of the fdivrs instruction generated. 
> It clearly access the memory for read access! 

That is a constant read, that doesn't count either.  It is in memory only because the instruction doesn't support constant immediates, the memory is read-only.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-04-13 12:34 ` Alexander Monakov
  2018-04-13 12:34   ` Vivek Kinhekar
@ 2018-04-13 13:51   ` Vivek Kinhekar
  2018-04-13 16:10     ` Jakub Jelinek
  2018-05-04 12:50   ` Umesh Kalappa
  2 siblings, 1 reply; 10+ messages in thread
From: Vivek Kinhekar @ 2018-04-13 13:51 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc

Hello Alexander,

In the given testcase, the generated fdivrs instruction performs the division of a symbol ref (memory value) by FPU Stack Register and stores the value in FPU Stack Register. 

Please find the following RTL Dump of the fdivrs instruction generated. It clearly access the memory for read access!
===============================================================================
#(insn:TI 13 20 16 2 (set (reg:XF 8 st)
#        (div:XF (float_extend:XF (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [4 S4 A32]))
#            (reg:XF 8 st)))  {*fop_xf_4_i387}
#     (nil))
        fdivrs  .LC0    # 13    *fop_xf_4_i387/1        [length = 6]
===============================================================================

Are we missing anything subtle here?

Regards,
Vivek Kinhekar

-----Original Message-----
From: Alexander Monakov <amonakov@ispras.ru> 
Sent: Friday, April 13, 2018 5:58 PM
To: Vivek Kinhekar <vivek.kinhekar@blackfigtech.com>
Cc: gcc@gcc.gnu.org
Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
> The mfence instruction with memory clobber asm instruction should 
> create a barrier between division and printf instructions.

No, floating-point division does not touch memory, so the asm does not (and need not) restrict its motion.

Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-04-13 12:34 ` Alexander Monakov
@ 2018-04-13 12:34   ` Vivek Kinhekar
  2018-04-13 13:51   ` Vivek Kinhekar
  2018-05-04 12:50   ` Umesh Kalappa
  2 siblings, 0 replies; 10+ messages in thread
From: Vivek Kinhekar @ 2018-04-13 12:34 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc

Thanks for the quick response, Alexander!

Regards,
Vivek Kinhekar
+91-7709046470

-----Original Message-----
From: Alexander Monakov <amonakov@ispras.ru> 
Sent: Friday, April 13, 2018 5:58 PM
To: Vivek Kinhekar <vivek.kinhekar@blackfigtech.com>
Cc: gcc@gcc.gnu.org
Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
> The mfence instruction with memory clobber asm instruction should 
> create a barrier between division and printf instructions.

No, floating-point division does not touch memory, so the asm does not (and need not) restrict its motion.

Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
  2018-04-13 12:28 Vivek Kinhekar
@ 2018-04-13 12:34 ` Alexander Monakov
  2018-04-13 12:34   ` Vivek Kinhekar
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Alexander Monakov @ 2018-04-13 12:34 UTC (permalink / raw)
  To: Vivek Kinhekar; +Cc: gcc

On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
> The mfence instruction with memory clobber asm instruction should create a
> barrier between division and printf instructions.

No, floating-point division does not touch memory, so the asm does not (and
need not) restrict its motion.

Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

* GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
@ 2018-04-13 12:28 Vivek Kinhekar
  2018-04-13 12:34 ` Alexander Monakov
  0 siblings, 1 reply; 10+ messages in thread
From: Vivek Kinhekar @ 2018-04-13 12:28 UTC (permalink / raw)
  To: gcc; +Cc: Vivek Kinhekar

Hi,

We are trying to create a memory barrier with following testcase.

=====================================
#include <stdio.h>

void Test()
{
float fDivident = 0.000000001f;
float fResult = 0.0f;

fResult = ( fDivident / fResult );

__asm volatile ("mfence" ::: "memory");

printf("\nResult: %f\n", fResult);
}
======================================

'mfence' performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.

The mfence instruction with memory clobber asm instruction should create a barrier between division and printf instructions.

When the testcase is compiled with optimization options O1 and above it can be observed that the mfence instruction is reordered and precedes division instruction.

We expected that the two sets of assembly instructions, one pertaining to division operation and another pertaining to the printf operation, would not get mixed up on reordering by the GCC compiler optimizer because of the presence of the __asm volatile ("mfence" ::: "memory"); line between them.

But, the generated assembly, which is inlined below for reference, isn't quite right as per our expectation.

====================================================================

        pushl   %ebp    # 23    *pushsi2        [length = 1]
        movl    %esp, %ebp      # 24    *movsi_internal/1       [length = 2]
        subl    $24, %esp       # 25    pro_epilogue_adjust_stack_si_add/1      [length = 3]
        mfence
        fldz    # 20    *movxf_internal/3       [length = 2]
        fdivrs  .LC0    # 13    *fop_xf_4_i387/1        [length = 6]
====================================================================
You may note that the mfence instruction is generated before the fdivrs instruction.

Can you please let us know if the usage of the "asm (mfence)" instruction as given in the above testcase is the right way of creating the expected memory barrier between the two sets of instructions pertaining to the division and printf operations, respectively or not?

If yes, then we think, it's a bug in Compiler. Could you please confirm?

If no, then what is the correct usage of "asm (mfence)" so as to get/ achieve the memory barrier functionality as expected in the above testcase?

Thanks,
Vivek Kinhekar

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-05-29  4:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-07  8:28 GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction Umesh Kalappa
2018-05-07  8:38 ` Jakub Jelinek
2018-05-29  4:20   ` Umesh Kalappa
  -- strict thread matches above, loose matches on Subject: below --
2018-04-13 12:28 Vivek Kinhekar
2018-04-13 12:34 ` Alexander Monakov
2018-04-13 12:34   ` Vivek Kinhekar
2018-04-13 13:51   ` Vivek Kinhekar
2018-04-13 16:10     ` Jakub Jelinek
2018-04-13 14:32       ` Vivek Kinhekar
2018-05-04 12:50   ` Umesh Kalappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).