public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Use type alignment in get_builtin_sync_mem
@ 2019-09-02 20:35 Ulrich Weigand
  2019-09-03 10:55 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Ulrich Weigand @ 2019-09-02 20:35 UTC (permalink / raw)
  To: gcc-patches

Hello,

on s390x the 128-bit integer type is only aligned to 8 bytes by default,
but when lock-free atomic operations can only be performed on objects
aligned to 16 bytes.  However, we've noticed that GCC sometimes falls
back to library calls *even if* the object is actually 16 byte aligned,
and GCC could easily know this from type alignment information.

However, it turns out that get_builtin_sync_mem *ignores* type alignment
data, and only looks at *mode* alignment and pointer alignment data.
This is a problem since mode alignment doesn't know about user-provided
alignment attributes, and while pointer alignment does sometimes know
about those, this usually only happens with optimization on and also
if the optimizers can trace pointer assignments to the final object.

One important use case where the latter does not happen is with the
C++ atomic classes, because here the object is accessed via the "this"
pointer.  Pointer alignment tracking at this point never knows the
final object, and thus we always get a library call *even though*
libstdc++ has actually marked the class type with the correct alignment
atttribute.

Now one question might be, why does get_pointer_alignment not take
type alignment into account by itself?  This appears to be deliberate
to avoid considering numeric pointer values to be aligned when they
are not for target-specific reasons (e.g. the low bit that indicates
Thumb on ARM).

However, this is not an issue in get_builtin_sync_mem, where we are
actually interested in the alignment of the MEM we're just about to
generate, so it should be fine to check type alignment here.

This patch does just that, fixing the issue we've been seeing.

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

	* builtins.c (get_builtin_sync_mem): Respect type alignment.

testsuite/ChangeLog:

	* gcc.target/s390/md/atomic_exchange-1.c: Do not use -latomic.
	(aligned_int128): New data type.
	(ae_128_0): Use it instead of __int128 to ensure proper alignment
	for atomic accesses.
	(ae_128_1): Likewise.
	(g128): Likewise.
	(main): Likewise.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 274142)
+++ gcc/builtins.c	(working copy)
@@ -6001,9 +6001,16 @@
 
   mem = validize_mem (mem);
 
-  /* The alignment needs to be at least according to that of the mode.  */
-  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
-			   get_pointer_alignment (loc)));
+  /* The alignment needs to be at least according to that of the mode.
+     Also respect alignment requirements of the type, and alignment
+     info that may be deduced from the expression itself.  */
+  unsigned int align = GET_MODE_ALIGNMENT (mode);
+  if (POINTER_TYPE_P (TREE_TYPE (loc)))
+    {
+      unsigned int talign = min_align_of_type (TREE_TYPE (TREE_TYPE (loc)));
+      align = MAX (align, talign * BITS_PER_UNIT);
+    }
+  set_mem_align (mem, MAX (align, get_pointer_alignment (loc)));
   set_mem_alias_set (mem, ALIAS_SET_MEMORY_BARRIER);
   MEM_VOLATILE_P (mem) = 1;
 
Index: gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
===================================================================
--- gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c	(revision 274142)
+++ gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c	(working copy)
@@ -1,7 +1,7 @@
 /* Machine description pattern tests.  */
 
 /* { dg-do compile } */
-/* { dg-options "-lpthread -latomic" } */
+/* { dg-options "-lpthread" } */
 /* { dg-do run { target { s390_useable_hw } } } */
 
 /**/
@@ -119,19 +119,21 @@
 /**/
 
 #ifdef __s390x__
+typedef __int128 __attribute__((aligned(16))) aligned_int128;
+
 __int128
-ae_128_0 (__int128 *lock)
+ae_128_0 (aligned_int128 *lock)
 {
   return __atomic_exchange_n (lock, 0, 2);
 }
 
 __int128
-ae_128_1 (__int128 *lock)
+ae_128_1 (aligned_int128 *lock)
 {
   return __atomic_exchange_n (lock, 1, 2);
 }
 
-__int128 g128;
+aligned_int128 g128;
 
 __int128
 ae_128_g_0 (void)
@@ -274,7 +276,7 @@
 
 #ifdef __s390x__
       {
-	__int128 lock;
+	aligned_int128 lock;
 	__int128 rval;
 
 	lock = oval;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-02 20:35 [PATCH] Use type alignment in get_builtin_sync_mem Ulrich Weigand
@ 2019-09-03 10:55 ` Richard Biener
  2019-09-03 11:56   ` Ulrich Weigand
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2019-09-03 10:55 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: GCC Patches

On Mon, Sep 2, 2019 at 10:35 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
>
> Hello,
>
> on s390x the 128-bit integer type is only aligned to 8 bytes by default,
> but when lock-free atomic operations can only be performed on objects
> aligned to 16 bytes.  However, we've noticed that GCC sometimes falls
> back to library calls *even if* the object is actually 16 byte aligned,
> and GCC could easily know this from type alignment information.
>
> However, it turns out that get_builtin_sync_mem *ignores* type alignment
> data, and only looks at *mode* alignment and pointer alignment data.
> This is a problem since mode alignment doesn't know about user-provided
> alignment attributes, and while pointer alignment does sometimes know
> about those, this usually only happens with optimization on and also
> if the optimizers can trace pointer assignments to the final object.
>
> One important use case where the latter does not happen is with the
> C++ atomic classes, because here the object is accessed via the "this"
> pointer.  Pointer alignment tracking at this point never knows the
> final object, and thus we always get a library call *even though*
> libstdc++ has actually marked the class type with the correct alignment
> atttribute.
>
> Now one question might be, why does get_pointer_alignment not take
> type alignment into account by itself?  This appears to be deliberate
> to avoid considering numeric pointer values to be aligned when they
> are not for target-specific reasons (e.g. the low bit that indicates
> Thumb on ARM).

It's simply because we throw away type casting on GIMPLE so
when you see an aligned __int128 * as argument to
__builtin_sync* then there's no guarantee it's actually the type
the user used here.   The user might have written

__buitlin_sync_* ((__int128 *) ptr /* cast away bogus over-alignedness */, ...);

and GIMPLE happily throws away the cast.

> However, this is not an issue in get_builtin_sync_mem, where we are
> actually interested in the alignment of the MEM we're just about to
> generate, so it should be fine to check type alignment here.

The proper way to carry the information from source to RTL expansion
(or earlier GIMPLE) is to add another argument to the builtins
specifying alignment which the FEs can derive from the argument type
given appropriate semantics of the builtin.

Alternatively you could make it simply undefined behavior passing
not appropriately aligned memory to those builtins.  But I guess
we support misaligned cases so that's not an option.

> This patch does just that, fixing the issue we've been seeing.
>
> Tested on s390x-ibm-linux.
>
> OK for mainline?

I don't think the patch is technically correct.  Unlikely to break
for __int128 I guess but quite possibly for smaller types.

Oh, and in this context

/* The alignment needs to be at least according to that of the mode.  */

also looks fishy.  So maybe it _is_ an option to simply require
larger alignment?  Maybe add targetm.mode_alignment_for_atomics ()?

Richard.

> Bye,
> Ulrich
>
>
> ChangeLog:
>
>         * builtins.c (get_builtin_sync_mem): Respect type alignment.
>
> testsuite/ChangeLog:
>
>         * gcc.target/s390/md/atomic_exchange-1.c: Do not use -latomic.
>         (aligned_int128): New data type.
>         (ae_128_0): Use it instead of __int128 to ensure proper alignment
>         for atomic accesses.
>         (ae_128_1): Likewise.
>         (g128): Likewise.
>         (main): Likewise.
>
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      (revision 274142)
> +++ gcc/builtins.c      (working copy)
> @@ -6001,9 +6001,16 @@
>
>    mem = validize_mem (mem);
>
> -  /* The alignment needs to be at least according to that of the mode.  */
> -  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
> -                          get_pointer_alignment (loc)));
> +  /* The alignment needs to be at least according to that of the mode.
> +     Also respect alignment requirements of the type, and alignment
> +     info that may be deduced from the expression itself.  */
> +  unsigned int align = GET_MODE_ALIGNMENT (mode);
> +  if (POINTER_TYPE_P (TREE_TYPE (loc)))
> +    {
> +      unsigned int talign = min_align_of_type (TREE_TYPE (TREE_TYPE (loc)));
> +      align = MAX (align, talign * BITS_PER_UNIT);
> +    }
> +  set_mem_align (mem, MAX (align, get_pointer_alignment (loc)));
>    set_mem_alias_set (mem, ALIAS_SET_MEMORY_BARRIER);
>    MEM_VOLATILE_P (mem) = 1;
>
> Index: gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c        (revision 274142)
> +++ gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c        (working copy)
> @@ -1,7 +1,7 @@
>  /* Machine description pattern tests.  */
>
>  /* { dg-do compile } */
> -/* { dg-options "-lpthread -latomic" } */
> +/* { dg-options "-lpthread" } */
>  /* { dg-do run { target { s390_useable_hw } } } */
>
>  /**/
> @@ -119,19 +119,21 @@
>  /**/
>
>  #ifdef __s390x__
> +typedef __int128 __attribute__((aligned(16))) aligned_int128;
> +
>  __int128
> -ae_128_0 (__int128 *lock)
> +ae_128_0 (aligned_int128 *lock)
>  {
>    return __atomic_exchange_n (lock, 0, 2);
>  }
>
>  __int128
> -ae_128_1 (__int128 *lock)
> +ae_128_1 (aligned_int128 *lock)
>  {
>    return __atomic_exchange_n (lock, 1, 2);
>  }
>
> -__int128 g128;
> +aligned_int128 g128;
>
>  __int128
>  ae_128_g_0 (void)
> @@ -274,7 +276,7 @@
>
>  #ifdef __s390x__
>        {
> -       __int128 lock;
> +       aligned_int128 lock;
>         __int128 rval;
>
>         lock = oval;
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-03 10:55 ` Richard Biener
@ 2019-09-03 11:56   ` Ulrich Weigand
  2019-09-03 12:28     ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Ulrich Weigand @ 2019-09-03 11:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener wrote:
> On Mon, Sep 2, 2019 at 10:35 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > Now one question might be, why does get_pointer_alignment not take
> > type alignment into account by itself?  This appears to be deliberate
> > to avoid considering numeric pointer values to be aligned when they
> > are not for target-specific reasons (e.g. the low bit that indicates
> > Thumb on ARM).
> 
> It's simply because we throw away type casting on GIMPLE so
> when you see an aligned __int128 * as argument to
> __builtin_sync* then there's no guarantee it's actually the type
> the user used here.   The user might have written
> 
> __buitlin_sync_* ((__int128 *) ptr /* cast away bogus over-alignedness */, ...);
> 
> and GIMPLE happily throws away the cast.

Huh, I was simply going after the comment in front of get_object_alignment_2:
   Note that the address (and thus the alignment) computed here is based
   on the address to which a symbol resolves, whereas DECL_ALIGN is based
   on the address at which an object is actually located.  These two
   addresses are not always the same.  For example, on ARM targets,
   the address &foo of a Thumb function foo() has the lowest bit set,
   whereas foo() itself starts on an even address.

combined with the fact that get_object_alignment_2 actually itself
uses type alignment if we have an actual memory object:
      /* When EXP is an actual memory reference then we can use
         TYPE_ALIGN of a pointer indirection to derive alignment.
         Do so only if get_pointer_alignment_1 did not reveal absolute
         alignment knowledge and if using that alignment would
         improve the situation.  */

Is this not correct either, or is the situation there somehow different?

In any case, I'm not sure I understand your example.  Is this supposed
to cover the case where "ptr" is a type with 16 byte alignment, while
__int128 only has 8 byte alignment?  But in that case I'd assume that
every possible value of "ptr" at runtime must be 16 byte aligned, or
else it wouldn't be a valid value of a variable of that type, right?
So in that case assuming a 16-byte alignment would be correct?

> The proper way to carry the information from source to RTL expansion
> (or earlier GIMPLE) is to add another argument to the builtins
> specifying alignment which the FEs can derive from the argument type
> given appropriate semantics of the builtin.

I guess I can have a look to do it this way, if necessary.
Is there already some example of a builtin that does that?

> Alternatively you could make it simply undefined behavior passing
> not appropriately aligned memory to those builtins.  But I guess
> we support misaligned cases so that's not an option.

We definitely must handle the misaligned case (since the **default**
alignment of __int128 on s390x is misaligned for atomic access).

> also looks fishy.  So maybe it _is_ an option to simply require
> larger alignment?  Maybe add targetm.mode_alignment_for_atomics ()?

That's why this wouldn't work either -- a default __int128 is not
correctly aligned, we can only assume alignment if it was specifically
specified by the user via an attribute or the like.  (But since the
latter is a very common case, it is also important to handle.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-03 11:56   ` Ulrich Weigand
@ 2019-09-03 12:28     ` Richard Biener
  2019-09-03 13:09       ` Ulrich Weigand
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2019-09-03 12:28 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: GCC Patches

On Tue, Sep 3, 2019 at 1:56 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
>
> Richard Biener wrote:
> > On Mon, Sep 2, 2019 at 10:35 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > > Now one question might be, why does get_pointer_alignment not take
> > > type alignment into account by itself?  This appears to be deliberate
> > > to avoid considering numeric pointer values to be aligned when they
> > > are not for target-specific reasons (e.g. the low bit that indicates
> > > Thumb on ARM).
> >
> > It's simply because we throw away type casting on GIMPLE so
> > when you see an aligned __int128 * as argument to
> > __builtin_sync* then there's no guarantee it's actually the type
> > the user used here.   The user might have written
> >
> > __buitlin_sync_* ((__int128 *) ptr /* cast away bogus over-alignedness */, ...);
> >
> > and GIMPLE happily throws away the cast.
>
> Huh, I was simply going after the comment in front of get_object_alignment_2:
>    Note that the address (and thus the alignment) computed here is based
>    on the address to which a symbol resolves, whereas DECL_ALIGN is based
>    on the address at which an object is actually located.  These two
>    addresses are not always the same.  For example, on ARM targets,
>    the address &foo of a Thumb function foo() has the lowest bit set,
>    whereas foo() itself starts on an even address.

Sure, that is another reason - but you are not passing a FUNCTION_DECL here.

> combined with the fact that get_object_alignment_2 actually itself
> uses type alignment if we have an actual memory object:
>       /* When EXP is an actual memory reference then we can use
>          TYPE_ALIGN of a pointer indirection to derive alignment.
>          Do so only if get_pointer_alignment_1 did not reveal absolute
>          alignment knowledge and if using that alignment would
>          improve the situation.  */
>
> Is this not correct either, or is the situation there somehow different?

We're looking at an address, the above applies to places where we actually
dereference it and there we take the type from the type of the access
and not from the type of the pointer (or the pointed-to type) since that would
be equally wrong.

> In any case, I'm not sure I understand your example.  Is this supposed
> to cover the case where "ptr" is a type with 16 byte alignment, while
> __int128 only has 8 byte alignment?

Yes.

>  But in that case I'd assume that
> every possible value of "ptr" at runtime must be 16 byte aligned, or
> else it wouldn't be a valid value of a variable of that type, right?
> So in that case assuming a 16-byte alignment would be correct?

If you read the C standards fine-print then yes.  But people (or
even the C frontend!) hardly get that correct since for example
for

struct __attribute__((packed)) { int i; } s;

void foo ()
{
  __builtin_printf ("%p", &s.i);
}

the C fronted actually creates a int * typed pointer for the ADDR_EXPR
(and not an int * variant with 1-byte alignment).  And people use
int * to pass such pointers everywhere.

> > The proper way to carry the information from source to RTL expansion
> > (or earlier GIMPLE) is to add another argument to the builtins
> > specifying alignment which the FEs can derive from the argument type
> > given appropriate semantics of the builtin.
>
> I guess I can have a look to do it this way, if necessary.
> Is there already some example of a builtin that does that?

I think the atomic_* builtins avoid (or should avoid) the issue by
only "expanding"
to the fixed size vairants if the memory is appropriately aligned
and otherwise fall back to _n which doesn't assume any alignment(?)

There's internal functions that put alignment into extra args, I'm not
aware of user-accessible builtins that do.

> > Alternatively you could make it simply undefined behavior passing
> > not appropriately aligned memory to those builtins.  But I guess
> > we support misaligned cases so that's not an option.
>
> We definitely must handle the misaligned case (since the **default**
> alignment of __int128 on s390x is misaligned for atomic access).

But is it fast (and atomic)?  Would going the _n way "work" or do
you need a special 8-byte-aligned__int128 path?

Richard.

> > also looks fishy.  So maybe it _is_ an option to simply require
> > larger alignment?  Maybe add targetm.mode_alignment_for_atomics ()?
>
> That's why this wouldn't work either -- a default __int128 is not
> correctly aligned, we can only assume alignment if it was specifically
> specified by the user via an attribute or the like.  (But since the
> latter is a very common case, it is also important to handle.)
>
> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-03 12:28     ` Richard Biener
@ 2019-09-03 13:09       ` Ulrich Weigand
  2019-09-06 10:46         ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Ulrich Weigand @ 2019-09-03 13:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener wrote:
> On Tue, Sep 3, 2019 at 1:56 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > combined with the fact that get_object_alignment_2 actually itself
> > uses type alignment if we have an actual memory object:
> >       /* When EXP is an actual memory reference then we can use
> >          TYPE_ALIGN of a pointer indirection to derive alignment.
> >          Do so only if get_pointer_alignment_1 did not reveal absolute
> >          alignment knowledge and if using that alignment would
> >          improve the situation.  */
> >
> > Is this not correct either, or is the situation there somehow different?
> 
> We're looking at an address, the above applies to places where we actually
> dereference it and there we take the type from the type of the access
> and not from the type of the pointer (or the pointed-to type) since that would
> be equally wrong.

I guess I'm confused again, sorry :-)   What is "the type of the access",
and where do we get it from?  In your example below, if somebody were
to access s.i here, what would the "type of the access" be?

> >  But in that case I'd assume that
> > every possible value of "ptr" at runtime must be 16 byte aligned, or
> > else it wouldn't be a valid value of a variable of that type, right?
> > So in that case assuming a 16-byte alignment would be correct?
> 
> If you read the C standards fine-print then yes.  But people (or
> even the C frontend!) hardly get that correct since for example
> for
> 
> struct __attribute__((packed)) { int i; } s;
> 
> void foo ()
> {
>   __builtin_printf ("%p", &s.i);
> }
> 
> the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> (and not an int * variant with 1-byte alignment).  And people use
> int * to pass such pointers everywhere.

Well ... I'd say if you cast to int * and then use an atomic operation,
it's your own fault that it fails :-/   If the frontend itself uses
the wrong type, that is of course a problem.

> > > The proper way to carry the information from source to RTL expansion
> > > (or earlier GIMPLE) is to add another argument to the builtins
> > > specifying alignment which the FEs can derive from the argument type
> > > given appropriate semantics of the builtin.
> >
> > I guess I can have a look to do it this way, if necessary.
> > Is there already some example of a builtin that does that?
> 
> I think the atomic_* builtins avoid (or should avoid) the issue by
> only "expanding"
> to the fixed size vairants if the memory is appropriately aligned
> and otherwise fall back to _n which doesn't assume any alignment(?)

Well, this is exactly what we're trying to do!  If the compiler can
prove correct alignment, we want to emit the atomic instruction.
Otherwise, we want to emit the library call.  (The library function
will do a run-time alignment check, and if the object is aligned
after all, it will also use the atomic instruction; otherwise it
will use a lock.)

The problem is that the decision whether to emit the instruction or
the library call is currently done by the back-end: if the insn
expander fails, the middle-end will fall back to the libcall.

So the back-end needs to base this decision on the alignment, but
the only input it has is the MEM (and its alignment setting).  But
that MEM is **generated** by get_builtin_sync_mem using the alignment
computed there, which is exactly the place I'm trying to fix ...

Are you saying that this very decision ought to be made earlier
in the front-end?  That would of course also be OK with me.

> > We definitely must handle the misaligned case (since the **default**
> > alignment of __int128 on s390x is misaligned for atomic access).
> 
> But is it fast (and atomic)?  Would going the _n way "work" or do
> you need a special 8-byte-aligned__int128 path?

See above.  We want the libcall fallback in the unaligned case.  It
is sort-of fast (except for the function call and run-time check
overhead) if the object actually is aligned, less fast if it isn't.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-03 13:09       ` Ulrich Weigand
@ 2019-09-06 10:46         ` Richard Biener
  2019-09-06 13:01           ` Ulrich Weigand
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2019-09-06 10:46 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: GCC Patches

On Tue, Sep 3, 2019 at 3:09 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
>
> Richard Biener wrote:
> > On Tue, Sep 3, 2019 at 1:56 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > > combined with the fact that get_object_alignment_2 actually itself
> > > uses type alignment if we have an actual memory object:
> > >       /* When EXP is an actual memory reference then we can use
> > >          TYPE_ALIGN of a pointer indirection to derive alignment.
> > >          Do so only if get_pointer_alignment_1 did not reveal absolute
> > >          alignment knowledge and if using that alignment would
> > >          improve the situation.  */
> > >
> > > Is this not correct either, or is the situation there somehow different?
> >
> > We're looking at an address, the above applies to places where we actually
> > dereference it and there we take the type from the type of the access
> > and not from the type of the pointer (or the pointed-to type) since that would
> > be equally wrong.
>
> I guess I'm confused again, sorry :-)   What is "the type of the access",
> and where do we get it from?  In your example below, if somebody were
> to access s.i here, what would the "type of the access" be?

The type of the access is 'int' I guess (packed is outside of the C standard).

> > >  But in that case I'd assume that
> > > every possible value of "ptr" at runtime must be 16 byte aligned, or
> > > else it wouldn't be a valid value of a variable of that type, right?
> > > So in that case assuming a 16-byte alignment would be correct?
> >
> > If you read the C standards fine-print then yes.  But people (or
> > even the C frontend!) hardly get that correct since for example
> > for
> >
> > struct __attribute__((packed)) { int i; } s;
> >
> > void foo ()
> > {
> >   __builtin_printf ("%p", &s.i);
> > }
> >
> > the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> > (and not an int * variant with 1-byte alignment).  And people use
> > int * to pass such pointers everywhere.
>
> Well ... I'd say if you cast to int * and then use an atomic operation,
> it's your own fault that it fails :-/   If the frontend itself uses
> the wrong type, that is of course a problem.

Yes.  The C standard says that if you cast something to a pointer type
the pointer has to be aligned according to the pointed-to type, otherwise
undefined.  But we have no chance to use this because of this kind of
issues (and of course developer laziness...).

> > > > The proper way to carry the information from source to RTL expansion
> > > > (or earlier GIMPLE) is to add another argument to the builtins
> > > > specifying alignment which the FEs can derive from the argument type
> > > > given appropriate semantics of the builtin.
> > >
> > > I guess I can have a look to do it this way, if necessary.
> > > Is there already some example of a builtin that does that?
> >
> > I think the atomic_* builtins avoid (or should avoid) the issue by
> > only "expanding"
> > to the fixed size vairants if the memory is appropriately aligned
> > and otherwise fall back to _n which doesn't assume any alignment(?)
>
> Well, this is exactly what we're trying to do!  If the compiler can
> prove correct alignment, we want to emit the atomic instruction.
> Otherwise, we want to emit the library call.  (The library function
> will do a run-time alignment check, and if the object is aligned
> after all, it will also use the atomic instruction; otherwise it
> will use a lock.)
>
> The problem is that the decision whether to emit the instruction or
> the library call is currently done by the back-end: if the insn
> expander fails, the middle-end will fall back to the libcall.
>
> So the back-end needs to base this decision on the alignment, but
> the only input it has is the MEM (and its alignment setting).  But
> that MEM is **generated** by get_builtin_sync_mem using the alignment
> computed there, which is exactly the place I'm trying to fix ...
>
> Are you saying that this very decision ought to be made earlier
> in the front-end?  That would of course also be OK with me.

I'm not sure how it is done now but IIRC the users
use __atomic_load (ptr) and then the frontend changes that
to one of BUILT_IN_ATOMIC_LOAD_{N,1,2,4,8,16} based on
some criteria (size mainly).  I'm saying we should factor in
alignment here, _not_ using say BUILT_IN_ATOMIC_LOAD_16
if according to the C standard the data isn't aligned.  Maybe we can
ask the target whether the alignment according to C matches the
requirement for _16 expansion.  And if things are not fine
the FE should instead use BUILT_IN_ATOMIC_LOAD_N
which we later if the compiler can prove bigger alignment and N
is constant could expand as one of the others.

But safety first.

> > > We definitely must handle the misaligned case (since the **default**
> > > alignment of __int128 on s390x is misaligned for atomic access).
> >
> > But is it fast (and atomic)?  Would going the _n way "work" or do
> > you need a special 8-byte-aligned__int128 path?
>
> See above.  We want the libcall fallback in the unaligned case.  It
> is sort-of fast (except for the function call and run-time check
> overhead) if the object actually is aligned, less fast if it isn't.

OK.

So yes, I'd say try tackling the issue in the frontend which is the
last to know the alignment in the most optimistic way (according
to the C standard).  At RTL expansion time we have to be (very)
conservative.

Btw, the alternative is to add an extra alignment parameter
to all of the atomic builtins where the FE could stick the alignment
for us to use during RTL expansion.  But given we have the _{1,2,4,8,16,N}
variants it sounds easier to use those...

Note we only document __atomic_load and __atomic_load_n so
the _{1,2,4,8,16} variants seem to be implementation detail.

Richard.

> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-06 10:46         ` Richard Biener
@ 2019-09-06 13:01           ` Ulrich Weigand
  2019-09-09  8:37             ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Ulrich Weigand @ 2019-09-06 13:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener wrote:
> On Tue, Sep 3, 2019 at 3:09 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > > If you read the C standards fine-print then yes.  But people (or
> > > even the C frontend!) hardly get that correct since for example
> > > for
> > >
> > > struct __attribute__((packed)) { int i; } s;
> > >
> > > void foo ()
> > > {
> > >   __builtin_printf ("%p", &s.i);
> > > }
> > >
> > > the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> > > (and not an int * variant with 1-byte alignment).  And people use
> > > int * to pass such pointers everywhere.
> >
> > Well ... I'd say if you cast to int * and then use an atomic operation,
> > it's your own fault that it fails :-/   If the frontend itself uses
> > the wrong type, that is of course a problem.
> 
> Yes.  The C standard says that if you cast something to a pointer type
> the pointer has to be aligned according to the pointed-to type, otherwise
> undefined.  But we have no chance to use this because of this kind of
> issues (and of course developer laziness...).

But as far as I can see, for *atomic* operations at least, we do make
that assumption.  The x86 back-end for example just assumes that any
"int" or "long" object that is the target of an atomic operation is
naturally aligned, or else the generated code would just crash.   So
if you used your example with a packed struct and passed that pointer
to an atomic, the back-end would still assume the alignment and the
code would crash.  But I'd still consider this a perfectly reasonable
and expected behavior in this case ...

The only thing that is special on s390 is that for the 16-byte integer
type, the "natural" (mode) alignment is only 8 bytes, but for atomics
we require 16 bytes.  But if you explicitly use a 16-byte aligned
pointer type to assert to the compiler that this object *is* aligned,
the compiler should be able to rely on that.

Of course if some part of the middle end get the alignment wrong, we
have a problem.  But I'm not sure how this could happen here.  I agree
that it might be the case that a user-specified *under*-alignment might
get lost somewhere (e.g. you have a packed int and somewhere in the
optimizers this gets accidentally cast to a normal int with the default
alignment).  But in *this* case, what would have to happen is that the
middle-end accidentally casts something to a pointer with *higher*
than the default alignment for the type, even though no such pointer
cast is present in the source.  Can this really happen?

> I'm not sure how it is done now but IIRC the users
> use __atomic_load (ptr) and then the frontend changes that
> to one of BUILT_IN_ATOMIC_LOAD_{N,1,2,4,8,16} based on
> some criteria (size mainly).  I'm saying we should factor in
> alignment here, _not_ using say BUILT_IN_ATOMIC_LOAD_16
> if according to the C standard the data isn't aligned.  Maybe we can
> ask the target whether the alignment according to C matches the
> requirement for _16 expansion.  And if things are not fine
> the FE should instead use BUILT_IN_ATOMIC_LOAD_N
> which we later if the compiler can prove bigger alignment and N
> is constant could expand as one of the others.
> 
> But safety first.

The problem with using the _N variant is that we then get a call
to the _n version of the library routine, right?  This would actually
be wrong on s390.  The problem is that all atomic operations on any
one single object need to be consistent: they either all use the
16-byte atomic instruction, or else they all protect the access with
a lock.  If you have parts of the code use the lock and other parts
use the instruction, they're not actually protected against each other.

This is why the _16 version of the library routine does the runtime
alignment check, so that all accesses to actually 16-byte aligned
objects use the instruction, both in the library and in compiler-
generated code.   The _n version doesn't do that.

So I guess we'd have to convert the _n version back to the _16
library call if the size is constant 16, or something?

> So yes, I'd say try tackling the issue in the frontend which is the
> last to know the alignment in the most optimistic way (according
> to the C standard).  At RTL expansion time we have to be (very)
> conservative.
> 
> Btw, the alternative is to add an extra alignment parameter
> to all of the atomic builtins where the FE could stick the alignment
> for us to use during RTL expansion.  But given we have the _{1,2,4,8,16,N}
> variants it sounds easier to use those...
> 
> Note we only document __atomic_load and __atomic_load_n so
> the _{1,2,4,8,16} variants seem to be implementation detail.

Adding the alignment parameter would work, I think.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-06 13:01           ` Ulrich Weigand
@ 2019-09-09  8:37             ` Richard Biener
  2019-09-09 14:17               ` Ulrich Weigand
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2019-09-09  8:37 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: GCC Patches

On Fri, Sep 6, 2019 at 3:00 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
>
> Richard Biener wrote:
> > On Tue, Sep 3, 2019 at 3:09 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > > > If you read the C standards fine-print then yes.  But people (or
> > > > even the C frontend!) hardly get that correct since for example
> > > > for
> > > >
> > > > struct __attribute__((packed)) { int i; } s;
> > > >
> > > > void foo ()
> > > > {
> > > >   __builtin_printf ("%p", &s.i);
> > > > }
> > > >
> > > > the C fronted actually creates a int * typed pointer for the ADDR_EXPR
> > > > (and not an int * variant with 1-byte alignment).  And people use
> > > > int * to pass such pointers everywhere.
> > >
> > > Well ... I'd say if you cast to int * and then use an atomic operation,
> > > it's your own fault that it fails :-/   If the frontend itself uses
> > > the wrong type, that is of course a problem.
> >
> > Yes.  The C standard says that if you cast something to a pointer type
> > the pointer has to be aligned according to the pointed-to type, otherwise
> > undefined.  But we have no chance to use this because of this kind of
> > issues (and of course developer laziness...).
>
> But as far as I can see, for *atomic* operations at least, we do make
> that assumption.  The x86 back-end for example just assumes that any
> "int" or "long" object that is the target of an atomic operation is
> naturally aligned, or else the generated code would just crash.   So
> if you used your example with a packed struct and passed that pointer
> to an atomic, the back-end would still assume the alignment and the
> code would crash.  But I'd still consider this a perfectly reasonable
> and expected behavior in this case ...

Would it crash?  I think it would be not atomic if it crossed a cache-line
boundary.

> The only thing that is special on s390 is that for the 16-byte integer
> type, the "natural" (mode) alignment is only 8 bytes, but for atomics
> we require 16 bytes.  But if you explicitly use a 16-byte aligned
> pointer type to assert to the compiler that this object *is* aligned,
> the compiler should be able to rely on that.
>
> Of course if some part of the middle end get the alignment wrong, we
> have a problem.  But I'm not sure how this could happen here.  I agree
> that it might be the case that a user-specified *under*-alignment might
> get lost somewhere (e.g. you have a packed int and somewhere in the
> optimizers this gets accidentally cast to a normal int with the default
> alignment).  But in *this* case, what would have to happen is that the
> middle-end accidentally casts something to a pointer with *higher*
> than the default alignment for the type, even though no such pointer
> cast is present in the source.  Can this really happen?

If the cast to the lower-aligned type is lost and there is an earlier cast
to the aligned type.

So - I don't see how this cannot happen.  Will it be likely?  Probably
not.

> > I'm not sure how it is done now but IIRC the users
> > use __atomic_load (ptr) and then the frontend changes that
> > to one of BUILT_IN_ATOMIC_LOAD_{N,1,2,4,8,16} based on
> > some criteria (size mainly).  I'm saying we should factor in
> > alignment here, _not_ using say BUILT_IN_ATOMIC_LOAD_16
> > if according to the C standard the data isn't aligned.  Maybe we can
> > ask the target whether the alignment according to C matches the
> > requirement for _16 expansion.  And if things are not fine
> > the FE should instead use BUILT_IN_ATOMIC_LOAD_N
> > which we later if the compiler can prove bigger alignment and N
> > is constant could expand as one of the others.
> >
> > But safety first.
>
> The problem with using the _N variant is that we then get a call
> to the _n version of the library routine, right?

Yes.

> This would actually
> be wrong on s390.  The problem is that all atomic operations on any
> one single object need to be consistent: they either all use the
> 16-byte atomic instruction, or else they all protect the access with
> a lock.  If you have parts of the code use the lock and other parts
> use the instruction, they're not actually protected against each other.

But then the user has to be consistent in accessing the object
atomically.  If he accesses it once as (aligned_int128_t *)
and once as (int128_t *) it's his fault, no?

If we'd document that the user invokes undefined behavior
when performing __builtin_atomic () on objects that are not
sufficiently aligned according to target specific needs then
we are of course fine and should simply assume the memory
is aligned accordingly (similar to your patch but probably with
some target hook specifying the atomic alignment requirement
when it differs from mode alignment).  But I don't read the
documentation of our atomic builtins that way.

Does _Atomic __int128_t work properly on s390?

> This is why the _16 version of the library routine does the runtime
> alignment check, so that all accesses to actually 16-byte aligned
> objects use the instruction, both in the library and in compiler-
> generated code.   The _n version doesn't do that.
>
> So I guess we'd have to convert the _n version back to the _16
> library call if the size is constant 16, or something?

If it is aligned 16?  But not sure - see above.

> > So yes, I'd say try tackling the issue in the frontend which is the
> > last to know the alignment in the most optimistic way (according
> > to the C standard).  At RTL expansion time we have to be (very)
> > conservative.
> >
> > Btw, the alternative is to add an extra alignment parameter
> > to all of the atomic builtins where the FE could stick the alignment
> > for us to use during RTL expansion.  But given we have the _{1,2,4,8,16,N}
> > variants it sounds easier to use those...
> >
> > Note we only document __atomic_load and __atomic_load_n so
> > the _{1,2,4,8,16} variants seem to be implementation detail.
>
> Adding the alignment parameter would work, I think.

But then the user still needs to access the int128_t with
consistent alignment, no?  Otherwise it will again not protect
against each other?

Why not always use the slow & correct variant for __int128?

Richard.

> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-09  8:37             ` Richard Biener
@ 2019-09-09 14:17               ` Ulrich Weigand
  2019-09-09 18:27                 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Ulrich Weigand @ 2019-09-09 14:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener wrote:
> On Fri, Sep 6, 2019 at 3:00 PM Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > But as far as I can see, for *atomic* operations at least, we do make
> > that assumption.  The x86 back-end for example just assumes that any
> > "int" or "long" object that is the target of an atomic operation is
> > naturally aligned, or else the generated code would just crash.   So
> > if you used your example with a packed struct and passed that pointer
> > to an atomic, the back-end would still assume the alignment and the
> > code would crash.  But I'd still consider this a perfectly reasonable
> > and expected behavior in this case ...
> 
> Would it crash?  I think it would be not atomic if it crossed a cache-line
> boundary.

Sorry, I misremembered the x86 operations, it does indeed work for
unaligned 4- or 8-byte accesses.  However, for 16-byte accesses,
CMPXCHG16B does require aligned memory, the manual says:

  Note that CMPXCHG16B requires that the destination (memory)
  operand be 16-byte aligned.
[...]
64-Bit Mode Exceptions
[...]
#GP(0) If the memory address is in a non-canonical form.
       If memory operand for CMPXCHG16B is not aligned on a
       16-byte boundary.
[...]

So this is basically the same situation as on s390, except that on x86
the default TImode alignment is already 16 bytes.

> > Of course if some part of the middle end get the alignment wrong, we
> > have a problem.  But I'm not sure how this could happen here.  I agree
> > that it might be the case that a user-specified *under*-alignment might
> > get lost somewhere (e.g. you have a packed int and somewhere in the
> > optimizers this gets accidentally cast to a normal int with the default
> > alignment).  But in *this* case, what would have to happen is that the
> > middle-end accidentally casts something to a pointer with *higher*
> > than the default alignment for the type, even though no such pointer
> > cast is present in the source.  Can this really happen?
> 
> If the cast to the lower-aligned type is lost and there is an earlier cast
> to the aligned type.

My point is that this "cast to the aligned type" must have come from the
user in this case (since the aligned type is *more* aligned that any
standard version of the typ), and if the user casts the value to the aligned
type, it is undefined behavior anyway if the value was in fact not aligned.

> > This would actually
> > be wrong on s390.  The problem is that all atomic operations on any
> > one single object need to be consistent: they either all use the
> > 16-byte atomic instruction, or else they all protect the access with
> > a lock.  If you have parts of the code use the lock and other parts
> > use the instruction, they're not actually protected against each other.
> 
> But then the user has to be consistent in accessing the object
> atomically.  If he accesses it once as (aligned_int128_t *)
> and once as (int128_t *) it's his fault, no?

I'm not sure why this should be a requirement.  E.g. if we have a
set of subroutines that operates (correctly) on any int128_t *,
aligned or not, and have one user of those routines that actually
locally has an aligned_int128_t, then that user could no longer
safely pass that a pointer to its aligned variable to that
subsystem if it also does atomic operations locally ...

> If we'd document that the user invokes undefined behavior
> when performing __builtin_atomic () on objects that are not
> sufficiently aligned according to target specific needs then
> we are of course fine and should simply assume the memory
> is aligned accordingly (similar to your patch but probably with
> some target hook specifying the atomic alignment requirement
> when it differs from mode alignment).  But I don't read the
> documentation of our atomic builtins that way.
> 
> Does _Atomic __int128_t work properly on s390?

Yes, it currently does work properly in all cases (just not in
all cases as efficiently as it could be).

The rule to perform atomic operations on __int128_t on s390 is:
 - If the object is *actually* 16-byte aligned at runtime, then
   every atomic access must be performed using one of the atomic
   instructions (CDSG, LPQ, STPQ).
 - If the object is actually *not* 16-byte aligned, then every
   atomic access must be performed under protection of an
   out-of-line lock.

This rule is correctly implemented by:
 - The __builtin_*_16 family of libatomic library routines;
   these all perform a run-time alignment check and use either
   the instruction or the lock, as appropriate; and
 - Compiler-generated inline code;
   this will always use the instruction, but the compiler will
   generate it only if it can prove at compile-time that the
   object *must* be aligned at run-time.

[ However, this rule is *not* implemented by the _n family of
libatomic library routines.  That is not a problem at the moment
since those will *never* get called on any object of size 16;
but they would be if we implemented your proposal; that's why
I pointed out that this would then *introduce* unsafe behavior. ]

The original point of why I started this discussion is nothing to
do with safety -- code is currently safe and would remain safe
after this change -- but simply performance: we would get compiler
generated inline code more frequently than we do today.

(This is not *just* performance but also convenience, since we
can avoid having to explicitly link in libatomic in many cases
as well.  This would help with portability of some Makefiles
etc. to s390, where they don't link in libatomic since it is
not needed on x86 ...)

> > This is why the _16 version of the library routine does the runtime
> > alignment check, so that all accesses to actually 16-byte aligned
> > objects use the instruction, both in the library and in compiler-
> > generated code.   The _n version doesn't do that.
> >
> > So I guess we'd have to convert the _n version back to the _16
> > library call if the size is constant 16, or something?
> 
> If it is aligned 16?  But not sure - see above.

No, it would have to be done no matter whether it is (i.e. can be
proven at compiler-time) aligned or not, the point is we must call
the _16 version of the library routine so that it can do the run-time
alignment check and perform the appropriate action.

> > > So yes, I'd say try tackling the issue in the frontend which is the
> > > last to know the alignment in the most optimistic way (according
> > > to the C standard).  At RTL expansion time we have to be (very)
> > > conservative.
> > >
> > > Btw, the alternative is to add an extra alignment parameter
> > > to all of the atomic builtins where the FE could stick the alignment
> > > for us to use during RTL expansion.  But given we have the _{1,2,4,8,16,N}
> > > variants it sounds easier to use those...
> > >
> > > Note we only document __atomic_load and __atomic_load_n so
> > > the _{1,2,4,8,16} variants seem to be implementation detail.
> >
> > Adding the alignment parameter would work, I think.
> 
> But then the user still needs to access the int128_t with
> consistent alignment, no?  Otherwise it will again not protect
> against each other?

See above, it will still work *correctly* in either case.  The
alignment info simply allows the compiler to safely choose the
fast (inlined) variant. 

> Why not always use the slow & correct variant for __int128?

As above, both slow & fast variants are correct; it's just that
the fast variant is, well, faster and more convenient.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Use type alignment in get_builtin_sync_mem
  2019-09-09 14:17               ` Ulrich Weigand
@ 2019-09-09 18:27                 ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches

On September 9, 2019 4:17:20 PM GMT+02:00, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>Richard Biener wrote:
>> On Fri, Sep 6, 2019 at 3:00 PM Ulrich Weigand <uweigand@de.ibm.com>
>wrote:
>> > But as far as I can see, for *atomic* operations at least, we do
>make
>> > that assumption.  The x86 back-end for example just assumes that
>any
>> > "int" or "long" object that is the target of an atomic operation is
>> > naturally aligned, or else the generated code would just crash.  
>So
>> > if you used your example with a packed struct and passed that
>pointer
>> > to an atomic, the back-end would still assume the alignment and the
>> > code would crash.  But I'd still consider this a perfectly
>reasonable
>> > and expected behavior in this case ...
>> 
>> Would it crash?  I think it would be not atomic if it crossed a
>cache-line
>> boundary.
>
>Sorry, I misremembered the x86 operations, it does indeed work for
>unaligned 4- or 8-byte accesses.  However, for 16-byte accesses,
>CMPXCHG16B does require aligned memory, the manual says:
>
>  Note that CMPXCHG16B requires that the destination (memory)
>  operand be 16-byte aligned.
>[...]
>64-Bit Mode Exceptions
>[...]
>#GP(0) If the memory address is in a non-canonical form.
>       If memory operand for CMPXCHG16B is not aligned on a
>       16-byte boundary.
>[...]
>
>So this is basically the same situation as on s390, except that on x86
>the default TImode alignment is already 16 bytes.
>
>> > Of course if some part of the middle end get the alignment wrong,
>we
>> > have a problem.  But I'm not sure how this could happen here.  I
>agree
>> > that it might be the case that a user-specified *under*-alignment
>might
>> > get lost somewhere (e.g. you have a packed int and somewhere in the
>> > optimizers this gets accidentally cast to a normal int with the
>default
>> > alignment).  But in *this* case, what would have to happen is that
>the
>> > middle-end accidentally casts something to a pointer with *higher*
>> > than the default alignment for the type, even though no such
>pointer
>> > cast is present in the source.  Can this really happen?
>> 
>> If the cast to the lower-aligned type is lost and there is an earlier
>cast
>> to the aligned type.
>
>My point is that this "cast to the aligned type" must have come from
>the
>user in this case (since the aligned type is *more* aligned that any
>standard version of the typ), and if the user casts the value to the
>aligned
>type, it is undefined behavior anyway if the value was in fact not
>aligned.
>
>> > This would actually
>> > be wrong on s390.  The problem is that all atomic operations on any
>> > one single object need to be consistent: they either all use the
>> > 16-byte atomic instruction, or else they all protect the access
>with
>> > a lock.  If you have parts of the code use the lock and other parts
>> > use the instruction, they're not actually protected against each
>other.
>> 
>> But then the user has to be consistent in accessing the object
>> atomically.  If he accesses it once as (aligned_int128_t *)
>> and once as (int128_t *) it's his fault, no?
>
>I'm not sure why this should be a requirement.  E.g. if we have a
>set of subroutines that operates (correctly) on any int128_t *,
>aligned or not, and have one user of those routines that actually
>locally has an aligned_int128_t, then that user could no longer
>safely pass that a pointer to its aligned variable to that
>subsystem if it also does atomic operations locally ...
>
>> If we'd document that the user invokes undefined behavior
>> when performing __builtin_atomic () on objects that are not
>> sufficiently aligned according to target specific needs then
>> we are of course fine and should simply assume the memory
>> is aligned accordingly (similar to your patch but probably with
>> some target hook specifying the atomic alignment requirement
>> when it differs from mode alignment).  But I don't read the
>> documentation of our atomic builtins that way.
>> 
>> Does _Atomic __int128_t work properly on s390?
>
>Yes, it currently does work properly in all cases (just not in
>all cases as efficiently as it could be).
>
>The rule to perform atomic operations on __int128_t on s390 is:
> - If the object is *actually* 16-byte aligned at runtime, then
>   every atomic access must be performed using one of the atomic
>   instructions (CDSG, LPQ, STPQ).
> - If the object is actually *not* 16-byte aligned, then every
>   atomic access must be performed under protection of an
>   out-of-line lock.
>
>This rule is correctly implemented by:
> - The __builtin_*_16 family of libatomic library routines;
>   these all perform a run-time alignment check and use either
>   the instruction or the lock, as appropriate; and
> - Compiler-generated inline code;
>   this will always use the instruction, but the compiler will
>   generate it only if it can prove at compile-time that the
>   object *must* be aligned at run-time.
>
>[ However, this rule is *not* implemented by the _n family of
>libatomic library routines.  That is not a problem at the moment
>since those will *never* get called on any object of size 16;
>but they would be if we implemented your proposal; that's why
>I pointed out that this would then *introduce* unsafe behavior. ]
>
>The original point of why I started this discussion is nothing to
>do with safety -- code is currently safe and would remain safe
>after this change -- but simply performance: we would get compiler
>generated inline code more frequently than we do today.
>
>(This is not *just* performance but also convenience, since we
>can avoid having to explicitly link in libatomic in many cases
>as well.  This would help with portability of some Makefiles
>etc. to s390, where they don't link in libatomic since it is
>not needed on x86 ...)
>
>> > This is why the _16 version of the library routine does the runtime
>> > alignment check, so that all accesses to actually 16-byte aligned
>> > objects use the instruction, both in the library and in compiler-
>> > generated code.   The _n version doesn't do that.
>> >
>> > So I guess we'd have to convert the _n version back to the _16
>> > library call if the size is constant 16, or something?
>> 
>> If it is aligned 16?  But not sure - see above.
>
>No, it would have to be done no matter whether it is (i.e. can be
>proven at compiler-time) aligned or not, the point is we must call
>the _16 version of the library routine so that it can do the run-time
>alignment check and perform the appropriate action.
>
>> > > So yes, I'd say try tackling the issue in the frontend which is
>the
>> > > last to know the alignment in the most optimistic way (according
>> > > to the C standard).  At RTL expansion time we have to be (very)
>> > > conservative.
>> > >
>> > > Btw, the alternative is to add an extra alignment parameter
>> > > to all of the atomic builtins where the FE could stick the
>alignment
>> > > for us to use during RTL expansion.  But given we have the
>_{1,2,4,8,16,N}
>> > > variants it sounds easier to use those...
>> > >
>> > > Note we only document __atomic_load and __atomic_load_n so
>> > > the _{1,2,4,8,16} variants seem to be implementation detail.
>> >
>> > Adding the alignment parameter would work, I think.
>> 
>> But then the user still needs to access the int128_t with
>> consistent alignment, no?  Otherwise it will again not protect
>> against each other?
>
>See above, it will still work *correctly* in either case.  The
>alignment info simply allows the compiler to safely choose the
>fast (inlined) variant. 
>
>> Why not always use the slow & correct variant for __int128?
>
>As above, both slow & fast variants are correct; it's just that
>the fast variant is, well, faster and more convenient.

So the compiler already does that correctly then via get_pointer_alignment. 
Why not inline the runtime alignment check as well plus the fast case and the library call fallback? 

Richard. 

>Bye,
>Ulrich

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-09-09 18:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 20:35 [PATCH] Use type alignment in get_builtin_sync_mem Ulrich Weigand
2019-09-03 10:55 ` Richard Biener
2019-09-03 11:56   ` Ulrich Weigand
2019-09-03 12:28     ` Richard Biener
2019-09-03 13:09       ` Ulrich Weigand
2019-09-06 10:46         ` Richard Biener
2019-09-06 13:01           ` Ulrich Weigand
2019-09-09  8:37             ` Richard Biener
2019-09-09 14:17               ` Ulrich Weigand
2019-09-09 18:27                 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).