public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/3] Remove support for obsolete x86 -malign-foo options
  2017-04-18 18:30 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
  2017-04-18 18:30 ` [PATCH 2/3] Temporary remove "at least 8 byte alignment" code from x86 Denys Vlasenko
@ 2017-04-18 18:30 ` Denys Vlasenko
  2017-05-06  7:22   ` Uros Bizjak
  2017-04-18 18:46 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
  2017-05-05 14:40 ` [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
  3 siblings, 1 reply; 26+ messages in thread
From: Denys Vlasenko @ 2017-04-18 18:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt,
	Sandra Loosemore

2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>

    * config/i386/i386-common.c (ix86_handle_option): Remove support
    for obsolete -malign-loops, -malign-jumps and -malign-functions
    options.
    * config/i386/i386.opt: Likewise.

Index: gcc/common/config/i386/i386-common.c
===================================================================
--- gcc/common/config/i386/i386-common.c	(revision 240663)
+++ gcc/common/config/i386/i386-common.c	(working copy)
@@ -998,38 +998,6 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
-
-  /* Comes from final.c -- no real reason to change it.  */
-#define MAX_CODE_ALIGN 16
-
-    case OPT_malign_loops_:
-      warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops");
-      if (value > MAX_CODE_ALIGN)
-	error_at (loc, "-malign-loops=%d is not between 0 and %d",
-		  value, MAX_CODE_ALIGN);
-      else
-	opts->x_align_loops = 1 << value;
-      return true;
-
-    case OPT_malign_jumps_:
-      warning_at (loc, 0, "-malign-jumps is obsolete, use -falign-jumps");
-      if (value > MAX_CODE_ALIGN)
-	error_at (loc, "-malign-jumps=%d is not between 0 and %d",
-		  value, MAX_CODE_ALIGN);
-      else
-	opts->x_align_jumps = 1 << value;
-      return true;
-
-    case OPT_malign_functions_:
-      warning_at (loc, 0,
-		  "-malign-functions is obsolete, use -falign-functions");
-      if (value > MAX_CODE_ALIGN)
-	error_at (loc, "-malign-functions=%d is not between 0 and %d",
-		  value, MAX_CODE_ALIGN);
-      else
-	opts->x_align_functions = 1 << value;
-      return true;
-
     case OPT_mbranch_cost_:
       if (value > 5)
 	{
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(revision 240663)
+++ gcc/config/i386/i386.opt	(working copy)
@@ -205,18 +205,6 @@ malign-double
 Target Report Mask(ALIGN_DOUBLE) Save
 Align some doubles on dword boundary.
 
-malign-functions=
-Target RejectNegative Joined UInteger
-Function starts are aligned to this power of 2.
-
-malign-jumps=
-Target RejectNegative Joined UInteger
-Jump targets are aligned to this power of 2.
-
-malign-loops=
-Target RejectNegative Joined UInteger
-Loop code aligned to this power of 2.
-
 malign-stringops
 Target RejectNegative Report InverseMask(NO_ALIGN_STRINGOPS, ALIGN_STRINGOPS) Save
 Align destination of the string operations.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8
@ 2017-04-18 18:30 Denys Vlasenko
  2017-04-18 18:30 ` [PATCH 2/3] Temporary remove "at least 8 byte alignment" code from x86 Denys Vlasenko
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Denys Vlasenko @ 2017-04-18 18:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt,
	Sandra Loosemore

These patches are for this bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66240
"RFE: extend -falign-xyz syntax"

An extended explanation is in commit message of patch 3.

The test program:

int g();
int f(int i) {
        i *= 3;
        while (--i > 100) {
 L1:            if (g()) goto L1;
                if (g()) goto L2;
        }
        return i;
 L2:    return 123;
}

"-O2" assembly before the patch:	After the patch:
        .text                           	.text
        .p2align 4,,15                  	.p2align 4
        .globl  f                       	.globl	f
        .type   f, @function            	.type	f, @function
f:                                      f:
.LFB0:                                  .LFB0:
        pushq   %rbx                    	pushq	%rbx
        leal    (%rdi,%rdi,2), %ebx     	leal	(%rdi,%rdi,2), %ebx
        .p2align 4,,10                  	.p2align 4,,10
        .p2align 3                      	.p2align 3
.L2:                                    .L2:
        subl    $1, %ebx                	subl	$1, %ebx
        cmpl    $100, %ebx              	cmpl	$100, %ebx
        jle     .L1                     	jle	.L1
        .p2align 4,,10                  	.p2align 4,,10
        .p2align 3                      	.p2align 3
.L3:                                    .L3:
        xorl    %eax, %eax              	xorl	%eax, %eax
        call    g                       	call	g
        testl   %eax, %eax              	testl	%eax, %eax
        jne     .L3                     	jne	.L3
        call    g                       	call	g
        testl   %eax, %eax              	testl	%eax, %eax
        je      .L2                     	je	.L2
        movl    $123, %ebx              	movl	$123, %ebx
.L4:                                    .L4:
.L1:                                    .L1:
        movl    %ebx, %eax              	movl	%ebx, %eax
        popq    %rbx                    	popq	%rbx
        ret                             	ret

This is version 8 of the patch set.

Changes since version 7:

* Documentation fixes

Changes since version 6:

* Rediffed to accomodate changes introduced by recently introduced
  -flimit-function-alignment

Changes since version 5:

* Changes in rs6000, mips, alpha, visium, sh, rx, spu to accomodate
  new alignment options.
* Explicitly list secondary alignment of 8 ("n,m,8") in x86 tables
  for all types of jump targets.

Changes since version 4:

* Deleted rather than NOPed -malign-foo=N support.
* Improved behavior match with x86 8-byte subalignment for labels.

Changes since version 3:

* Improved documentation in invoke.texi
* Fixed x86-specific calculation of default N2 value:
  previous version was doing it incorrectly for cross-compile

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/3] Temporary remove "at least 8 byte alignment" code from x86
  2017-04-18 18:30 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
@ 2017-04-18 18:30 ` Denys Vlasenko
  2017-04-18 18:30 ` [PATCH 1/3] Remove support for obsolete x86 -malign-foo options Denys Vlasenko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Denys Vlasenko @ 2017-04-18 18:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt,
	Sandra Loosemore

This change drops forced alignment to 8 if requested alignment is higher
than 8: before the patch, -falign-functions=9 was generating

        .p2align 4,,8
        .p2align 3

which means: "align to 16 if the skip is 8 bytes or less; else align to 8".
After this change, ".p2align 3" is not emitted.

This behavior will be implemented differently by the next patch.

The new SUBALIGN_LOG define will be used by the next patch.

While we are here, avoid generating ".p2align N,,2^N-1" -
it is functionally equivalent to ".p2align N". In this case, use the latter.

2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>

    * config/i386/dragonfly.h: (ASM_OUTPUT_MAX_SKIP_ALIGN):
    Use a simpler align directive also if MAXSKIP = ALIGN-1.
    * config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
    * config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
    * config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
    * config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
    * config/i386/freebsd.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Remove "If N
    is large, do at least 8 byte alignment" code. Add SUBALIGN_LOG
    define. Use a simpler align directive also if MAXSKIP = ALIGN-1.
    * config/i386/gnu-user.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
    * config/i386/iamcu.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
    * config/i386/openbsdelf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
    * config/i386/x86-64.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.

Index: gcc/config/i386/dragonfly.h
===================================================================
--- gcc/config/i386/dragonfly.h	(revision 239860)
+++ gcc/config/i386/dragonfly.h	(working copy)
@@ -69,10 +69,12 @@ see the files COPYING3 and COPYING.RUNTIME respect
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #undef  ASM_OUTPUT_MAX_SKIP_ALIGN
-#define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE, LOG, MAX_SKIP)					\
-  if ((LOG) != 0) {														\
-    if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-    else fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
+#define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE, LOG, MAX_SKIP)			\
+  if ((LOG) != 0) {							\
+    if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)			\
+      fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+    else								\
+      fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
   }
 #endif
 
Index: gcc/config/i386/freebsd.h
===================================================================
--- gcc/config/i386/freebsd.h	(revision 239860)
+++ gcc/config/i386/freebsd.h	(working copy)
@@ -92,9 +92,9 @@ along with GCC; see the file COPYING3.  If not see
 
 /* A C statement to output to the stdio stream FILE an assembler
    command to advance the location counter to a multiple of 1<<LOG
-   bytes if it is within MAX_SKIP bytes.
+   bytes if it is within MAX_SKIP bytes.  */
 
-   This is used to align code labels according to Intel recommendations.  */
+#define SUBALIGN_LOG 3
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #undef  ASM_OUTPUT_MAX_SKIP_ALIGN
@@ -101,16 +101,10 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)			\
   do {									\
     if ((LOG) != 0) {							\
-      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-      else {								\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
 	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
-	/* Make sure that we have at least 8 byte alignment if > 8 byte \
-	   alignment is preferred.  */					\
-	if ((LOG) > 3							\
-	    && (1 << (LOG)) > ((MAX_SKIP) + 1)				\
-	    && (MAX_SKIP) >= 7)						\
-	  fputs ("\t.p2align 3\n", (FILE));				\
-      }									\
     }									\
   } while (0)
 #endif
Index: gcc/config/i386/gas.h
===================================================================
--- gcc/config/i386/gas.h	(revision 239860)
+++ gcc/config/i386/gas.h	(working copy)
@@ -72,10 +72,12 @@ along with GCC; see the file COPYING3.  If not see
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #  define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP) \
-     if ((LOG) != 0) {\
-       if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
-       else fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
-     }
+    if ((LOG) != 0) { \
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
+	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
+    }
 #endif
 \f
 /* A C statement or statements which output an assembler instruction
Index: gcc/config/i386/gnu-user.h
===================================================================
--- gcc/config/i386/gnu-user.h	(revision 239860)
+++ gcc/config/i386/gnu-user.h	(working copy)
@@ -94,24 +94,18 @@ along with GCC; see the file COPYING3.  If not see
 
 /* A C statement to output to the stdio stream FILE an assembler
    command to advance the location counter to a multiple of 1<<LOG
-   bytes if it is within MAX_SKIP bytes.
+   bytes if it is within MAX_SKIP bytes.  */
 
-   This is used to align code labels according to Intel recommendations.  */
+#define SUBALIGN_LOG 3
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)			\
   do {									\
     if ((LOG) != 0) {							\
-      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-      else {								\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
 	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
-	/* Make sure that we have at least 8 byte alignment if > 8 byte \
-	   alignment is preferred.  */					\
-	if ((LOG) > 3							\
-	    && (1 << (LOG)) > ((MAX_SKIP) + 1)				\
-	    && (MAX_SKIP) >= 7)						\
-	  fputs ("\t.p2align 3\n", (FILE));				\
-      }									\
     }									\
   } while (0)
 #endif
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 239860)
+++ gcc/config/i386/i386.h	(working copy)
@@ -2271,7 +2271,7 @@ do {									\
 #define ASM_OUTPUT_MAX_SKIP_PAD(FILE, LOG, MAX_SKIP)			\
   if ((LOG) != 0)							\
     {									\
-      if ((MAX_SKIP) == 0)						\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
         fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
       else								\
         fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
Index: gcc/config/i386/iamcu.h
===================================================================
--- gcc/config/i386/iamcu.h	(revision 239860)
+++ gcc/config/i386/iamcu.h	(working copy)
@@ -62,23 +62,17 @@ see the files COPYING3 and COPYING.RUNTIME respect
 
 /* A C statement to output to the stdio stream FILE an assembler
    command to advance the location counter to a multiple of 1<<LOG
-   bytes if it is within MAX_SKIP bytes.
+   bytes if it is within MAX_SKIP bytes.  */
 
-   This is used to align code labels according to Intel recommendations.  */
+#define SUBALIGN_LOG 3
 
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)			\
   do {									\
     if ((LOG) != 0) {							\
-      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-      else {								\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
 	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
-	/* Make sure that we have at least 8 byte alignment if > 8 byte \
-	   alignment is preferred.  */					\
-	if ((LOG) > 3							\
-	    && (1 << (LOG)) > ((MAX_SKIP) + 1)				\
-	    && (MAX_SKIP) >= 7)						\
-	  fputs ("\t.p2align 3\n", (FILE));				\
-      }									\
     }									\
   } while (0)
 
Index: gcc/config/i386/lynx.h
===================================================================
--- gcc/config/i386/lynx.h	(revision 239860)
+++ gcc/config/i386/lynx.h	(working copy)
@@ -61,8 +61,10 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)			\
   do {									\
     if ((LOG) != 0) {							\
-      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-      else fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
+	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
     }									\
   } while (0)
 #endif
Index: gcc/config/i386/netbsd-elf.h
===================================================================
--- gcc/config/i386/netbsd-elf.h	(revision 239860)
+++ gcc/config/i386/netbsd-elf.h	(working copy)
@@ -104,8 +104,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE, LOG, MAX_SKIP)			\
   if ((LOG) != 0) {							\
-    if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-    else fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
+    if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)			\
+      fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+    else								\
+      fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
   }
 #endif
 
Index: gcc/config/i386/openbsdelf.h
===================================================================
--- gcc/config/i386/openbsdelf.h	(revision 239860)
+++ gcc/config/i386/openbsdelf.h	(working copy)
@@ -63,24 +63,18 @@ along with GCC; see the file COPYING3.  If not see
 
 /* A C statement to output to the stdio stream FILE an assembler
    command to advance the location counter to a multiple of 1<<LOG
-   bytes if it is within MAX_SKIP bytes.
+   bytes if it is within MAX_SKIP bytes.  */
 
-   This is used to align code labels according to Intel recommendations.  */
+#define SUBALIGN_LOG 3
 
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)			\
   do {									\
     if ((LOG) != 0) {							\
-      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-      else {								\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
 	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
-	/* Make sure that we have at least 8 byte alignment if > 8 byte \
-	   alignment is preferred.  */					\
-	if ((LOG) > 3							\
-	    && (1 << (LOG)) > ((MAX_SKIP) + 1)				\
-	    && (MAX_SKIP) >= 7)						\
-	  fputs ("\t.p2align 3\n", (FILE));				\
-      }									\
     }									\
   } while (0)
 #endif
Index: gcc/config/i386/x86-64.h
===================================================================
--- gcc/config/i386/x86-64.h	(revision 239860)
+++ gcc/config/i386/x86-64.h	(working copy)
@@ -61,20 +61,16 @@ see the files COPYING3 and COPYING.RUNTIME respect
 
 /* This is used to align code labels according to Intel recommendations.  */
 
+#define SUBALIGN_LOG 3
+
 #ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)			\
   do {									\
     if ((LOG) != 0) {							\
-      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG));	\
-      else {								\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
+	fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
+      else								\
 	fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
-	/* Make sure that we have at least 8 byte alignment if > 8 byte \
-	   alignment is preferred.  */					\
-	if ((LOG) > 3							\
-	    && (1 << (LOG)) > ((MAX_SKIP) + 1)				\
-	    && (MAX_SKIP) >= 7)						\
-	  fputs ("\t.p2align 3\n", (FILE));				\
-      }									\
     }									\
   } while (0)
 #undef  ASM_OUTPUT_MAX_SKIP_PAD
@@ -81,7 +77,7 @@ see the files COPYING3 and COPYING.RUNTIME respect
 #define ASM_OUTPUT_MAX_SKIP_PAD(FILE, LOG, MAX_SKIP)			\
   if ((LOG) != 0)							\
     {									\
-      if ((MAX_SKIP) == 0)						\
+      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)		\
         fprintf ((FILE), "\t.p2align %d\n", (LOG));			\
       else								\
         fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2017-04-18 18:30 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
  2017-04-18 18:30 ` [PATCH 2/3] Temporary remove "at least 8 byte alignment" code from x86 Denys Vlasenko
  2017-04-18 18:30 ` [PATCH 1/3] Remove support for obsolete x86 -malign-foo options Denys Vlasenko
@ 2017-04-18 18:46 ` Denys Vlasenko
  2017-04-18 19:12   ` Sandra Loosemore
  2017-05-05 14:40 ` [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
  3 siblings, 1 reply; 26+ messages in thread
From: Denys Vlasenko @ 2017-04-18 18:46 UTC (permalink / raw)
  To: gcc-patches
  Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt,
	Sandra Loosemore

falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte alignment
of functions runs fastest (he tried many other possibilites):
this way, after a call CPU can fetch a lot of insns in the first cacheline fill.

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
        On 05/20/2015 02:47 AM, Linus Torvalds wrote:
        > At the same time, I have to admit that I abhor a 64-byte function
        > alignment, when we have a fair number of functions that are (much)
        > smaller than that.
        >
        > Is there some way to get gcc to take the size of the function into
        > account? Because aligning a 16-byte or 32-byte function on a 64-byte
        > alignment is just criminally nasty and wasteful.

This change makes it possible to align functions to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Example syntax is -falign-functions=64,9: "align to 64 by skipping up to
9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able
to fetch at least 9 bytes of insns".

x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment.
For example, falign-functions=10 was emitting this before every function:
	.p2align 4,,9
	.p2align 3
This tweak was removed by the previous patch. Now it is reinstated
by the logic that if falign-functions=N[,M] is specified and N > 8,
then default value of N2 is 8, not 1. Now this can be suppressed by
falign-functions=N,M,1 - which wasn't possible before.
In general, optional N2,M2 pair can be used to generate any secondary
alignment user wants.

Subalignment for loops/jumps/labels are trickier to fully implement.
The implementation in this patch uses falign-labels subalignment values
for any of these three types of labels - but only if "main" alignment
triggers. With -O2 defaults, this provides a matching behavior on x86:
loops and jumps are aligned (to 16-32 bytes depending on selected CPU)
and subaligned to 8 bytes. Labels are not aligned.

Testing:

Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal parameters) works exactly
like -falign-functions=N.

No change from past behavior:
Tested that "-falign-functions" uses an arch-dependent alignment.
Tested that "-O2" uses an arch-dependent alignment.
Tested that "-O2 -falign-functions=N" uses explicitly given alignment.

2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>

    * doc/invoke.texi: Update option documentation.
    * common.opt (-falign-functions=): Accept a string instead of integer.
    (-falign-jumps=): Likewise.
    (-falign-labels=): Likewise.
    (-falign-loops=): Likewise.
    * flags.h (struct target_flag_state): Revamp how alignment data is stored:
    for each of four alignment types, store two pairs of log/maxskip values.
    * toplev.c (read_uint): New function.
    (read_log_maxskip): New function.
    (parse_N_M): New function.
    (init_alignments): Rename to parse_alignment_opts, make globally visible.
    Set align_foo[0/1].log/maxskip from
    specified falign-FOO=N[,M[,N[,M]]] options.
    * toplev.h (parse_alignment_opts): Now globally visible.
    (min_align_loops_log): Variable which holds arch override for minimal
    alignment of loops.
    (min_align_jumps_log): Likewise for jumps.
    (min_align_labels_log): Likewise for labels.
    (min_align_functions_log): Likewise for functions.
    * varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN
    macros, first for N,M and second time for N2,M2 from
    falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives.
    * final.c (final_scan_insn): If a label, jump or loop target
    is being aligned, emit a secondary alignment directive.
    * config/i386/i386.c (struct ptt): Change foo_align members from
    integers to strings. Add align_label member. Set it to "0,0,8"
    on the processors which have maxskips > 7 for loops and jumps -
    this preserves existing behaviout of adding 8-byte subalign.
    * config/i386/i386.c (processor_target_table): Likewise.
    * config/aarch64/aarch64-protos.h (struct tune_params):
    Change foo_align members from integers to strings.
    * config/aarch64/aarch64.c (<cpu>_tunings):
    Change foo_align field values from integers to strings.
    * config/arm/arm.c (arm_override_options_after_change_1):
    Fix if() condition to detect that -falign-functions is specified,
    change code which sets arch-default alignment.
    * config/i386/i386.c (ix86_default_align): Likewise.
    * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
    * config/mips/mips.c (mips_set_compression_mode): Likewise.
    * config/alpha/alpha.c (alpha_override_options_after_change): Likewise.
    * config/visium/visium.c (visium_option_override): Likewise.
    * config/sh/sh.c (sh_override_options_after_change): Likewise.
    * config/rx/rx.c (rx_option_override): Likewise.
    * config/rx/rx.h (JUMP_ALIGN): Use new variables to access alignment
    information.
    (LABEL_ALIGN): Likewise.
    (LOOP_ALIGN): Likewise.
    * config/spu/spu.c (spu_sched_init): Call parse_alignment_opts(), then
    use new variables to access alignment information.
    * config/sh/sh.c (sh_override_options_after_change): Likewise.
    * testsuite/gcc.target/i386/falign-functions.c: New file.

Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 246948)
+++ gcc/common.opt	(working copy)
@@ -921,35 +921,35 @@ Common Report Var(flag_aggressive_loop_optimizatio
 Aggressively optimize loops using language constraints.
 
 falign-functions
-Common Report Var(align_functions,0) Optimization UInteger
+Common Report Var(flag_align_functions) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions)
+Common RejectNegative Joined Var(str_align_functions)
 
 flimit-function-alignment
 Common Report Var(flag_limit_function_alignment) Optimization Init(0)
 
 falign-jumps
-Common Report Var(align_jumps,0) Optimization UInteger
+Common Report Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps)
+Common RejectNegative Joined Var(str_align_jumps)
 
 falign-labels
-Common Report Var(align_labels,0) Optimization UInteger
+Common Report Var(flag_align_labels) Optimization
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels)
+Common RejectNegative Joined Var(str_align_labels)
 
 falign-loops
-Common Report Var(align_loops,0) Optimization UInteger
+Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops)
+Common RejectNegative Joined Var(str_align_loops)
 
 fargument-alias
 Common Ignore
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	(revision 246948)
+++ gcc/config/aarch64/aarch64-protos.h	(working copy)
@@ -214,9 +214,9 @@ struct tune_params
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
-  int function_align;
-  int jump_align;
-  int loop_align;
+  const char *function_align;
+  const char *jump_align;
+  const char *loop_align;
   int int_reassoc_width;
   int fp_reassoc_width;
   int vec_reassoc_width;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 246948)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -537,9 +537,9 @@ static const struct tune_params generic_tunings =
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -563,9 +563,9 @@ static const struct tune_params cortexa35_tunings
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -589,9 +589,9 @@ static const struct tune_params cortexa53_tunings
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -615,9 +615,9 @@ static const struct tune_params cortexa57_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -641,9 +641,9 @@ static const struct tune_params cortexa72_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -667,9 +667,9 @@ static const struct tune_params cortexa73_tunings
   2, /* issue_rate.  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -692,9 +692,9 @@ static const struct tune_params exynosm1_tunings =
   4,	/* memmov_cost  */
   3,	/* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  4,	/* function_align.  */
-  4,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -717,9 +717,9 @@ static const struct tune_params thunderx_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -742,9 +742,9 @@ static const struct tune_params xgene1_tunings =
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -768,9 +768,9 @@ static const struct tune_params qdf24xx_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -793,9 +793,9 @@ static const struct tune_params thunderx2t99_tunin
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
   (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   3,	/* int_reassoc_width.  */
   2,	/* fp_reassoc_width.  */
   2,	/* vec_reassoc_width.  */
Index: gcc/config/alpha/alpha.c
===================================================================
--- gcc/config/alpha/alpha.c	(revision 246948)
+++ gcc/config/alpha/alpha.c	(working copy)
@@ -609,13 +609,13 @@ alpha_override_options_after_change (void)
   /* ??? Kludge these by not doing anything if we don't optimize.  */
   if (optimize > 0)
     {
-      if (align_loops <= 0)
-	align_loops = 16;
-      if (align_jumps <= 0)
-	align_jumps = 16;
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = "16";
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = "16";
     }
-  if (align_functions <= 0)
-    align_functions = 16;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = "16";
 }
 \f
 /* Returns 1 if VALUE is a mask that contains full bytes of zero or ones.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 246948)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2902,9 +2902,10 @@ static GTY(()) tree init_optimize;
 static void
 arm_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (opts->x_align_functions <= 0)
-    opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags)
-      && opts->x_optimize_size ? 2 : 4;
+  /* -falign-functions without argument: supply one */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags)
+      && opts->x_optimize_size ? "2" : "4";
 }
 
 /* Implement targetm.override_options_after_change.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 246948)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2636,45 +2636,47 @@ struct ptt
 {
   const char *const name;			/* processor name  */
   const struct processor_costs *cost;		/* Processor costs */
-  const int align_loop;				/* Default alignments.  */
-  const int align_loop_max_skip;
-  const int align_jump;
-  const int align_jump_max_skip;
-  const int align_func;
+  const char *const align_loop;			/* Default alignments.  */
+  const char *const align_jump;
+  const char *const align_label;
+  const char *const align_func;
 };
 
 /* This table must be in sync with enum processor_type in i386.h.  */ 
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {"generic", &generic_cost, 16, 10, 16, 10, 16},
-  {"i386", &i386_cost, 4, 3, 4, 3, 4},
-  {"i486", &i486_cost, 16, 15, 16, 15, 16},
-  {"pentium", &pentium_cost, 16, 7, 16, 7, 16},
-  {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16},
-  {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16},
-  {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0},
-  {"nocona", &nocona_cost, 0, 0, 0, 0, 0},
-  {"core2", &core_cost, 16, 10, 16, 10, 16},
-  {"nehalem", &core_cost, 16, 10, 16, 10, 16},
-  {"sandybridge", &core_cost, 16, 10, 16, 10, 16},
-  {"haswell", &core_cost, 16, 10, 16, 10, 16},
-  {"bonnell", &atom_cost, 16, 15, 16, 7, 16},
-  {"silvermont", &slm_cost, 16, 15, 16, 7, 16},
-  {"knl", &slm_cost, 16, 15, 16, 7, 16},
-  {"skylake-avx512", &core_cost, 16, 10, 16, 10, 16},
-  {"intel", &intel_cost, 16, 15, 16, 7, 16},
-  {"geode", &geode_cost, 0, 0, 0, 0, 0},
-  {"k6", &k6_cost, 32, 7, 32, 7, 32},
-  {"athlon", &athlon_cost, 16, 7, 16, 7, 16},
-  {"k8", &k8_cost, 16, 7, 16, 7, 16},
-  {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32},
-  {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11},
-  {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11},
-  {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11},
-  {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11},
-  {"btver1", &btver1_cost, 16, 10, 16, 7, 11},
-  {"btver2", &btver2_cost, 16, 10, 16, 7, 11},
-  {"znver1", &znver1_cost, 16, 15, 16, 15, 16}
+/* The "0,0,8" label alignment specified for some processors generates
+   secondary 8-byte alignment only for those label/jump/loop targets
+   which have primary alignment.  */
+  {"generic",    &generic_cost,   "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"i386",       &i386_cost,      "4",       "4",       NULL,    "4" },
+  {"i486",       &i486_cost,      "16,16,8", "16,16,8", "0,0,8", "16"},
+  {"pentium",    &pentium_cost,   "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"lakemont",   &lakemont_cost,  "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"pentiumpro", &pentiumpro_cost,"16,16,8", "16,11,8", "0,0,8", "16"},
+  {"pentium4",   &pentium4_cost,  NULL,      NULL,      NULL,    NULL},
+  {"nocona",     &nocona_cost,    NULL,      NULL,      NULL,    NULL},
+  {"core2",      &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"nehalem",    &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"sandybridge",&core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"haswell",    &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"bonnell",    &atom_cost,      "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"silvermont", &slm_cost,       "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"knl",        &slm_cost,       "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"skylake-avx512", &core_cost,  "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"intel",      &intel_cost,     "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"geode",      &geode_cost,     NULL,      NULL,      NULL,    NULL},
+  {"k6",         &k6_cost,        "32,8,8",  "32,8,8",  "0,0,8", "32"},
+  {"athlon",     &athlon_cost,    "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"k8",         &k8_cost,        "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"amdfam10",   &amdfam10_cost,  "32,25,8", "32,8,8",  "0,0,8", "32"},
+  {"bdver1",     &bdver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver2",     &bdver2_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver3",     &bdver3_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver4",     &bdver4_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"btver1",     &btver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"btver2",     &btver2_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"znver1",     &znver1_cost,    "16,16,8", "16,16,8", "0,0,8", "16"}
 };
 \f
 static unsigned int
@@ -4856,20 +4858,23 @@ set_ix86_tune_features (enum processor_type ix86_t
 static void
 ix86_default_align (struct gcc_options *opts)
 {
-  if (opts->x_align_loops == 0)
+  /* -falign-foo without argument: supply one */
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
     {
-      opts->x_align_loops = processor_target_table[ix86_tune].align_loop;
-      align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
+      opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop;
     }
-  if (opts->x_align_jumps == 0)
+  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
     {
-      opts->x_align_jumps = processor_target_table[ix86_tune].align_jump;
-      align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
+      opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump;
     }
-  if (opts->x_align_functions == 0)
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
     {
-      opts->x_align_functions = processor_target_table[ix86_tune].align_func;
+      opts->x_str_align_labels = processor_target_table[ix86_tune].align_label;
     }
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    {
+      opts->x_str_align_functions = processor_target_table[ix86_tune].align_func;
+    }
 }
 
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 246948)
+++ gcc/config/mips/mips.c	(working copy)
@@ -488,9 +488,9 @@ unsigned int mips_base_compression_flags;
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
 static int mips_base_move_loop_invariants; /* flag_move_loop_invariants */
-static int mips_base_align_loops; /* align_loops */
-static int mips_base_align_jumps; /* align_jumps */
-static int mips_base_align_functions; /* align_functions */
+static const char *mips_base_align_loops; /* align_loops */
+static const char *mips_base_align_jumps; /* align_jumps */
+static const char *mips_base_align_functions; /* align_functions */
 
 /* Index [M][R] is true if register R is allowed to hold a value of mode M.  */
 bool mips_hard_regno_mode_ok[(int) MAX_MACHINE_MODE][FIRST_PSEUDO_REGISTER];
@@ -19453,12 +19453,12 @@ mips_set_compression_mode (unsigned int compressio
       /* Provide default values for align_* for 64-bit targets.  */
       if (TARGET_64BIT)
 	{
-	  if (align_loops == 0)
-	    align_loops = 8;
-	  if (align_jumps == 0)
-	    align_jumps = 8;
-	  if (align_functions == 0)
-	    align_functions = 8;
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "8";
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "8";
+	  if (flag_align_functions && !str_align_functions)
+	    str_align_functions = "8";
 	}
 
       targetm.min_anchor_offset = -32768;
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 246948)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5218,29 +5218,25 @@ rs6000_option_override_internal (bool global_init_
 	  if (rs6000_cpu == PROCESSOR_TITAN
 	      || rs6000_cpu == PROCESSOR_CELL)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 8;
-	      if (align_jumps <= 0)
-		align_jumps = 8;
-	      if (align_loops <= 0)
-		align_loops = 8;
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "8";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "8";
+	      if (flag_align_loops && !str_align_loops)
+		str_align_loops = "8";
 	    }
 	  if (rs6000_align_branch_targets)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 16;
-	      if (align_jumps <= 0)
-		align_jumps = 16;
-	      if (align_loops <= 0)
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "16";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "16";
+	      if (flag_align_loops && !str_align_loops)
 		{
 		  can_override_loop_align = 1;
-		  align_loops = 16;
+		  str_align_loops = "16";
 		}
 	    }
-	  if (align_jumps_max_skip <= 0)
-	    align_jumps_max_skip = 15;
-	  if (align_loops_max_skip <= 0)
-	    align_loops_max_skip = 15;
 	}
 
       /* Arrange to save and restore machine status around nested functions.  */
Index: gcc/config/rx/rx.c
===================================================================
--- gcc/config/rx/rx.c	(revision 246948)
+++ gcc/config/rx/rx.c	(working copy)
@@ -2820,12 +2820,15 @@ rx_option_override (void)
   rx_override_options_after_change ();
 
   /* These values are bytes, not log.  */
-  if (align_jumps == 0 && ! optimize_size)
-    align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_loops == 0 && ! optimize_size)
-    align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_labels == 0 && ! optimize_size)
-    align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
+  if (! optimize_size)
+    {
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_labels && !str_align_labels)
+	str_align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+    }
 }
 
 \f
Index: gcc/config/rx/rx.h
===================================================================
--- gcc/config/rx/rx.h	(revision 246948)
+++ gcc/config/rx/rx.h	(working copy)
@@ -432,9 +432,9 @@ typedef unsigned int CUMULATIVE_ARGS;
 /* Compute the alignment needed for label X in various situations.
    If the user has specified an alignment then honour that, otherwise
    use rx_align_for_label.  */
-#define JUMP_ALIGN(x)				(align_jumps > 1 ? align_jumps_log : rx_align_for_label (x, 0))
-#define LABEL_ALIGN(x)				(align_labels > 1 ? align_labels_log : rx_align_for_label (x, 3))
-#define LOOP_ALIGN(x)				(align_loops > 1 ? align_loops_log : rx_align_for_label (x, 2))
+#define JUMP_ALIGN(x)				(align_jumps_log > 0 ? align_jumps_log : rx_align_for_label (x, 0))
+#define LABEL_ALIGN(x)				(align_labels_log > 0 ? align_labels_log : rx_align_for_label (x, 3))
+#define LOOP_ALIGN(x)				(align_loops_log > 0 ? align_loops_log : rx_align_for_label (x, 2))
 #define LABEL_ALIGN_AFTER_BARRIER(x)		rx_align_for_label (x, 0)
 
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(STREAM, LOG, MAX_SKIP)	\
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 246948)
+++ gcc/config/sh/sh.c	(working copy)
@@ -984,16 +984,16 @@ sh_override_options_after_change (void)
       Aligning all jumps increases the code size, even if it might
       result in slightly faster code.  Thus, it is set to the smallest 
       alignment possible if not specified by the user.  */
-  if (align_loops == 0)
-    align_loops = optimize_size ? 2 : 4;
+  if (flag_align_loops && !str_align_loops)
+    str_align_loops = optimize_size ? "2" : "4";
 
-  if (align_jumps == 0)
-    align_jumps = 2;
-  else if (align_jumps < 2)
-    align_jumps = 2;
+  if (flag_align_jumps && !str_align_jumps)
+    str_align_jumps = "2";
+  else
+    min_align_jumps_log = 1;
 
-  if (align_functions == 0)
-    align_functions = optimize_size ? 2 : 4;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = optimize_size ? "2" : "4";
 
   /* The linker relaxation code breaks when a function contains
      alignments that are larger than that at the start of a
@@ -1000,13 +1000,13 @@ sh_override_options_after_change (void)
      compilation unit.  */
   if (TARGET_RELAX)
     {
-      int min_align = align_loops > align_jumps ? align_loops : align_jumps;
+      parse_alignment_opts ();
+      min_align_functions_log = align_loops_log > align_jumps_log ?
+				align_loops_log : align_jumps_log;
 
       /* Also take possible .long constants / mova tables into account.	*/
-      if (min_align < 4)
-	min_align = 4;
-      if (align_functions < min_align)
-	align_functions = min_align;
+      if (min_align_functions_log < 2)
+	min_align_functions_log = 2;
     }
 }
 \f
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	(revision 246948)
+++ gcc/config/spu/spu.c	(working copy)
@@ -2767,7 +2767,8 @@ static void
 spu_sched_init (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 		int max_ready ATTRIBUTE_UNUSED)
 {
-  if (align_labels > 4 || align_loops > 4 || align_jumps > 4)
+  parse_alignment_opts ();
+  if (align_labels_log > 2 || align_loops_log > 2 || align_jumps_log > 2)
     {
       /* When any block might be at least 8-byte aligned, assume they
          will all be at least 8-byte aligned to make sure dual issue
Index: gcc/config/visium/visium.c
===================================================================
--- gcc/config/visium/visium.c	(revision 246948)
+++ gcc/config/visium/visium.c	(working copy)
@@ -413,12 +413,12 @@ visium_option_override (void)
 
   /* Align functions on 256-byte (32-quadword) for GR5 and 64-byte (8-quadword)
      boundaries for GR6 so they start a new burst mode window.  */
-  if (align_functions == 0)
+  if (flag_align_functions && !str_align_functions)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_functions = 64;
+	str_align_functions = "64";
       else
-	align_functions = 256;
+	str_align_functions = "256";
 
       /* Allow the size of compilation units to double because of inlining.
 	 In practice the global size of the object code is hardly affected
@@ -429,26 +429,25 @@ visium_option_override (void)
     }
 
   /* Likewise for loops.  */
-  if (align_loops == 0)
+  if (flag_align_loops && !str_align_loops)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_loops = 64;
+	str_align_loops = "64";
       else
 	{
-	  align_loops = 256;
 	  /* But not if they are too far away from a 256-byte boundary.  */
-	  align_loops_max_skip = 31;
+	  str_align_loops = "256,32";
 	}
     }
 
   /* Align all jumps on quadword boundaries for the burst mode, and even
      on 8-quadword boundaries for GR6 so they start a new window.  */
-  if (align_jumps == 0)
+  if (flag_align_jumps && !str_align_jumps)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_jumps = 64;
+	str_align_jumps = "64";
       else
-	align_jumps = 8;
+	str_align_jumps = "8";
     }
 
   /* We register a machine-specific pass.  This pass must be scheduled as
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 246948)
+++ gcc/doc/invoke.texi	(working copy)
@@ -351,9 +351,11 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations  -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}]  -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-jumps[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-labels[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-loops[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fbranch-target-load-optimize  -fbranch-target-load-optimize2 @gol
@@ -8672,19 +8674,36 @@ The @option{-fstrict-overflow} option is enabled a
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
+@itemx -falign-functions=@var{n},@var{m},@var{n2}
+@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-functions
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  This ensures that at least
+the first @var{m} bytes of the function can be fetched by the CPU
+without crossing an @var{n}-byte alignment boundary.
 
-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
 
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32,7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.
+
+The second pair of @var{n2},@var{m2} values allows you to specify
+a secondary alignment: @option{-falign-functions=64,7,32,3} aligns to
+the next 64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.
+If @var{m2} is not specified, it defaults to @var{n2}.
+
 Some assemblers only support this flag when @var{n} is a power of two;
 in that case, it is rounded up.
 
+@option{-fno-align-functions} and @option{-falign-functions=1} are
+equivalent and mean that functions are not aligned.
+
 If @var{n} is not specified or is zero, use a machine-dependent default.
 
 Enabled at levels @option{-O2}, @option{-O3}.
@@ -8697,12 +8716,13 @@ skip more bytes than the size of the function.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
+@itemx -falign-labels=@var{n},@var{m},@var{n2}
+@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-labels} and @option{-falign-labels=1} are
 equivalent and mean that labels are not aligned.
 
@@ -8716,12 +8736,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n},@var{m}
+@itemx -falign-loops=@var{n},@var{m},@var{n2}
+@itemx -falign-loops=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
-like @option{-falign-functions}.  If the loops are
-executed many times, this makes up for any execution of the dummy
-operations.
+Align loops to a power-of-two boundary.  If the loops are executed
+many times, this makes up for any execution of the dummy padding
+instructions.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-loops} and @option{-falign-loops=1} are
 equivalent and mean that loops are not aligned.
 
@@ -8731,12 +8754,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n},@var{m}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-jumps
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
-bytes like @option{-falign-functions}.  In this case, no dummy operations
-need be executed.
+where the targets can only be reached by jumping.  In this case,
+no dummy operations need be executed.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-jumps} and @option{-falign-jumps=1} are
 equivalent and mean that loops are not aligned.
 
Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 246948)
+++ gcc/final.c	(working copy)
@@ -2429,6 +2429,12 @@ final_scan_insn (rtx_insn *insn, FILE *file, int o
 	    {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
 	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip);
+	      /* Above, we don't know whether a label, jump or loop
+		 alignment was used. Conservatively apply
+		 label subalignment, not jump or loop
+		 subalignment (they are almost always larger).  */
+	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align_labels[1].log,
+					 align_labels[1].maxskip);
 #else
 #ifdef ASM_OUTPUT_ALIGN_WITH_NOP
               ASM_OUTPUT_ALIGN_WITH_NOP (file, align);
Index: gcc/flags.h
===================================================================
--- gcc/flags.h	(revision 246948)
+++ gcc/flags.h	(working copy)
@@ -43,19 +43,22 @@ extern bool final_insns_dump_p;
 /* Other basic status info about current function.  */
 
 /* Target-dependent global state.  */
-struct target_flag_state {
+struct align_flags {
   /* Values of the -falign-* flags: how much to align labels in code.
-     0 means `use default', 1 means `don't align'.
-     For each variable, there is an _log variant which is the power
-     of two not less than the variable, for .align output.  */
-  int x_align_loops_log;
-  int x_align_loops_max_skip;
-  int x_align_jumps_log;
-  int x_align_jumps_max_skip;
-  int x_align_labels_log;
-  int x_align_labels_max_skip;
-  int x_align_functions_log;
+     log is "align to 2^log" (so 0 means no alignment).
+     maxskip is the maximum allowed amount of padding to insert. */
+  int log;
+  int maxskip;
+};
 
+struct target_flag_state {
+  /* Each falign-foo can generate up to two levels of alignment:
+     -falign-foo=N,M[,N2,M2] */
+  struct align_flags x_align_loops[2];
+  struct align_flags x_align_jumps[2];
+  struct align_flags x_align_labels[2];
+  struct align_flags x_align_functions[2];
+
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
 };
@@ -67,20 +70,21 @@ extern struct target_flag_state *this_target_flag_
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops_log \
-  (this_target_flag_state->x_align_loops_log)
-#define align_loops_max_skip \
-  (this_target_flag_state->x_align_loops_max_skip)
-#define align_jumps_log \
-  (this_target_flag_state->x_align_jumps_log)
-#define align_jumps_max_skip \
-  (this_target_flag_state->x_align_jumps_max_skip)
-#define align_labels_log \
-  (this_target_flag_state->x_align_labels_log)
-#define align_labels_max_skip \
-  (this_target_flag_state->x_align_labels_max_skip)
-#define align_functions_log \
-  (this_target_flag_state->x_align_functions_log)
+#define align_loops              (this_target_flag_state->x_align_loops)
+#define align_jumps              (this_target_flag_state->x_align_jumps)
+#define align_labels             (this_target_flag_state->x_align_labels)
+#define align_functions          (this_target_flag_state->x_align_functions)
+#define align_loops_log          (align_loops[0].log)
+#define align_jumps_log          (align_jumps[0].log)
+#define align_labels_log         (align_labels[0].log)
+#define align_functions_log      (align_functions[0].log)
+#define align_loops_max_skip     (align_loops[0].maxskip)
+#define align_jumps_max_skip     (align_jumps[0].maxskip)
+#define align_labels_max_skip    (align_labels[0].maxskip)
+#define align_functions_max_skip (align_functions[0].maxskip)
+/* String representaions of the above options are available in
+   const char *str_align_foo. NULL if not set. */
+
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
Index: gcc/testsuite/gcc.target/i386/falign-functions.c
===================================================================
--- gcc/testsuite/gcc.target/i386/falign-functions.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/falign-functions.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64,8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	(revision 246948)
+++ gcc/toplev.c	(working copy)
@@ -1177,31 +1177,111 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
-/* Default the align_* variables to 1 if they're still unset, and
-   set up the align_*_log variables.  */
+/* Read a decimal number from string FLAG, up to end of line or comma.
+   Emit error message if number ends with any other character.
+   Return pointer past comma, or NULL if end of line.  */
+static const char *
+read_uint (const char *flag, const char *name, int *np)
+{
+  const char *flag_start = flag;
+  int n = 0;
+  char c;
+
+  while ((c = *flag++) >= '0' && c <= '9')
+    n = n*10 + (c-'0');
+  *np = n & 0x3fffffff; /* avoid accidentally negative numbers */
+  if (c == '\0')
+    return NULL;
+  if (c == ',')
+    return flag;
+
+  error_at (UNKNOWN_LOCATION, "-falign-%s parameter is bad at '%s'",
+            name, flag_start);
+  return NULL;
+}
+
+/* Parse "N[,M][,...]" string FLAG into struct align_flags A.
+   Return pointer past second comma, or NULL if end of line.  */
+static const char *
+read_log_maxskip (const char *flag, const char *name, struct align_flags *a)
+{
+  int n, m;
+  flag = read_uint (flag, name, &a->log);
+  n = a->log;
+  if (n != 0)
+    a->log = floor_log2 (n * 2 - 1);
+  if (!flag)
+    {
+      a->maxskip = n ? n - 1 : 0;
+      return flag;
+    }
+  flag = read_uint (flag, name, &a->maxskip);
+  m = a->maxskip;
+  if (m > n) m = n;
+  if (m > 0) m--; /* -falign-foo=N,M means M-1 max bytes of padding, not M */
+  a->maxskip = m;
+  return flag;
+}
+
+/* Parse "N[,M[,N2[,M2]]]" string FLAG into a pair of struct align_flags.  */
 static void
-init_alignments (void)
+parse_N_M (const char *flag, const char *name, struct align_flags a[2],
+	   unsigned int min_align_log)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  if (flag)
+    {
+      flag = read_log_maxskip (flag, name, &a[0]);
+      if (flag)
+	flag = read_log_maxskip (flag, name, &a[1]);
+#ifdef SUBALIGN_LOG
+      else
+	{
+	  /* N2[,M2] is not specified. This arch has a default for N2.
+	     Before -falign-foo=N,M,N2,M2 was introduced, x86 had a tweak.
+	     -falign-functions=N with N > 8 was adding secondary alignment.
+	     -falign-functions=10 was emitting this before every function:
+			.p2align 4,,9
+			.p2align 3
+	     Now this behavior (and more) can be explicitly requested:
+	     -falign-functions=16,10,8
+	     Retain old behavior if N2 is missing: */
+
+	  int align = 1 << a[0].log;
+	  int subalign = 1 << SUBALIGN_LOG;
+
+	  if (a[0].log > SUBALIGN_LOG && a[0].maxskip >= subalign - 1)
+	    {
+	      /* Set N2 unless subalign can never have any effect */
+	      if (align > a[0].maxskip + 1)
+		a[1].log = SUBALIGN_LOG;
+	    }
+	}
+#endif
+    }
+  if ((unsigned int)a[0].log < min_align_log)
+    {
+      a[0].log = min_align_log;
+      a[0].maxskip = (1 << min_align_log) - 1;
+    }
 }
 
+/* Minimum alignment requirements, if arch has them.  */
+unsigned int min_align_loops_log = 0;
+unsigned int min_align_jumps_log = 0;
+unsigned int min_align_labels_log = 0;
+unsigned int min_align_functions_log = 0;
+
+/* Process -falign-foo=N[,M[,N2[,M2]]] options.  */
+void
+parse_alignment_opts (void)
+{
+  parse_N_M (str_align_loops, "loops", align_loops, min_align_loops_log);
+  parse_N_M (str_align_jumps, "jumps", align_jumps, min_align_jumps_log);
+  parse_N_M (str_align_labels, "labels", align_labels, min_align_labels_log);
+  parse_N_M (str_align_functions, "functions", align_functions,
+	     min_align_functions_log);
+}
+
 /* Process the options that have been parsed.  */
 static void
 process_options (void)
@@ -1640,7 +1720,7 @@ static void
 backend_init_target (void)
 {
   /* Initialize alignment variables.  */
-  init_alignments ();
+  parse_alignment_opts ();
 
   /* This depends on stack_pointer_rtx.  */
   init_fake_stack_mems ();
Index: gcc/toplev.h
===================================================================
--- gcc/toplev.h	(revision 246948)
+++ gcc/toplev.h	(working copy)
@@ -93,6 +93,13 @@ extern bool set_src_pwd		       (const char *);
 extern HOST_WIDE_INT get_random_seed (bool);
 extern const char *set_random_seed (const char *);
 
+extern unsigned int min_align_loops_log;
+extern unsigned int min_align_jumps_log;
+extern unsigned int min_align_labels_log;
+extern unsigned int min_align_functions_log;
+
+extern void parse_alignment_opts (void);
+
 extern void initialize_rtl (void);
 
 #endif /* ! GCC_TOPLEV_H */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 246948)
+++ gcc/varasm.c	(working copy)
@@ -1792,9 +1792,9 @@ assemble_start_function (tree decl, const char *fn
       && optimize_function_for_speed_p (cfun))
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
-      int align_log = align_functions_log;
+      int align_log = align_functions[0].log;
 #endif
-      int max_skip = align_functions - 1;
+      int max_skip = align_functions[0].maxskip;
       if (flag_limit_function_alignment && crtl->max_insn_address > 0
 	  && max_skip >= crtl->max_insn_address)
 	max_skip = crtl->max_insn_address - 1;
@@ -1801,8 +1801,11 @@ assemble_start_function (tree decl, const char *fn
 
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
       ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
+      if (max_skip == align_functions[0].maxskip)
+        ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions[1].log,
+				   align_functions[1].maxskip);
 #else
-      ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
+      ASM_OUTPUT_ALIGN (asm_out_file, align_functions[0].log);
 #endif
     }
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2017-04-18 18:46 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
@ 2017-04-18 19:12   ` Sandra Loosemore
  0 siblings, 0 replies; 26+ messages in thread
From: Sandra Loosemore @ 2017-04-18 19:12 UTC (permalink / raw)
  To: Denys Vlasenko, gcc-patches; +Cc: Andrew Pinski, Uros Bizjak, Bernd Schmidt

On 04/18/2017 12:30 PM, Denys Vlasenko wrote:
>
> 2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>
>
>      * doc/invoke.texi: Update option documentation.
>      [snip]

The documentation part of this version is OK.

-Sandra


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8
  2017-04-18 18:30 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
                   ` (2 preceding siblings ...)
  2017-04-18 18:46 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
@ 2017-05-05 14:40 ` Denys Vlasenko
  3 siblings, 0 replies; 26+ messages in thread
From: Denys Vlasenko @ 2017-05-05 14:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski, Uros Bizjak, Bernd Schmidt, Sandra Loosemore

On 04/18/2017 08:30 PM, Denys Vlasenko wrote:
> These patches are for this bug:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66240
> "RFE: extend -falign-xyz syntax"

Ping.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] Remove support for obsolete x86 -malign-foo options
  2017-04-18 18:30 ` [PATCH 1/3] Remove support for obsolete x86 -malign-foo options Denys Vlasenko
@ 2017-05-06  7:22   ` Uros Bizjak
  2017-05-11 12:24     ` Denys Vlasenko
  2018-02-12 10:07     ` Martin Liška
  0 siblings, 2 replies; 26+ messages in thread
From: Uros Bizjak @ 2017-05-06  7:22 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: gcc-patches, Andrew Pinski, Bernd Schmidt, Sandra Loosemore

On Tue, Apr 18, 2017 at 8:30 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> 2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>
>
>     * config/i386/i386-common.c (ix86_handle_option): Remove support
>     for obsolete -malign-loops, -malign-jumps and -malign-functions
>     options.
>     * config/i386/i386.opt: Likewise.
> Index: gcc/common/config/i386/i386-common.c
> ===================================================================
> --- gcc/common/config/i386/i386-common.c        (revision 240663)
> +++ gcc/common/config/i386/i386-common.c        (working copy)
> @@ -998,38 +998,6 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> -
> -  /* Comes from final.c -- no real reason to change it.  */
> -#define MAX_CODE_ALIGN 16
> -
> -    case OPT_malign_loops_:
> -      warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops");
> -      if (value > MAX_CODE_ALIGN)
> -       error_at (loc, "-malign-loops=%d is not between 0 and %d",
> -                 value, MAX_CODE_ALIGN);
> -      else
> -       opts->x_align_loops = 1 << value;
> -      return true;
> -
> -    case OPT_malign_jumps_:
> -      warning_at (loc, 0, "-malign-jumps is obsolete, use -falign-jumps");
> -      if (value > MAX_CODE_ALIGN)
> -       error_at (loc, "-malign-jumps=%d is not between 0 and %d",
> -                 value, MAX_CODE_ALIGN);
> -      else
> -       opts->x_align_jumps = 1 << value;
> -      return true;
> -
> -    case OPT_malign_functions_:
> -      warning_at (loc, 0,
> -                 "-malign-functions is obsolete, use -falign-functions");
> -      if (value > MAX_CODE_ALIGN)
> -       error_at (loc, "-malign-functions=%d is not between 0 and %d",
> -                 value, MAX_CODE_ALIGN);
> -      else
> -       opts->x_align_functions = 1 << value;
> -      return true;
> -
>      case OPT_mbranch_cost_:
>        if (value > 5)
>         {
> Index: gcc/config/i386/i386.opt
> ===================================================================
> --- gcc/config/i386/i386.opt    (revision 240663)
> +++ gcc/config/i386/i386.opt    (working copy)
> @@ -205,18 +205,6 @@ malign-double
>  Target Report Mask(ALIGN_DOUBLE) Save
>  Align some doubles on dword boundary.
>
> -malign-functions=
> -Target RejectNegative Joined UInteger
> -Function starts are aligned to this power of 2.
> -
> -malign-jumps=
> -Target RejectNegative Joined UInteger
> -Jump targets are aligned to this power of 2.
> -
> -malign-loops=
> -Target RejectNegative Joined UInteger
> -Loop code aligned to this power of 2.
> -
>  malign-stringops
>  Target RejectNegative Report InverseMask(NO_ALIGN_STRINGOPS, ALIGN_STRINGOPS) Save
>  Align destination of the string operations.

Instead of removing the above definitions, please rather redefine them
in a similar way -mcpu in i386.opt is obsoleted, e.g.:

malign-functions=
Target RejectNegative Joined Undocumented Alias(falign-functions=)
Warn(%<-malign-functions%> is obsolete, use %<-falign-functions%>)

This cleanup should be done a long time ago, the patch can be
committed independently of other patches in the series.

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] Remove support for obsolete x86 -malign-foo options
  2017-05-06  7:22   ` Uros Bizjak
@ 2017-05-11 12:24     ` Denys Vlasenko
  2018-02-12 10:07     ` Martin Liška
  1 sibling, 0 replies; 26+ messages in thread
From: Denys Vlasenko @ 2017-05-11 12:24 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Andrew Pinski, Bernd Schmidt, Sandra Loosemore

On 05/06/2017 09:20 AM, Uros Bizjak wrote:
> On Tue, Apr 18, 2017 at 8:30 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> 2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>
>>
>>     * config/i386/i386-common.c (ix86_handle_option): Remove support
>>     for obsolete -malign-loops, -malign-jumps and -malign-functions
>>     options.
>>     * config/i386/i386.opt: Likewise.
...
>> --- gcc/config/i386/i386.opt    (revision 240663)
>> +++ gcc/config/i386/i386.opt    (working copy)
>> @@ -205,18 +205,6 @@ malign-double
>>  Target Report Mask(ALIGN_DOUBLE) Save
>>  Align some doubles on dword boundary.
>>
>> -malign-functions=
>> -Target RejectNegative Joined UInteger
>> -Function starts are aligned to this power of 2.
>> -
>> -malign-jumps=
>> -Target RejectNegative Joined UInteger
>> -Jump targets are aligned to this power of 2.
>> -
>> -malign-loops=
>> -Target RejectNegative Joined UInteger
>> -Loop code aligned to this power of 2.
>> -
>>  malign-stringops
>>  Target RejectNegative Report InverseMask(NO_ALIGN_STRINGOPS, ALIGN_STRINGOPS) Save
>>  Align destination of the string operations.
>
> Instead of removing the above definitions, please rather redefine them
> in a similar way -mcpu in i386.opt is obsoleted

They were already obsoleted sixteen years ago. The warning message
was added:

    if (ix86_align_loops_string)
      {
-      i = atoi (ix86_align_loops_string);
-      if (i < 0 || i > MAX_CODE_ALIGN)
-       error ("-malign-loops=%d is not between 0 and %d", i, MAX_CODE_ALIGN);
-      else
-       ix86_align_loops = i;
+      warning ("-malign-loops is obsolete, use -falign-loops");

in the year 2001:

commit a2b35d8705efb23182c3e4b75a5e7727b6ddfc88
Author: geoffk <geoffk@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri May 4 06:31:27 2001 +0000

             * invoke.texi (i386 Options): Delete references to -malign-jumps,
             -malign-loops, -malign-functions.
             * i386.c (ix86_align_funcs): Delete.
             (ix86_align_loops): Delete.
             (ix86_align_jumps): Delete.
             (override_options): Mark -malign-* as obsolete.  Emulate their
             behaviour with the -falign-* options.  Default -falign-* from
             the processor table.
             * i386.h (FUNCTION_BOUNDARY): Define to 16; revert Richard Kenner's
             patch of Wed May 2 13:09:36 2001.
             (LOOP_ALIGN): Delete.
             (LOOP_ALIGN_MAX_SKIP): Delete.
             (LABEL_ALIGN_AFTER_BARRIER): Delete.
             (LABEL_ALIGN_AFTER_BARRIER_MAX_SKIP): Delete.

     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@41825 138bc75d-0d04-0410-961f-82ee72b054a4


I would think sixteen years of receiving these warnings should enough
for everyone to switch to the -falign options.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] Remove support for obsolete x86 -malign-foo options
  2017-05-06  7:22   ` Uros Bizjak
  2017-05-11 12:24     ` Denys Vlasenko
@ 2018-02-12 10:07     ` Martin Liška
  1 sibling, 0 replies; 26+ messages in thread
From: Martin Liška @ 2018-02-12 10:07 UTC (permalink / raw)
  To: Uros Bizjak, Denys Vlasenko
  Cc: gcc-patches, Andrew Pinski, Bernd Schmidt, Sandra Loosemore

On 05/06/2017 09:20 AM, Uros Bizjak wrote:
> On Tue, Apr 18, 2017 at 8:30 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> 2017-04-18  Denys Vlasenko  <dvlasenk@redhat.com>
>>
>>     * config/i386/i386-common.c (ix86_handle_option): Remove support
>>     for obsolete -malign-loops, -malign-jumps and -malign-functions
>>     options.
>>     * config/i386/i386.opt: Likewise.
>> Index: gcc/common/config/i386/i386-common.c
>> ===================================================================
>> --- gcc/common/config/i386/i386-common.c        (revision 240663)
>> +++ gcc/common/config/i386/i386-common.c        (working copy)
>> @@ -998,38 +998,6 @@ ix86_handle_option (struct gcc_options *opts,
>>         }
>>        return true;
>>
>> -
>> -  /* Comes from final.c -- no real reason to change it.  */
>> -#define MAX_CODE_ALIGN 16
>> -
>> -    case OPT_malign_loops_:
>> -      warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops");
>> -      if (value > MAX_CODE_ALIGN)
>> -       error_at (loc, "-malign-loops=%d is not between 0 and %d",
>> -                 value, MAX_CODE_ALIGN);
>> -      else
>> -       opts->x_align_loops = 1 << value;
>> -      return true;
>> -
>> -    case OPT_malign_jumps_:
>> -      warning_at (loc, 0, "-malign-jumps is obsolete, use -falign-jumps");
>> -      if (value > MAX_CODE_ALIGN)
>> -       error_at (loc, "-malign-jumps=%d is not between 0 and %d",
>> -                 value, MAX_CODE_ALIGN);
>> -      else
>> -       opts->x_align_jumps = 1 << value;
>> -      return true;
>> -
>> -    case OPT_malign_functions_:
>> -      warning_at (loc, 0,
>> -                 "-malign-functions is obsolete, use -falign-functions");
>> -      if (value > MAX_CODE_ALIGN)
>> -       error_at (loc, "-malign-functions=%d is not between 0 and %d",
>> -                 value, MAX_CODE_ALIGN);
>> -      else
>> -       opts->x_align_functions = 1 << value;
>> -      return true;
>> -
>>      case OPT_mbranch_cost_:
>>        if (value > 5)
>>         {
>> Index: gcc/config/i386/i386.opt
>> ===================================================================
>> --- gcc/config/i386/i386.opt    (revision 240663)
>> +++ gcc/config/i386/i386.opt    (working copy)
>> @@ -205,18 +205,6 @@ malign-double
>>  Target Report Mask(ALIGN_DOUBLE) Save
>>  Align some doubles on dword boundary.
>>
>> -malign-functions=
>> -Target RejectNegative Joined UInteger
>> -Function starts are aligned to this power of 2.
>> -
>> -malign-jumps=
>> -Target RejectNegative Joined UInteger
>> -Jump targets are aligned to this power of 2.
>> -
>> -malign-loops=
>> -Target RejectNegative Joined UInteger
>> -Loop code aligned to this power of 2.
>> -
>>  malign-stringops
>>  Target RejectNegative Report InverseMask(NO_ALIGN_STRINGOPS, ALIGN_STRINGOPS) Save
>>  Align destination of the string operations.
> 
> Instead of removing the above definitions, please rather redefine them
> in a similar way -mcpu in i386.opt is obsoleted, e.g.:
> 
> malign-functions=
> Target RejectNegative Joined Undocumented Alias(falign-functions=)
> Warn(%<-malign-functions%> is obsolete, use %<-falign-functions%>)

Please correct me but doing the alias is not simple as value of -malign-functions
option is a power of 2, while -falign-functions= is an absolute value.
Thus -malign-functions=5 == -falign-functions=32.

I believe the legacy options are not problem for the patch series as it only sets
value of -falign-functions option.

Martin

> 
> This cleanup should be done a long time ago, the patch can be
> committed independently of other patches in the series.
> 
> Uros.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03 19:12       ` Martin Liška
@ 2018-07-04  0:20         ` Jeff Law
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Law @ 2018-07-04  0:20 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: dvlasenk, Segher Boessenkool

On 07/03/2018 01:11 PM, Martin Liška wrote:
> On 07/03/2018 10:53 AM, Martin Liška wrote:
>> Thank you Jeff.
>>
>> I found some issues when doing build of all targets
>> (contrib/config-list.mk).
>> I'll update patch and test that affected cross-compilers still produce
>> same output.
> 
> Hello.
> 
> I'm done with testing, I bootstrapped and regtested the patch on
> x86_64-linux and ppc64-linux-gnu.
> I also build all cross compilers we have in contrib/config-list.mk and I
> verified that
> results for gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c source file
> is equal for all cross compilers
> that I touched in the patch. I tested these options:
> 
> -O2
> -O2 -falign-loops=256
> -O2 -falign-loops=256 -falign-functions=512 -falign-labels=1024
> -falign-jumps=2048
> -O2 -falign-loops=1024 -falign-functions=512 -falign-jumps=2048
> -O2 -falign-loops=256 -falign-jumps=2048
> -O2 -falign-loops=100 -falign-functions=200 -falign-labels=300
> -falign-jumps=400
> -O2 -falign-loops=1111 -falign-functions=1112 -falign-labels=1113
> -falign-jumps=1114
> 
> there are no issues except one that are present on current trunk:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86394
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86390
> 
> Is the patchset still ready for approval?
Yes.

jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03  8:53     ` Martin Liška
  2018-07-03  9:55       ` Segher Boessenkool
@ 2018-07-03 19:12       ` Martin Liška
  2018-07-04  0:20         ` Jeff Law
  1 sibling, 1 reply; 26+ messages in thread
From: Martin Liška @ 2018-07-03 19:12 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: dvlasenk, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On 07/03/2018 10:53 AM, Martin Liška wrote:
> Thank you Jeff.
> 
> I found some issues when doing build of all targets (contrib/config-list.mk).
> I'll update patch and test that affected cross-compilers still produce same output.

Hello.

I'm done with testing, I bootstrapped and regtested the patch on x86_64-linux and ppc64-linux-gnu.
I also build all cross compilers we have in contrib/config-list.mk and I verified that
results for gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c source file is equal for all cross compilers
that I touched in the patch. I tested these options:

-O2
-O2 -falign-loops=256
-O2 -falign-loops=256 -falign-functions=512 -falign-labels=1024 -falign-jumps=2048
-O2 -falign-loops=1024 -falign-functions=512 -falign-jumps=2048
-O2 -falign-loops=256 -falign-jumps=2048
-O2 -falign-loops=100 -falign-functions=200 -falign-labels=300 -falign-jumps=400
-O2 -falign-loops=1111 -falign-functions=1112 -falign-labels=1113 -falign-jumps=1114

there are no issues except one that are present on current trunk:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86390

Is the patchset still ready for approval?
Thanks,
Martin

[-- Attachment #2: 0001-Extend-falign-FOO-N-to-N-M-N2-M2.patch --]
[-- Type: text/x-patch, Size: 78344 bytes --]

From cd071ae635d24bfaf41afbe4531de578833dce8c Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 21 May 2018 20:58:02 +0200
Subject: [PATCH] Extend -falign-FOO=N to N[:M[:N2[:M2]]]

gcc/ChangeLog:

2018-05-25  Denys Vlasenko  <dvlasenk@redhat.com>
	    Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* common.opt: Rename align options with 'str_' prefix.
	* common/config/i386/i386-common.c (set_malign_value): New
	function.
	(ix86_handle_option): Use it to set -falign-* options/
	* config/aarch64/aarch64-protos.h (struct tune_params): Change
	type from int to string.
	* config/aarch64/aarch64.c: Update default values from int
	to string.
	* config/alpha/alpha.c (alpha_override_options_after_change):
	Likewise.
	* config/arm/arm.c (arm_override_options_after_change_1): Likewise.
	* config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/freebsd.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/gnu-user.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/i386.c (struct ptt): Change type from int to
	string.
	(ix86_default_align): Set default values.
	* config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print
	max skip conditionally.
	* config/i386/iamcu.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN):
	* config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN):
	* config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/openbsdelf.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.:
	* config/i386/x86-64.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	(ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
	* config/ia64/ia64.c (ia64_option_override): Set default values
        for alignment options.
	* config/m68k/m68k.c: Handle new str_align_* options.
	* config/mips/mips.c (mips_set_compression_mode): Change
	type of constants.
	(mips_option_override): Set default values for options.
	* config/powerpcspe/powerpcspe.c (rs6000_option_override_internal):
        Likewise.
	* config/rs6000/rs6000.c (rs6000_option_override_internal):
	Likewise.
	* config/rx/rx.c (rx_option_override): Likewise.
	* config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log.
	(LABEL_ALIGN): Use align_labels_log.
	(LOOP_ALIGN): Use align_loops_align.
	* config/s390/s390.c (s390_asm_output_function_label): Use new
        macros.
	* config/sh/sh.c (sh_override_options_after_change):
	Change type of constants.
	* config/spu/spu.c (spu_sched_init): Likewise.
	* config/sparc/sparc.c (sparc_option_override): Set default
        values for options.
	* config/visium/visium.c (visium_option_override): Likewise.
	* config/visium/visium.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Do not
        emit p2align format with last argument if it's not needed.
	* doc/invoke.texi: Document extended format of -falign-*.
	* final.c: Use align_labels alignment.
	* flags.h (struct target_flag_state): Change type to use
	align_flags.
	(struct align_flags_tuple): New.
	(struct align_flags): Likewise.
	(align_loops_log): Redefine macro to use new types.
	(align_loops_max_skip): Redefine macro to use new types.
	(align_jumps_log): Redefine macro to use new types.
	(align_jumps_max_skip): Redefine macro to use new types.
	(align_labels_log): Redefine macro to use new types.
	(align_labels_max_skip): Redefine macro to use new types.
	(align_functions_log): Redefine macro to use new types.
	(align_loops): Redefine macro to use new types.
	(align_jumps): Redefine macro to use new types.
	(align_labels): Redefine macro to use new types.
	(align_functions): Redefine macro to use new types.
	(align_functions_max_skip): Redefine macro to use new types.
	(align_loops_value): New macro.
	(align_jumps_value): New macro.
	(align_labels_value): New macro.
	(align_functions_value): New macro.
	* function.c (invoke_set_current_function_hook): Propagate
	alignment values from flags to global variables default in
	topleev.h.
	* ipa-icf.c (sem_function::equals_wpa): Use
	cl_optimization_option_eq instead of memcmp.
	* lto-streamer.h (cl_optimization_stream_out): Support streaming
	of string types.
	(cl_optimization_stream_in): Likewise.
	* optc-save-gen.awk: Support strings in cl_optimization.
	* opth-gen.awk: Likewise.
	* opts.c (finish_options): Remove error checking of invalid
	value ranges.
	(MAX_CODE_ALIGN): Remove.
	(MAX_CODE_ALIGN_VALUE): Likewise.
	(parse_and_check_align_values): New function.
	(check_alignment_argument): Likewise.
	(common_handle_option): Use check_alignment_argument.
	* opts.h (parse_and_check_align_values): Declare.
	* toplev.c (init_alignments): Remove.
	(read_log_maxskip): New.
	(parse_N_M): Likewise.
	(parse_alignment_opts): Likewise.
	(backend_init_target): Remove usage of init_alignments.
	* toplev.h (parse_alignment_opts): Declare.
	* tree-streamer-in.c (streamer_read_tree_bitfields): Add new
	argument.
	* tree-streamer-out.c (streamer_write_tree_bitfields): Likewise.
	* tree.c (cl_option_hasher::equal): New.
	* varasm.c: Use new global macros.

gcc/lto/ChangeLog:

2018-05-25  Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* lto.c (compare_tree_sccs_1): Use cl_optimization_option_eq
	instead of memcmp.

gcc/testsuite/ChangeLog:

2018-05-25  Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* gcc.dg/pr84100.c (foo):
	* gcc.target/i386/falign-functions-2.c: New test.
	* gcc.target/i386/falign-functions.c: New test.
---
 gcc/common.opt                                |  16 +--
 gcc/common/config/i386/i386-common.c          |  16 ++-
 gcc/config/aarch64/aarch64-protos.h           |   6 +-
 gcc/config/aarch64/aarch64.c                  |  90 ++++++------
 gcc/config/alpha/alpha.c                      |  20 +--
 gcc/config/arm/arm.c                          |   7 +-
 gcc/config/i386/i386.c                        | 112 +++++++--------
 gcc/config/ia64/ia64.c                        |   8 +-
 gcc/config/m68k/m68k.c                        |  15 +-
 gcc/config/mips/mips.c                        |  30 ++--
 gcc/config/powerpcspe/powerpcspe.c            |  33 ++---
 gcc/config/rs6000/rs6000.c                    |  33 ++---
 gcc/config/rx/rx.c                            |  18 ++-
 gcc/config/rx/rx.h                            |   6 +-
 gcc/config/s390/s390.c                        |   4 +-
 gcc/config/sh/sh.c                            |  31 ++--
 gcc/config/sparc/sparc.c                      |   6 +-
 gcc/config/spu/spu.c                          |   9 +-
 gcc/config/spu/spu.h                          |   2 +-
 gcc/config/visium/visium.c                    |  19 ++-
 gcc/config/visium/visium.h                    |   3 +-
 gcc/doc/invoke.texi                           |  66 ++++++---
 gcc/final.c                                   |   6 +
 gcc/flags.h                                   |  71 +++++----
 gcc/function.c                                |   3 +
 gcc/ipa-icf.c                                 |   2 +-
 gcc/lto-streamer.h                            |   6 +-
 gcc/lto/lto.c                                 |   4 +-
 gcc/optc-save-gen.awk                         |  95 +++++++++++-
 gcc/opth-gen.awk                              |   3 +
 gcc/opts.c                                    | 108 +++++++++++---
 gcc/opts.h                                    |   7 +
 gcc/testsuite/gcc.dg/pr84100.c                |   2 +-
 .../gcc.target/i386/falign-functions-2.c      |  30 ++++
 .../gcc.target/i386/falign-functions.c        |   8 ++
 gcc/toplev.c                                  | 136 ++++++++++++++----
 gcc/toplev.h                                  |   7 +
 gcc/tree-streamer-in.c                        |   2 +-
 gcc/tree-streamer-out.c                       |   2 +-
 gcc/tree.c                                    |  20 +--
 gcc/varasm.c                                  |  10 +-
 41 files changed, 725 insertions(+), 347 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/falign-functions-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/falign-functions.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 5a50bc27710..963c37f04cd 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -959,35 +959,35 @@ Common Report Var(flag_aggressive_loop_optimizations) Optimization Init(1)
 Aggressively optimize loops using language constraints.
 
 falign-functions
-Common Report Var(align_functions,0) Optimization UInteger
+Common Report Var(flag_align_functions) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions) Optimization
+Common RejectNegative Joined Var(str_align_functions) Optimization
 
 flimit-function-alignment
 Common Report Var(flag_limit_function_alignment) Optimization Init(0)
 
 falign-jumps
-Common Report Var(align_jumps,0) Optimization UInteger
+Common Report Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps) Optimization
+Common RejectNegative Joined Var(str_align_jumps) Optimization
 
 falign-labels
-Common Report Var(align_labels,0) Optimization UInteger
+Common Report Var(flag_align_labels) Optimization
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels) Optimization
+Common RejectNegative Joined Var(str_align_labels) Optimization
 
 falign-loops
-Common Report Var(align_loops,0) Optimization UInteger
+Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops) Optimization
+Common RejectNegative Joined Var(str_align_loops)
 
 fargument-alias
 Common Ignore
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 664240e7e8d..277ee55a093 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -275,6 +275,16 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   (OPTION_MASK_ISA2_AVX512F_UNSET)
 
+/* Set 1 << value as value of -malign-FLAG option.  */
+
+static void
+set_malign_value (const char **flag, unsigned value)
+{
+  char *r = XNEWVEC (char, 6);
+  sprintf (r, "%d", 1 << value);
+  *flag = r;
+}
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -1317,7 +1327,7 @@ ix86_handle_option (struct gcc_options *opts,
 	error_at (loc, "-malign-loops=%d is not between 0 and %d",
 		  value, MAX_CODE_ALIGN);
       else
-	opts->x_align_loops = 1 << value;
+	set_malign_value (&opts->x_str_align_loops, value);
       return true;
 
     case OPT_malign_jumps_:
@@ -1326,7 +1336,7 @@ ix86_handle_option (struct gcc_options *opts,
 	error_at (loc, "-malign-jumps=%d is not between 0 and %d",
 		  value, MAX_CODE_ALIGN);
       else
-	opts->x_align_jumps = 1 << value;
+	set_malign_value (&opts->x_str_align_jumps, value);
       return true;
 
     case OPT_malign_functions_:
@@ -1336,7 +1346,7 @@ ix86_handle_option (struct gcc_options *opts,
 	error_at (loc, "-malign-functions=%d is not between 0 and %d",
 		  value, MAX_CODE_ALIGN);
       else
-	opts->x_align_functions = 1 << value;
+	set_malign_value (&opts->x_str_align_functions, value);
       return true;
 
     case OPT_mbranch_cost_:
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 87c6ae20278..0530747ece4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -250,9 +250,9 @@ struct tune_params
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
-  int function_align;
-  int jump_align;
-  int loop_align;
+  const char *function_align;
+  const char *jump_align;
+  const char *loop_align;
   int int_reassoc_width;
   int fp_reassoc_width;
   int vec_reassoc_width;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b88e7cac27a..1efa97ff66a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -637,9 +637,9 @@ static const struct tune_params generic_tunings =
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  8,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -663,9 +663,9 @@ static const struct tune_params cortexa35_tunings =
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -689,9 +689,9 @@ static const struct tune_params cortexa53_tunings =
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -715,9 +715,9 @@ static const struct tune_params cortexa57_tunings =
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -741,9 +741,9 @@ static const struct tune_params cortexa72_tunings =
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -767,9 +767,9 @@ static const struct tune_params cortexa73_tunings =
   2, /* issue_rate.  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -794,9 +794,9 @@ static const struct tune_params exynosm1_tunings =
   4,	/* memmov_cost  */
   3,	/* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  4,	/* function_align.  */
-  4,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -819,9 +819,9 @@ static const struct tune_params thunderxt88_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -844,9 +844,9 @@ static const struct tune_params thunderx_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -870,9 +870,9 @@ static const struct tune_params xgene1_tunings =
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -896,9 +896,9 @@ static const struct tune_params qdf24xx_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -924,9 +924,9 @@ static const struct tune_params saphira_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -950,9 +950,9 @@ static const struct tune_params thunderx2t99_tunings =
   4, /* issue_rate.  */
   (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
    | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   3,	/* int_reassoc_width.  */
   2,	/* fp_reassoc_width.  */
   2,	/* vec_reassoc_width.  */
@@ -10572,12 +10572,12 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
      alignment to what the target wants.  */
   if (!opts->x_optimize_size)
     {
-      if (opts->x_align_loops <= 0)
-	opts->x_align_loops = aarch64_tune_params.loop_align;
-      if (opts->x_align_jumps <= 0)
-	opts->x_align_jumps = aarch64_tune_params.jump_align;
-      if (opts->x_align_functions <= 0)
-	opts->x_align_functions = aarch64_tune_params.function_align;
+      if (opts->x_flag_align_loops && !opts->x_str_align_loops)
+	opts->x_str_align_loops = aarch64_tune_params.loop_align;
+      if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
+	opts->x_str_align_jumps = aarch64_tune_params.jump_align;
+      if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+	opts->x_str_align_functions = aarch64_tune_params.function_align;
     }
 
   /* We default to no pc-relative literal loads.  */
diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 26d89f3ea13..9adfe159381 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "flags.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -614,13 +615,13 @@ alpha_override_options_after_change (void)
   /* ??? Kludge these by not doing anything if we don't optimize.  */
   if (optimize > 0)
     {
-      if (align_loops <= 0)
-	align_loops = 16;
-      if (align_jumps <= 0)
-	align_jumps = 16;
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = "16";
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = "16";
     }
-  if (align_functions <= 0)
-    align_functions = 16;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = "16";
 }
 \f
 /* Returns 1 if VALUE is a mask that contains full bytes of zero or ones.  */
@@ -9268,10 +9269,11 @@ alpha_align_insns_1 (unsigned int max_align,
   /* Let shorten branches care for assigning alignments to code labels.  */
   shorten_branches (get_insns ());
 
-  if (align_functions < 4)
+  unsigned int option_alignment = align_functions_max_skip + 1;
+  if (option_alignment < 4)
     align = 4;
-  else if ((unsigned int) align_functions < max_align)
-    align = align_functions;
+  else if ((unsigned int) option_alignment < max_align)
+    align = option_alignment;
   else
     align = max_align;
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1a99588bab..8d5897c8f0f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2952,9 +2952,10 @@ static GTY(()) tree init_optimize;
 static void
 arm_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (opts->x_align_functions <= 0)
-    opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags)
-      && opts->x_optimize_size ? 2 : 4;
+  /* -falign-functions without argument: supply one.  */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags)
+      && opts->x_optimize_size ? "2" : "4";
 }
 
 /* Implement targetm.override_options_after_change.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ba23cd0a1ab..a4d8a2a86a4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -834,53 +834,58 @@ struct ptt
 {
   const char *const name;			/* processor name  */
   const struct processor_costs *cost;		/* Processor costs */
-  const int align_loop;				/* Default alignments.  */
-  const int align_loop_max_skip;
-  const int align_jump;
-  const int align_jump_max_skip;
-  const int align_func;
+
+  /* Default alignments.  */
+  const char *const align_loop;
+  const char *const align_jump;
+  const char *const align_label;
+  const char *const align_func;
 };
 
 /* This table must be in sync with enum processor_type in i386.h.  */ 
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {"generic", &generic_cost, 16, 10, 16, 10, 16},
-  {"i386", &i386_cost, 4, 3, 4, 3, 4},
-  {"i486", &i486_cost, 16, 15, 16, 15, 16},
-  {"pentium", &pentium_cost, 16, 7, 16, 7, 16},
-  {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16},
-  {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16},
-  {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0},
-  {"nocona", &nocona_cost, 0, 0, 0, 0, 0},
-  {"core2", &core_cost, 16, 10, 16, 10, 16},
-  {"nehalem", &core_cost, 16, 10, 16, 10, 16},
-  {"sandybridge", &core_cost, 16, 10, 16, 10, 16},
-  {"haswell", &core_cost, 16, 10, 16, 10, 16},
-  {"bonnell", &atom_cost, 16, 15, 16, 7, 16},
-  {"silvermont", &slm_cost, 16, 15, 16, 7, 16},
-  {"goldmont", &slm_cost, 16, 15, 16, 7, 16},
-  {"goldmont-plus", &slm_cost, 16, 15, 16, 7, 16},
-  {"tremont", &slm_cost, 16, 15, 16, 7, 16},
-  {"knl", &slm_cost, 16, 15, 16, 7, 16},
-  {"knm", &slm_cost, 16, 15, 16, 7, 16},
-  {"skylake", &skylake_cost, 16, 10, 16, 10, 16},
-  {"skylake-avx512", &skylake_cost, 16, 10, 16, 10, 16},
-  {"cannonlake", &skylake_cost, 16, 10, 16, 10, 16},
-  {"icelake-client", &skylake_cost, 16, 10, 16, 10, 16},
-  {"icelake-server", &skylake_cost, 16, 10, 16, 10, 16},
-  {"intel", &intel_cost, 16, 15, 16, 7, 16},
-  {"geode", &geode_cost, 0, 0, 0, 0, 0},
-  {"k6", &k6_cost, 32, 7, 32, 7, 32},
-  {"athlon", &athlon_cost, 16, 7, 16, 7, 16},
-  {"k8", &k8_cost, 16, 7, 16, 7, 16},
-  {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32},
-  {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11},
-  {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11},
-  {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11},
-  {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11},
-  {"btver1", &btver1_cost, 16, 10, 16, 7, 11},
-  {"btver2", &btver2_cost, 16, 10, 16, 7, 11},
-  {"znver1", &znver1_cost, 16, 15, 16, 15, 16}
+/* The "0:0:8" label alignment specified for some processors generates
+   secondary 8-byte alignment only for those label/jump/loop targets
+   which have primary alignment.  */
+
+  {"generic",        &generic_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"i386",           &i386_cost,       "4",       "4",       NULL,    "4" },
+  {"i486",           &i486_cost,       "16",      "16",      "0:0:8", "16"},
+  {"pentium",        &pentium_cost,    "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"lakemont",       &lakemont_cost,   "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"pentiumpro",     &pentiumpro_cost, "16",      "16:11:8", "0:0:8", "16"},
+  {"pentium4",       &pentium4_cost,   NULL,      NULL,      NULL,    NULL},
+  {"nocona",         &nocona_cost,     NULL,      NULL,      NULL,    NULL},
+  {"core2",          &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"nehalem",        &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"sandybridge",    &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"haswell",        &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"bonnell",        &atom_cost,       "16",      "16:8:8",  "0:0:8", "16"},
+  {"silvermont",     &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"goldmont",       &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"goldmont-plus",  &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"tremont",	     &slm_cost,	       "16",	  "16:8:8",  "0:0:8", "16"},
+  {"knl",            &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"knm",            &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"skylake",        &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"skylake-avx512", &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"cannonlake",     &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"icelake-client", &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"icelake-server", &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"intel",          &intel_cost,      "16",      "16:8:8",  "0:0:8", "16"},
+  {"geode",          &geode_cost,      NULL,      NULL,      NULL,    NULL},
+  {"k6",             &k6_cost,         "32:8:8",  "32:8:8",  "0:0:8", "32"},
+  {"athlon",         &athlon_cost,     "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"k8",             &k8_cost,         "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"amdfam10",       &amdfam10_cost,   "32:25:8", "32:8:8",  "0:0:8", "32"},
+  {"bdver1",         &bdver1_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"bdver2",         &bdver2_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"bdver3",         &bdver3_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"bdver4",         &bdver4_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"btver1",         &btver1_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"btver2",         &btver2_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"znver1",         &znver1_cost,     "16",      "16",      "0:0:8", "16"}
 };
 \f
 static unsigned int
@@ -3346,20 +3351,15 @@ set_ix86_tune_features (enum processor_type ix86_tune, bool dump)
 static void
 ix86_default_align (struct gcc_options *opts)
 {
-  if (opts->x_align_loops == 0)
-    {
-      opts->x_align_loops = processor_target_table[ix86_tune].align_loop;
-      align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
-    }
-  if (opts->x_align_jumps == 0)
-    {
-      opts->x_align_jumps = processor_target_table[ix86_tune].align_jump;
-      align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
-    }
-  if (opts->x_align_functions == 0)
-    {
-      opts->x_align_functions = processor_target_table[ix86_tune].align_func;
-    }
+  /* -falign-foo without argument: supply one.  */
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
+    opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop;
+  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
+    opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump;
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
+    opts->x_str_align_labels = processor_target_table[ix86_tune].align_label;
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = processor_target_table[ix86_tune].align_func;
 }
 
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index 74888714086..f121cee1997 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -6107,10 +6107,10 @@ ia64_option_override (void)
 
   init_machine_status = ia64_init_machine_status;
 
-  if (align_functions <= 0)
-    align_functions = 64;
-  if (align_loops <= 0)
-    align_loops = 32;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = "64";
+  if (flag_align_loops && !str_align_loops)
+    str_align_loops = "32";
   if (TARGET_ABI_OPEN_VMS)
     flag_no_common = 1;
 
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 495a80b759e..cea5c0ecab5 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "optabs.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "toplev.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -651,15 +652,17 @@ m68k_option_override (void)
     }
 
 #ifndef ASM_OUTPUT_ALIGN_WITH_NOP
-  if (align_labels > 2)
+  parse_alignment_opts ();
+  if (align_labels_value > 2)
     {
-      warning (0, "-falign-labels=%d is not supported", align_labels);
-      align_labels = 0;
+      warning (0, "-falign-labels=%d is not supported", align_labels_value);
+      str_align_labels = "1";
     }
-  if (align_loops > 2)
+
+  if (align_loops_value > 2)
     {
-      warning (0, "-falign-loops=%d is not supported", align_loops);
-      align_loops = 0;
+      warning (0, "-falign-loops=%d is not supported", align_loops_value);
+      str_align_loops = "1";
     }
 #endif
 
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index ad393040bee..75ee834137e 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -501,9 +501,9 @@ unsigned int mips_base_compression_flags;
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
 static int mips_base_move_loop_invariants; /* flag_move_loop_invariants */
-static int mips_base_align_loops; /* align_loops */
-static int mips_base_align_jumps; /* align_jumps */
-static int mips_base_align_functions; /* align_functions */
+static const char *mips_base_align_loops; /* align_loops */
+static const char *mips_base_align_jumps; /* align_jumps */
+static const char *mips_base_align_functions; /* align_functions */
 
 /* Index [M][R] is true if register R is allowed to hold a value of mode M.  */
 static bool mips_hard_regno_mode_ok_p[MAX_MACHINE_MODE][FIRST_PSEUDO_REGISTER];
@@ -19517,9 +19517,9 @@ mips_set_compression_mode (unsigned int compression_mode)
   flag_schedule_insns = mips_base_schedule_insns;
   flag_reorder_blocks_and_partition = mips_base_reorder_blocks_and_partition;
   flag_move_loop_invariants = mips_base_move_loop_invariants;
-  align_loops = mips_base_align_loops;
-  align_jumps = mips_base_align_jumps;
-  align_functions = mips_base_align_functions;
+  str_align_loops = mips_base_align_loops;
+  str_align_jumps = mips_base_align_jumps;
+  str_align_functions = mips_base_align_functions;
   target_flags &= ~(MASK_MIPS16 | MASK_MICROMIPS);
   target_flags |= compression_mode;
 
@@ -19589,12 +19589,12 @@ mips_set_compression_mode (unsigned int compression_mode)
       /* Provide default values for align_* for 64-bit targets.  */
       if (TARGET_64BIT)
 	{
-	  if (align_loops == 0)
-	    align_loops = 8;
-	  if (align_jumps == 0)
-	    align_jumps = 8;
-	  if (align_functions == 0)
-	    align_functions = 8;
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "8";
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "8";
+	  if (flag_align_functions && !str_align_functions)
+	    str_align_functions = "8";
 	}
 
       targetm.min_anchor_offset = -32768;
@@ -20278,9 +20278,9 @@ mips_option_override (void)
   mips_base_schedule_insns = flag_schedule_insns;
   mips_base_reorder_blocks_and_partition = flag_reorder_blocks_and_partition;
   mips_base_move_loop_invariants = flag_move_loop_invariants;
-  mips_base_align_loops = align_loops;
-  mips_base_align_jumps = align_jumps;
-  mips_base_align_functions = align_functions;
+  mips_base_align_loops = str_align_loops;
+  mips_base_align_jumps = str_align_jumps;
+  mips_base_align_functions = str_align_functions;
 
   /* Now select the ISA mode.
 
diff --git a/gcc/config/powerpcspe/powerpcspe.c b/gcc/config/powerpcspe/powerpcspe.c
index f67505a3552..80f67de12fc 100644
--- a/gcc/config/powerpcspe/powerpcspe.c
+++ b/gcc/config/powerpcspe/powerpcspe.c
@@ -5406,29 +5406,30 @@ rs6000_option_override_internal (bool global_init_p)
 	  if (rs6000_cpu == PROCESSOR_TITAN
 	      || rs6000_cpu == PROCESSOR_CELL)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 8;
-	      if (align_jumps <= 0)
-		align_jumps = 8;
-	      if (align_loops <= 0)
-		align_loops = 8;
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "8";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "8";
+	      if (flag_align_loops && !str_align_loops)
+		str_align_loops = "8";
 	    }
 	  if (rs6000_align_branch_targets)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 16;
-	      if (align_jumps <= 0)
-		align_jumps = 16;
-	      if (align_loops <= 0)
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "16";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "16";
+	      if (flag_align_loops && !str_align_loops)
 		{
 		  can_override_loop_align = 1;
-		  align_loops = 16;
+		  str_align_loops = "16";
 		}
 	    }
-	  if (align_jumps_max_skip <= 0)
-	    align_jumps_max_skip = 15;
-	  if (align_loops_max_skip <= 0)
-	    align_loops_max_skip = 15;
+
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "16";
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "16";
 	}
 
       /* Arrange to save and restore machine status around nested functions.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e61c9cee893..f815221b1af 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4958,29 +4958,30 @@ rs6000_option_override_internal (bool global_init_p)
 	  if (rs6000_tune == PROCESSOR_TITAN
 	      || rs6000_tune == PROCESSOR_CELL)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 8;
-	      if (align_jumps <= 0)
-		align_jumps = 8;
-	      if (align_loops <= 0)
-		align_loops = 8;
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "8";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "8";
+	      if (flag_align_loops && !str_align_loops)
+		str_align_loops = "8";
 	    }
 	  if (rs6000_align_branch_targets)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 16;
-	      if (align_jumps <= 0)
-		align_jumps = 16;
-	      if (align_loops <= 0)
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "16";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "16";
+	      if (flag_align_loops && !str_align_loops)
 		{
 		  can_override_loop_align = 1;
-		  align_loops = 16;
+		  str_align_loops = "16";
 		}
 	    }
-	  if (align_jumps_max_skip <= 0)
-	    align_jumps_max_skip = 15;
-	  if (align_loops_max_skip <= 0)
-	    align_loops_max_skip = 15;
+
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "16";
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "16";
 	}
 
       /* Arrange to save and restore machine status around nested functions.  */
diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index fe467f7bd3a..af97bef301d 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -2843,12 +2843,18 @@ rx_option_override (void)
   rx_override_options_after_change ();
 
   /* These values are bytes, not log.  */
-  if (align_jumps == 0 && ! optimize_size)
-    align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_loops == 0 && ! optimize_size)
-    align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_labels == 0 && ! optimize_size)
-    align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
+  if (! optimize_size)
+    {
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = ((rx_cpu_type == RX100
+			    || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = ((rx_cpu_type == RX100
+			    || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_labels && !str_align_labels)
+	str_align_labels = ((rx_cpu_type == RX100
+			     || rx_cpu_type == RX200) ? "4" : "8");
+    }
 }
 
 \f
diff --git a/gcc/config/rx/rx.h b/gcc/config/rx/rx.h
index a2aa392ce67..2f5a0e94677 100644
--- a/gcc/config/rx/rx.h
+++ b/gcc/config/rx/rx.h
@@ -417,9 +417,9 @@ typedef unsigned int CUMULATIVE_ARGS;
 /* Compute the alignment needed for label X in various situations.
    If the user has specified an alignment then honour that, otherwise
    use rx_align_for_label.  */
-#define JUMP_ALIGN(x)				(align_jumps > 1 ? align_jumps_log : rx_align_for_label (x, 0))
-#define LABEL_ALIGN(x)				(align_labels > 1 ? align_labels_log : rx_align_for_label (x, 3))
-#define LOOP_ALIGN(x)				(align_loops > 1 ? align_loops_log : rx_align_for_label (x, 2))
+#define JUMP_ALIGN(x)				(align_jumps_log > 0 ? align_jumps_log : rx_align_for_label (x, 0))
+#define LABEL_ALIGN(x)				(align_labels_log > 0 ? align_labels_log : rx_align_for_label (x, 3))
+#define LOOP_ALIGN(x)				(align_loops_log > 0 ? align_loops_log : rx_align_for_label (x, 2))
 #define LABEL_ALIGN_AFTER_BARRIER(x)		rx_align_for_label (x, 0)
 
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(STREAM, LOG, MAX_SKIP)	\
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 5add5985866..23c3f3db621 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7543,9 +7543,9 @@ s390_asm_output_function_label (FILE *asm_out_file, const char *fname,
       function_alignment = MAX (8, DECL_ALIGN (decl) / BITS_PER_UNIT);
       if (! DECL_USER_ALIGN (decl))
 	function_alignment = MAX (function_alignment,
-				  (unsigned int) align_functions);
+				  (unsigned int) align_functions_max_skip + 1);
       fputs ("\t# alignment for hotpatch\n", asm_out_file);
-      ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (function_alignment));
+      ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
     }
 
   if (S390_USE_TARGET_ATTRIBUTE && TARGET_DEBUG_ARG)
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 5f6fbb37e3e..a1cad42eb70 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "rtl-iter.h"
 #include "regs.h"
+#include "toplev.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1007,29 +1008,37 @@ sh_override_options_after_change (void)
       Aligning all jumps increases the code size, even if it might
       result in slightly faster code.  Thus, it is set to the smallest 
       alignment possible if not specified by the user.  */
-  if (align_loops == 0)
-    align_loops = optimize_size ? 2 : 4;
+  if (flag_align_loops && !str_align_loops)
+    str_align_loops = optimize_size ? "2" : "4";
 
-  if (align_jumps == 0)
-    align_jumps = 2;
-  else if (align_jumps < 2)
-    align_jumps = 2;
+  /* Parse values so that we can compare for current value.  */
+  parse_alignment_opts ();
+  if (flag_align_jumps && !str_align_jumps)
+    str_align_jumps = "2";
+  else if (align_jumps_value < 2)
+    str_align_jumps = "2";
 
-  if (align_functions == 0)
-    align_functions = optimize_size ? 2 : 4;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = optimize_size ? "2" : "4";
 
   /* The linker relaxation code breaks when a function contains
      alignments that are larger than that at the start of a
      compilation unit.  */
   if (TARGET_RELAX)
     {
-      int min_align = align_loops > align_jumps ? align_loops : align_jumps;
+      /* Parse values so that we can compare for current value.  */
+      parse_alignment_opts ();
+      int min_align = MAX (align_loops_value, align_jumps_value);
 
       /* Also take possible .long constants / mova tables into account.	*/
       if (min_align < 4)
 	min_align = 4;
-      if (align_functions < min_align)
-	align_functions = min_align;
+      if (align_functions_value < min_align)
+	{
+	  char *r = XNEWVEC (char, 16);
+	  sprintf (r, "%d", min_align);
+	  str_align_functions = r;
+	}
     }
 }
 \f
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 6b6f155f49f..d90a260785c 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1906,7 +1906,7 @@ sparc_option_override (void)
     target_flags &= ~MASK_FSMULD;
 
   /* Supply a default value for align_functions.  */
-  if (align_functions == 0)
+  if (flag_align_functions && !str_align_functions)
     {
       if (sparc_cpu == PROCESSOR_ULTRASPARC
 	  || sparc_cpu == PROCESSOR_ULTRASPARC3
@@ -1914,10 +1914,10 @@ sparc_option_override (void)
 	  || sparc_cpu == PROCESSOR_NIAGARA2
 	  || sparc_cpu == PROCESSOR_NIAGARA3
 	  || sparc_cpu == PROCESSOR_NIAGARA4)
-	align_functions = 32;
+	str_align_functions = "32";
       else if (sparc_cpu == PROCESSOR_NIAGARA7
 	       || sparc_cpu == PROCESSOR_M8)
-	align_functions = 64;
+	str_align_functions = "64";
     }
 
   /* Validate PCC_STRUCT_RETURN.  */
diff --git a/gcc/config/spu/spu.c b/gcc/config/spu/spu.c
index 53935795424..fe2a2a34a05 100644
--- a/gcc/config/spu/spu.c
+++ b/gcc/config/spu/spu.c
@@ -58,6 +58,8 @@
 #include "dumpfile.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "flags.h"
+#include "toplev.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -238,8 +240,9 @@ spu_option_override (void)
   flag_omit_frame_pointer = 1;
 
   /* Functions must be 8 byte aligned so we correctly handle dual issue */
-  if (align_functions < 8)
-    align_functions = 8;
+  parse_alignment_opts ();
+  if (align_functions_value < 8)
+    str_align_functions = "8";
 
   spu_hint_dist = 8*4 - spu_max_nops*4;
   if (spu_hint_dist < 0) 
@@ -2769,7 +2772,7 @@ static void
 spu_sched_init (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 		int max_ready ATTRIBUTE_UNUSED)
 {
-  if (align_labels > 4 || align_loops > 4 || align_jumps > 4)
+  if (align_labels_value > 4 || align_loops_value > 4 || align_jumps_value > 4)
     {
       /* When any block might be at least 8-byte aligned, assume they
          will all be at least 8-byte aligned to make sure dual issue
diff --git a/gcc/config/spu/spu.h b/gcc/config/spu/spu.h
index c9797505781..e846f1c2512 100644
--- a/gcc/config/spu/spu.h
+++ b/gcc/config/spu/spu.h
@@ -107,7 +107,7 @@ extern GTY(()) int spu_tune;
 	(GET_CODE (X) == SYMBOL_REF \
           && (SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_ALIGN1) == 0 \
 	  && (! SYMBOL_REF_FUNCTION_P (X) \
-	      || align_functions >= 16))
+	      || align_functions_value >= 16))
 
 #define PCC_BITFIELD_TYPE_MATTERS 1
 
diff --git a/gcc/config/visium/visium.c b/gcc/config/visium/visium.c
index 106cdaf9e3f..37de6249797 100644
--- a/gcc/config/visium/visium.c
+++ b/gcc/config/visium/visium.c
@@ -443,12 +443,12 @@ visium_option_override (void)
 
   /* Align functions on 256-byte (32-quadword) for GR5 and 64-byte (8-quadword)
      boundaries for GR6 so they start a new burst mode window.  */
-  if (align_functions == 0)
+  if (flag_align_functions && !str_align_functions)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_functions = 64;
+	str_align_functions = "64";
       else
-	align_functions = 256;
+	str_align_functions = "256";
 
       /* Allow the size of compilation units to double because of inlining.
 	 In practice the global size of the object code is hardly affected
@@ -459,26 +459,25 @@ visium_option_override (void)
     }
 
   /* Likewise for loops.  */
-  if (align_loops == 0)
+  if (flag_align_loops && !str_align_loops)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_loops = 64;
+	str_align_loops = "64";
       else
 	{
-	  align_loops = 256;
 	  /* But not if they are too far away from a 256-byte boundary.  */
-	  align_loops_max_skip = 31;
+	  str_align_loops = "256:32";
 	}
     }
 
   /* Align all jumps on quadword boundaries for the burst mode, and even
      on 8-quadword boundaries for GR6 so they start a new window.  */
-  if (align_jumps == 0)
+  if (flag_align_jumps && !str_align_jumps)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_jumps = 64;
+	str_align_jumps = "64";
       else
-	align_jumps = 8;
+	str_align_jumps = "8";
     }
 
   /* We register a machine-specific pass.  This pass must be scheduled as
diff --git a/gcc/config/visium/visium.h b/gcc/config/visium/visium.h
index ebac7f12818..dac9a4565d5 100644
--- a/gcc/config/visium/visium.h
+++ b/gcc/config/visium/visium.h
@@ -1501,7 +1501,8 @@ do									\
    expression of type `int'. */
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(STREAM,LOG,MAX_SKIP)			\
   if ((LOG) != 0) {							\
-    if ((MAX_SKIP) == 0) fprintf ((STREAM), "\t.p2align %d\n", (LOG));	\
+    if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))-1)			\
+      fprintf ((STREAM), "\t.p2align %d\n", (LOG));			\
     else {								\
       fprintf ((STREAM), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));	\
       /* Make sure that we have at least 8-byte alignment if > 8-byte	\
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 56cd122b0d7..fe68e53f667 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -365,9 +365,11 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations  -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}]  -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
+-falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
+-falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
+-falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fbranch-target-load-optimize  -fbranch-target-load-optimize2 @gol
@@ -9240,19 +9242,36 @@ The @option{-fstrict-aliasing} option is enabled at levels
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n}:@var{m}
+@itemx -falign-functions=@var{n}:@var{m}:@var{n2}
+@itemx -falign-functions=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-functions
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  This ensures that at least
+the first @var{m} bytes of the function can be fetched by the CPU
+without crossing an @var{n}-byte alignment boundary.
 
-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
+
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32:7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.
+
+The second pair of @var{n2}:@var{m2} values allows you to specify
+a secondary alignment: @option{-falign-functions=64:7:32:3} aligns to
+the next 64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.
+If @var{m2} is not specified, it defaults to @var{n2}.
 
 Some assemblers only support this flag when @var{n} is a power of two;
 in that case, it is rounded up.
 
+@option{-fno-align-functions} and @option{-falign-functions=1} are
+equivalent and mean that functions are not aligned.
+
 If @var{n} is not specified or is zero, use a machine-dependent default.
 The maximum allowed @var{n} option value is 65536.
 
@@ -9266,12 +9285,13 @@ skip more bytes than the size of the function.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n}:@var{m}
+@itemx -falign-labels=@var{n}:@var{m}:@var{n2}
+@itemx -falign-labels=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-labels} and @option{-falign-labels=1} are
 equivalent and mean that labels are not aligned.
 
@@ -9286,12 +9306,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n}:@var{m}
+@itemx -falign-loops=@var{n}:@var{m}:@var{n2}
+@itemx -falign-loops=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
-like @option{-falign-functions}.  If the loops are
-executed many times, this makes up for any execution of the dummy
-operations.
+Align loops to a power-of-two boundary.  If the loops are executed
+many times, this makes up for any execution of the dummy padding
+instructions.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-loops} and @option{-falign-loops=1} are
 equivalent and mean that loops are not aligned.
 The maximum allowed @var{n} option value is 65536.
@@ -9302,12 +9325,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n}:@var{m}
+@itemx -falign-jumps=@var{n}:@var{m}:@var{n2}
+@itemx -falign-jumps=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-jumps
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
-bytes like @option{-falign-functions}.  In this case, no dummy operations
-need be executed.
+where the targets can only be reached by jumping.  In this case,
+no dummy operations need be executed.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-jumps} and @option{-falign-jumps=1} are
 equivalent and mean that loops are not aligned.
 
diff --git a/gcc/final.c b/gcc/final.c
index a17a3a67b54..ea238656d34 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -2528,6 +2528,12 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p ATTRIBUTE_UNUSED,
 	    {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
 	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip);
+	      /* Above, we don't know whether a label, jump or loop
+		 alignment was used.  Conservatively apply
+		 label subalignment, not jump or loop
+		 subalignment (they are almost always larger).  */
+	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, state_align_labels.levels[1].log,
+					 state_align_labels.levels[1].maxskip);
 #else
 #ifdef ASM_OUTPUT_ALIGN_WITH_NOP
               ASM_OUTPUT_ALIGN_WITH_NOP (file, align);
diff --git a/gcc/flags.h b/gcc/flags.h
index d5d4d78e18f..bfd645b7f29 100644
--- a/gcc/flags.h
+++ b/gcc/flags.h
@@ -42,19 +42,32 @@ extern bool final_insns_dump_p;
 \f
 /* Other basic status info about current function.  */
 
-/* Target-dependent global state.  */
-struct target_flag_state {
+/* Align flags tuple with alignment in log form and with a maximum skip.  */
+
+struct align_flags_tuple
+{
   /* Values of the -falign-* flags: how much to align labels in code.
-     0 means `use default', 1 means `don't align'.
-     For each variable, there is an _log variant which is the power
-     of two not less than the variable, for .align output.  */
-  int x_align_loops_log;
-  int x_align_loops_max_skip;
-  int x_align_jumps_log;
-  int x_align_jumps_max_skip;
-  int x_align_labels_log;
-  int x_align_labels_max_skip;
-  int x_align_functions_log;
+     log is "align to 2^log" (so 0 means no alignment).
+     maxskip is the maximum allowed amount of padding to insert.  */
+  int log;
+  int maxskip;
+};
+
+/* Target-dependent global state.  */
+
+struct align_flags
+{
+  align_flags_tuple levels[2];
+};
+
+struct target_flag_state
+{
+  /* Each falign-foo can generate up to two levels of alignment:
+     -falign-foo=N:M[:N2:M2] */
+  align_flags x_align_loops;
+  align_flags x_align_jumps;
+  align_flags x_align_labels;
+  align_flags x_align_functions;
 
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
@@ -67,20 +80,26 @@ extern struct target_flag_state *this_target_flag_state;
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops_log \
-  (this_target_flag_state->x_align_loops_log)
-#define align_loops_max_skip \
-  (this_target_flag_state->x_align_loops_max_skip)
-#define align_jumps_log \
-  (this_target_flag_state->x_align_jumps_log)
-#define align_jumps_max_skip \
-  (this_target_flag_state->x_align_jumps_max_skip)
-#define align_labels_log \
-  (this_target_flag_state->x_align_labels_log)
-#define align_labels_max_skip \
-  (this_target_flag_state->x_align_labels_max_skip)
-#define align_functions_log \
-  (this_target_flag_state->x_align_functions_log)
+#define state_align_loops	 (this_target_flag_state->x_align_loops)
+#define state_align_jumps	 (this_target_flag_state->x_align_jumps)
+#define state_align_labels	 (this_target_flag_state->x_align_labels)
+#define state_align_functions	 (this_target_flag_state->x_align_functions)
+#define align_loops_log		 (state_align_loops.levels[0].log)
+#define align_jumps_log		 (state_align_jumps.levels[0].log)
+#define align_labels_log	 (state_align_labels.levels[0].log)
+#define align_functions_log      (state_align_functions.levels[0].log)
+#define align_loops_max_skip     (state_align_loops.levels[0].maxskip)
+#define align_jumps_max_skip     (state_align_jumps.levels[0].maxskip)
+#define align_labels_max_skip    (state_align_labels.levels[0].maxskip)
+#define align_functions_max_skip (state_align_functions.levels[0].maxskip)
+#define align_loops_value	 (align_loops_max_skip + 1)
+#define align_jumps_value	 (align_jumps_max_skip + 1)
+#define align_labels_value	 (align_labels_max_skip + 1)
+#define align_functions_value	 (align_functions_max_skip + 1)
+
+/* String representaions of the above options are available in
+   const char *str_align_foo.  NULL if not set.  */
+
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
diff --git a/gcc/function.c b/gcc/function.c
index 47232a27611..142cdaec2ce 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4609,6 +4609,9 @@ invoke_set_current_function_hook (tree fndecl)
       targetm.set_current_function (fndecl);
       this_fn_optabs = this_target_optabs;
 
+      /* Initialize global alignment variables after op.  */
+      parse_alignment_opts ();
+
       if (opts != optimization_default_node)
 	{
 	  init_tree_optimization_optabs (opts);
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 90d1e17e5cd..39b96ba13be 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -658,7 +658,7 @@ sem_function::equals_wpa (sem_item *item,
   cl_optimization *opt1 = opts_for_fn (decl);
   cl_optimization *opt2 = opts_for_fn (item->decl);
 
-  if (opt1 != opt2 && memcmp (opt1, opt2, sizeof(cl_optimization)))
+  if (opt1 != opt2 && !cl_optimization_option_eq (opt1, opt2))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 236fcc4480b..dd279f6762b 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -919,9 +919,11 @@ void cl_target_option_stream_in (struct data_in *,
 				 struct bitpack_d *,
 				 struct cl_target_option *);
 
-void cl_optimization_stream_out (struct bitpack_d *, struct cl_optimization *);
+void cl_optimization_stream_out (struct output_block *,
+				 struct bitpack_d *, struct cl_optimization *);
 
-void cl_optimization_stream_in (struct bitpack_d *, struct cl_optimization *);
+void cl_optimization_stream_in (struct data_in *,
+				struct bitpack_d *, struct cl_optimization *);
 
 
 
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 3b57c12bc9f..6f10dab27e5 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1222,8 +1222,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
       return false;
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
-    if (memcmp (TREE_OPTIMIZATION (t1), TREE_OPTIMIZATION (t2),
-		sizeof (struct cl_optimization)) != 0)
+    if (!cl_optimization_option_eq (TREE_OPTIMIZATION (t1),
+				    TREE_OPTIMIZATION (t2)))
       return false;
 
   if (CODE_CONTAINS_STRUCT (code, TS_BINFO))
diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index 1a365fc883c..6e33a4320c1 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -85,6 +85,7 @@ n_opt_char = 3;
 n_opt_short = 0;
 n_opt_int = 0;
 n_opt_enum = 0;
+n_opt_string = 0;
 n_opt_other = 0;
 var_opt_char[0] = "optimize";
 var_opt_char[1] = "optimize_size";
@@ -123,6 +124,8 @@ for (i = 0; i < n_opts; i++) {
 			else if (otype ~ "^signed +char *$")
 				var_opt_range[name] = "-128, 127"
 		}
+		else if (otype ~ "^const char \\**$")
+			var_opt_string[n_opt_string++] = name;
 		else
 			var_opt_other[n_opt_other++] = name;
 	}
@@ -155,6 +158,10 @@ for (i = 0; i < n_opt_char; i++) {
 	print "  ptr->x_" var_opt_char[i] " = opts->x_" var_opt_char[i] ";";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	print "  ptr->x_" var_opt_string[i] " = opts->x_" var_opt_string[i] ";";
+}
+
 print "}";
 
 print "";
@@ -183,6 +190,10 @@ for (i = 0; i < n_opt_char; i++) {
 	print "  opts->x_" var_opt_char[i] " = ptr->x_" var_opt_char[i] ";";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	print "  opts->x_" var_opt_string[i] " = ptr->x_" var_opt_string[i] ";";
+}
+
 print "  targetm.override_options_after_change ();";
 print "}";
 
@@ -239,6 +250,15 @@ for (i = 0; i < n_opt_char; i++) {
 	print "";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	print "  if (ptr->x_" var_opt_char[i] ")";
+	print "    fprintf (file, \"%*s%s (%s)\\n\",";
+	print "             indent_to, \"\",";
+	print "             \"" var_opt_string[i] "\",";
+	print "             ptr->x_" var_opt_string[i] ");";
+	print "";
+}
+
 print "}";
 
 print "";
@@ -301,6 +321,19 @@ for (i = 0; i < n_opt_char; i++) {
 	print "";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	name = var_opt_string[i]
+	print "  if (ptr1->x_" name " != ptr2->x_" name "";
+	print "      || (!ptr1->x_" name" || !ptr2->x_" name
+	print "          || strcmp (ptr1->x_" name", ptr2->x_" name ")))";
+	print "    fprintf (file, \"%*s%s (%s/%s)\\n\",";
+	print "             indent_to, \"\",";
+	print "             \"" name "\",";
+	print "             ptr1->x_" name ",";
+	print "             ptr2->x_" name ");";
+	print "";
+}
+
 print "}";
 
 
@@ -766,32 +799,82 @@ for (i = 0; i < n_opt_val; i++) {
 	if (!var_opt_hash[i])
 		continue;
 	name = var_opt_val[i]
-	print "  hstate.add_hwi (ptr->" name");";
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+	{
+		print "  if (ptr->" name")";
+		print "    hstate.add (ptr->" name", strlen (ptr->" name"));";
+		print "  else";
+		print "    hstate.add_int (0);";
+	}
+	else
+		print "  hstate.add_hwi (ptr->" name");";
 }
 print "  return hstate.end ();";
 print "}";
 
+print "";
+print "/* Compare two optimization options  */";
+print "bool";
+print "cl_optimization_option_eq (cl_optimization const *ptr1,";
+print "                           cl_optimization const *ptr2)";
+print "{";
+for (i = 0; i < n_opt_val; i++) {
+	if (!var_opt_hash[i])
+		continue;
+	name = var_opt_val[i]
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+	{
+		print "  if (ptr1->" name" != ptr2->" name;
+		print "      && (!ptr1->" name" || !ptr2->" name
+		print "          || strcmp (ptr1->" name", ptr2->" name ")))";
+		print "    return false;";
+	}
+	else
+	{
+		print "  if (ptr1->" name" != ptr2->" name ")";
+		print "    return false;";
+	}
+}
+print "  return true;";
+print "}";
+
 print "";
 print "/* Stream out optimization options  */";
 print "void";
-print "cl_optimization_stream_out (struct bitpack_d *bp,";
+print "cl_optimization_stream_out (struct output_block *ob,";
+print "                            struct bitpack_d *bp,";
 print "                            struct cl_optimization *ptr)";
 print "{";
 for (i = 0; i < n_opt_val; i++) {
 	name = var_opt_val[i]
-	print "  bp_pack_value (bp, ptr->" name", 64);";
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+		print "  bp_pack_string (ob, bp, ptr->" name", true);";
+	else
+		print "  bp_pack_value (bp, ptr->" name", 64);";
 }
 print "}";
 
 print "";
 print "/* Stream in optimization options  */";
 print "void";
-print "cl_optimization_stream_in (struct bitpack_d *bp,";
-print "                           struct cl_optimization *ptr)";
+print "cl_optimization_stream_in (struct data_in *data_in ATTRIBUTE_UNUSED,";
+print "                           struct bitpack_d *bp ATTRIBUTE_UNUSED,";
+print "                           struct cl_optimization *ptr ATTRIBUTE_UNUSED)";
 print "{";
 for (i = 0; i < n_opt_val; i++) {
 	name = var_opt_val[i]
-	print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value (bp, 64);";
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+	{
+	      print "  ptr->" name" = bp_unpack_string (data_in, bp);";
+	      print "  if (ptr->" name")";
+	      print "    ptr->" name" = xstrdup (ptr->" name");";
+	}
+	else
+	      print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value (bp, 64);";
 }
 print "}";
 }
diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
index fecd4b8a0b5..8358b9b2b67 100644
--- a/gcc/opth-gen.awk
+++ b/gcc/opth-gen.awk
@@ -308,6 +308,9 @@ print "";
 print "/* Hash optimization from a structure.  */";
 print "extern hashval_t cl_optimization_hash (const struct cl_optimization *);";
 print "";
+print "/* Compare two optimization options.  */";
+print "extern bool cl_optimization_option_eq (cl_optimization const *ptr1, cl_optimization const *ptr2);"
+print "";
 print "/* Generator files may not have access to location_t, and don't need these.  */"
 print "#if defined(UNKNOWN_LOCATION)"
 print "bool                                                                  "
diff --git a/gcc/opts.c b/gcc/opts.c
index ed102c05c22..e536607fe79 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1039,26 +1039,6 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
   if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_flag_tm)
     sorry ("transactional memory is not supported with "
 	   "%<-fsanitize=kernel-address%>");
-
-  /* Comes from final.c -- no real reason to change it.  */
-#define MAX_CODE_ALIGN 16
-#define MAX_CODE_ALIGN_VALUE (1 << MAX_CODE_ALIGN)
-
-  if (opts->x_align_loops > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-loops=%d is not between 0 and %d",
-	      opts->x_align_loops, MAX_CODE_ALIGN_VALUE);
-
-  if (opts->x_align_jumps > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-jumps=%d is not between 0 and %d",
-	      opts->x_align_jumps, MAX_CODE_ALIGN_VALUE);
-
-  if (opts->x_align_functions > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-functions=%d is not between 0 and %d",
-	      opts->x_align_functions, MAX_CODE_ALIGN_VALUE);
-
-  if (opts->x_align_labels > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-labels=%d is not between 0 and %d",
-	      opts->x_align_labels, MAX_CODE_ALIGN_VALUE);
 }
 
 #define LEFT_COLUMN	27
@@ -1779,6 +1759,78 @@ parse_no_sanitize_attribute (char *value)
   return flags;
 }
 
+/* Parse -falign-NAME format for a FLAG value.  Return individual
+   parsed integer values into RESULT_VALUES array.  If REPORT_ERROR is
+   set, print error message at LOC location.  */
+
+bool
+parse_and_check_align_values (const char *flag,
+			      const char *name,
+			      auto_vec<unsigned> &result_values,
+			      bool report_error,
+			      location_t loc)
+{
+  char *str = xstrdup (flag);
+  for (char *p = strtok (str, ":"); p; p = strtok (NULL, ":"))
+    {
+      char *end;
+      int v = strtol (p, &end, 10);
+      if (*end != '\0' || v < 0)
+	{
+	  if (report_error)
+	    error_at (loc, "invalid arguments for %<-falign-%s%> option: %qs",
+		      name, flag);
+
+	  return false;
+	}
+
+      result_values.safe_push ((unsigned)v);
+    }
+
+  free (str);
+
+  /* Check that we have a correct number of values.  */
+#ifdef SUBALIGN_LOG
+  unsigned max_valid_values = 4;
+#else
+  unsigned max_valid_values = 2;
+#endif
+
+  if (result_values.is_empty ()
+      || result_values.length () > max_valid_values)
+    {
+      if (report_error)
+	error_at (loc, "invalid number of arguments for %<-falign-%s%> "
+		  "option: %qs", name, flag);
+      return false;
+    }
+
+  /* Comes from final.c -- no real reason to change it.  */
+#define MAX_CODE_ALIGN 16
+#define MAX_CODE_ALIGN_VALUE (1 << MAX_CODE_ALIGN)
+
+  for (unsigned i = 0; i < result_values.length (); i++)
+    if (result_values[i] > MAX_CODE_ALIGN_VALUE)
+      {
+	if (report_error)
+	  error_at (loc, "%<-falign-%s%> is not between 0 and %d",
+		    name, MAX_CODE_ALIGN_VALUE);
+	return false;
+      }
+
+  return true;
+}
+
+/* Check that alignment value FLAG for -falign-NAME is valid at a given
+   location LOC.  */
+
+static void
+check_alignment_argument (location_t loc, const char *flag, const char *name)
+{
+  auto_vec<unsigned> align_result;
+  parse_and_check_align_values (flag, name, align_result, true, loc);
+}
+
 /* Handle target- and language-independent options.  Return zero to
    generate an "unknown option" message.  Only options that need
    extra handling need to be listed here; if you simply want
@@ -2501,6 +2553,22 @@ common_handle_option (struct gcc_options *opts,
       opts->x_flag_ipa_icf_variables = value;
       break;
 
+    case OPT_falign_loops_:
+      check_alignment_argument (loc, arg, "loops");
+      break;
+
+    case OPT_falign_jumps_:
+      check_alignment_argument (loc, arg, "jumps");
+      break;
+
+    case OPT_falign_labels_:
+      check_alignment_argument (loc, arg, "labels");
+      break;
+
+    case OPT_falign_functions_:
+      check_alignment_argument (loc, arg, "functions");
+      break;
+
     default:
       /* If the flag was handled in a standard way, assume the lack of
 	 processing here is intentional.  */
diff --git a/gcc/opts.h b/gcc/opts.h
index 3c4065eae92..3723bdbf95b 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -442,4 +442,11 @@ extern const char *candidates_list_and_hint (const char *arg, char *&str,
 					     const auto_vec <const char *> &
 					     candidates);
 
+
+extern bool parse_and_check_align_values (const char *flag,
+					  const char *name,
+					  auto_vec<unsigned> &result_values,
+					  bool report_error,
+					  location_t loc);
+
 #endif
diff --git a/gcc/testsuite/gcc.dg/pr84100.c b/gcc/testsuite/gcc.dg/pr84100.c
index 86fbc4f7a3e..676d0c78dea 100644
--- a/gcc/testsuite/gcc.dg/pr84100.c
+++ b/gcc/testsuite/gcc.dg/pr84100.c
@@ -8,7 +8,7 @@ __attribute__((optimize ("align-loops=16", "align-jumps=16",
 			 "align-labels=16", "align-functions=16")))
 void
 foo (void)
-{			/* { dg-bogus "bad option" } */
+{			/* { dg-warning "bad option" } */
   for (int i = 0; i < 1024; ++i)
     bar ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/falign-functions-2.c b/gcc/testsuite/gcc.target/i386/falign-functions-2.c
new file mode 100644
index 00000000000..26d505e3bea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/falign-functions-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64:8" } */
+
+void
+a (void)
+{
+}
+
+#pragma GCC push_options
+#pragma GCC optimize "align-functions=128:100"
+void b (void)
+{
+}
+#pragma GCC pop_options
+
+void
+__attribute__((optimize("-falign-functions=88:88:32")))
+c (void)
+{
+}
+
+void
+d (void)
+{
+}
+
+/* { dg-final { scan-assembler-times ".p2align 6,,7" 2 } } */
+/* { dg-final { scan-assembler-times ".p2align 7,,99" 1 } } */
+/* { dg-final { scan-assembler-times ".p2align 7,,87" 1 } } */
+/* { dg-final { scan-assembler-times ".p2align 5" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/falign-functions.c b/gcc/testsuite/gcc.target/i386/falign-functions.c
new file mode 100644
index 00000000000..27daa1d0e6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/falign-functions.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64:8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}
diff --git a/gcc/toplev.c b/gcc/toplev.c
index d1080968833..cf7bab655bd 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1197,29 +1197,120 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
-/* Default the align_* variables to 1 if they're still unset, and
-   set up the align_*_log variables.  */
+/* Parse "N[:M][:...]" into struct align_flags A.
+   VALUES contains parsed values (in reverse order), all processed
+   values are popped.  */
+
 static void
-init_alignments (void)
+read_log_maxskip (auto_vec<unsigned> &values, align_flags_tuple *a)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  unsigned n = values.pop ();
+  if (n != 0)
+    a->log = floor_log2 (n * 2 - 1);
+  if (values.is_empty ())
+    a->maxskip = n ? n - 1 : 0;
+  else
+    {
+      unsigned m = values.pop ();
+      if (m > n)
+	m = n;
+      /* -falign-foo=N:M means M-1 max bytes of padding, not M.  */
+      if (m > 0)
+	m--;
+      a->maxskip = m;
+    }
+}
+
+/* Parse "N[:M[:N2[:M2]]]" string FLAG into a pair of struct align_flags.  */
+
+static void
+parse_N_M (const char *flag, align_flags &a, unsigned int min_align_log)
+{
+  if (flag)
+    {
+      static hash_map <nofree_string_hash, align_flags> cache;
+      align_flags *entry = cache.get (flag);
+      if (entry)
+	{
+	  a = *entry;
+	  return;
+	}
+
+      auto_vec<unsigned> result_values;
+      bool r = parse_and_check_align_values (flag, NULL, result_values, false,
+					     UNKNOWN_LOCATION);
+      if (!r)
+	return;
+
+      /* Reverse values for easier manipulation.  */
+      result_values.reverse ();
+
+      read_log_maxskip (result_values, &a.levels[0]);
+      if (!result_values.is_empty ())
+	read_log_maxskip (result_values, &a.levels[1]);
+#ifdef SUBALIGN_LOG
+      else
+	{
+	  /* N2[:M2] is not specified.  This arch has a default for N2.
+	     Before -falign-foo=N:M:N2:M2 was introduced, x86 had a tweak.
+	     -falign-functions=N with N > 8 was adding secondary alignment.
+	     -falign-functions=10 was emitting this before every function:
+			.p2align 4,,9
+			.p2align 3
+	     Now this behavior (and more) can be explicitly requested:
+	     -falign-functions=16:10:8
+	     Retain old behavior if N2 is missing: */
+
+	  int align = 1 << a.levels[0].log;
+	  int subalign = 1 << SUBALIGN_LOG;
+
+	  if (a.levels[0].log > SUBALIGN_LOG
+	      && a.levels[0].maxskip >= subalign - 1)
+	    {
+	      /* Set N2 unless subalign can never have any effect.  */
+	      if (align > a.levels[0].maxskip + 1)
+		a.levels[1].log = SUBALIGN_LOG;
+	    }
+	}
+#endif
+
+      /* Cache seen value.  */
+      cache.put (flag, a);
+    }
+  else
+    {
+      /* Reset values to zero.  */
+      for (unsigned i = 0; i < 2; i++)
+	{
+	  a.levels[i].log = 0;
+	  a.levels[i].maxskip = 0;
+	}
+    }
+
+  if ((unsigned int)a.levels[0].log < min_align_log)
+    {
+      a.levels[0].log = min_align_log;
+      a.levels[0].maxskip = (1 << min_align_log) - 1;
+    }
+}
+
+/* Minimum alignment requirements, if arch has them.  */
+
+unsigned int min_align_loops_log = 0;
+unsigned int min_align_jumps_log = 0;
+unsigned int min_align_labels_log = 0;
+unsigned int min_align_functions_log = 0;
+
+/* Process -falign-foo=N[:M[:N2[:M2]]] options.  */
+
+void
+parse_alignment_opts (void)
+{
+  parse_N_M (str_align_loops, state_align_loops, min_align_loops_log);
+  parse_N_M (str_align_jumps, state_align_jumps, min_align_jumps_log);
+  parse_N_M (str_align_labels, state_align_labels, min_align_labels_log);
+  parse_N_M (str_align_functions, state_align_functions,
+	     min_align_functions_log);
 }
 
 /* Process the options that have been parsed.  */
@@ -1722,9 +1813,6 @@ process_options (void)
 static void
 backend_init_target (void)
 {
-  /* Initialize alignment variables.  */
-  init_alignments ();
-
   /* This depends on stack_pointer_rtx.  */
   init_fake_stack_mems ();
 
diff --git a/gcc/toplev.h b/gcc/toplev.h
index c97375b1ca1..98f3ceea872 100644
--- a/gcc/toplev.h
+++ b/gcc/toplev.h
@@ -93,6 +93,13 @@ extern bool set_src_pwd		       (const char *);
 extern HOST_WIDE_INT get_random_seed (bool);
 extern void set_random_seed (const char *);
 
+extern unsigned int min_align_loops_log;
+extern unsigned int min_align_jumps_log;
+extern unsigned int min_align_labels_log;
+extern unsigned int min_align_functions_log;
+
+extern void parse_alignment_opts (void);
+
 extern void initialize_rtl (void);
 
 #endif /* ! GCC_TOPLEV_H */
diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
index da3a7efbe5e..4bb420cb5e8 100644
--- a/gcc/tree-streamer-in.c
+++ b/gcc/tree-streamer-in.c
@@ -530,7 +530,7 @@ streamer_read_tree_bitfields (struct lto_input_block *ib,
     unpack_ts_translation_unit_decl_value_fields (data_in, &bp, expr);
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
-    cl_optimization_stream_in (&bp, TREE_OPTIMIZATION (expr));
+    cl_optimization_stream_in (data_in, &bp, TREE_OPTIMIZATION (expr));
 
   if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
     {
diff --git a/gcc/tree-streamer-out.c b/gcc/tree-streamer-out.c
index 59db3b906af..8b20f0a74e0 100644
--- a/gcc/tree-streamer-out.c
+++ b/gcc/tree-streamer-out.c
@@ -466,7 +466,7 @@ streamer_write_tree_bitfields (struct output_block *ob, tree expr)
     pack_ts_translation_unit_decl_value_fields (ob, &bp, expr);
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
-    cl_optimization_stream_out (&bp, TREE_OPTIMIZATION (expr));
+    cl_optimization_stream_out (ob, &bp, TREE_OPTIMIZATION (expr));
 
   if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
     bp_pack_var_len_unsigned (&bp, CONSTRUCTOR_NELTS (expr));
diff --git a/gcc/tree.c b/gcc/tree.c
index 8fc206d0abb..afd41d42dd7 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -11972,30 +11972,18 @@ cl_option_hasher::equal (tree x, tree y)
 {
   const_tree const xt = x;
   const_tree const yt = y;
-  const char *xp;
-  const char *yp;
-  size_t len;
 
   if (TREE_CODE (xt) != TREE_CODE (yt))
     return 0;
 
   if (TREE_CODE (xt) == OPTIMIZATION_NODE)
-    {
-      xp = (const char *)TREE_OPTIMIZATION (xt);
-      yp = (const char *)TREE_OPTIMIZATION (yt);
-      len = sizeof (struct cl_optimization);
-    }
-
+    return cl_optimization_option_eq (TREE_OPTIMIZATION (xt),
+				      TREE_OPTIMIZATION (yt));
   else if (TREE_CODE (xt) == TARGET_OPTION_NODE)
-    {
-      return cl_target_option_eq (TREE_TARGET_OPTION (xt),
-				  TREE_TARGET_OPTION (yt));
-    }
-
+    return cl_target_option_eq (TREE_TARGET_OPTION (xt),
+				TREE_TARGET_OPTION (yt));
   else
     gcc_unreachable ();
-
-  return (memcmp (xp, yp, len) == 0);
 }
 
 /* Build an OPTIMIZATION_NODE based on the options in OPTS.  */
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 5769bc6d63e..81f460643ea 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1805,17 +1805,21 @@ assemble_start_function (tree decl, const char *fnname)
       && optimize_function_for_speed_p (cfun))
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
-      int align_log = align_functions_log;
+      int align_log = state_align_functions.levels[0].log;
 #endif
-      int max_skip = align_functions - 1;
+      int max_skip = state_align_functions.levels[0].maxskip;
       if (flag_limit_function_alignment && crtl->max_insn_address > 0
 	  && max_skip >= crtl->max_insn_address)
 	max_skip = crtl->max_insn_address - 1;
 
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
       ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
+      if (max_skip == state_align_functions.levels[0].maxskip)
+	ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
+				   state_align_functions.levels[1].log,
+				   state_align_functions.levels[1].maxskip);
 #else
-      ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
+      ASM_OUTPUT_ALIGN (asm_out_file, state_align_functions.levels[0].log);
 #endif
     }
 
-- 
2.18.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03 12:51             ` Martin Liška
@ 2018-07-03 13:23               ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2018-07-03 13:23 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jeff Law, gcc-patches, dvlasenk

On Tue, Jul 03, 2018 at 02:51:27PM +0200, Martin Liška wrote:
> On 07/03/2018 12:58 PM, Segher Boessenkool wrote:
> > On Tue, Jul 03, 2018 at 12:15:48PM +0200, Martin Liška wrote:
> >>> toplev.c already has (in init_alignments):
> >>>
> >>>   if (align_jumps_max_skip > align_jumps)
> >>>     align_jumps_max_skip = align_jumps - 1;
> >>
> >> I'm rewriting this logic in the patch set. Issue is that 
> >> checking for value of align_jumps_max_skip is done
> >> in rs6000_option_override_internal, which is place before
> >> align_jumps_max_skip is parsed.
> >>
> >> That said, 'align_jumps_max_skip <= 0' is always true.
> > 
> > It's not clear to me what you want me to do.
> > 
> > You should write your patch so that the end result behaves the same as
> > before, on all targets.  If that requires changing (or at least checking)
> > all targets, then you have a lot of work to do.
> > 
> > If you think the rs6000 backend is doing something wrong, please say
> > what exactly?  I don't see it.
> 
> Uf, it's quite complicated I would say.
> So first I believe for all -falign-{labels,loops,jumps} we don't handle properly
> value of the argument. More precisely for a value of N (not power of 2),
> we don't respect max_skip and we generate alignment to M, where M is first bigger
> power of 2 number. Example:
> 
> $ gcc /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-labels=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
>       1 	.align 32
>     132 	.p2align 11
>       7 	.p2align 4,,15
> 
> 2^11 == 2048, but I would expect '.p2align 11,,1024' to be generated. That's what you get for function alignment:
> 
> $ gcc /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-functions=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
>       1 	.align 32
>       7 	.p2align 11,,1024
>      55 	.p2align 3
>      48 	.p2align 4,,10
> 
> Do I understand that correctly that it's broken?

Yes, this behaviour contradicts our documentation:

'-falign-labels=N'
     Align all branch targets to a power-of-two boundary, skipping up to
     N bytes like '-falign-functions'.

'-falign-functions=N'
     Align the start of functions to the next power-of-two greater than
     N, skipping up to N bytes.  For instance, '-falign-functions=32'
     aligns functions to the next 32-byte boundary, but
     '-falign-functions=24' aligns to the next 32-byte boundary only if
     this can be done by skipping 23 bytes or less.

> On powerpc, because align_jumps_max_skip is set to 15, then we see inconsistency like:
> 
> ./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-jumps=14 -c -S  -o /dev/stdout | grep align | sort | uniq -c
> ...
>      27 	.p2align 4,,13
> ...
> 
> which is correct.
> 
> but:
> 
> ./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-jumps=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
> ...
>      27 	.p2align 11,,15
> ...
> 
> Here 11,,15 is completely broken value.

Yup.

This is specific to align-jumps...  Not many people ever change that :-)


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03 10:58           ` Segher Boessenkool
@ 2018-07-03 12:51             ` Martin Liška
  2018-07-03 13:23               ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Liška @ 2018-07-03 12:51 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, gcc-patches, dvlasenk

On 07/03/2018 12:58 PM, Segher Boessenkool wrote:
> On Tue, Jul 03, 2018 at 12:15:48PM +0200, Martin Liška wrote:
>>> toplev.c already has (in init_alignments):
>>>
>>>   if (align_jumps_max_skip > align_jumps)
>>>     align_jumps_max_skip = align_jumps - 1;
>>
>> I'm rewriting this logic in the patch set. Issue is that 
>> checking for value of align_jumps_max_skip is done
>> in rs6000_option_override_internal, which is place before
>> align_jumps_max_skip is parsed.
>>
>> That said, 'align_jumps_max_skip <= 0' is always true.
> 
> It's not clear to me what you want me to do.
> 
> You should write your patch so that the end result behaves the same as
> before, on all targets.  If that requires changing (or at least checking)
> all targets, then you have a lot of work to do.
> 
> If you think the rs6000 backend is doing something wrong, please say
> what exactly?  I don't see it.

Uf, it's quite complicated I would say.
So first I believe for all -falign-{labels,loops,jumps} we don't handle properly
value of the argument. More precisely for a value of N (not power of 2),
we don't respect max_skip and we generate alignment to M, where M is first bigger
power of 2 number. Example:

$ gcc /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-labels=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
      1 	.align 32
    132 	.p2align 11
      7 	.p2align 4,,15

2^11 == 2048, but I would expect '.p2align 11,,1024' to be generated. That's what you get for function alignment:

$ gcc /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-functions=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
      1 	.align 32
      7 	.p2align 11,,1024
     55 	.p2align 3
     48 	.p2align 4,,10

Do I understand that correctly that it's broken?

On powerpc, because align_jumps_max_skip is set to 15, then we see inconsistency like:

./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-jumps=14 -c -S  -o /dev/stdout | grep align | sort | uniq -c
...
     27 	.p2align 4,,13
...

which is correct.

but:

./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 -falign-jumps=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
...
     27 	.p2align 11,,15
...

Here 11,,15 is completely broken value.

Martin


> 
> Still confused,
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03 10:16         ` Martin Liška
@ 2018-07-03 10:58           ` Segher Boessenkool
  2018-07-03 12:51             ` Martin Liška
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2018-07-03 10:58 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jeff Law, gcc-patches, dvlasenk

On Tue, Jul 03, 2018 at 12:15:48PM +0200, Martin Liška wrote:
> > toplev.c already has (in init_alignments):
> > 
> >   if (align_jumps_max_skip > align_jumps)
> >     align_jumps_max_skip = align_jumps - 1;
> 
> I'm rewriting this logic in the patch set. Issue is that 
> checking for value of align_jumps_max_skip is done
> in rs6000_option_override_internal, which is place before
> align_jumps_max_skip is parsed.
> 
> That said, 'align_jumps_max_skip <= 0' is always true.

It's not clear to me what you want me to do.

You should write your patch so that the end result behaves the same as
before, on all targets.  If that requires changing (or at least checking)
all targets, then you have a lot of work to do.

If you think the rs6000 backend is doing something wrong, please say
what exactly?  I don't see it.

Still confused,


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03  9:55       ` Segher Boessenkool
@ 2018-07-03 10:16         ` Martin Liška
  2018-07-03 10:58           ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Liška @ 2018-07-03 10:16 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, gcc-patches, dvlasenk

On 07/03/2018 11:55 AM, Segher Boessenkool wrote:
> On Tue, Jul 03, 2018 at 10:53:20AM +0200, Martin Liška wrote:
>> On 06/29/2018 09:04 PM, Jeff Law wrote:
>>> I think this is fine for the trunk.
>>>
>>> jeff
>>
>> Thank you Jeff.
>>
>> I found some issues when doing build of all targets (contrib/config-list.mk).
>> I'll update patch and test that affected cross-compilers still produce same output.
>>
>> However I noticed one ppc64 issue:
>>
>> $ cat -n gcc/config/powerpcspe/powerpcspe.c
>>
>>   5401        /* Set branch target alignment, if not optimizing for size.  */
>>   5402        if (!optimize_size)
>>   5403          {
>>   5404            /* Cell wants to be aligned 8byte for dual issue.  Titan wants to be
>>   5405               aligned 8byte to avoid misprediction by the branch predictor.  */
>>   5406            if (rs6000_cpu == PROCESSOR_TITAN
>>   5407                || rs6000_cpu == PROCESSOR_CELL)
>>   5408              {
>>   5409                if (align_functions <= 0)
>>   5410                  align_functions = 8;
>>   5411                if (align_jumps <= 0)
>>   5412                  align_jumps = 8;
>>   5413                if (align_loops <= 0)
>>   5414                  align_loops = 8;
>>   5415              }
>>   5416            if (rs6000_align_branch_targets)
>>   5417              {
>>   5418                if (align_functions <= 0)
>>   5419                  align_functions = 16;
>>   5420                if (align_jumps <= 0)
>>   5421                  align_jumps = 16;
>>   5422                if (align_loops <= 0)
>>   5423                  {
>>   5424                    can_override_loop_align = 1;
>>   5425                    align_loops = 16;
>>   5426                  }
>>   5427              }
>>   5428            if (align_jumps_max_skip <= 0)
>>   5429              align_jumps_max_skip = 15;
>>   5430            if (align_loops_max_skip <= 0)
>>   5431              align_loops_max_skip = 15;
>>
>> Note that at line 5429 there's set of align_jumps_max_skip to 15 if not set by default.
>> At line 5412 align_jumps is set to 8, and align_jumps_max_skip should be equal align_jumps - 1.
>> That's a discrepancy. Segher can you please take a look?
> 
> This is powerpcspe, that's not mine.
> 
> But rs6000 has the same code, sure.

Right, that why I wrote to you.

> Why do you say "align_jumps_max_skip
> should be equal align_jumps - 1"?  If that were true, why does it exist
> at all?
> 
> toplev.c already has (in init_alignments):
> 
>   if (align_jumps_max_skip > align_jumps)
>     align_jumps_max_skip = align_jumps - 1;

I'm rewriting this logic in the patch set. Issue is that 
checking for value of align_jumps_max_skip is done
in rs6000_option_override_internal, which is place before
align_jumps_max_skip is parsed.

That said, 'align_jumps_max_skip <= 0' is always true.

Martin

> 
> so why would targets duplicate that logic?  (The target override is called
> before init_alignments).
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-07-03  8:53     ` Martin Liška
@ 2018-07-03  9:55       ` Segher Boessenkool
  2018-07-03 10:16         ` Martin Liška
  2018-07-03 19:12       ` Martin Liška
  1 sibling, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2018-07-03  9:55 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jeff Law, gcc-patches, dvlasenk

On Tue, Jul 03, 2018 at 10:53:20AM +0200, Martin Liška wrote:
> On 06/29/2018 09:04 PM, Jeff Law wrote:
> > I think this is fine for the trunk.
> > 
> > jeff
> 
> Thank you Jeff.
> 
> I found some issues when doing build of all targets (contrib/config-list.mk).
> I'll update patch and test that affected cross-compilers still produce same output.
> 
> However I noticed one ppc64 issue:
> 
> $ cat -n gcc/config/powerpcspe/powerpcspe.c
> 
>   5401        /* Set branch target alignment, if not optimizing for size.  */
>   5402        if (!optimize_size)
>   5403          {
>   5404            /* Cell wants to be aligned 8byte for dual issue.  Titan wants to be
>   5405               aligned 8byte to avoid misprediction by the branch predictor.  */
>   5406            if (rs6000_cpu == PROCESSOR_TITAN
>   5407                || rs6000_cpu == PROCESSOR_CELL)
>   5408              {
>   5409                if (align_functions <= 0)
>   5410                  align_functions = 8;
>   5411                if (align_jumps <= 0)
>   5412                  align_jumps = 8;
>   5413                if (align_loops <= 0)
>   5414                  align_loops = 8;
>   5415              }
>   5416            if (rs6000_align_branch_targets)
>   5417              {
>   5418                if (align_functions <= 0)
>   5419                  align_functions = 16;
>   5420                if (align_jumps <= 0)
>   5421                  align_jumps = 16;
>   5422                if (align_loops <= 0)
>   5423                  {
>   5424                    can_override_loop_align = 1;
>   5425                    align_loops = 16;
>   5426                  }
>   5427              }
>   5428            if (align_jumps_max_skip <= 0)
>   5429              align_jumps_max_skip = 15;
>   5430            if (align_loops_max_skip <= 0)
>   5431              align_loops_max_skip = 15;
> 
> Note that at line 5429 there's set of align_jumps_max_skip to 15 if not set by default.
> At line 5412 align_jumps is set to 8, and align_jumps_max_skip should be equal align_jumps - 1.
> That's a discrepancy. Segher can you please take a look?

This is powerpcspe, that's not mine.

But rs6000 has the same code, sure.  Why do you say "align_jumps_max_skip
should be equal align_jumps - 1"?  If that were true, why does it exist
at all?

toplev.c already has (in init_alignments):

  if (align_jumps_max_skip > align_jumps)
    align_jumps_max_skip = align_jumps - 1;

so why would targets duplicate that logic?  (The target override is called
before init_alignments).


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-06-29 19:05   ` Jeff Law
@ 2018-07-03  8:53     ` Martin Liška
  2018-07-03  9:55       ` Segher Boessenkool
  2018-07-03 19:12       ` Martin Liška
  0 siblings, 2 replies; 26+ messages in thread
From: Martin Liška @ 2018-07-03  8:53 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: dvlasenk, Segher Boessenkool

On 06/29/2018 09:04 PM, Jeff Law wrote:
> I think this is fine for the trunk.
> 
> jeff

Thank you Jeff.

I found some issues when doing build of all targets (contrib/config-list.mk).
I'll update patch and test that affected cross-compilers still produce same output.

However I noticed one ppc64 issue:

$ cat -n gcc/config/powerpcspe/powerpcspe.c

  5401        /* Set branch target alignment, if not optimizing for size.  */
  5402        if (!optimize_size)
  5403          {
  5404            /* Cell wants to be aligned 8byte for dual issue.  Titan wants to be
  5405               aligned 8byte to avoid misprediction by the branch predictor.  */
  5406            if (rs6000_cpu == PROCESSOR_TITAN
  5407                || rs6000_cpu == PROCESSOR_CELL)
  5408              {
  5409                if (align_functions <= 0)
  5410                  align_functions = 8;
  5411                if (align_jumps <= 0)
  5412                  align_jumps = 8;
  5413                if (align_loops <= 0)
  5414                  align_loops = 8;
  5415              }
  5416            if (rs6000_align_branch_targets)
  5417              {
  5418                if (align_functions <= 0)
  5419                  align_functions = 16;
  5420                if (align_jumps <= 0)
  5421                  align_jumps = 16;
  5422                if (align_loops <= 0)
  5423                  {
  5424                    can_override_loop_align = 1;
  5425                    align_loops = 16;
  5426                  }
  5427              }
  5428            if (align_jumps_max_skip <= 0)
  5429              align_jumps_max_skip = 15;
  5430            if (align_loops_max_skip <= 0)
  5431              align_loops_max_skip = 15;

Note that at line 5429 there's set of align_jumps_max_skip to 15 if not set by default.
At line 5412 align_jumps is set to 8, and align_jumps_max_skip should be equal align_jumps - 1.
That's a discrepancy. Segher can you please take a look?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-05-25 11:04 ` [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]] marxin
@ 2018-06-29 19:05   ` Jeff Law
  2018-07-03  8:53     ` Martin Liška
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Law @ 2018-06-29 19:05 UTC (permalink / raw)
  To: marxin, gcc-patches; +Cc: dvlasenk

On 05/21/2018 12:58 PM, marxin wrote:
> gcc/ChangeLog:
> 
> 2018-05-25  Denys Vlasenko  <dvlasenk@redhat.com>
> 	    Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/66240
> 	PR target/45996
> 	PR c/84100
> 	* common.opt: Rename align options with 'str_' prefix.
> 	* common/config/i386/i386-common.c (set_malign_value): New
> 	function.
> 	(ix86_handle_option): Use it to set -falign-* options/
> 	* config/aarch64/aarch64-protos.h (struct tune_params): Change
> 	type from int to string.
> 	* config/aarch64/aarch64.c: Update default values from int
> 	to string.
> 	* config/alpha/alpha.c (alpha_override_options_after_change):
> 	Likewise.
> 	* config/arm/arm.c (arm_override_options_after_change_1): Likewise.
> 	* config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
> 	max skip conditionally.
> 	* config/i386/freebsd.h (SUBALIGN_LOG): New.
> 	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
> 	max skip conditionally.
> 	* config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
> 	max skip conditionally.
> 	* config/i386/gnu-user.h (SUBALIGN_LOG): New.
> 	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
> 	max skip conditionally.
> 	* config/i386/i386.c (struct ptt): Change type from int to
> 	string.
> 	(ix86_default_align): Set default values.
> 	* config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print
> 	max skip conditionally.
> 	* config/i386/iamcu.h (SUBALIGN_LOG): New.
> 	(ASM_OUTPUT_MAX_SKIP_ALIGN):
> 	* config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN):
> 	* config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
> 	max skip conditionally.
> 	* config/i386/openbsdelf.h (SUBALIGN_LOG): New.
> 	(ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.:
> 	* config/i386/x86-64.h (SUBALIGN_LOG): New.
> 	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
> 	max skip conditionally.
> 	(ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
> 	* config/mips/mips.c (mips_set_compression_mode): Change
> 	type of constants.
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal):
> 	Likewise.
> 	* config/rx/rx.c (rx_option_override): Likewise.
> 	* config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log.
> 	(LABEL_ALIGN): Use align_labels_log.
> 	(LOOP_ALIGN): Use align_loops_align.
> 	* config/sh/sh.c (sh_override_options_after_change):
> 	Change type of constants.
> 	* config/spu/spu.c (spu_sched_init): Likewise.
> 	* config/visium/visium.c (visium_option_override): Likewise.
> 	* doc/invoke.texi: Document extended format of -falign-*.
> 	* final.c: Use align_labels alignment.
> 	* flags.h (struct target_flag_state): Change type to use
> 	align_flags.
> 	(struct align_flags_tuple): New.
> 	(struct align_flags): Likewise.
> 	(align_loops_log): Redefine macro to use new types.
> 	(align_loops_max_skip): Redefine macro to use new types.
> 	(align_jumps_log): Redefine macro to use new types.
> 	(align_jumps_max_skip): Redefine macro to use new types.
> 	(align_labels_log): Redefine macro to use new types.
> 	(align_labels_max_skip): Redefine macro to use new types.
> 	(align_functions_log): Redefine macro to use new types.
> 	(align_loops): Redefine macro to use new types.
> 	(align_jumps): Redefine macro to use new types.
> 	(align_labels): Redefine macro to use new types.
> 	(align_functions): Redefine macro to use new types.
> 	(align_functions_max_skip): Redefine macro to use new types.
> 	* function.c (invoke_set_current_function_hook): Propagate
> 	alignment values from flags to global variables default in
> 	topleev.h.
> 	* ipa-icf.c (sem_function::equals_wpa): Use
> 	cl_optimization_option_eq instead of memcmp.
> 	* lto-streamer.h (cl_optimization_stream_out): Support streaming
> 	of string types.
> 	(cl_optimization_stream_in): Likewise.
> 	* optc-save-gen.awk: Support strings in cl_optimization.
> 	* opth-gen.awk: Likewise.
> 	* opts.c (finish_options): Remove error checking of invalid
> 	value ranges.
> 	(MAX_CODE_ALIGN): Remove.
> 	(MAX_CODE_ALIGN_VALUE): Likewise.
> 	(parse_and_check_align_values): New function.
> 	(check_alignment_argument): Likewise.
> 	(common_handle_option): Use check_alignment_argument.
> 	* opts.h (parse_and_check_align_values): Declare.
> 	* toplev.c (init_alignments): Remove.
> 	(read_log_maxskip): New.
> 	(parse_N_M): Likewise.
> 	(parse_alignment_opts): Likewise.
> 	(backend_init_target): Remove usage of init_alignments.
> 	* toplev.h (parse_alignment_opts): Declare.
> 	* tree-streamer-in.c (streamer_read_tree_bitfields): Add new
> 	argument.
> 	* tree-streamer-out.c (streamer_write_tree_bitfields): Likewise.
> 	* tree.c (cl_option_hasher::equal): New.
> 	* varasm.c: Use new global macros.
> 
> gcc/lto/ChangeLog:
> 
> 2018-05-25  Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/66240
> 	PR target/45996
> 	PR c/84100
> 	* lto.c (compare_tree_sccs_1): Use cl_optimization_option_eq
> 	instead of memcmp.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-05-25  Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/66240
> 	PR target/45996
> 	PR c/84100
> 	* gcc.dg/pr84100.c (foo):
> 	* gcc.target/i386/falign-functions-2.c: New test.
> 	* gcc.target/i386/falign-functions.c: New test.
I think this is fine for the trunk.

jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]
  2018-05-25 11:04 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 9 marxin
@ 2018-05-25 11:04 ` marxin
  2018-06-29 19:05   ` Jeff Law
  0 siblings, 1 reply; 26+ messages in thread
From: marxin @ 2018-05-25 11:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: dvlasenk

[-- Attachment #1: Type: text/plain, Size: 7116 bytes --]


gcc/ChangeLog:

2018-05-25  Denys Vlasenko  <dvlasenk@redhat.com>
	    Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* common.opt: Rename align options with 'str_' prefix.
	* common/config/i386/i386-common.c (set_malign_value): New
	function.
	(ix86_handle_option): Use it to set -falign-* options/
	* config/aarch64/aarch64-protos.h (struct tune_params): Change
	type from int to string.
	* config/aarch64/aarch64.c: Update default values from int
	to string.
	* config/alpha/alpha.c (alpha_override_options_after_change):
	Likewise.
	* config/arm/arm.c (arm_override_options_after_change_1): Likewise.
	* config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/freebsd.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/gnu-user.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/i386.c (struct ptt): Change type from int to
	string.
	(ix86_default_align): Set default values.
	* config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print
	max skip conditionally.
	* config/i386/iamcu.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN):
	* config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN):
	* config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/openbsdelf.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.:
	* config/i386/x86-64.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	(ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
	* config/mips/mips.c (mips_set_compression_mode): Change
	type of constants.
	* config/rs6000/rs6000.c (rs6000_option_override_internal):
	Likewise.
	* config/rx/rx.c (rx_option_override): Likewise.
	* config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log.
	(LABEL_ALIGN): Use align_labels_log.
	(LOOP_ALIGN): Use align_loops_align.
	* config/sh/sh.c (sh_override_options_after_change):
	Change type of constants.
	* config/spu/spu.c (spu_sched_init): Likewise.
	* config/visium/visium.c (visium_option_override): Likewise.
	* doc/invoke.texi: Document extended format of -falign-*.
	* final.c: Use align_labels alignment.
	* flags.h (struct target_flag_state): Change type to use
	align_flags.
	(struct align_flags_tuple): New.
	(struct align_flags): Likewise.
	(align_loops_log): Redefine macro to use new types.
	(align_loops_max_skip): Redefine macro to use new types.
	(align_jumps_log): Redefine macro to use new types.
	(align_jumps_max_skip): Redefine macro to use new types.
	(align_labels_log): Redefine macro to use new types.
	(align_labels_max_skip): Redefine macro to use new types.
	(align_functions_log): Redefine macro to use new types.
	(align_loops): Redefine macro to use new types.
	(align_jumps): Redefine macro to use new types.
	(align_labels): Redefine macro to use new types.
	(align_functions): Redefine macro to use new types.
	(align_functions_max_skip): Redefine macro to use new types.
	* function.c (invoke_set_current_function_hook): Propagate
	alignment values from flags to global variables default in
	topleev.h.
	* ipa-icf.c (sem_function::equals_wpa): Use
	cl_optimization_option_eq instead of memcmp.
	* lto-streamer.h (cl_optimization_stream_out): Support streaming
	of string types.
	(cl_optimization_stream_in): Likewise.
	* optc-save-gen.awk: Support strings in cl_optimization.
	* opth-gen.awk: Likewise.
	* opts.c (finish_options): Remove error checking of invalid
	value ranges.
	(MAX_CODE_ALIGN): Remove.
	(MAX_CODE_ALIGN_VALUE): Likewise.
	(parse_and_check_align_values): New function.
	(check_alignment_argument): Likewise.
	(common_handle_option): Use check_alignment_argument.
	* opts.h (parse_and_check_align_values): Declare.
	* toplev.c (init_alignments): Remove.
	(read_log_maxskip): New.
	(parse_N_M): Likewise.
	(parse_alignment_opts): Likewise.
	(backend_init_target): Remove usage of init_alignments.
	* toplev.h (parse_alignment_opts): Declare.
	* tree-streamer-in.c (streamer_read_tree_bitfields): Add new
	argument.
	* tree-streamer-out.c (streamer_write_tree_bitfields): Likewise.
	* tree.c (cl_option_hasher::equal): New.
	* varasm.c: Use new global macros.

gcc/lto/ChangeLog:

2018-05-25  Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* lto.c (compare_tree_sccs_1): Use cl_optimization_option_eq
	instead of memcmp.

gcc/testsuite/ChangeLog:

2018-05-25  Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* gcc.dg/pr84100.c (foo):
	* gcc.target/i386/falign-functions-2.c: New test.
	* gcc.target/i386/falign-functions.c: New test.
---
 gcc/common.opt                                     |  16 +--
 gcc/common/config/i386/i386-common.c               |  16 ++-
 gcc/config/aarch64/aarch64-protos.h                |   6 +-
 gcc/config/aarch64/aarch64.c                       |  60 ++++-----
 gcc/config/alpha/alpha.c                           |  12 +-
 gcc/config/arm/arm.c                               |   7 +-
 gcc/config/i386/i386.c                             | 110 ++++++++---------
 gcc/config/mips/mips.c                             |  18 +--
 gcc/config/rs6000/rs6000.c                         |  28 ++---
 gcc/config/rx/rx.c                                 |  18 ++-
 gcc/config/rx/rx.h                                 |   6 +-
 gcc/config/sh/sh.c                                 |  26 ++--
 gcc/config/spu/spu.c                               |   3 +-
 gcc/config/visium/visium.c                         |  19 ++-
 gcc/doc/invoke.texi                                |  66 +++++++---
 gcc/final.c                                        |   6 +
 gcc/flags.h                                        |  66 ++++++----
 gcc/function.c                                     |   3 +
 gcc/ipa-icf.c                                      |   2 +-
 gcc/lto-streamer.h                                 |   6 +-
 gcc/lto/lto.c                                      |   4 +-
 gcc/optc-save-gen.awk                              |  95 ++++++++++++++-
 gcc/opth-gen.awk                                   |   3 +
 gcc/opts.c                                         | 108 ++++++++++++++---
 gcc/opts.h                                         |   7 ++
 gcc/testsuite/gcc.dg/pr84100.c                     |   2 +-
 gcc/testsuite/gcc.target/i386/falign-functions-2.c |  30 +++++
 gcc/testsuite/gcc.target/i386/falign-functions.c   |   8 ++
 gcc/toplev.c                                       | 135 +++++++++++++++++----
 gcc/toplev.h                                       |   7 ++
 gcc/tree-streamer-in.c                             |   2 +-
 gcc/tree-streamer-out.c                            |   2 +-
 gcc/tree.c                                         |  20 +--
 gcc/varasm.c                                       |   9 +-
 34 files changed, 637 insertions(+), 289 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/falign-functions-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/falign-functions.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-Extend-falign-FOO-N-to-N-M-N2-M2.patch --]
[-- Type: text/x-patch; name="0003-Extend-falign-FOO-N-to-N-M-N2-M2.patch", Size: 58141 bytes --]

diff --git a/gcc/common.opt b/gcc/common.opt
index 13ab5c65d43..49e46f09a56 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -950,35 +950,35 @@ Common Report Var(flag_aggressive_loop_optimizations) Optimization Init(1)
 Aggressively optimize loops using language constraints.
 
 falign-functions
-Common Report Var(align_functions,0) Optimization UInteger
+Common Report Var(flag_align_functions) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions) Optimization
+Common RejectNegative Joined Var(str_align_functions) Optimization
 
 flimit-function-alignment
 Common Report Var(flag_limit_function_alignment) Optimization Init(0)
 
 falign-jumps
-Common Report Var(align_jumps,0) Optimization UInteger
+Common Report Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps) Optimization
+Common RejectNegative Joined Var(str_align_jumps) Optimization
 
 falign-labels
-Common Report Var(align_labels,0) Optimization UInteger
+Common Report Var(flag_align_labels) Optimization
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels) Optimization
+Common RejectNegative Joined Var(str_align_labels) Optimization
 
 falign-loops
-Common Report Var(align_loops,0) Optimization UInteger
+Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops) Optimization
+Common RejectNegative Joined Var(str_align_loops)
 
 fargument-alias
 Common Ignore
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 3aa32f5934b..e32dc3c2331 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -275,6 +275,16 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   (OPTION_MASK_ISA2_AVX512F_UNSET | OPTION_MASK_ISA_MPX)
 
+/* Set 1 << value as value of -malign-FLAG option.  */
+
+static void
+set_malign_value (const char **flag, unsigned value)
+{
+  char *r = XNEWVEC (char, 6);
+  sprintf (r, "%d", 1 << value);
+  *flag = r;
+}
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -1317,7 +1327,7 @@ ix86_handle_option (struct gcc_options *opts,
 	error_at (loc, "-malign-loops=%d is not between 0 and %d",
 		  value, MAX_CODE_ALIGN);
       else
-	opts->x_align_loops = 1 << value;
+	set_malign_value (&opts->x_str_align_loops, value);
       return true;
 
     case OPT_malign_jumps_:
@@ -1326,7 +1336,7 @@ ix86_handle_option (struct gcc_options *opts,
 	error_at (loc, "-malign-jumps=%d is not between 0 and %d",
 		  value, MAX_CODE_ALIGN);
       else
-	opts->x_align_jumps = 1 << value;
+	set_malign_value (&opts->x_str_align_jumps, value);
       return true;
 
     case OPT_malign_functions_:
@@ -1336,7 +1346,7 @@ ix86_handle_option (struct gcc_options *opts,
 	error_at (loc, "-malign-functions=%d is not between 0 and %d",
 		  value, MAX_CODE_ALIGN);
       else
-	opts->x_align_functions = 1 << value;
+	set_malign_value (&opts->x_str_align_functions, value);
       return true;
 
     case OPT_mbranch_cost_:
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 4ea50acaa59..e635b74437c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -250,9 +250,9 @@ struct tune_params
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
-  int function_align;
-  int jump_align;
-  int loop_align;
+  const char *function_align;
+  const char *jump_align;
+  const char *loop_align;
   int int_reassoc_width;
   int fp_reassoc_width;
   int vec_reassoc_width;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index afc91850d6f..477af02440e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -637,9 +637,9 @@ static const struct tune_params generic_tunings =
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  8,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -663,9 +663,9 @@ static const struct tune_params cortexa35_tunings =
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -689,9 +689,9 @@ static const struct tune_params cortexa53_tunings =
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -715,9 +715,9 @@ static const struct tune_params cortexa57_tunings =
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -741,9 +741,9 @@ static const struct tune_params cortexa72_tunings =
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -767,9 +767,9 @@ static const struct tune_params cortexa73_tunings =
   2, /* issue_rate.  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  4,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -794,9 +794,9 @@ static const struct tune_params exynosm1_tunings =
   4,	/* memmov_cost  */
   3,	/* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  4,	/* function_align.  */
-  4,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -819,9 +819,9 @@ static const struct tune_params thunderxt88_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -870,9 +870,9 @@ static const struct tune_params xgene1_tunings =
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -896,9 +896,9 @@ static const struct tune_params qdf24xx_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 26d89f3ea13..5c688d1fe81 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -614,13 +614,13 @@ alpha_override_options_after_change (void)
   /* ??? Kludge these by not doing anything if we don't optimize.  */
   if (optimize > 0)
     {
-      if (align_loops <= 0)
-	align_loops = 16;
-      if (align_jumps <= 0)
-	align_jumps = 16;
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = "16";
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = "16";
     }
-  if (align_functions <= 0)
-    align_functions = 16;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = "16";
 }
 \f
 /* Returns 1 if VALUE is a mask that contains full bytes of zero or ones.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c70be366ed8..e0b3c94c6fb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2952,9 +2952,10 @@ static GTY(()) tree init_optimize;
 static void
 arm_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (opts->x_align_functions <= 0)
-    opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags)
-      && opts->x_optimize_size ? 2 : 4;
+  /* -falign-functions without argument: supply one.  */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags)
+      && opts->x_optimize_size ? "2" : "4";
 }
 
 /* Implement targetm.override_options_after_change.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 637c10565d5..3796f492fab 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -837,52 +837,57 @@ struct ptt
 {
   const char *const name;			/* processor name  */
   const struct processor_costs *cost;		/* Processor costs */
-  const int align_loop;				/* Default alignments.  */
-  const int align_loop_max_skip;
-  const int align_jump;
-  const int align_jump_max_skip;
-  const int align_func;
+
+  /* Default alignments.  */
+  const char *const align_loop;
+  const char *const align_jump;
+  const char *const align_label;
+  const char *const align_func;
 };
 
 /* This table must be in sync with enum processor_type in i386.h.  */ 
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {"generic", &generic_cost, 16, 10, 16, 10, 16},
-  {"i386", &i386_cost, 4, 3, 4, 3, 4},
-  {"i486", &i486_cost, 16, 15, 16, 15, 16},
-  {"pentium", &pentium_cost, 16, 7, 16, 7, 16},
-  {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16},
-  {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16},
-  {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0},
-  {"nocona", &nocona_cost, 0, 0, 0, 0, 0},
-  {"core2", &core_cost, 16, 10, 16, 10, 16},
-  {"nehalem", &core_cost, 16, 10, 16, 10, 16},
-  {"sandybridge", &core_cost, 16, 10, 16, 10, 16},
-  {"haswell", &core_cost, 16, 10, 16, 10, 16},
-  {"bonnell", &atom_cost, 16, 15, 16, 7, 16},
-  {"silvermont", &slm_cost, 16, 15, 16, 7, 16},
-  {"goldmont", &slm_cost, 16, 15, 16, 7, 16},
-  {"goldmont-plus", &slm_cost, 16, 15, 16, 7, 16},
-  {"knl", &slm_cost, 16, 15, 16, 7, 16},
-  {"knm", &slm_cost, 16, 15, 16, 7, 16},
-  {"skylake", &skylake_cost, 16, 10, 16, 10, 16},
-  {"skylake-avx512", &skylake_cost, 16, 10, 16, 10, 16},
-  {"cannonlake", &skylake_cost, 16, 10, 16, 10, 16},
-  {"icelake-client", &skylake_cost, 16, 10, 16, 10, 16},
-  {"icelake-server", &skylake_cost, 16, 10, 16, 10, 16},
-  {"intel", &intel_cost, 16, 15, 16, 7, 16},
-  {"geode", &geode_cost, 0, 0, 0, 0, 0},
-  {"k6", &k6_cost, 32, 7, 32, 7, 32},
-  {"athlon", &athlon_cost, 16, 7, 16, 7, 16},
-  {"k8", &k8_cost, 16, 7, 16, 7, 16},
-  {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32},
-  {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11},
-  {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11},
-  {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11},
-  {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11},
-  {"btver1", &btver1_cost, 16, 10, 16, 7, 11},
-  {"btver2", &btver2_cost, 16, 10, 16, 7, 11},
-  {"znver1", &znver1_cost, 16, 15, 16, 15, 16}
+/* The "0:0:8" label alignment specified for some processors generates
+   secondary 8-byte alignment only for those label/jump/loop targets
+   which have primary alignment.  */
+
+  {"generic",        &generic_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"i386",           &i386_cost,       "4",       "4",       NULL,    "4" },
+  {"i486",           &i486_cost,       "16",      "16",      "0:0:8", "16"},
+  {"pentium",        &pentium_cost,    "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"lakemont",       &lakemont_cost,   "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"pentiumpro",     &pentiumpro_cost, "16",      "16:11:8", "0:0:8", "16"},
+  {"pentium4",       &pentium4_cost,   NULL,      NULL,      NULL,    NULL},
+  {"nocona",         &nocona_cost,     NULL,      NULL,      NULL,    NULL},
+  {"core2",          &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"nehalem",        &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"sandybridge",    &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"haswell",        &core_cost,       "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"bonnell",        &atom_cost,       "16",      "16:8:8",  "0:0:8", "16"},
+  {"silvermont",     &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"goldmont",       &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"goldmont-plus",  &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"knl",            &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"knm",            &slm_cost,        "16",      "16:8:8",  "0:0:8", "16"},
+  {"skylake",        &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"skylake-avx512", &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"cannonlake",     &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"icelake-client", &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"icelake-server", &skylake_cost,    "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"intel",          &intel_cost,      "16",      "16:8:8",  "0:0:8", "16"},
+  {"geode",          &geode_cost,      NULL,      NULL,      NULL,    NULL},
+  {"k6",             &k6_cost,         "32:8:8",  "32:8:8",  "0:0:8", "32"},
+  {"athlon",         &athlon_cost,     "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"k8",             &k8_cost,         "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"amdfam10",       &amdfam10_cost,   "32:25:8", "32:8:8",  "0:0:8", "32"},
+  {"bdver1",         &bdver1_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"bdver2",         &bdver2_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"bdver3",         &bdver3_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"bdver4",         &bdver4_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"btver1",         &btver1_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"btver2",         &btver2_cost,     "16:11:8", "16:8:8",  "0:0:8", "11"},
+  {"znver1",         &znver1_cost,     "16",      "16",      "0:0:8", "16"}
 };
 \f
 static unsigned int
@@ -3349,20 +3354,15 @@ set_ix86_tune_features (enum processor_type ix86_tune, bool dump)
 static void
 ix86_default_align (struct gcc_options *opts)
 {
-  if (opts->x_align_loops == 0)
-    {
-      opts->x_align_loops = processor_target_table[ix86_tune].align_loop;
-      align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
-    }
-  if (opts->x_align_jumps == 0)
-    {
-      opts->x_align_jumps = processor_target_table[ix86_tune].align_jump;
-      align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
-    }
-  if (opts->x_align_functions == 0)
-    {
-      opts->x_align_functions = processor_target_table[ix86_tune].align_func;
-    }
+  /* -falign-foo without argument: supply one.  */
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
+    opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop;
+  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
+    opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump;
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
+    opts->x_str_align_labels = processor_target_table[ix86_tune].align_label;
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = processor_target_table[ix86_tune].align_func;
 }
 
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bfe64bb060c..8ae3ef39267 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -491,9 +491,9 @@ unsigned int mips_base_compression_flags;
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
 static int mips_base_move_loop_invariants; /* flag_move_loop_invariants */
-static int mips_base_align_loops; /* align_loops */
-static int mips_base_align_jumps; /* align_jumps */
-static int mips_base_align_functions; /* align_functions */
+static const char *mips_base_align_loops; /* align_loops */
+static const char *mips_base_align_jumps; /* align_jumps */
+static const char *mips_base_align_functions; /* align_functions */
 
 /* Index [M][R] is true if register R is allowed to hold a value of mode M.  */
 static bool mips_hard_regno_mode_ok_p[MAX_MACHINE_MODE][FIRST_PSEUDO_REGISTER];
@@ -19497,12 +19497,12 @@ mips_set_compression_mode (unsigned int compression_mode)
       /* Provide default values for align_* for 64-bit targets.  */
       if (TARGET_64BIT)
 	{
-	  if (align_loops == 0)
-	    align_loops = 8;
-	  if (align_jumps == 0)
-	    align_jumps = 8;
-	  if (align_functions == 0)
-	    align_functions = 8;
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "8";
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "8";
+	  if (flag_align_functions && !str_align_functions)
+	    str_align_functions = "8";
 	}
 
       targetm.min_anchor_offset = -32768;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 26d58fc4c28..2c0a5756e68 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4936,29 +4936,25 @@ rs6000_option_override_internal (bool global_init_p)
 	  if (rs6000_tune == PROCESSOR_TITAN
 	      || rs6000_tune == PROCESSOR_CELL)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 8;
-	      if (align_jumps <= 0)
-		align_jumps = 8;
-	      if (align_loops <= 0)
-		align_loops = 8;
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "8";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "8";
+	      if (flag_align_loops && !str_align_loops)
+		str_align_loops = "8";
 	    }
 	  if (rs6000_align_branch_targets)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 16;
-	      if (align_jumps <= 0)
-		align_jumps = 16;
-	      if (align_loops <= 0)
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "16";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "16";
+	      if (flag_align_loops && !str_align_loops)
 		{
 		  can_override_loop_align = 1;
-		  align_loops = 16;
+		  str_align_loops = "16";
 		}
 	    }
-	  if (align_jumps_max_skip <= 0)
-	    align_jumps_max_skip = 15;
-	  if (align_loops_max_skip <= 0)
-	    align_loops_max_skip = 15;
 	}
 
       /* Arrange to save and restore machine status around nested functions.  */
diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index fe467f7bd3a..af97bef301d 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -2843,12 +2843,18 @@ rx_option_override (void)
   rx_override_options_after_change ();
 
   /* These values are bytes, not log.  */
-  if (align_jumps == 0 && ! optimize_size)
-    align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_loops == 0 && ! optimize_size)
-    align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_labels == 0 && ! optimize_size)
-    align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
+  if (! optimize_size)
+    {
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = ((rx_cpu_type == RX100
+			    || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = ((rx_cpu_type == RX100
+			    || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_labels && !str_align_labels)
+	str_align_labels = ((rx_cpu_type == RX100
+			     || rx_cpu_type == RX200) ? "4" : "8");
+    }
 }
 
 \f
diff --git a/gcc/config/rx/rx.h b/gcc/config/rx/rx.h
index a2aa392ce67..2f5a0e94677 100644
--- a/gcc/config/rx/rx.h
+++ b/gcc/config/rx/rx.h
@@ -417,9 +417,9 @@ typedef unsigned int CUMULATIVE_ARGS;
 /* Compute the alignment needed for label X in various situations.
    If the user has specified an alignment then honour that, otherwise
    use rx_align_for_label.  */
-#define JUMP_ALIGN(x)				(align_jumps > 1 ? align_jumps_log : rx_align_for_label (x, 0))
-#define LABEL_ALIGN(x)				(align_labels > 1 ? align_labels_log : rx_align_for_label (x, 3))
-#define LOOP_ALIGN(x)				(align_loops > 1 ? align_loops_log : rx_align_for_label (x, 2))
+#define JUMP_ALIGN(x)				(align_jumps_log > 0 ? align_jumps_log : rx_align_for_label (x, 0))
+#define LABEL_ALIGN(x)				(align_labels_log > 0 ? align_labels_log : rx_align_for_label (x, 3))
+#define LOOP_ALIGN(x)				(align_loops_log > 0 ? align_loops_log : rx_align_for_label (x, 2))
 #define LABEL_ALIGN_AFTER_BARRIER(x)		rx_align_for_label (x, 0)
 
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(STREAM, LOG, MAX_SKIP)	\
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index ced66408265..e5dc32e45c3 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -1007,29 +1007,29 @@ sh_override_options_after_change (void)
       Aligning all jumps increases the code size, even if it might
       result in slightly faster code.  Thus, it is set to the smallest 
       alignment possible if not specified by the user.  */
-  if (align_loops == 0)
-    align_loops = optimize_size ? 2 : 4;
+  if (flag_align_loops && !str_align_loops)
+    str_align_loops = optimize_size ? "2" : "4";
 
-  if (align_jumps == 0)
-    align_jumps = 2;
-  else if (align_jumps < 2)
-    align_jumps = 2;
+  if (flag_align_jumps && !str_align_jumps)
+    str_align_jumps = "2";
+  else
+    min_align_jumps_log = 1;
 
-  if (align_functions == 0)
-    align_functions = optimize_size ? 2 : 4;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = optimize_size ? "2" : "4";
 
   /* The linker relaxation code breaks when a function contains
      alignments that are larger than that at the start of a
      compilation unit.  */
   if (TARGET_RELAX)
     {
-      int min_align = align_loops > align_jumps ? align_loops : align_jumps;
+      parse_alignment_opts ();
+      min_align_functions_log = align_loops_log > align_jumps_log ?
+				align_loops_log : align_jumps_log;
 
       /* Also take possible .long constants / mova tables into account.	*/
-      if (min_align < 4)
-	min_align = 4;
-      if (align_functions < min_align)
-	align_functions = min_align;
+      if (min_align_functions_log < 2)
+	min_align_functions_log = 2;
     }
 }
 \f
diff --git a/gcc/config/spu/spu.c b/gcc/config/spu/spu.c
index 53935795424..4db01e057a5 100644
--- a/gcc/config/spu/spu.c
+++ b/gcc/config/spu/spu.c
@@ -2769,7 +2769,8 @@ static void
 spu_sched_init (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 		int max_ready ATTRIBUTE_UNUSED)
 {
-  if (align_labels > 4 || align_loops > 4 || align_jumps > 4)
+  parse_alignment_opts ();
+  if (align_labels_log > 2 || align_loops_log > 2 || align_jumps_log > 2)
     {
       /* When any block might be at least 8-byte aligned, assume they
          will all be at least 8-byte aligned to make sure dual issue
diff --git a/gcc/config/visium/visium.c b/gcc/config/visium/visium.c
index 106cdaf9e3f..37de6249797 100644
--- a/gcc/config/visium/visium.c
+++ b/gcc/config/visium/visium.c
@@ -443,12 +443,12 @@ visium_option_override (void)
 
   /* Align functions on 256-byte (32-quadword) for GR5 and 64-byte (8-quadword)
      boundaries for GR6 so they start a new burst mode window.  */
-  if (align_functions == 0)
+  if (flag_align_functions && !str_align_functions)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_functions = 64;
+	str_align_functions = "64";
       else
-	align_functions = 256;
+	str_align_functions = "256";
 
       /* Allow the size of compilation units to double because of inlining.
 	 In practice the global size of the object code is hardly affected
@@ -459,26 +459,25 @@ visium_option_override (void)
     }
 
   /* Likewise for loops.  */
-  if (align_loops == 0)
+  if (flag_align_loops && !str_align_loops)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_loops = 64;
+	str_align_loops = "64";
       else
 	{
-	  align_loops = 256;
 	  /* But not if they are too far away from a 256-byte boundary.  */
-	  align_loops_max_skip = 31;
+	  str_align_loops = "256:32";
 	}
     }
 
   /* Align all jumps on quadword boundaries for the burst mode, and even
      on 8-quadword boundaries for GR6 so they start a new window.  */
-  if (align_jumps == 0)
+  if (flag_align_jumps && !str_align_jumps)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_jumps = 64;
+	str_align_jumps = "64";
       else
-	align_jumps = 8;
+	str_align_jumps = "8";
     }
 
   /* We register a machine-specific pass.  This pass must be scheduled as
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 65f32d67640..9b97bf0a36f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -364,9 +364,11 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations  -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}]  -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
+-falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
+-falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
+-falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] @gol
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fbranch-target-load-optimize  -fbranch-target-load-optimize2 @gol
@@ -9175,19 +9177,36 @@ The @option{-fstrict-aliasing} option is enabled at levels
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n}:@var{m}
+@itemx -falign-functions=@var{n}:@var{m}:@var{n2}
+@itemx -falign-functions=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-functions
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  This ensures that at least
+the first @var{m} bytes of the function can be fetched by the CPU
+without crossing an @var{n}-byte alignment boundary.
 
-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
+
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32:7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.
+
+The second pair of @var{n2}:@var{m2} values allows you to specify
+a secondary alignment: @option{-falign-functions=64:7:32:3} aligns to
+the next 64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.
+If @var{m2} is not specified, it defaults to @var{n2}.
 
 Some assemblers only support this flag when @var{n} is a power of two;
 in that case, it is rounded up.
 
+@option{-fno-align-functions} and @option{-falign-functions=1} are
+equivalent and mean that functions are not aligned.
+
 If @var{n} is not specified or is zero, use a machine-dependent default.
 The maximum allowed @var{n} option value is 65536.
 
@@ -9201,12 +9220,13 @@ skip more bytes than the size of the function.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n}:@var{m}
+@itemx -falign-labels=@var{n}:@var{m}:@var{n2}
+@itemx -falign-labels=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-labels} and @option{-falign-labels=1} are
 equivalent and mean that labels are not aligned.
 
@@ -9221,12 +9241,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n}:@var{m}
+@itemx -falign-loops=@var{n}:@var{m}:@var{n2}
+@itemx -falign-loops=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
-like @option{-falign-functions}.  If the loops are
-executed many times, this makes up for any execution of the dummy
-operations.
+Align loops to a power-of-two boundary.  If the loops are executed
+many times, this makes up for any execution of the dummy padding
+instructions.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-loops} and @option{-falign-loops=1} are
 equivalent and mean that loops are not aligned.
 The maximum allowed @var{n} option value is 65536.
@@ -9237,12 +9260,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n}:@var{m}
+@itemx -falign-jumps=@var{n}:@var{m}:@var{n2}
+@itemx -falign-jumps=@var{n}:@var{m}:@var{n2}:@var{m2}
 @opindex falign-jumps
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
-bytes like @option{-falign-functions}.  In this case, no dummy operations
-need be executed.
+where the targets can only be reached by jumping.  In this case,
+no dummy operations need be executed.
 
+Parameters of this option are analogous to the @option{-falign-functions} option.
 @option{-fno-align-jumps} and @option{-falign-jumps=1} are
 equivalent and mean that loops are not aligned.
 
diff --git a/gcc/final.c b/gcc/final.c
index 4c600f0edf2..7a0bf2d17fd 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -2529,6 +2529,12 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p ATTRIBUTE_UNUSED,
 	    {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
 	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip);
+	      /* Above, we don't know whether a label, jump or loop
+		 alignment was used.  Conservatively apply
+		 label subalignment, not jump or loop
+		 subalignment (they are almost always larger).  */
+	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align_labels.levels[1].log,
+					 align_labels.levels[1].maxskip);
 #else
 #ifdef ASM_OUTPUT_ALIGN_WITH_NOP
               ASM_OUTPUT_ALIGN_WITH_NOP (file, align);
diff --git a/gcc/flags.h b/gcc/flags.h
index d5d4d78e18f..26c7f021f55 100644
--- a/gcc/flags.h
+++ b/gcc/flags.h
@@ -42,19 +42,32 @@ extern bool final_insns_dump_p;
 \f
 /* Other basic status info about current function.  */
 
-/* Target-dependent global state.  */
-struct target_flag_state {
+/* Align flags tuple with alignment in log form and with a maximum skip.  */
+
+struct align_flags_tuple
+{
   /* Values of the -falign-* flags: how much to align labels in code.
-     0 means `use default', 1 means `don't align'.
-     For each variable, there is an _log variant which is the power
-     of two not less than the variable, for .align output.  */
-  int x_align_loops_log;
-  int x_align_loops_max_skip;
-  int x_align_jumps_log;
-  int x_align_jumps_max_skip;
-  int x_align_labels_log;
-  int x_align_labels_max_skip;
-  int x_align_functions_log;
+     log is "align to 2^log" (so 0 means no alignment).
+     maxskip is the maximum allowed amount of padding to insert.  */
+  int log;
+  int maxskip;
+};
+
+/* Target-dependent global state.  */
+
+struct align_flags
+{
+  align_flags_tuple levels[2];
+};
+
+struct target_flag_state
+{
+  /* Each falign-foo can generate up to two levels of alignment:
+     -falign-foo=N:M[:N2:M2] */
+  align_flags x_align_loops;
+  align_flags x_align_jumps;
+  align_flags x_align_labels;
+  align_flags x_align_functions;
 
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
@@ -67,20 +80,21 @@ extern struct target_flag_state *this_target_flag_state;
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops_log \
-  (this_target_flag_state->x_align_loops_log)
-#define align_loops_max_skip \
-  (this_target_flag_state->x_align_loops_max_skip)
-#define align_jumps_log \
-  (this_target_flag_state->x_align_jumps_log)
-#define align_jumps_max_skip \
-  (this_target_flag_state->x_align_jumps_max_skip)
-#define align_labels_log \
-  (this_target_flag_state->x_align_labels_log)
-#define align_labels_max_skip \
-  (this_target_flag_state->x_align_labels_max_skip)
-#define align_functions_log \
-  (this_target_flag_state->x_align_functions_log)
+#define align_loops		 (this_target_flag_state->x_align_loops)
+#define align_jumps		 (this_target_flag_state->x_align_jumps)
+#define align_labels		 (this_target_flag_state->x_align_labels)
+#define align_functions		 (this_target_flag_state->x_align_functions)
+#define align_loops_log		 (align_loops.levels[0].log)
+#define align_jumps_log		 (align_jumps.levels[0].log)
+#define align_labels_log	 (align_labels.levels[0].log)
+#define align_functions_log      (align_functions.levels[0].log)
+#define align_loops_max_skip     (align_loops.levels[0].maxskip)
+#define align_jumps_max_skip     (align_jumps.levels[0].maxskip)
+#define align_labels_max_skip    (align_labels.levels[0].maxskip)
+#define align_functions_max_skip (align_functions.levels[0].maxskip)
+/* String representaions of the above options are available in
+   const char *str_align_foo.  NULL if not set.  */
+
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
diff --git a/gcc/function.c b/gcc/function.c
index 61515e38e47..2aa818be64e 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4823,6 +4823,9 @@ invoke_set_current_function_hook (tree fndecl)
       targetm.set_current_function (fndecl);
       this_fn_optabs = this_target_optabs;
 
+      /* Initialize global alignment variables after op.  */
+      parse_alignment_opts ();
+
       if (opts != optimization_default_node)
 	{
 	  init_tree_optimization_optabs (opts);
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 37e63fc2ba8..8ae461f2584 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -658,7 +658,7 @@ sem_function::equals_wpa (sem_item *item,
   cl_optimization *opt1 = opts_for_fn (decl);
   cl_optimization *opt2 = opts_for_fn (item->decl);
 
-  if (opt1 != opt2 && memcmp (opt1, opt2, sizeof(cl_optimization)))
+  if (opt1 != opt2 && !cl_optimization_option_eq (opt1, opt2))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index d2006fad0ad..27f62056255 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -919,9 +919,11 @@ void cl_target_option_stream_in (struct data_in *,
 				 struct bitpack_d *,
 				 struct cl_target_option *);
 
-void cl_optimization_stream_out (struct bitpack_d *, struct cl_optimization *);
+void cl_optimization_stream_out (struct output_block *,
+				 struct bitpack_d *, struct cl_optimization *);
 
-void cl_optimization_stream_in (struct bitpack_d *, struct cl_optimization *);
+void cl_optimization_stream_in (struct data_in *,
+				struct bitpack_d *, struct cl_optimization *);
 
 
 
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index d2ccaf67689..0ecab1f98f8 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1222,8 +1222,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
       return false;
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
-    if (memcmp (TREE_OPTIMIZATION (t1), TREE_OPTIMIZATION (t2),
-		sizeof (struct cl_optimization)) != 0)
+    if (!cl_optimization_option_eq (TREE_OPTIMIZATION (t1),
+				    TREE_OPTIMIZATION (t2)))
       return false;
 
   if (CODE_CONTAINS_STRUCT (code, TS_BINFO))
diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index 1a365fc883c..6e33a4320c1 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -85,6 +85,7 @@ n_opt_char = 3;
 n_opt_short = 0;
 n_opt_int = 0;
 n_opt_enum = 0;
+n_opt_string = 0;
 n_opt_other = 0;
 var_opt_char[0] = "optimize";
 var_opt_char[1] = "optimize_size";
@@ -123,6 +124,8 @@ for (i = 0; i < n_opts; i++) {
 			else if (otype ~ "^signed +char *$")
 				var_opt_range[name] = "-128, 127"
 		}
+		else if (otype ~ "^const char \\**$")
+			var_opt_string[n_opt_string++] = name;
 		else
 			var_opt_other[n_opt_other++] = name;
 	}
@@ -155,6 +158,10 @@ for (i = 0; i < n_opt_char; i++) {
 	print "  ptr->x_" var_opt_char[i] " = opts->x_" var_opt_char[i] ";";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	print "  ptr->x_" var_opt_string[i] " = opts->x_" var_opt_string[i] ";";
+}
+
 print "}";
 
 print "";
@@ -183,6 +190,10 @@ for (i = 0; i < n_opt_char; i++) {
 	print "  opts->x_" var_opt_char[i] " = ptr->x_" var_opt_char[i] ";";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	print "  opts->x_" var_opt_string[i] " = ptr->x_" var_opt_string[i] ";";
+}
+
 print "  targetm.override_options_after_change ();";
 print "}";
 
@@ -239,6 +250,15 @@ for (i = 0; i < n_opt_char; i++) {
 	print "";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	print "  if (ptr->x_" var_opt_char[i] ")";
+	print "    fprintf (file, \"%*s%s (%s)\\n\",";
+	print "             indent_to, \"\",";
+	print "             \"" var_opt_string[i] "\",";
+	print "             ptr->x_" var_opt_string[i] ");";
+	print "";
+}
+
 print "}";
 
 print "";
@@ -301,6 +321,19 @@ for (i = 0; i < n_opt_char; i++) {
 	print "";
 }
 
+for (i = 0; i < n_opt_string; i++) {
+	name = var_opt_string[i]
+	print "  if (ptr1->x_" name " != ptr2->x_" name "";
+	print "      || (!ptr1->x_" name" || !ptr2->x_" name
+	print "          || strcmp (ptr1->x_" name", ptr2->x_" name ")))";
+	print "    fprintf (file, \"%*s%s (%s/%s)\\n\",";
+	print "             indent_to, \"\",";
+	print "             \"" name "\",";
+	print "             ptr1->x_" name ",";
+	print "             ptr2->x_" name ");";
+	print "";
+}
+
 print "}";
 
 
@@ -766,32 +799,82 @@ for (i = 0; i < n_opt_val; i++) {
 	if (!var_opt_hash[i])
 		continue;
 	name = var_opt_val[i]
-	print "  hstate.add_hwi (ptr->" name");";
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+	{
+		print "  if (ptr->" name")";
+		print "    hstate.add (ptr->" name", strlen (ptr->" name"));";
+		print "  else";
+		print "    hstate.add_int (0);";
+	}
+	else
+		print "  hstate.add_hwi (ptr->" name");";
 }
 print "  return hstate.end ();";
 print "}";
 
+print "";
+print "/* Compare two optimization options  */";
+print "bool";
+print "cl_optimization_option_eq (cl_optimization const *ptr1,";
+print "                           cl_optimization const *ptr2)";
+print "{";
+for (i = 0; i < n_opt_val; i++) {
+	if (!var_opt_hash[i])
+		continue;
+	name = var_opt_val[i]
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+	{
+		print "  if (ptr1->" name" != ptr2->" name;
+		print "      && (!ptr1->" name" || !ptr2->" name
+		print "          || strcmp (ptr1->" name", ptr2->" name ")))";
+		print "    return false;";
+	}
+	else
+	{
+		print "  if (ptr1->" name" != ptr2->" name ")";
+		print "    return false;";
+	}
+}
+print "  return true;";
+print "}";
+
 print "";
 print "/* Stream out optimization options  */";
 print "void";
-print "cl_optimization_stream_out (struct bitpack_d *bp,";
+print "cl_optimization_stream_out (struct output_block *ob,";
+print "                            struct bitpack_d *bp,";
 print "                            struct cl_optimization *ptr)";
 print "{";
 for (i = 0; i < n_opt_val; i++) {
 	name = var_opt_val[i]
-	print "  bp_pack_value (bp, ptr->" name", 64);";
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+		print "  bp_pack_string (ob, bp, ptr->" name", true);";
+	else
+		print "  bp_pack_value (bp, ptr->" name", 64);";
 }
 print "}";
 
 print "";
 print "/* Stream in optimization options  */";
 print "void";
-print "cl_optimization_stream_in (struct bitpack_d *bp,";
-print "                           struct cl_optimization *ptr)";
+print "cl_optimization_stream_in (struct data_in *data_in ATTRIBUTE_UNUSED,";
+print "                           struct bitpack_d *bp ATTRIBUTE_UNUSED,";
+print "                           struct cl_optimization *ptr ATTRIBUTE_UNUSED)";
 print "{";
 for (i = 0; i < n_opt_val; i++) {
 	name = var_opt_val[i]
-	print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value (bp, 64);";
+	otype = var_opt_val_type[i];
+	if (otype ~ "^const char \\**$")
+	{
+	      print "  ptr->" name" = bp_unpack_string (data_in, bp);";
+	      print "  if (ptr->" name")";
+	      print "    ptr->" name" = xstrdup (ptr->" name");";
+	}
+	else
+	      print "  ptr->" name" = (" var_opt_val_type[i] ") bp_unpack_value (bp, 64);";
 }
 print "}";
 }
diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
index fecd4b8a0b5..8358b9b2b67 100644
--- a/gcc/opth-gen.awk
+++ b/gcc/opth-gen.awk
@@ -308,6 +308,9 @@ print "";
 print "/* Hash optimization from a structure.  */";
 print "extern hashval_t cl_optimization_hash (const struct cl_optimization *);";
 print "";
+print "/* Compare two optimization options.  */";
+print "extern bool cl_optimization_option_eq (cl_optimization const *ptr1, cl_optimization const *ptr2);"
+print "";
 print "/* Generator files may not have access to location_t, and don't need these.  */"
 print "#if defined(UNKNOWN_LOCATION)"
 print "bool                                                                  "
diff --git a/gcc/opts.c b/gcc/opts.c
index 33efcc0d6e7..f5f0947faf3 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1039,26 +1039,6 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
   if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_flag_tm)
     sorry ("transactional memory is not supported with "
 	   "%<-fsanitize=kernel-address%>");
-
-  /* Comes from final.c -- no real reason to change it.  */
-#define MAX_CODE_ALIGN 16
-#define MAX_CODE_ALIGN_VALUE (1 << MAX_CODE_ALIGN)
-
-  if (opts->x_align_loops > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-loops=%d is not between 0 and %d",
-	      opts->x_align_loops, MAX_CODE_ALIGN_VALUE);
-
-  if (opts->x_align_jumps > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-jumps=%d is not between 0 and %d",
-	      opts->x_align_jumps, MAX_CODE_ALIGN_VALUE);
-
-  if (opts->x_align_functions > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-functions=%d is not between 0 and %d",
-	      opts->x_align_functions, MAX_CODE_ALIGN_VALUE);
-
-  if (opts->x_align_labels > MAX_CODE_ALIGN_VALUE)
-    error_at (loc, "-falign-labels=%d is not between 0 and %d",
-	      opts->x_align_labels, MAX_CODE_ALIGN_VALUE);
 }
 
 #define LEFT_COLUMN	27
@@ -1779,6 +1759,78 @@ parse_no_sanitize_attribute (char *value)
   return flags;
 }
 
+/* Parse -falign-NAME format for a FLAG value.  Return individual
+   parsed integer values into RESULT_VALUES array.  If REPORT_ERROR is
+   set, print error message at LOC location.  */
+
+bool
+parse_and_check_align_values (const char *flag,
+			      const char *name,
+			      auto_vec<unsigned> &result_values,
+			      bool report_error,
+			      location_t loc)
+{
+  char *str = xstrdup (flag);
+  for (char *p = strtok (str, ":"); p; p = strtok (NULL, ":"))
+    {
+      char *end;
+      int v = strtol (p, &end, 10);
+      if (*end != '\0' || v < 0)
+	{
+	  if (report_error)
+	    error_at (loc, "invalid arguments for %<-falign-%s%> option: %qs",
+		      name, flag);
+
+	  return false;
+	}
+
+      result_values.safe_push ((unsigned)v);
+    }
+
+  free (str);
+
+  /* Check that we have a correct number of values.  */
+#ifdef SUBALIGN_LOG
+  unsigned max_valid_values = 4;
+#else
+  unsigned max_valid_values = 2;
+#endif
+
+  if (result_values.is_empty ()
+      || result_values.length () > max_valid_values)
+    {
+      if (report_error)
+	error_at (loc, "invalid number of arguments for %<-falign-%s%> "
+		  "option: %qs", name, flag);
+      return false;
+    }
+
+  /* Comes from final.c -- no real reason to change it.  */
+#define MAX_CODE_ALIGN 16
+#define MAX_CODE_ALIGN_VALUE (1 << MAX_CODE_ALIGN)
+
+  for (unsigned i = 0; i < result_values.length (); i++)
+    if (result_values[i] > MAX_CODE_ALIGN_VALUE)
+      {
+	if (report_error)
+	  error_at (loc, "%<-falign-%s%> is not between 0 and %d",
+		    name, MAX_CODE_ALIGN_VALUE);
+	return false;
+      }
+
+  return true;
+}
+
+/* Check that alignment value FLAG for -falign-NAME is valid at a given
+   location LOC.  */
+
+static void
+check_alignment_argument (location_t loc, const char *flag, const char *name)
+{
+  auto_vec<unsigned> align_result;
+  parse_and_check_align_values (flag, name, align_result, true, loc);
+}
+
 /* Handle target- and language-independent options.  Return zero to
    generate an "unknown option" message.  Only options that need
    extra handling need to be listed here; if you simply want
@@ -2498,6 +2550,22 @@ common_handle_option (struct gcc_options *opts,
       opts->x_flag_ipa_icf_variables = value;
       break;
 
+    case OPT_falign_loops_:
+      check_alignment_argument (loc, arg, "loops");
+      break;
+
+    case OPT_falign_jumps_:
+      check_alignment_argument (loc, arg, "jumps");
+      break;
+
+    case OPT_falign_labels_:
+      check_alignment_argument (loc, arg, "labels");
+      break;
+
+    case OPT_falign_functions_:
+      check_alignment_argument (loc, arg, "functions");
+      break;
+
     default:
       /* If the flag was handled in a standard way, assume the lack of
 	 processing here is intentional.  */
diff --git a/gcc/opts.h b/gcc/opts.h
index 484fc1c39d9..8a2ca5a06ab 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -439,4 +439,11 @@ extern const char *candidates_list_and_hint (const char *arg, char *&str,
 					     const auto_vec <const char *> &
 					     candidates);
 
+
+extern bool parse_and_check_align_values (const char *flag,
+					  const char *name,
+					  auto_vec<unsigned> &result_values,
+					  bool report_error,
+					  location_t loc);
+
 #endif
diff --git a/gcc/testsuite/gcc.dg/pr84100.c b/gcc/testsuite/gcc.dg/pr84100.c
index 86fbc4f7a3e..676d0c78dea 100644
--- a/gcc/testsuite/gcc.dg/pr84100.c
+++ b/gcc/testsuite/gcc.dg/pr84100.c
@@ -8,7 +8,7 @@ __attribute__((optimize ("align-loops=16", "align-jumps=16",
 			 "align-labels=16", "align-functions=16")))
 void
 foo (void)
-{			/* { dg-bogus "bad option" } */
+{			/* { dg-warning "bad option" } */
   for (int i = 0; i < 1024; ++i)
     bar ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/falign-functions-2.c b/gcc/testsuite/gcc.target/i386/falign-functions-2.c
new file mode 100644
index 00000000000..26d505e3bea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/falign-functions-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64:8" } */
+
+void
+a (void)
+{
+}
+
+#pragma GCC push_options
+#pragma GCC optimize "align-functions=128:100"
+void b (void)
+{
+}
+#pragma GCC pop_options
+
+void
+__attribute__((optimize("-falign-functions=88:88:32")))
+c (void)
+{
+}
+
+void
+d (void)
+{
+}
+
+/* { dg-final { scan-assembler-times ".p2align 6,,7" 2 } } */
+/* { dg-final { scan-assembler-times ".p2align 7,,99" 1 } } */
+/* { dg-final { scan-assembler-times ".p2align 7,,87" 1 } } */
+/* { dg-final { scan-assembler-times ".p2align 5" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/falign-functions.c b/gcc/testsuite/gcc.target/i386/falign-functions.c
new file mode 100644
index 00000000000..27daa1d0e6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/falign-functions.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64:8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}
diff --git a/gcc/toplev.c b/gcc/toplev.c
index b066bcc7229..5d3b46f000e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1200,29 +1200,119 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
-/* Default the align_* variables to 1 if they're still unset, and
-   set up the align_*_log variables.  */
+/* Parse "N[:M][:...]" into struct align_flags A.
+   VALUES contains parsed values (in reverse order), all processed
+   values are popped.  */
+
 static void
-init_alignments (void)
+read_log_maxskip (auto_vec<unsigned> &values, align_flags_tuple *a)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  unsigned n = values.pop ();
+  if (n != 0)
+    a->log = floor_log2 (n * 2 - 1);
+  if (values.is_empty ())
+    a->maxskip = n ? n - 1 : 0;
+  else
+    {
+      unsigned m = values.pop ();
+      if (m > n)
+	m = n;
+      /* -falign-foo=N:M means M-1 max bytes of padding, not M.  */
+      if (m > 0)
+	m--;
+      a->maxskip = m;
+    }
+}
+
+/* Parse "N[:M[:N2[:M2]]]" string FLAG into a pair of struct align_flags.  */
+
+static void
+parse_N_M (const char *flag, align_flags &a, unsigned int min_align_log)
+{
+  if (flag)
+    {
+      static hash_map <nofree_string_hash, align_flags> cache;
+      align_flags *entry = cache.get (flag);
+      if (entry)
+	{
+	  a = *entry;
+	  return;
+	}
+
+      auto_vec<unsigned> result_values;
+      bool r = parse_and_check_align_values (flag, NULL, result_values, false,
+					     UNKNOWN_LOCATION);
+      if (!r)
+	return;
+
+      /* Reverse values for easier manipulation.  */
+      result_values.reverse ();
+
+      read_log_maxskip (result_values, &a.levels[0]);
+      if (!result_values.is_empty ())
+	read_log_maxskip (result_values, &a.levels[1]);
+#ifdef SUBALIGN_LOG
+      else
+	{
+	  /* N2[:M2] is not specified.  This arch has a default for N2.
+	     Before -falign-foo=N:M:N2:M2 was introduced, x86 had a tweak.
+	     -falign-functions=N with N > 8 was adding secondary alignment.
+	     -falign-functions=10 was emitting this before every function:
+			.p2align 4,,9
+			.p2align 3
+	     Now this behavior (and more) can be explicitly requested:
+	     -falign-functions=16:10:8
+	     Retain old behavior if N2 is missing: */
+
+	  int align = 1 << a.levels[0].log;
+	  int subalign = 1 << SUBALIGN_LOG;
+
+	  if (a.levels[0].log > SUBALIGN_LOG
+	      && a.levels[0].maxskip >= subalign - 1)
+	    {
+	      /* Set N2 unless subalign can never have any effect.  */
+	      if (align > a.levels[0].maxskip + 1)
+		a.levels[1].log = SUBALIGN_LOG;
+	    }
+	}
+#endif
+
+      /* Cache seen value.  */
+      cache.put (flag, a);
+    }
+  else
+    {
+      /* Reset values to zero.  */
+      for (unsigned i = 0; i < 2; i++)
+	{
+	  a.levels[i].log = 0;
+	  a.levels[i].maxskip = 0;
+	}
+    }
+
+  if ((unsigned int)a.levels[0].log < min_align_log)
+    {
+      a.levels[0].log = min_align_log;
+      a.levels[0].maxskip = (1 << min_align_log) - 1;
+    }
+}
+
+/* Minimum alignment requirements, if arch has them.  */
+
+unsigned int min_align_loops_log = 0;
+unsigned int min_align_jumps_log = 0;
+unsigned int min_align_labels_log = 0;
+unsigned int min_align_functions_log = 0;
+
+/* Process -falign-foo=N[:M[:N2[:M2]]] options.  */
+
+void
+parse_alignment_opts (void)
+{
+  parse_N_M (str_align_loops, align_loops, min_align_loops_log);
+  parse_N_M (str_align_jumps, align_jumps, min_align_jumps_log);
+  parse_N_M (str_align_labels, align_labels, min_align_labels_log);
+  parse_N_M (str_align_functions, align_functions, min_align_functions_log);
 }
 
 /* Process the options that have been parsed.  */
@@ -1768,9 +1858,6 @@ process_options (void)
 static void
 backend_init_target (void)
 {
-  /* Initialize alignment variables.  */
-  init_alignments ();
-
   /* This depends on stack_pointer_rtx.  */
   init_fake_stack_mems ();
 
diff --git a/gcc/toplev.h b/gcc/toplev.h
index c97375b1ca1..98f3ceea872 100644
--- a/gcc/toplev.h
+++ b/gcc/toplev.h
@@ -93,6 +93,13 @@ extern bool set_src_pwd		       (const char *);
 extern HOST_WIDE_INT get_random_seed (bool);
 extern void set_random_seed (const char *);
 
+extern unsigned int min_align_loops_log;
+extern unsigned int min_align_jumps_log;
+extern unsigned int min_align_labels_log;
+extern unsigned int min_align_functions_log;
+
+extern void parse_alignment_opts (void);
+
 extern void initialize_rtl (void);
 
 #endif /* ! GCC_TOPLEV_H */
diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
index 912fa5f0f02..add3b32ba7b 100644
--- a/gcc/tree-streamer-in.c
+++ b/gcc/tree-streamer-in.c
@@ -531,7 +531,7 @@ streamer_read_tree_bitfields (struct lto_input_block *ib,
     unpack_ts_translation_unit_decl_value_fields (data_in, &bp, expr);
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
-    cl_optimization_stream_in (&bp, TREE_OPTIMIZATION (expr));
+    cl_optimization_stream_in (data_in, &bp, TREE_OPTIMIZATION (expr));
 
   if (CODE_CONTAINS_STRUCT (code, TS_BINFO))
     {
diff --git a/gcc/tree-streamer-out.c b/gcc/tree-streamer-out.c
index 03145b4cf58..f70d8215288 100644
--- a/gcc/tree-streamer-out.c
+++ b/gcc/tree-streamer-out.c
@@ -466,7 +466,7 @@ streamer_write_tree_bitfields (struct output_block *ob, tree expr)
     pack_ts_translation_unit_decl_value_fields (ob, &bp, expr);
 
   if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
-    cl_optimization_stream_out (&bp, TREE_OPTIMIZATION (expr));
+    cl_optimization_stream_out (ob, &bp, TREE_OPTIMIZATION (expr));
 
   if (CODE_CONTAINS_STRUCT (code, TS_BINFO))
     bp_pack_var_len_unsigned (&bp, vec_safe_length (BINFO_BASE_ACCESSES (expr)));
diff --git a/gcc/tree.c b/gcc/tree.c
index e8dc42557d0..966b6bd5e3c 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -11946,30 +11946,18 @@ cl_option_hasher::equal (tree x, tree y)
 {
   const_tree const xt = x;
   const_tree const yt = y;
-  const char *xp;
-  const char *yp;
-  size_t len;
 
   if (TREE_CODE (xt) != TREE_CODE (yt))
     return 0;
 
   if (TREE_CODE (xt) == OPTIMIZATION_NODE)
-    {
-      xp = (const char *)TREE_OPTIMIZATION (xt);
-      yp = (const char *)TREE_OPTIMIZATION (yt);
-      len = sizeof (struct cl_optimization);
-    }
-
+    return cl_optimization_option_eq (TREE_OPTIMIZATION (xt),
+				      TREE_OPTIMIZATION (yt));
   else if (TREE_CODE (xt) == TARGET_OPTION_NODE)
-    {
-      return cl_target_option_eq (TREE_TARGET_OPTION (xt),
-				  TREE_TARGET_OPTION (yt));
-    }
-
+    return cl_target_option_eq (TREE_TARGET_OPTION (xt),
+				TREE_TARGET_OPTION (yt));
   else
     gcc_unreachable ();
-
-  return (memcmp (xp, yp, len) == 0);
 }
 
 /* Build an OPTIMIZATION_NODE based on the options in OPTS.  */
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 6b9f87b203f..62ddf770077 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1809,17 +1809,20 @@ assemble_start_function (tree decl, const char *fnname)
       && optimize_function_for_speed_p (cfun))
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
-      int align_log = align_functions_log;
+      int align_log = align_functions.levels[0].log;
 #endif
-      int max_skip = align_functions - 1;
+      int max_skip = align_functions.levels[0].maxskip;
       if (flag_limit_function_alignment && crtl->max_insn_address > 0
 	  && max_skip >= crtl->max_insn_address)
 	max_skip = crtl->max_insn_address - 1;
 
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
       ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
+      if (max_skip == align_functions.levels[0].maxskip)
+	ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions.levels[1].log,
+				   align_functions.levels[1].maxskip);
 #else
-      ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
+      ASM_OUTPUT_ALIGN (asm_out_file, align_functions.levels[0].log);
 #endif
     }
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2017-04-17 20:02   ` Sandra Loosemore
@ 2017-04-18 18:30     ` Denys Vlasenko
  0 siblings, 0 replies; 26+ messages in thread
From: Denys Vlasenko @ 2017-04-18 18:30 UTC (permalink / raw)
  To: Sandra Loosemore, gcc-patches; +Cc: Andrew Pinski, Uros Bizjak, Bernd Schmidt

On 04/17/2017 09:54 PM, Sandra Loosemore wrote:
>>   @item -falign-functions
>>   @itemx -falign-functions=@var{n}
>> +@itemx -falign-functions=@var{n},@var{m}
>> +@itemx -falign-functions=@var{n},@var{m},@var{n2}
>> +@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
>>   @opindex falign-functions
>>   Align the start of functions to the next power-of-two greater than
>> -@var{n}, skipping up to @var{n} bytes.  For instance,
>> -@option{-falign-functions=32} aligns functions to the next 32-byte
>> -boundary, but @option{-falign-functions=24} aligns to the next
>> -32-byte boundary only if this can be done by skipping 23 bytes or less.
>> +@var{n}, skipping up to @var{m}-1 bytes.  Such alignment ensures that
>> +after branch, at least @var{m} bytes can be fetched by the CPU
>> +without crossing specified alignment boundary.
>
> This last sentence doesn't make much sense to me.  How about something like
>
> This ensures that at least the first @var{m} bytes of the function can be fetched by the CPU without crossing an @var{n}-byte alignment boundary.
>
>> -@option{-fno-align-functions} and @option{-falign-functions=1} are
>> -equivalent and mean that functions are not aligned.
>> +If @var{m} is not specified, it defaults to @var{n}.
>> +Same for @var{m2} and @var{n2}.
>
> You haven't said what m2 and n2 are yet.  The last sentence should be moved to the end of this paragraph instead.
>
>> +The second pair of @var{n2},@var{m2} values allows to have a secondary
>> +alignment: @option{-falign-functions=64,7,32,3} aligns to the next
>> +64-byte boundary if this can be done by skipping 6 bytes or less,
>> +otherwise aligns to the next 32-byte boundary if this can be done
>> +by skipping 2 bytes or less.
>
> Also please
> s/allows to have/allows you to specify/
>
>> @@ -8697,12 +8716,13 @@ skip more bytes than the size of the function.
>>
>>   @item -falign-labels
>>   @itemx -falign-labels=@var{n}
>> +@itemx -falign-labels=@var{n},@var{m}
>> +@itemx -falign-labels=@var{n},@var{m},@var{n2}
>> +@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
>>   @opindex falign-labels
>> -Align all branch targets to a power-of-two boundary, skipping up to
>> -@var{n} bytes like @option{-falign-functions}.  This option can easily
>> -make code slower, because it must insert dummy operations for when the
>> -branch target is reached in the usual flow of the code.
>> +Align all branch targets to a power-of-two boundary.
>>
>> +Parameters of this option are analogous to @option{-falign-functions} option.
>
> s/to @option/to the @option/
>
> Here and for -falign-loops and -falign-jumps too.

Thanks for the review.

I'm sending version 8 which has all of your changes incorporated.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2017-04-17 16:20 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
@ 2017-04-17 20:02   ` Sandra Loosemore
  2017-04-18 18:30     ` Denys Vlasenko
  0 siblings, 1 reply; 26+ messages in thread
From: Sandra Loosemore @ 2017-04-17 20:02 UTC (permalink / raw)
  To: Denys Vlasenko, gcc-patches; +Cc: Andrew Pinski, Uros Bizjak, Bernd Schmidt

On 04/17/2017 09:57 AM, Denys Vlasenko wrote:
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi	(revision 246948)
> +++ gcc/doc/invoke.texi	(working copy)
> @@ -351,9 +351,11 @@ Objective-C and Objective-C++ Dialects}.
>
>   @item Optimization Options
>   @xref{Optimize Options,,Options that Control Optimization}.
> -@gccoptlist{-faggressive-loop-optimizations  -falign-functions[=@var{n}] @gol
> --falign-jumps[=@var{n}] @gol
> --falign-labels[=@var{n}]  -falign-loops[=@var{n}] @gol
> +@gccoptlist{-faggressive-loop-optimizations @gol
> +-falign-functions[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
> +-falign-jumps[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
> +-falign-labels[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
> +-falign-loops[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
>   -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
>   -fauto-inc-dec  -fbranch-probabilities @gol
>   -fbranch-target-load-optimize  -fbranch-target-load-optimize2 @gol
> @@ -8672,19 +8674,36 @@ The @option{-fstrict-overflow} option is enabled a
>
>   @item -falign-functions
>   @itemx -falign-functions=@var{n}
> +@itemx -falign-functions=@var{n},@var{m}
> +@itemx -falign-functions=@var{n},@var{m},@var{n2}
> +@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
>   @opindex falign-functions
>   Align the start of functions to the next power-of-two greater than
> -@var{n}, skipping up to @var{n} bytes.  For instance,
> -@option{-falign-functions=32} aligns functions to the next 32-byte
> -boundary, but @option{-falign-functions=24} aligns to the next
> -32-byte boundary only if this can be done by skipping 23 bytes or less.
> +@var{n}, skipping up to @var{m}-1 bytes.  Such alignment ensures that
> +after branch, at least @var{m} bytes can be fetched by the CPU
> +without crossing specified alignment boundary.

This last sentence doesn't make much sense to me.  How about something like

This ensures that at least the first @var{m} bytes of the function can 
be fetched by the CPU without crossing an @var{n}-byte alignment boundary.

> -@option{-fno-align-functions} and @option{-falign-functions=1} are
> -equivalent and mean that functions are not aligned.
> +If @var{m} is not specified, it defaults to @var{n}.
> +Same for @var{m2} and @var{n2}.

You haven't said what m2 and n2 are yet.  The last sentence should be 
moved to the end of this paragraph instead.

> +The second pair of @var{n2},@var{m2} values allows to have a secondary
> +alignment: @option{-falign-functions=64,7,32,3} aligns to the next
> +64-byte boundary if this can be done by skipping 6 bytes or less,
> +otherwise aligns to the next 32-byte boundary if this can be done
> +by skipping 2 bytes or less.

Also please
s/allows to have/allows you to specify/

> @@ -8697,12 +8716,13 @@ skip more bytes than the size of the function.
>
>   @item -falign-labels
>   @itemx -falign-labels=@var{n}
> +@itemx -falign-labels=@var{n},@var{m}
> +@itemx -falign-labels=@var{n},@var{m},@var{n2}
> +@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
>   @opindex falign-labels
> -Align all branch targets to a power-of-two boundary, skipping up to
> -@var{n} bytes like @option{-falign-functions}.  This option can easily
> -make code slower, because it must insert dummy operations for when the
> -branch target is reached in the usual flow of the code.
> +Align all branch targets to a power-of-two boundary.
>
> +Parameters of this option are analogous to @option{-falign-functions} option.

s/to @option/to the @option/

Here and for -falign-loops and -falign-jumps too.

-Sandra

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2017-04-17 15:57 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 7 Denys Vlasenko
@ 2017-04-17 16:20 ` Denys Vlasenko
  2017-04-17 20:02   ` Sandra Loosemore
  0 siblings, 1 reply; 26+ messages in thread
From: Denys Vlasenko @ 2017-04-17 16:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt

falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte alignment
of functions runs fastest (he tried many other possibilites):
this way, after a call CPU can fetch a lot of insns in the first cacheline fill.

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
        On 05/20/2015 02:47 AM, Linus Torvalds wrote:
        > At the same time, I have to admit that I abhor a 64-byte function
        > alignment, when we have a fair number of functions that are (much)
        > smaller than that.
        >
        > Is there some way to get gcc to take the size of the function into
        > account? Because aligning a 16-byte or 32-byte function on a 64-byte
        > alignment is just criminally nasty and wasteful.

This change makes it possible to align functions to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Example syntax is -falign-functions=64,9: "align to 64 by skipping up to
9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able
to fetch at least 9 bytes of insns".

x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment.
For example, falign-functions=10 was emitting this before every function:
	.p2align 4,,9
	.p2align 3
This tweak was removed by the previous patch. Now it is reinstated
by the logic that if falign-functions=N[,M] is specified and N > 8,
then default value of N2 is 8, not 1. Now this can be suppressed by
falign-functions=N,M,1 - which wasn't possible before.
In general, optional N2,M2 pair can be used to generate any secondary
alignment user wants.

Subalignment for loops/jumps/labels are trickier to fully implement.
The implementation in this patch uses falign-labels subalignment values
for any of these three types of labels - but only if "main" alignment
triggers. With -O2 defaults, this provides a matching behavior on x86:
loops and jumps are aligned (to 16-32 bytes depending on selected CPU)
and subaligned to 8 bytes. Labels are not aligned.

Testing:

Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal parameters) works exactly
like -falign-functions=N.

No change from past behavior:
Tested that "-falign-functions" uses an arch-dependent alignment.
Tested that "-O2" uses an arch-dependent alignment.
Tested that "-O2 -falign-functions=N" uses explicitly given alignment.

2016-09-27  Denys Vlasenko  <dvlasenk@redhat.com>

    * doc/invoke.texi: Update option documentation.
    * common.opt (-falign-functions=): Accept a string instead of integer.
    (-falign-jumps=): Likewise.
    (-falign-labels=): Likewise.
    (-falign-loops=): Likewise.
    * flags.h (struct target_flag_state): Revamp how alignment data is stored:
    for each of four alignment types, store two pairs of log/maxskip values.
    * toplev.c (read_uint): New function.
    (read_log_maxskip): New function.
    (parse_N_M): New function.
    (init_alignments): Rename to parse_alignment_opts, make globally visible.
    Set align_foo[0/1].log/maxskip from
    specified falign-FOO=N[,M[,N[,M]]] options.
    * toplev.h (parse_alignment_opts): Now globally visible.
    (min_align_loops_log): Variable which holds arch override for minimal
    alignment of loops.
    (min_align_jumps_log): Likewise for jumps.
    (min_align_labels_log): Likewise for labels.
    (min_align_functions_log): Likewise for functions.
    * varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN
    macros, first for N,M and second time for N2,M2 from
    falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives.
    * final.c (final_scan_insn): If a label, jump or loop target
    is being aligned, emit a secondary alignment directive.
    * config/i386/i386.c (struct ptt): Change foo_align members from
    integers to strings. Add align_label member. Set it to "0,0,8"
    on the processors which have maxskips > 7 for loops and jumps -
    this preserves existing behaviout of adding 8-byte subalign.
    * config/i386/i386.c (processor_target_table): Likewise.
    * config/aarch64/aarch64-protos.h (struct tune_params):
    Change foo_align members from integers to strings.
    * config/aarch64/aarch64.c (<cpu>_tunings):
    Change foo_align field values from integers to strings.
    * config/arm/arm.c (arm_override_options_after_change_1):
    Fix if() condition to detect that -falign-functions is specified,
    change code which sets arch-default alignment.
    * config/i386/i386.c (ix86_default_align): Likewise.
    * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
    * config/mips/mips.c (mips_set_compression_mode): Likewise.
    * config/alpha/alpha.c (alpha_override_options_after_change): Likewise.
    * config/visium/visium.c (visium_option_override): Likewise.
    * config/sh/sh.c (sh_override_options_after_change): Likewise.
    * config/rx/rx.c (rx_option_override): Likewise.
    * config/rx/rx.h (JUMP_ALIGN): Use new variables to access alignment
    information.
    (LABEL_ALIGN): Likewise.
    (LOOP_ALIGN): Likewise.
    * config/spu/spu.c (spu_sched_init): Call parse_alignment_opts(), then
    use new variables to access alignment information.
    * config/sh/sh.c (sh_override_options_after_change): Likewise.
    * testsuite/gcc.target/i386/falign-functions.c: New file.

Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 246948)
+++ gcc/common.opt	(working copy)
@@ -921,35 +921,35 @@ Common Report Var(flag_aggressive_loop_optimizatio
 Aggressively optimize loops using language constraints.
 
 falign-functions
-Common Report Var(align_functions,0) Optimization UInteger
+Common Report Var(flag_align_functions) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions)
+Common RejectNegative Joined Var(str_align_functions)
 
 flimit-function-alignment
 Common Report Var(flag_limit_function_alignment) Optimization Init(0)
 
 falign-jumps
-Common Report Var(align_jumps,0) Optimization UInteger
+Common Report Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps)
+Common RejectNegative Joined Var(str_align_jumps)
 
 falign-labels
-Common Report Var(align_labels,0) Optimization UInteger
+Common Report Var(flag_align_labels) Optimization
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels)
+Common RejectNegative Joined Var(str_align_labels)
 
 falign-loops
-Common Report Var(align_loops,0) Optimization UInteger
+Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops)
+Common RejectNegative Joined Var(str_align_loops)
 
 fargument-alias
 Common Ignore
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	(revision 246948)
+++ gcc/config/aarch64/aarch64-protos.h	(working copy)
@@ -214,9 +214,9 @@ struct tune_params
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
-  int function_align;
-  int jump_align;
-  int loop_align;
+  const char *function_align;
+  const char *jump_align;
+  const char *loop_align;
   int int_reassoc_width;
   int fp_reassoc_width;
   int vec_reassoc_width;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 246948)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -537,9 +537,9 @@ static const struct tune_params generic_tunings =
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -563,9 +563,9 @@ static const struct tune_params cortexa35_tunings
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -589,9 +589,9 @@ static const struct tune_params cortexa53_tunings
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -615,9 +615,9 @@ static const struct tune_params cortexa57_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -641,9 +641,9 @@ static const struct tune_params cortexa72_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -667,9 +667,9 @@ static const struct tune_params cortexa73_tunings
   2, /* issue_rate.  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -692,9 +692,9 @@ static const struct tune_params exynosm1_tunings =
   4,	/* memmov_cost  */
   3,	/* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  4,	/* function_align.  */
-  4,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -717,9 +717,9 @@ static const struct tune_params thunderx_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -742,9 +742,9 @@ static const struct tune_params xgene1_tunings =
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -768,9 +768,9 @@ static const struct tune_params qdf24xx_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -793,9 +793,9 @@ static const struct tune_params thunderx2t99_tunin
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
   (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   3,	/* int_reassoc_width.  */
   2,	/* fp_reassoc_width.  */
   2,	/* vec_reassoc_width.  */
Index: gcc/config/alpha/alpha.c
===================================================================
--- gcc/config/alpha/alpha.c	(revision 246948)
+++ gcc/config/alpha/alpha.c	(working copy)
@@ -609,13 +609,13 @@ alpha_override_options_after_change (void)
   /* ??? Kludge these by not doing anything if we don't optimize.  */
   if (optimize > 0)
     {
-      if (align_loops <= 0)
-	align_loops = 16;
-      if (align_jumps <= 0)
-	align_jumps = 16;
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = "16";
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = "16";
     }
-  if (align_functions <= 0)
-    align_functions = 16;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = "16";
 }
 \f
 /* Returns 1 if VALUE is a mask that contains full bytes of zero or ones.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 246948)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2902,9 +2902,10 @@ static GTY(()) tree init_optimize;
 static void
 arm_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (opts->x_align_functions <= 0)
-    opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags)
-      && opts->x_optimize_size ? 2 : 4;
+  /* -falign-functions without argument: supply one */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags)
+      && opts->x_optimize_size ? "2" : "4";
 }
 
 /* Implement targetm.override_options_after_change.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 246948)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2636,45 +2636,47 @@ struct ptt
 {
   const char *const name;			/* processor name  */
   const struct processor_costs *cost;		/* Processor costs */
-  const int align_loop;				/* Default alignments.  */
-  const int align_loop_max_skip;
-  const int align_jump;
-  const int align_jump_max_skip;
-  const int align_func;
+  const char *const align_loop;			/* Default alignments.  */
+  const char *const align_jump;
+  const char *const align_label;
+  const char *const align_func;
 };
 
 /* This table must be in sync with enum processor_type in i386.h.  */ 
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {"generic", &generic_cost, 16, 10, 16, 10, 16},
-  {"i386", &i386_cost, 4, 3, 4, 3, 4},
-  {"i486", &i486_cost, 16, 15, 16, 15, 16},
-  {"pentium", &pentium_cost, 16, 7, 16, 7, 16},
-  {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16},
-  {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16},
-  {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0},
-  {"nocona", &nocona_cost, 0, 0, 0, 0, 0},
-  {"core2", &core_cost, 16, 10, 16, 10, 16},
-  {"nehalem", &core_cost, 16, 10, 16, 10, 16},
-  {"sandybridge", &core_cost, 16, 10, 16, 10, 16},
-  {"haswell", &core_cost, 16, 10, 16, 10, 16},
-  {"bonnell", &atom_cost, 16, 15, 16, 7, 16},
-  {"silvermont", &slm_cost, 16, 15, 16, 7, 16},
-  {"knl", &slm_cost, 16, 15, 16, 7, 16},
-  {"skylake-avx512", &core_cost, 16, 10, 16, 10, 16},
-  {"intel", &intel_cost, 16, 15, 16, 7, 16},
-  {"geode", &geode_cost, 0, 0, 0, 0, 0},
-  {"k6", &k6_cost, 32, 7, 32, 7, 32},
-  {"athlon", &athlon_cost, 16, 7, 16, 7, 16},
-  {"k8", &k8_cost, 16, 7, 16, 7, 16},
-  {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32},
-  {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11},
-  {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11},
-  {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11},
-  {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11},
-  {"btver1", &btver1_cost, 16, 10, 16, 7, 11},
-  {"btver2", &btver2_cost, 16, 10, 16, 7, 11},
-  {"znver1", &znver1_cost, 16, 15, 16, 15, 16}
+/* The "0,0,8" label alignment specified for some processors generates
+   secondary 8-byte alignment only for those label/jump/loop targets
+   which have primary alignment.  */
+  {"generic",    &generic_cost,   "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"i386",       &i386_cost,      "4",       "4",       NULL,    "4" },
+  {"i486",       &i486_cost,      "16,16,8", "16,16,8", "0,0,8", "16"},
+  {"pentium",    &pentium_cost,   "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"lakemont",   &lakemont_cost,  "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"pentiumpro", &pentiumpro_cost,"16,16,8", "16,11,8", "0,0,8", "16"},
+  {"pentium4",   &pentium4_cost,  NULL,      NULL,      NULL,    NULL},
+  {"nocona",     &nocona_cost,    NULL,      NULL,      NULL,    NULL},
+  {"core2",      &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"nehalem",    &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"sandybridge",&core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"haswell",    &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"bonnell",    &atom_cost,      "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"silvermont", &slm_cost,       "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"knl",        &slm_cost,       "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"skylake-avx512", &core_cost,  "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"intel",      &intel_cost,     "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"geode",      &geode_cost,     NULL,      NULL,      NULL,    NULL},
+  {"k6",         &k6_cost,        "32,8,8",  "32,8,8",  "0,0,8", "32"},
+  {"athlon",     &athlon_cost,    "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"k8",         &k8_cost,        "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"amdfam10",   &amdfam10_cost,  "32,25,8", "32,8,8",  "0,0,8", "32"},
+  {"bdver1",     &bdver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver2",     &bdver2_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver3",     &bdver3_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver4",     &bdver4_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"btver1",     &btver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"btver2",     &btver2_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"znver1",     &znver1_cost,    "16,16,8", "16,16,8", "0,0,8", "16"}
 };
 \f
 static unsigned int
@@ -4856,20 +4858,23 @@ set_ix86_tune_features (enum processor_type ix86_t
 static void
 ix86_default_align (struct gcc_options *opts)
 {
-  if (opts->x_align_loops == 0)
+  /* -falign-foo without argument: supply one */
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
     {
-      opts->x_align_loops = processor_target_table[ix86_tune].align_loop;
-      align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
+      opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop;
     }
-  if (opts->x_align_jumps == 0)
+  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
     {
-      opts->x_align_jumps = processor_target_table[ix86_tune].align_jump;
-      align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
+      opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump;
     }
-  if (opts->x_align_functions == 0)
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
     {
-      opts->x_align_functions = processor_target_table[ix86_tune].align_func;
+      opts->x_str_align_labels = processor_target_table[ix86_tune].align_label;
     }
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    {
+      opts->x_str_align_functions = processor_target_table[ix86_tune].align_func;
+    }
 }
 
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 246948)
+++ gcc/config/mips/mips.c	(working copy)
@@ -488,9 +488,9 @@ unsigned int mips_base_compression_flags;
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
 static int mips_base_move_loop_invariants; /* flag_move_loop_invariants */
-static int mips_base_align_loops; /* align_loops */
-static int mips_base_align_jumps; /* align_jumps */
-static int mips_base_align_functions; /* align_functions */
+static const char *mips_base_align_loops; /* align_loops */
+static const char *mips_base_align_jumps; /* align_jumps */
+static const char *mips_base_align_functions; /* align_functions */
 
 /* Index [M][R] is true if register R is allowed to hold a value of mode M.  */
 bool mips_hard_regno_mode_ok[(int) MAX_MACHINE_MODE][FIRST_PSEUDO_REGISTER];
@@ -19453,12 +19453,12 @@ mips_set_compression_mode (unsigned int compressio
       /* Provide default values for align_* for 64-bit targets.  */
       if (TARGET_64BIT)
 	{
-	  if (align_loops == 0)
-	    align_loops = 8;
-	  if (align_jumps == 0)
-	    align_jumps = 8;
-	  if (align_functions == 0)
-	    align_functions = 8;
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "8";
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "8";
+	  if (flag_align_functions && !str_align_functions)
+	    str_align_functions = "8";
 	}
 
       targetm.min_anchor_offset = -32768;
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 246948)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5218,29 +5218,25 @@ rs6000_option_override_internal (bool global_init_
 	  if (rs6000_cpu == PROCESSOR_TITAN
 	      || rs6000_cpu == PROCESSOR_CELL)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 8;
-	      if (align_jumps <= 0)
-		align_jumps = 8;
-	      if (align_loops <= 0)
-		align_loops = 8;
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "8";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "8";
+	      if (flag_align_loops && !str_align_loops)
+		str_align_loops = "8";
 	    }
 	  if (rs6000_align_branch_targets)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 16;
-	      if (align_jumps <= 0)
-		align_jumps = 16;
-	      if (align_loops <= 0)
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "16";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "16";
+	      if (flag_align_loops && !str_align_loops)
 		{
 		  can_override_loop_align = 1;
-		  align_loops = 16;
+		  str_align_loops = "16";
 		}
 	    }
-	  if (align_jumps_max_skip <= 0)
-	    align_jumps_max_skip = 15;
-	  if (align_loops_max_skip <= 0)
-	    align_loops_max_skip = 15;
 	}
 
       /* Arrange to save and restore machine status around nested functions.  */
Index: gcc/config/rx/rx.c
===================================================================
--- gcc/config/rx/rx.c	(revision 246948)
+++ gcc/config/rx/rx.c	(working copy)
@@ -2820,12 +2820,15 @@ rx_option_override (void)
   rx_override_options_after_change ();
 
   /* These values are bytes, not log.  */
-  if (align_jumps == 0 && ! optimize_size)
-    align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_loops == 0 && ! optimize_size)
-    align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_labels == 0 && ! optimize_size)
-    align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
+  if (! optimize_size)
+    {
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_labels && !str_align_labels)
+	str_align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+    }
 }
 
 \f
Index: gcc/config/rx/rx.h
===================================================================
--- gcc/config/rx/rx.h	(revision 246948)
+++ gcc/config/rx/rx.h	(working copy)
@@ -432,9 +432,9 @@ typedef unsigned int CUMULATIVE_ARGS;
 /* Compute the alignment needed for label X in various situations.
    If the user has specified an alignment then honour that, otherwise
    use rx_align_for_label.  */
-#define JUMP_ALIGN(x)				(align_jumps > 1 ? align_jumps_log : rx_align_for_label (x, 0))
-#define LABEL_ALIGN(x)				(align_labels > 1 ? align_labels_log : rx_align_for_label (x, 3))
-#define LOOP_ALIGN(x)				(align_loops > 1 ? align_loops_log : rx_align_for_label (x, 2))
+#define JUMP_ALIGN(x)				(align_jumps_log > 0 ? align_jumps_log : rx_align_for_label (x, 0))
+#define LABEL_ALIGN(x)				(align_labels_log > 0 ? align_labels_log : rx_align_for_label (x, 3))
+#define LOOP_ALIGN(x)				(align_loops_log > 0 ? align_loops_log : rx_align_for_label (x, 2))
 #define LABEL_ALIGN_AFTER_BARRIER(x)		rx_align_for_label (x, 0)
 
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(STREAM, LOG, MAX_SKIP)	\
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 246948)
+++ gcc/config/sh/sh.c	(working copy)
@@ -984,16 +984,16 @@ sh_override_options_after_change (void)
       Aligning all jumps increases the code size, even if it might
       result in slightly faster code.  Thus, it is set to the smallest 
       alignment possible if not specified by the user.  */
-  if (align_loops == 0)
-    align_loops = optimize_size ? 2 : 4;
+  if (flag_align_loops && !str_align_loops)
+    str_align_loops = optimize_size ? "2" : "4";
 
-  if (align_jumps == 0)
-    align_jumps = 2;
-  else if (align_jumps < 2)
-    align_jumps = 2;
+  if (flag_align_jumps && !str_align_jumps)
+    str_align_jumps = "2";
+  else
+    min_align_jumps_log = 1;
 
-  if (align_functions == 0)
-    align_functions = optimize_size ? 2 : 4;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = optimize_size ? "2" : "4";
 
   /* The linker relaxation code breaks when a function contains
      alignments that are larger than that at the start of a
@@ -1000,13 +1000,13 @@ sh_override_options_after_change (void)
      compilation unit.  */
   if (TARGET_RELAX)
     {
-      int min_align = align_loops > align_jumps ? align_loops : align_jumps;
+      parse_alignment_opts ();
+      min_align_functions_log = align_loops_log > align_jumps_log ?
+				align_loops_log : align_jumps_log;
 
       /* Also take possible .long constants / mova tables into account.	*/
-      if (min_align < 4)
-	min_align = 4;
-      if (align_functions < min_align)
-	align_functions = min_align;
+      if (min_align_functions_log < 2)
+	min_align_functions_log = 2;
     }
 }
 \f
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	(revision 246948)
+++ gcc/config/spu/spu.c	(working copy)
@@ -2767,7 +2767,8 @@ static void
 spu_sched_init (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 		int max_ready ATTRIBUTE_UNUSED)
 {
-  if (align_labels > 4 || align_loops > 4 || align_jumps > 4)
+  parse_alignment_opts ();
+  if (align_labels_log > 2 || align_loops_log > 2 || align_jumps_log > 2)
     {
       /* When any block might be at least 8-byte aligned, assume they
          will all be at least 8-byte aligned to make sure dual issue
Index: gcc/config/visium/visium.c
===================================================================
--- gcc/config/visium/visium.c	(revision 246948)
+++ gcc/config/visium/visium.c	(working copy)
@@ -413,12 +413,12 @@ visium_option_override (void)
 
   /* Align functions on 256-byte (32-quadword) for GR5 and 64-byte (8-quadword)
      boundaries for GR6 so they start a new burst mode window.  */
-  if (align_functions == 0)
+  if (flag_align_functions && !str_align_functions)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_functions = 64;
+	str_align_functions = "64";
       else
-	align_functions = 256;
+	str_align_functions = "256";
 
       /* Allow the size of compilation units to double because of inlining.
 	 In practice the global size of the object code is hardly affected
@@ -429,26 +429,25 @@ visium_option_override (void)
     }
 
   /* Likewise for loops.  */
-  if (align_loops == 0)
+  if (flag_align_loops && !str_align_loops)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_loops = 64;
+	str_align_loops = "64";
       else
 	{
-	  align_loops = 256;
 	  /* But not if they are too far away from a 256-byte boundary.  */
-	  align_loops_max_skip = 31;
+	  str_align_loops = "256,32";
 	}
     }
 
   /* Align all jumps on quadword boundaries for the burst mode, and even
      on 8-quadword boundaries for GR6 so they start a new window.  */
-  if (align_jumps == 0)
+  if (flag_align_jumps && !str_align_jumps)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_jumps = 64;
+	str_align_jumps = "64";
       else
-	align_jumps = 8;
+	str_align_jumps = "8";
     }
 
   /* We register a machine-specific pass.  This pass must be scheduled as
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 246948)
+++ gcc/doc/invoke.texi	(working copy)
@@ -351,9 +351,11 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations  -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}]  -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-jumps[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-labels[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-loops[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fbranch-target-load-optimize  -fbranch-target-load-optimize2 @gol
@@ -8672,19 +8674,36 @@ The @option{-fstrict-overflow} option is enabled a
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
+@itemx -falign-functions=@var{n},@var{m},@var{n2}
+@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-functions
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  Such alignment ensures that
+after branch, at least @var{m} bytes can be fetched by the CPU
+without crossing specified alignment boundary.
 
-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
+Same for @var{m2} and @var{n2}.
 
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32,7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.
+
+The second pair of @var{n2},@var{m2} values allows to have a secondary
+alignment: @option{-falign-functions=64,7,32,3} aligns to the next
+64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.
+
 Some assemblers only support this flag when @var{n} is a power of two;
 in that case, it is rounded up.
 
+@option{-fno-align-functions} and @option{-falign-functions=1} are
+equivalent and mean that functions are not aligned.
+
 If @var{n} is not specified or is zero, use a machine-dependent default.
 
 Enabled at levels @option{-O2}, @option{-O3}.
@@ -8697,12 +8716,13 @@ skip more bytes than the size of the function.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
+@itemx -falign-labels=@var{n},@var{m},@var{n2}
+@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-labels} and @option{-falign-labels=1} are
 equivalent and mean that labels are not aligned.
 
@@ -8716,12 +8736,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n},@var{m}
+@itemx -falign-loops=@var{n},@var{m},@var{n2}
+@itemx -falign-loops=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
-like @option{-falign-functions}.  If the loops are
-executed many times, this makes up for any execution of the dummy
-operations.
+Align loops to a power-of-two boundary.  If the loops are executed
+many times, this makes up for any execution of the dummy padding
+instructions.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-loops} and @option{-falign-loops=1} are
 equivalent and mean that loops are not aligned.
 
@@ -8731,12 +8754,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n},@var{m}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-jumps
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
-bytes like @option{-falign-functions}.  In this case, no dummy operations
-need be executed.
+where the targets can only be reached by jumping.  In this case,
+no dummy operations need be executed.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-jumps} and @option{-falign-jumps=1} are
 equivalent and mean that loops are not aligned.
 
Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 246948)
+++ gcc/final.c	(working copy)
@@ -2429,6 +2429,12 @@ final_scan_insn (rtx_insn *insn, FILE *file, int o
 	    {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
 	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip);
+	      /* Above, we don't know whether a label, jump or loop
+		 alignment was used. Conservatively apply
+		 label subalignment, not jump or loop
+		 subalignment (they are almost always larger).  */
+	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align_labels[1].log,
+					 align_labels[1].maxskip);
 #else
 #ifdef ASM_OUTPUT_ALIGN_WITH_NOP
               ASM_OUTPUT_ALIGN_WITH_NOP (file, align);
Index: gcc/flags.h
===================================================================
--- gcc/flags.h	(revision 246948)
+++ gcc/flags.h	(working copy)
@@ -43,19 +43,22 @@ extern bool final_insns_dump_p;
 /* Other basic status info about current function.  */
 
 /* Target-dependent global state.  */
-struct target_flag_state {
+struct align_flags {
   /* Values of the -falign-* flags: how much to align labels in code.
-     0 means `use default', 1 means `don't align'.
-     For each variable, there is an _log variant which is the power
-     of two not less than the variable, for .align output.  */
-  int x_align_loops_log;
-  int x_align_loops_max_skip;
-  int x_align_jumps_log;
-  int x_align_jumps_max_skip;
-  int x_align_labels_log;
-  int x_align_labels_max_skip;
-  int x_align_functions_log;
+     log is "align to 2^log" (so 0 means no alignment).
+     maxskip is the maximum allowed amount of padding to insert. */
+  int log;
+  int maxskip;
+};
 
+struct target_flag_state {
+  /* Each falign-foo can generate up to two levels of alignment:
+     -falign-foo=N,M[,N2,M2] */
+  struct align_flags x_align_loops[2];
+  struct align_flags x_align_jumps[2];
+  struct align_flags x_align_labels[2];
+  struct align_flags x_align_functions[2];
+
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
 };
@@ -67,20 +70,21 @@ extern struct target_flag_state *this_target_flag_
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops_log \
-  (this_target_flag_state->x_align_loops_log)
-#define align_loops_max_skip \
-  (this_target_flag_state->x_align_loops_max_skip)
-#define align_jumps_log \
-  (this_target_flag_state->x_align_jumps_log)
-#define align_jumps_max_skip \
-  (this_target_flag_state->x_align_jumps_max_skip)
-#define align_labels_log \
-  (this_target_flag_state->x_align_labels_log)
-#define align_labels_max_skip \
-  (this_target_flag_state->x_align_labels_max_skip)
-#define align_functions_log \
-  (this_target_flag_state->x_align_functions_log)
+#define align_loops              (this_target_flag_state->x_align_loops)
+#define align_jumps              (this_target_flag_state->x_align_jumps)
+#define align_labels             (this_target_flag_state->x_align_labels)
+#define align_functions          (this_target_flag_state->x_align_functions)
+#define align_loops_log          (align_loops[0].log)
+#define align_jumps_log          (align_jumps[0].log)
+#define align_labels_log         (align_labels[0].log)
+#define align_functions_log      (align_functions[0].log)
+#define align_loops_max_skip     (align_loops[0].maxskip)
+#define align_jumps_max_skip     (align_jumps[0].maxskip)
+#define align_labels_max_skip    (align_labels[0].maxskip)
+#define align_functions_max_skip (align_functions[0].maxskip)
+/* String representaions of the above options are available in
+   const char *str_align_foo. NULL if not set. */
+
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
Index: gcc/testsuite/gcc.target/i386/falign-functions.c
===================================================================
--- gcc/testsuite/gcc.target/i386/falign-functions.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/falign-functions.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64,8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	(revision 246948)
+++ gcc/toplev.c	(working copy)
@@ -1177,31 +1177,111 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
-/* Default the align_* variables to 1 if they're still unset, and
-   set up the align_*_log variables.  */
+/* Read a decimal number from string FLAG, up to end of line or comma.
+   Emit error message if number ends with any other character.
+   Return pointer past comma, or NULL if end of line.  */
+static const char *
+read_uint (const char *flag, const char *name, int *np)
+{
+  const char *flag_start = flag;
+  int n = 0;
+  char c;
+
+  while ((c = *flag++) >= '0' && c <= '9')
+    n = n*10 + (c-'0');
+  *np = n & 0x3fffffff; /* avoid accidentally negative numbers */
+  if (c == '\0')
+    return NULL;
+  if (c == ',')
+    return flag;
+
+  error_at (UNKNOWN_LOCATION, "-falign-%s parameter is bad at '%s'",
+            name, flag_start);
+  return NULL;
+}
+
+/* Parse "N[,M][,...]" string FLAG into struct align_flags A.
+   Return pointer past second comma, or NULL if end of line.  */
+static const char *
+read_log_maxskip (const char *flag, const char *name, struct align_flags *a)
+{
+  int n, m;
+  flag = read_uint (flag, name, &a->log);
+  n = a->log;
+  if (n != 0)
+    a->log = floor_log2 (n * 2 - 1);
+  if (!flag)
+    {
+      a->maxskip = n ? n - 1 : 0;
+      return flag;
+    }
+  flag = read_uint (flag, name, &a->maxskip);
+  m = a->maxskip;
+  if (m > n) m = n;
+  if (m > 0) m--; /* -falign-foo=N,M means M-1 max bytes of padding, not M */
+  a->maxskip = m;
+  return flag;
+}
+
+/* Parse "N[,M[,N2[,M2]]]" string FLAG into a pair of struct align_flags.  */
 static void
-init_alignments (void)
+parse_N_M (const char *flag, const char *name, struct align_flags a[2],
+	   unsigned int min_align_log)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  if (flag)
+    {
+      flag = read_log_maxskip (flag, name, &a[0]);
+      if (flag)
+	flag = read_log_maxskip (flag, name, &a[1]);
+#ifdef SUBALIGN_LOG
+      else
+	{
+	  /* N2[,M2] is not specified. This arch has a default for N2.
+	     Before -falign-foo=N,M,N2,M2 was introduced, x86 had a tweak.
+	     -falign-functions=N with N > 8 was adding secondary alignment.
+	     -falign-functions=10 was emitting this before every function:
+			.p2align 4,,9
+			.p2align 3
+	     Now this behavior (and more) can be explicitly requested:
+	     -falign-functions=16,10,8
+	     Retain old behavior if N2 is missing: */
+
+	  int align = 1 << a[0].log;
+	  int subalign = 1 << SUBALIGN_LOG;
+
+	  if (a[0].log > SUBALIGN_LOG && a[0].maxskip >= subalign - 1)
+	    {
+	      /* Set N2 unless subalign can never have any effect */
+	      if (align > a[0].maxskip + 1)
+		a[1].log = SUBALIGN_LOG;
+	    }
+	}
+#endif
+    }
+  if ((unsigned int)a[0].log < min_align_log)
+    {
+      a[0].log = min_align_log;
+      a[0].maxskip = (1 << min_align_log) - 1;
+    }
 }
 
+/* Minimum alignment requirements, if arch has them.  */
+unsigned int min_align_loops_log = 0;
+unsigned int min_align_jumps_log = 0;
+unsigned int min_align_labels_log = 0;
+unsigned int min_align_functions_log = 0;
+
+/* Process -falign-foo=N[,M[,N2[,M2]]] options.  */
+void
+parse_alignment_opts (void)
+{
+  parse_N_M (str_align_loops, "loops", align_loops, min_align_loops_log);
+  parse_N_M (str_align_jumps, "jumps", align_jumps, min_align_jumps_log);
+  parse_N_M (str_align_labels, "labels", align_labels, min_align_labels_log);
+  parse_N_M (str_align_functions, "functions", align_functions,
+	     min_align_functions_log);
+}
+
 /* Process the options that have been parsed.  */
 static void
 process_options (void)
@@ -1640,7 +1720,7 @@ static void
 backend_init_target (void)
 {
   /* Initialize alignment variables.  */
-  init_alignments ();
+  parse_alignment_opts ();
 
   /* This depends on stack_pointer_rtx.  */
   init_fake_stack_mems ();
Index: gcc/toplev.h
===================================================================
--- gcc/toplev.h	(revision 246948)
+++ gcc/toplev.h	(working copy)
@@ -93,6 +93,13 @@ extern bool set_src_pwd		       (const char *);
 extern HOST_WIDE_INT get_random_seed (bool);
 extern const char *set_random_seed (const char *);
 
+extern unsigned int min_align_loops_log;
+extern unsigned int min_align_jumps_log;
+extern unsigned int min_align_labels_log;
+extern unsigned int min_align_functions_log;
+
+extern void parse_alignment_opts (void);
+
 extern void initialize_rtl (void);
 
 #endif /* ! GCC_TOPLEV_H */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 246948)
+++ gcc/varasm.c	(working copy)
@@ -1792,9 +1792,9 @@ assemble_start_function (tree decl, const char *fn
       && optimize_function_for_speed_p (cfun))
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
-      int align_log = align_functions_log;
+      int align_log = align_functions[0].log;
 #endif
-      int max_skip = align_functions - 1;
+      int max_skip = align_functions[0].maxskip;
       if (flag_limit_function_alignment && crtl->max_insn_address > 0
 	  && max_skip >= crtl->max_insn_address)
 	max_skip = crtl->max_insn_address - 1;
@@ -1801,8 +1801,11 @@ assemble_start_function (tree decl, const char *fn
 
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
       ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
+      if (max_skip == align_functions[0].maxskip)
+        ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions[1].log,
+				   align_functions[1].maxskip);
 #else
-      ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
+      ASM_OUTPUT_ALIGN (asm_out_file, align_functions[0].log);
 #endif
     }
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2016-10-12 20:53 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 4 Denys Vlasenko
@ 2016-10-12 20:53 ` Denys Vlasenko
  0 siblings, 0 replies; 26+ messages in thread
From: Denys Vlasenko @ 2016-10-12 20:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt

falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte alignment
of functions runs fastest (he tried many other possibilites):
this way, after a call CPU can fetch a lot of insns in the first cacheline fill.

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
        On 05/20/2015 02:47 AM, Linus Torvalds wrote:
        > At the same time, I have to admit that I abhor a 64-byte function
        > alignment, when we have a fair number of functions that are (much)
        > smaller than that.
        >
        > Is there some way to get gcc to take the size of the function into
        > account? Because aligning a 16-byte or 32-byte function on a 64-byte
        > alignment is just criminally nasty and wasteful.

This change makes it possible to align functions to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Example syntax is -falign-functions=64,9: "align to 64 by skipping up to
9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able
to fetch at least 9 bytes of insns".

x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment.
For example, falign-functions=10 was emitting this before every function:
	.p2align 4,,9
	.p2align 3
This tweak was removed by the previous patch. Now it is reinstated
by the logic that if falign-functions=N[,M] is specified and N > 8,
then default value of N2 is 8, not 1. Now this can be suppressed by
falign-functions=N,M,1 - which wasn't possible before.
In general, optional N2,M2 pair can be used to generate any secondary
alignment user wants.

Subalignment for loops/jumps/labels are trickier to fully implement.
The implementation in this patch uses falign-labels subalignment values
for any of these three types of labels - but only if "main" alignment
triggers. With -O2 defaults, this provides a matching behavior on x86:
loops and jumps are aligned (to 16-32 bytes depending on selected CPU)
and subaligned to 8 bytes. Labels are not aligned.

Testing:

Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal parameters) works exactly
like -falign-functions=N.

No change from past behavior:
Tested that "-falign-functions" uses an arch-dependent alignment.
Tested that "-O2" uses an arch-dependent alignment.
Tested that "-O2 -falign-functions=N" uses explicitly given alignment.

2016-09-27  Denys Vlasenko  <dvlasenk@redhat.com>

    * doc/invoke.texi: Update option documentation.
    * common.opt (-falign-functions=): Accept a string instead of integer.
    (-falign-jumps=): Likewise.
    (-falign-labels=): Likewise.
    (-falign-loops=): Likewise.
    * flags.h (struct target_flag_state): Revamp how alignment data is stored:
    for each of four alignment types, store two pairs of log/maxskip values.
    * toplev.c (read_uint): New function.
    (read_log_maxskip): New function.
    (parse_N_M): New function.
    (init_alignments): Rename to parse_alignment_opts, make globally visible.
    Set align_foo[0/1].log/maxskip from
    specified falign-FOO=N[,M[,N[,M]]] options.
    * toplev.h (parse_alignment_opts): Now globally visible.
    (min_align_loops_log): Variable which holds arch override for minimal
    alignment of loops.
    (min_align_jumps_log): Likewise for jumps.
    (min_align_labels_log): Likewise for labels.
    (min_align_functions_log): Likewise for functions.
    * varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN
    macros, first for N,M and second time for N2,M2 from
    falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives.
    * final.c (final_scan_insn): If a label, jump or loop target
    is being aligned, emit a secondary alignment directive.
    * config/i386/i386.c (struct ptt): Change foo_align members from
    integers to strings. Add align_label member. Set it to "0,0,8"
    on the processors which have maxskips > 7 for loops and jumps -
    this preserves existing behaviout of adding 8-byte subalign.
    * config/i386/i386.c (processor_target_table): Likewise.
    * config/aarch64/aarch64-protos.h (struct tune_params):
    Change foo_align members from integers to strings.
    * config/aarch64/aarch64.c (<cpu>_tunings):
    Change foo_align field values from integers to strings.
    * config/arm/arm.c (arm_override_options_after_change_1):
    Fix if() condition to detect that -falign-functions is specified,
    change code which sets arch-default alignment.
    * config/i386/i386.c (ix86_default_align): Likewise.
    * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
    * config/mips/mips.c (mips_set_compression_mode): Likewise.
    * config/alpha/alpha.c (alpha_override_options_after_change): Likewise.
    * config/visium/visium.c (visium_option_override): Likewise.
    * config/sh/sh.c (sh_override_options_after_change): Likewise.
    * config/rx/rx.c (rx_option_override): Likewise.
    * config/rx/rx.h (JUMP_ALIGN): Use new variables to access alignment
    information.
    (LABEL_ALIGN): Likewise.
    (LOOP_ALIGN): Likewise.
    * config/spu/spu.c (spu_sched_init): Call parse_alignment_opts(), then
    use new variables to access alignment information.
    * config/sh/sh.c (sh_override_options_after_change): Likewise.
    * testsuite/gcc.target/i386/falign-functions.c: New file.

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 240663)
+++ gcc/doc/invoke.texi	(working copy)
@@ -339,9 +339,11 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}] -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-jumps[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-labels[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-loops[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
 -fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec -fbranch-probabilities @gol
 -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol
@@ -8231,19 +8233,36 @@ The @option{-fstrict-overflow} option is enabled a
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
+@itemx -falign-functions=@var{n},@var{m},@var{n2}
+@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-functions
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  Such alignment ensures that
+after branch, at least @var{m} bytes can be fetched by the CPU
+without crossing specified alignment boundary.
 
-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
+Same for @var{m2} and @var{n2}.
 
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32,7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.
+
+The second pair of @var{n2},@var{m2} values allows to have a secondary
+alignment: @option{-falign-functions=64,7,32,3} aligns to the next
+64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.
+
 Some assemblers only support this flag when @var{n} is a power of two;
 in that case, it is rounded up.
 
+@option{-fno-align-functions} and @option{-falign-functions=1} are
+equivalent and mean that functions are not aligned.
+
 If @var{n} is not specified or is zero, use a machine-dependent default.
 
 Enabled at levels @option{-O2}, @option{-O3}.
@@ -8250,12 +8269,13 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
+@itemx -falign-labels=@var{n},@var{m},@var{n2}
+@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-labels} and @option{-falign-labels=1} are
 equivalent and mean that labels are not aligned.
 
@@ -8269,12 +8289,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n},@var{m}
+@itemx -falign-loops=@var{n},@var{m},@var{n2}
+@itemx -falign-loops=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
-like @option{-falign-functions}.  If the loops are
-executed many times, this makes up for any execution of the dummy
-operations.
+Align loops to a power-of-two boundary.  If the loops are executed
+many times, this makes up for any execution of the dummy padding
+instructions.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-loops} and @option{-falign-loops=1} are
 equivalent and mean that loops are not aligned.
 
@@ -8284,12 +8307,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n},@var{m}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-jumps
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
-bytes like @option{-falign-functions}.  In this case, no dummy operations
-need be executed.
+where the targets can only be reached by jumping.  In this case,
+no dummy operations need be executed.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-jumps} and @option{-falign-jumps=1} are
 equivalent and mean that loops are not aligned.
 
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 240663)
+++ gcc/common.opt	(working copy)
@@ -900,32 +900,32 @@ Common Report Var(flag_aggressive_loop_optimizatio
 Aggressively optimize loops using language constraints.
 
 falign-functions
-Common Report Var(align_functions,0) Optimization UInteger
+Common Report Var(flag_align_functions) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions)
+Common RejectNegative Joined Var(str_align_functions)
 
 falign-jumps
-Common Report Var(align_jumps,0) Optimization UInteger
+Common Report Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps)
+Common RejectNegative Joined Var(str_align_jumps)
 
 falign-labels
-Common Report Var(align_labels,0) Optimization UInteger
+Common Report Var(flag_align_labels) Optimization
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels)
+Common RejectNegative Joined Var(str_align_labels)
 
 falign-loops
-Common Report Var(align_loops,0) Optimization UInteger
+Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops)
+Common RejectNegative Joined Var(str_align_loops)
 
 fargument-alias
 Common Ignore
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	(revision 240663)
+++ gcc/config/aarch64/aarch64-protos.h	(working copy)
@@ -208,9 +208,9 @@ struct tune_params
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
-  int function_align;
-  int jump_align;
-  int loop_align;
+  const char *function_align;
+  const char *jump_align;
+  const char *loop_align;
   int int_reassoc_width;
   int fp_reassoc_width;
   int vec_reassoc_width;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 240663)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -522,9 +522,9 @@ static const struct tune_params generic_tunings =
   4, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -548,9 +548,9 @@ static const struct tune_params cortexa35_tunings
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -574,9 +574,9 @@ static const struct tune_params cortexa53_tunings
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -600,9 +600,9 @@ static const struct tune_params cortexa57_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -626,9 +626,9 @@ static const struct tune_params cortexa72_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -652,9 +652,9 @@ static const struct tune_params cortexa73_tunings
   2, /* issue_rate.  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -677,9 +677,9 @@ static const struct tune_params exynosm1_tunings =
   4,	/* memmov_cost  */
   3,	/* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  4,	/* function_align.  */
-  4,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -702,9 +702,9 @@ static const struct tune_params thunderx_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -727,9 +727,9 @@ static const struct tune_params xgene1_tunings =
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -753,9 +753,9 @@ static const struct tune_params qdf24xx_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -778,9 +778,9 @@ static const struct tune_params vulcan_tunings =
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
   AARCH64_FUSE_NOTHING, /* fuseable_ops.  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   3,	/* int_reassoc_width.  */
   2,	/* fp_reassoc_width.  */
   2,	/* vec_reassoc_width.  */
Index: gcc/config/alpha/alpha.c
===================================================================
--- gcc/config/alpha/alpha.c	(revision 240663)
+++ gcc/config/alpha/alpha.c	(working copy)
@@ -624,13 +624,13 @@ alpha_override_options_after_change (void)
   /* ??? Kludge these by not doing anything if we don't optimize.  */
   if (optimize > 0)
     {
-      if (align_loops <= 0)
-	align_loops = 16;
-      if (align_jumps <= 0)
-	align_jumps = 16;
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = "16";
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = "16";
     }
-  if (align_functions <= 0)
-    align_functions = 16;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = "16";
 }
 \f
 /* Returns 1 if VALUE is a mask that contains full bytes of zero or ones.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 240663)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2922,9 +2922,10 @@ static GTY(()) tree init_optimize;
 static void
 arm_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (opts->x_align_functions <= 0)
-    opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags)
-      && opts->x_optimize_size ? 2 : 4;
+  /* -falign-functions without argument: supply one */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags)
+      && opts->x_optimize_size ? "2" : "4";
 }
 
 /* Implement targetm.override_options_after_change.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 240663)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2627,45 +2627,47 @@ struct ptt
 {
   const char *const name;			/* processor name  */
   const struct processor_costs *cost;		/* Processor costs */
-  const int align_loop;				/* Default alignments.  */
-  const int align_loop_max_skip;
-  const int align_jump;
-  const int align_jump_max_skip;
-  const int align_func;
+  const char *const align_loop;			/* Default alignments.  */
+  const char *const align_jump;
+  const char *const align_label;
+  const char *const align_func;
 };
 
-/* This table must be in sync with enum processor_type in i386.h.  */ 
+/* This table must be in sync with enum processor_type in i386.h.  */
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {"generic", &generic_cost, 16, 10, 16, 10, 16},
-  {"i386", &i386_cost, 4, 3, 4, 3, 4},
-  {"i486", &i486_cost, 16, 15, 16, 15, 16},
-  {"pentium", &pentium_cost, 16, 7, 16, 7, 16},
-  {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16},
-  {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16},
-  {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0},
-  {"nocona", &nocona_cost, 0, 0, 0, 0, 0},
-  {"core2", &core_cost, 16, 10, 16, 10, 16},
-  {"nehalem", &core_cost, 16, 10, 16, 10, 16},
-  {"sandybridge", &core_cost, 16, 10, 16, 10, 16},
-  {"haswell", &core_cost, 16, 10, 16, 10, 16},
-  {"bonnell", &atom_cost, 16, 15, 16, 7, 16},
-  {"silvermont", &slm_cost, 16, 15, 16, 7, 16},
-  {"knl", &slm_cost, 16, 15, 16, 7, 16},
-  {"skylake-avx512", &core_cost, 16, 10, 16, 10, 16},
-  {"intel", &intel_cost, 16, 15, 16, 7, 16},
-  {"geode", &geode_cost, 0, 0, 0, 0, 0},
-  {"k6", &k6_cost, 32, 7, 32, 7, 32},
-  {"athlon", &athlon_cost, 16, 7, 16, 7, 16},
-  {"k8", &k8_cost, 16, 7, 16, 7, 16},
-  {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32},
-  {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11},
-  {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11},
-  {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11},
-  {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11},
-  {"btver1", &btver1_cost, 16, 10, 16, 7, 11},
-  {"btver2", &btver2_cost, 16, 10, 16, 7, 11},
-  {"znver1", &znver1_cost, 16, 10, 16, 7, 11}
+/* The "0,0,8" label alignment specified for some processors generates
+   secondary 8-byte alignment only for those label/jump/loop targets
+   which have primary alignment.  */
+  {"generic",    &generic_cost,   "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"i386",       &i386_cost,      "4",       "4",       NULL,    "4" },
+  {"i486",       &i486_cost,      "16,16,8", "16,16,8", "0,0,8", "16"},
+  {"pentium",    &pentium_cost,   "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"lakemont",   &lakemont_cost,  "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"pentiumpro", &pentiumpro_cost,"16,16,8", "16,11,8", "0,0,8", "16"},
+  {"pentium4",   &pentium4_cost,  NULL,      NULL,      NULL,    NULL},
+  {"nocona",     &nocona_cost,    NULL,      NULL,      NULL,    NULL},
+  {"core2",      &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"nehalem",    &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"sandybridge",&core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"haswell",    &core_cost,      "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"bonnell",    &atom_cost,      "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"silvermont", &slm_cost,       "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"knl",        &slm_cost,       "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"skylake-avx512", &core_cost,  "16,11,8", "16,11,8", "0,0,8", "16"},
+  {"intel",      &intel_cost,     "16,16,8", "16,8,8",  "0,0,8", "16"},
+  {"geode",      &geode_cost,     NULL,      NULL,      NULL,    NULL},
+  {"k6",         &k6_cost,        "32,8,8",  "32,8,8",  "0,0,8", "32"},
+  {"athlon",     &athlon_cost,    "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"k8",         &k8_cost,        "16,8,8",  "16,8,8",  "0,0,8", "16"},
+  {"amdfam10",   &amdfam10_cost,  "32,25,8", "32,8,8",  "0,0,8", "32"},
+  {"bdver1",     &bdver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver2",     &bdver2_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver3",     &bdver3_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"bdver4",     &bdver4_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"btver1",     &btver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"btver2",     &btver2_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"},
+  {"znver1",     &znver1_cost,    "16,11,8", "16,8,8",  "0,0,8", "11"}
 };
 \f
 static unsigned int
@@ -4706,20 +4708,23 @@ set_ix86_tune_features (enum processor_type ix86_t
 static void
 ix86_default_align (struct gcc_options *opts)
 {
-  if (opts->x_align_loops == 0)
+  /* -falign-foo without argument: supply one */
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
     {
-      opts->x_align_loops = processor_target_table[ix86_tune].align_loop;
-      align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
+      opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop;
     }
-  if (opts->x_align_jumps == 0)
+  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
     {
-      opts->x_align_jumps = processor_target_table[ix86_tune].align_jump;
-      align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
+      opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump;
     }
-  if (opts->x_align_functions == 0)
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
     {
-      opts->x_align_functions = processor_target_table[ix86_tune].align_func;
+      opts->x_str_align_labels = processor_target_table[ix86_tune].align_label;
     }
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    {
+      opts->x_str_align_functions = processor_target_table[ix86_tune].align_func;
+    }
 }
 
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 240663)
+++ gcc/config/mips/mips.c	(working copy)
@@ -488,9 +488,9 @@ unsigned int mips_base_compression_flags;
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
 static int mips_base_move_loop_invariants; /* flag_move_loop_invariants */
-static int mips_base_align_loops; /* align_loops */
-static int mips_base_align_jumps; /* align_jumps */
-static int mips_base_align_functions; /* align_functions */
+static const char *mips_base_align_loops; /* align_loops */
+static const char *mips_base_align_jumps; /* align_jumps */
+static const char *mips_base_align_functions; /* align_functions */
 
 /* Index [M][R] is true if register R is allowed to hold a value of mode M.  */
 bool mips_hard_regno_mode_ok[(int) MAX_MACHINE_MODE][FIRST_PSEUDO_REGISTER];
@@ -19303,12 +19303,12 @@ mips_set_compression_mode (unsigned int compressio
       /* Provide default values for align_* for 64-bit targets.  */
       if (TARGET_64BIT)
 	{
-	  if (align_loops == 0)
-	    align_loops = 8;
-	  if (align_jumps == 0)
-	    align_jumps = 8;
-	  if (align_functions == 0)
-	    align_functions = 8;
+	  if (flag_align_loops && !str_align_loops)
+	    str_align_loops = "8";
+	  if (flag_align_jumps && !str_align_jumps)
+	    str_align_jumps = "8";
+	  if (flag_align_functions && !str_align_functions)
+	    str_align_functions = "8";
 	}
 
       targetm.min_anchor_offset = -32768;
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 240663)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -4750,29 +4750,25 @@ rs6000_option_override_internal (bool global_init_
 	  if (rs6000_cpu == PROCESSOR_TITAN
 	      || rs6000_cpu == PROCESSOR_CELL)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 8;
-	      if (align_jumps <= 0)
-		align_jumps = 8;
-	      if (align_loops <= 0)
-		align_loops = 8;
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "8";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "8";
+	      if (flag_align_loops && !str_align_loops)
+		str_align_loops = "8";
 	    }
 	  if (rs6000_align_branch_targets)
 	    {
-	      if (align_functions <= 0)
-		align_functions = 16;
-	      if (align_jumps <= 0)
-		align_jumps = 16;
-	      if (align_loops <= 0)
+	      if (flag_align_functions && !str_align_functions)
+		str_align_functions = "16";
+	      if (flag_align_jumps && !str_align_jumps)
+		str_align_jumps = "16";
+	      if (flag_align_loops && !str_align_loops)
 		{
 		  can_override_loop_align = 1;
-		  align_loops = 16;
+		  str_align_loops = "16";
 		}
 	    }
-	  if (align_jumps_max_skip <= 0)
-	    align_jumps_max_skip = 15;
-	  if (align_loops_max_skip <= 0)
-	    align_loops_max_skip = 15;
 	}
 
       /* Arrange to save and restore machine status around nested functions.  */
Index: gcc/config/rx/rx.c
===================================================================
--- gcc/config/rx/rx.c	(revision 240663)
+++ gcc/config/rx/rx.c	(working copy)
@@ -2819,12 +2819,15 @@ rx_option_override (void)
   rx_override_options_after_change ();
 
   /* These values are bytes, not log.  */
-  if (align_jumps == 0 && ! optimize_size)
-    align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_loops == 0 && ! optimize_size)
-    align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
-  if (align_labels == 0 && ! optimize_size)
-    align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? 4 : 8);
+  if (! optimize_size)
+    {
+      if (flag_align_jumps && !str_align_jumps)
+	str_align_jumps = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_loops && !str_align_loops)
+	str_align_loops = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+      if (flag_align_labels && !str_align_labels)
+	str_align_labels = ((rx_cpu_type == RX100 || rx_cpu_type == RX200) ? "4" : "8");
+    }
 }
 
 \f
Index: gcc/config/rx/rx.h
===================================================================
--- gcc/config/rx/rx.h	(revision 240663)
+++ gcc/config/rx/rx.h	(working copy)
@@ -432,9 +432,9 @@ typedef unsigned int CUMULATIVE_ARGS;
 /* Compute the alignment needed for label X in various situations.
    If the user has specified an alignment then honour that, otherwise
    use rx_align_for_label.  */
-#define JUMP_ALIGN(x)				(align_jumps > 1 ? align_jumps_log : rx_align_for_label (x, 0))
-#define LABEL_ALIGN(x)				(align_labels > 1 ? align_labels_log : rx_align_for_label (x, 3))
-#define LOOP_ALIGN(x)				(align_loops > 1 ? align_loops_log : rx_align_for_label (x, 2))
+#define JUMP_ALIGN(x)				(align_jumps_log > 0 ? align_jumps_log : rx_align_for_label (x, 0))
+#define LABEL_ALIGN(x)				(align_labels_log > 0 ? align_labels_log : rx_align_for_label (x, 3))
+#define LOOP_ALIGN(x)				(align_loops_log > 0 ? align_loops_log : rx_align_for_label (x, 2))
 #define LABEL_ALIGN_AFTER_BARRIER(x)		rx_align_for_label (x, 0)
 
 #define ASM_OUTPUT_MAX_SKIP_ALIGN(STREAM, LOG, MAX_SKIP)	\
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 240663)
+++ gcc/config/sh/sh.c	(working copy)
@@ -983,16 +983,16 @@ sh_override_options_after_change (void)
       Aligning all jumps increases the code size, even if it might
       result in slightly faster code.  Thus, it is set to the smallest 
       alignment possible if not specified by the user.  */
-  if (align_loops == 0)
-    align_loops = optimize_size ? 2 : 4;
+  if (flag_align_loops && !str_align_loops)
+    str_align_loops = optimize_size ? "2" : "4";
 
-  if (align_jumps == 0)
-    align_jumps = 2;
-  else if (align_jumps < 2)
-    align_jumps = 2;
+  if (flag_align_jumps && !str_align_jumps)
+    str_align_jumps = "2";
+  else
+    min_align_jumps_log = 1;
 
-  if (align_functions == 0)
-    align_functions = optimize_size ? 2 : 4;
+  if (flag_align_functions && !str_align_functions)
+    str_align_functions = optimize_size ? "2" : "4";
 
   /* The linker relaxation code breaks when a function contains
      alignments that are larger than that at the start of a
@@ -999,13 +999,13 @@ sh_override_options_after_change (void)
      compilation unit.  */
   if (TARGET_RELAX)
     {
-      int min_align = align_loops > align_jumps ? align_loops : align_jumps;
+      parse_alignment_opts ();
+      min_align_functions_log = align_loops_log > align_jumps_log ?
+				align_loops_log : align_jumps_log;
 
       /* Also take possible .long constants / mova tables into account.	*/
-      if (min_align < 4)
-	min_align = 4;
-      if (align_functions < min_align)
-	align_functions = min_align;
+      if (min_align_functions_log < 2)
+	min_align_functions_log = 2;
     }
 }
 \f
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	(revision 240663)
+++ gcc/config/spu/spu.c	(working copy)
@@ -2767,7 +2767,8 @@ static void
 spu_sched_init (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 		int max_ready ATTRIBUTE_UNUSED)
 {
-  if (align_labels > 4 || align_loops > 4 || align_jumps > 4)
+  parse_alignment_opts ();
+  if (align_labels_log > 2 || align_loops_log > 2 || align_jumps_log > 2)
     {
       /* When any block might be at least 8-byte aligned, assume they
          will all be at least 8-byte aligned to make sure dual issue
Index: gcc/config/visium/visium.c
===================================================================
--- gcc/config/visium/visium.c	(revision 240663)
+++ gcc/config/visium/visium.c	(working copy)
@@ -412,12 +412,12 @@ visium_option_override (void)
 
   /* Align functions on 256-byte (32-quadword) for GR5 and 64-byte (8-quadword)
      boundaries for GR6 so they start a new burst mode window.  */
-  if (align_functions == 0)
+  if (flag_align_functions && !str_align_functions)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_functions = 64;
+	str_align_functions = "64";
       else
-	align_functions = 256;
+	str_align_functions = "256";
 
       /* Allow the size of compilation units to double because of inlining.
 	 In practice the global size of the object code is hardly affected
@@ -428,26 +428,25 @@ visium_option_override (void)
     }
 
   /* Likewise for loops.  */
-  if (align_loops == 0)
+  if (flag_align_loops && !str_align_loops)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_loops = 64;
+	str_align_loops = "64";
       else
 	{
-	  align_loops = 256;
 	  /* But not if they are too far away from a 256-byte boundary.  */
-	  align_loops_max_skip = 31;
+	  str_align_loops = "256,32";
 	}
     }
 
   /* Align all jumps on quadword boundaries for the burst mode, and even
      on 8-quadword boundaries for GR6 so they start a new window.  */
-  if (align_jumps == 0)
+  if (flag_align_jumps && !str_align_jumps)
     {
       if (visium_cpu == PROCESSOR_GR6)
-	align_jumps = 64;
+	str_align_jumps = "64";
       else
-	align_jumps = 8;
+	str_align_jumps = "8";
     }
 
   /* We register a machine-specific pass.  This pass must be scheduled as
Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 240663)
+++ gcc/final.c	(working copy)
@@ -2417,6 +2417,12 @@ final_scan_insn (rtx_insn *insn, FILE *file, int o
 	    {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
 	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip);
+	      /* Above, we don't know whether a label, jump or loop
+		 alignment was used. Conservatively apply
+		 label subalignment, not jump or loop
+		 subalignment (they are almost always larger).  */
+	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align_labels[1].log,
+					 align_labels[1].maxskip);
 #else
 #ifdef ASM_OUTPUT_ALIGN_WITH_NOP
               ASM_OUTPUT_ALIGN_WITH_NOP (file, align);
Index: gcc/flags.h
===================================================================
--- gcc/flags.h	(revision 240663)
+++ gcc/flags.h	(working copy)
@@ -43,19 +43,22 @@ extern bool final_insns_dump_p;
 /* Other basic status info about current function.  */
 
 /* Target-dependent global state.  */
-struct target_flag_state {
+struct align_flags {
   /* Values of the -falign-* flags: how much to align labels in code.
-     0 means `use default', 1 means `don't align'.
-     For each variable, there is an _log variant which is the power
-     of two not less than the variable, for .align output.  */
-  int x_align_loops_log;
-  int x_align_loops_max_skip;
-  int x_align_jumps_log;
-  int x_align_jumps_max_skip;
-  int x_align_labels_log;
-  int x_align_labels_max_skip;
-  int x_align_functions_log;
+     log is "align to 2^log" (so 0 means no alignment).
+     maxskip is the maximum allowed amount of padding to insert. */
+  int log;
+  int maxskip;
+};
 
+struct target_flag_state {
+  /* Each falign-foo can generate up to two levels of alignment:
+     -falign-foo=N,M[,N2,M2] */
+  struct align_flags x_align_loops[2];
+  struct align_flags x_align_jumps[2];
+  struct align_flags x_align_labels[2];
+  struct align_flags x_align_functions[2];
+
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
 };
@@ -67,20 +70,21 @@ extern struct target_flag_state *this_target_flag_
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops_log \
-  (this_target_flag_state->x_align_loops_log)
-#define align_loops_max_skip \
-  (this_target_flag_state->x_align_loops_max_skip)
-#define align_jumps_log \
-  (this_target_flag_state->x_align_jumps_log)
-#define align_jumps_max_skip \
-  (this_target_flag_state->x_align_jumps_max_skip)
-#define align_labels_log \
-  (this_target_flag_state->x_align_labels_log)
-#define align_labels_max_skip \
-  (this_target_flag_state->x_align_labels_max_skip)
-#define align_functions_log \
-  (this_target_flag_state->x_align_functions_log)
+#define align_loops              (this_target_flag_state->x_align_loops)
+#define align_jumps              (this_target_flag_state->x_align_jumps)
+#define align_labels             (this_target_flag_state->x_align_labels)
+#define align_functions          (this_target_flag_state->x_align_functions)
+#define align_loops_log          (align_loops[0].log)
+#define align_jumps_log          (align_jumps[0].log)
+#define align_labels_log         (align_labels[0].log)
+#define align_functions_log      (align_functions[0].log)
+#define align_loops_max_skip     (align_loops[0].maxskip)
+#define align_jumps_max_skip     (align_jumps[0].maxskip)
+#define align_labels_max_skip    (align_labels[0].maxskip)
+#define align_functions_max_skip (align_functions[0].maxskip)
+/* String representaions of the above options are available in
+   const char *str_align_foo. NULL if not set. */
+
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	(revision 240663)
+++ gcc/toplev.c	(working copy)
@@ -1179,31 +1179,111 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
-/* Default the align_* variables to 1 if they're still unset, and
-   set up the align_*_log variables.  */
+/* Read a decimal number from string FLAG, up to end of line or comma.
+   Emit error message if number ends with any other character.
+   Return pointer past comma, or NULL if end of line.  */
+static const char *
+read_uint (const char *flag, const char *name, int *np)
+{
+  const char *flag_start = flag;
+  int n = 0;
+  char c;
+
+  while ((c = *flag++) >= '0' && c <= '9')
+    n = n*10 + (c-'0');
+  *np = n & 0x3fffffff; /* avoid accidentally negative numbers */
+  if (c == '\0')
+    return NULL;
+  if (c == ',')
+    return flag;
+
+  error_at (UNKNOWN_LOCATION, "-falign-%s parameter is bad at '%s'",
+            name, flag_start);
+  return NULL;
+}
+
+/* Parse "N[,M][,...]" string FLAG into struct align_flags A.
+   Return pointer past second comma, or NULL if end of line.  */
+static const char *
+read_log_maxskip (const char *flag, const char *name, struct align_flags *a)
+{
+  int n, m;
+  flag = read_uint (flag, name, &a->log);
+  n = a->log;
+  if (n != 0)
+    a->log = floor_log2 (n * 2 - 1);
+  if (!flag)
+    {
+      a->maxskip = n ? n - 1 : 0;
+      return flag;
+    }
+  flag = read_uint (flag, name, &a->maxskip);
+  m = a->maxskip;
+  if (m > n) m = n;
+  if (m > 0) m--; /* -falign-foo=N,M means M-1 max bytes of padding, not M */
+  a->maxskip = m;
+  return flag;
+}
+
+/* Parse "N[,M[,N2[,M2]]]" string FLAG into a pair of struct align_flags.  */
 static void
-init_alignments (void)
+parse_N_M (const char *flag, const char *name, struct align_flags a[2],
+	   unsigned int min_align_log)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  if (flag)
+    {
+      flag = read_log_maxskip (flag, name, &a[0]);
+      if (flag)
+	flag = read_log_maxskip (flag, name, &a[1]);
+#ifdef SUBALIGN_LOG
+      else
+	{
+	  /* N2[,M2] is not specified. This arch has a default for N2.
+	     Before -falign-foo=N,M,N2,M2 was introduced, x86 had a tweak.
+	     -falign-functions=N with N > 8 was adding secondary alignment.
+	     -falign-functions=10 was emitting this before every function:
+			.p2align 4,,9
+			.p2align 3
+	     Now this behavior (and more) can be explicitly requested:
+	     -falign-functions=16,10,8
+	     Retain old behavior if N2 is missing: */
+
+	  int align = 1 << a[0].log;
+	  int subalign = 1 << SUBALIGN_LOG;
+
+	  if (a[0].log > SUBALIGN_LOG && a[0].maxskip >= subalign - 1)
+	    {
+	      /* Set N2 unless subalign can never have any effect */
+	      if (align > a[0].maxskip + 1)
+		a[1].log = SUBALIGN_LOG;
+	    }
+	}
+#endif
+    }
+  if ((unsigned int)a[0].log < min_align_log)
+    {
+      a[0].log = min_align_log;
+      a[0].maxskip = (1 << min_align_log) - 1;
+    }
 }
 
+/* Minimum alignment requirements, if arch has them.  */
+unsigned int min_align_loops_log = 0;
+unsigned int min_align_jumps_log = 0;
+unsigned int min_align_labels_log = 0;
+unsigned int min_align_functions_log = 0;
+
+/* Process -falign-foo=N[,M[,N2[,M2]]] options.  */
+void
+parse_alignment_opts (void)
+{
+  parse_N_M (str_align_loops, "loops", align_loops, min_align_loops_log);
+  parse_N_M (str_align_jumps, "jumps", align_jumps, min_align_jumps_log);
+  parse_N_M (str_align_labels, "labels", align_labels, min_align_labels_log);
+  parse_N_M (str_align_functions, "functions", align_functions,
+	     min_align_functions_log);
+}
+
 /* Process the options that have been parsed.  */
 static void
 process_options (void)
@@ -1627,7 +1709,7 @@ static void
 backend_init_target (void)
 {
   /* Initialize alignment variables.  */
-  init_alignments ();
+  parse_alignment_opts ();
 
   /* This depends on stack_pointer_rtx.  */
   init_fake_stack_mems ();
Index: gcc/toplev.h
===================================================================
--- gcc/toplev.h	(revision 240663)
+++ gcc/toplev.h	(working copy)
@@ -98,6 +98,13 @@ extern bool set_src_pwd		       (const char *);
 extern HOST_WIDE_INT get_random_seed (bool);
 extern const char *set_random_seed (const char *);
 
+extern unsigned int min_align_loops_log;
+extern unsigned int min_align_jumps_log;
+extern unsigned int min_align_labels_log;
+extern unsigned int min_align_functions_log;
+
+extern void parse_alignment_opts (void);
+
 extern void initialize_rtl (void);
 
 #endif /* ! GCC_TOPLEV_H */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 240663)
+++ gcc/varasm.c	(working copy)
@@ -1792,8 +1792,10 @@ assemble_start_function (tree decl, const char *fn
       && optimize_function_for_speed_p (cfun))
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
-      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
-				 align_functions_log, align_functions - 1);
+      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions[0].log,
+				 align_functions[0].maxskip);
+      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions[1].log,
+				 align_functions[1].maxskip);
 #else
       ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
 #endif
Index: gcc/testsuite/gcc.target/i386/falign-functions.c
===================================================================
--- gcc/testsuite/gcc.target/i386/falign-functions.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/falign-functions.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64,8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2016-10-06  9:47   ` Bernd Schmidt
@ 2016-10-06 18:43     ` Denys Vlasenko
  0 siblings, 0 replies; 26+ messages in thread
From: Denys Vlasenko @ 2016-10-06 18:43 UTC (permalink / raw)
  To: Bernd Schmidt, gcc-patches; +Cc: Andrew Pinski, Uros Bizjak



On 10/06/2016 11:47 AM, Bernd Schmidt wrote:
> On 09/30/2016 07:54 PM, Denys Vlasenko wrote:
>> +struct target_flag_state {
>> +  /* Each falign-foo can generate up to two levels of alignment:
>> +     -falign-foo=N,M[,N2,M2] */
>> +  struct align_flags x_align_loops[2];
>> +  struct align_flags x_align_jumps[2];
>> +  struct align_flags x_align_labels[2];
>> +  struct align_flags x_align_functions[2];
>
>> +#define align_loops              (this_target_flag_state->x_align_loops)
>> +#define align_jumps              (this_target_flag_state->x_align_jumps)
>> +#define align_labels             (this_target_flag_state->x_align_labels)
>
> In addition to the points already raised, this also breaks ports
>  which access variables like align_jumps and expect them to be integers.
>rs6000 is one such port.

Ouch... I did forget to search for these! Will send an updated patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2016-09-30 17:58 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
@ 2016-10-06  9:47   ` Bernd Schmidt
  2016-10-06 18:43     ` Denys Vlasenko
  0 siblings, 1 reply; 26+ messages in thread
From: Bernd Schmidt @ 2016-10-06  9:47 UTC (permalink / raw)
  To: Denys Vlasenko, gcc-patches; +Cc: Andrew Pinski, Uros Bizjak

On 09/30/2016 07:54 PM, Denys Vlasenko wrote:
> +struct target_flag_state {
> +  /* Each falign-foo can generate up to two levels of alignment:
> +     -falign-foo=N,M[,N2,M2] */
> +  struct align_flags x_align_loops[2];
> +  struct align_flags x_align_jumps[2];
> +  struct align_flags x_align_labels[2];
> +  struct align_flags x_align_functions[2];

> +#define align_loops              (this_target_flag_state->x_align_loops)
> +#define align_jumps              (this_target_flag_state->x_align_jumps)
> +#define align_labels             (this_target_flag_state->x_align_labels)

In addition to the points already raised, this also breaks ports which 
access variables like align_jumps and expect them to be integers. rs6000 
is one such port.


Bernd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]]
  2016-09-30 17:55 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 4 Denys Vlasenko
@ 2016-09-30 17:58 ` Denys Vlasenko
  2016-10-06  9:47   ` Bernd Schmidt
  0 siblings, 1 reply; 26+ messages in thread
From: Denys Vlasenko @ 2016-09-30 17:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Denys Vlasenko, Andrew Pinski, Uros Bizjak, Bernd Schmidt

falign-functions=N is too simplistic.

Ingo Molnar ran some tests and it seems that on latest x86 CPUs, 64-byte alignment
of functions runs fastest (he tried many other possibilites):
this way, after a call CPU can fetch a lot of insns in the first cacheline fill.

However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
        On 05/20/2015 02:47 AM, Linus Torvalds wrote:
        > At the same time, I have to admit that I abhor a 64-byte function
        > alignment, when we have a fair number of functions that are (much)
        > smaller than that.
        >
        > Is there some way to get gcc to take the size of the function into
        > account? Because aligning a 16-byte or 32-byte function on a 64-byte
        > alignment is just criminally nasty and wasteful.

This change makes it possible to align functions to 64-byte boundaries *if*
this does not introduce huge amount of padding.

Example syntax is -falign-functions=64,9: "align to 64 by skipping up to
9 bytes (not inclusive)". IOW: "after a call insn, CPU will always be able
to fetch at least 9 bytes of insns".

x86 had a tweak: -falign-functions=N with N > 8 was adding secondary alignment.
For example, falign-functions=10 was emitting this before every function:
	.p2align 4,,9
	.p2align 3
This tweak was removed by the previous patch. Now it is reinstated
by the logic that if falign-functions=N[,M] is specified and N > 8,
then default value of N2 is 8, not 1. Now this can be suppressed by
falign-functions=N,M,1 - which wasn't possible before.
In general, optional N2,M2 pair can be used to generate any secondary
alignment user wants.

Subalignment for loops/jumps/labels are trickier to fully implement.
The implementation in this patch uses falign-labels subalignment values
for any of these three types of labels - but only if "main" alignment
triggers. With -O2 defaults, this provides a matching behavior on x86:
loops and jumps are aligned (to 16-32 bytes depending on selected CPU)
and subaligned to 8 bytes. Labels are not aligned.

Testing:

Tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal parameters) works exactly
like -falign-functions=N.

No change from past behavior:
Tested that "-falign-functions" uses an arch-dependent alignment.
Tested that "-O2" uses an arch-dependent alignment.
Tested that "-O2 -falign-functions=N" uses explicitly given alignment.

2016-09-27  Denys Vlasenko  <dvlasenk@redhat.com>

    * doc/invoke.texi: Update option documentation.
    * common.opt (-falign-functions=): Accept a string instead of integer.
    (-falign-jumps=): Likewise.
    (-falign-labels=): Likewise.
    (-falign-loops=): Likewise.
    * flags.h (struct target_flag_state): Revamp how alignment data is stored:
    for each of four alignment types, store two pairs of log/maxskip values.
    * toplev.c (read_uint): New function.
    (read_log_maxskip): New function.
    (parse_N_M): New function.
    (init_alignments): Set align_FOO[0/1].log/maxskip from
    specified falign-FOO=N[,M[,N[,M]]] options.
    * varasm.c (assemble_start_function): Call two ASM_OUTPUT_MAX_SKIP_ALIGN
    macros, first for N,M and second time for N2,M2 from
    falign-functions=N,M,N2,M2. This generates 0, 1, or 2 align directives.
    * final.c (final_scan_insn): If a label, jump or loop target
    is being aligned, emit a secondary alignment directive.
    * config/i386/i386.c (struct ptt): Change foo_align members from
    integers to strings. Add align_label member. Set it to "0,0,8"
    on the processors which have maxskips > 7 for loops and jumps -
    this preserves existing behaviout of adding 8-byte subalign.
    * config/i386/i386.c (processor_target_table): Likewise.
    * config/aarch64/aarch64-protos.h (struct tune_params):
    Change foo_align members from integers to strings.
    * config/aarch64/aarch64.c (<cpu>_tunings):
    Change foo_align field values from integers to strings.
    * config/arm/arm.c (arm_override_options_after_change_1):
    Fix if() condition to detect that -falign-functions is specified.
    * config/i386/i386.c (ix86_default_align): Likewise.
    * testsuite/gcc.target/i386/falign-functions.c: New file.

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 239860)
+++ gcc/doc/invoke.texi	(working copy)
@@ -337,9 +337,11 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Optimization Options
 @xref{Optimize Options,,Options that Control Optimization}.
-@gccoptlist{-faggressive-loop-optimizations -falign-functions[=@var{n}] @gol
--falign-jumps[=@var{n}] @gol
--falign-labels[=@var{n}] -falign-loops[=@var{n}] @gol
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-jumps[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-labels[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
+-falign-loops[=@var{n}[,@var{m},[@var{n2}[,@var{m2}]]]] @gol
 -fassociative-math -fauto-profile -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec -fbranch-probabilities @gol
 -fbranch-target-load-optimize -fbranch-target-load-optimize2 @gol
@@ -7952,19 +7954,36 @@ The @option{-fstrict-overflow} option is enabled a
 
 @item -falign-functions
 @itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
+@itemx -falign-functions=@var{n},@var{m},@var{n2}
+@itemx -falign-functions=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-functions
 Align the start of functions to the next power-of-two greater than
-@var{n}, skipping up to @var{n} bytes.  For instance,
-@option{-falign-functions=32} aligns functions to the next 32-byte
-boundary, but @option{-falign-functions=24} aligns to the next
-32-byte boundary only if this can be done by skipping 23 bytes or less.
+@var{n}, skipping up to @var{m}-1 bytes.  Such alignment ensures that
+after branch, at least @var{m} bytes can be fetched by the CPU
+without crossing specified alignment boundary.
 
-@option{-fno-align-functions} and @option{-falign-functions=1} are
-equivalent and mean that functions are not aligned.
+If @var{m} is not specified, it defaults to @var{n}.
+Same for @var{m2} and @var{n2}.
 
+Examples: @option{-falign-functions=32} aligns functions to the next
+32-byte boundary, @option{-falign-functions=24} aligns to the next
+32-byte boundary only if this can be done by skipping 23 bytes or less,
+@option{-falign-functions=32,7} aligns to the next
+32-byte boundary only if this can be done by skipping 6 bytes or less.
+
+The second pair of @var{n2},@var{m2} values allows to have a secondary
+alignment: @option{-falign-functions=64,7,32,3} aligns to the next
+64-byte boundary if this can be done by skipping 6 bytes or less,
+otherwise aligns to the next 32-byte boundary if this can be done
+by skipping 2 bytes or less.
+
 Some assemblers only support this flag when @var{n} is a power of two;
 in that case, it is rounded up.
 
+@option{-fno-align-functions} and @option{-falign-functions=1} are
+equivalent and mean that functions are not aligned.
+
 If @var{n} is not specified or is zero, use a machine-dependent default.
 
 Enabled at levels @option{-O2}, @option{-O3}.
@@ -7971,12 +7990,13 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-labels
 @itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
+@itemx -falign-labels=@var{n},@var{m},@var{n2}
+@itemx -falign-labels=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-labels
-Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}.  This option can easily
-make code slower, because it must insert dummy operations for when the
-branch target is reached in the usual flow of the code.
+Align all branch targets to a power-of-two boundary.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-labels} and @option{-falign-labels=1} are
 equivalent and mean that labels are not aligned.
 
@@ -7990,12 +8010,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-loops
 @itemx -falign-loops=@var{n}
+@itemx -falign-loops=@var{n},@var{m}
+@itemx -falign-loops=@var{n},@var{m},@var{n2}
+@itemx -falign-loops=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-loops
-Align loops to a power-of-two boundary, skipping up to @var{n} bytes
-like @option{-falign-functions}.  If the loops are
-executed many times, this makes up for any execution of the dummy
-operations.
+Align loops to a power-of-two boundary.  If the loops are executed
+many times, this makes up for any execution of the dummy padding
+instructions.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-loops} and @option{-falign-loops=1} are
 equivalent and mean that loops are not aligned.
 
@@ -8005,12 +8028,15 @@ Enabled at levels @option{-O2}, @option{-O3}.
 
 @item -falign-jumps
 @itemx -falign-jumps=@var{n}
+@itemx -falign-jumps=@var{n},@var{m}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2}
+@itemx -falign-jumps=@var{n},@var{m},@var{n2},@var{m2}
 @opindex falign-jumps
 Align branch targets to a power-of-two boundary, for branch targets
-where the targets can only be reached by jumping, skipping up to @var{n}
-bytes like @option{-falign-functions}.  In this case, no dummy operations
-need be executed.
+where the targets can only be reached by jumping.  In this case,
+no dummy operations need be executed.
 
+Parameters of this option are analogous to @option{-falign-functions} option.
 @option{-fno-align-jumps} and @option{-falign-jumps=1} are
 equivalent and mean that loops are not aligned.
 
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 239860)
+++ gcc/common.opt	(working copy)
@@ -896,32 +896,32 @@ Common Report Var(flag_aggressive_loop_optimizatio
 Aggressively optimize loops using language constraints.
 
 falign-functions
-Common Report Var(align_functions,0) Optimization UInteger
+Common Report Var(flag_align_functions) Optimization
 Align the start of functions.
 
 falign-functions=
-Common RejectNegative Joined UInteger Var(align_functions)
+Common RejectNegative Joined Var(str_align_functions)
 
 falign-jumps
-Common Report Var(align_jumps,0) Optimization UInteger
+Common Report Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
 
 falign-jumps=
-Common RejectNegative Joined UInteger Var(align_jumps)
+Common RejectNegative Joined Var(str_align_jumps)
 
 falign-labels
-Common Report Var(align_labels,0) Optimization UInteger
+Common Report Var(flag_align_labels) Optimization
 Align all labels.
 
 falign-labels=
-Common RejectNegative Joined UInteger Var(align_labels)
+Common RejectNegative Joined Var(str_align_labels)
 
 falign-loops
-Common Report Var(align_loops,0) Optimization UInteger
+Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined UInteger Var(align_loops)
+Common RejectNegative Joined Var(str_align_loops)
 
 fargument-alias
 Common Ignore
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	(revision 239860)
+++ gcc/config/aarch64/aarch64-protos.h	(working copy)
@@ -208,9 +208,9 @@ struct tune_params
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
-  int function_align;
-  int jump_align;
-  int loop_align;
+  const char *function_align;
+  const char *jump_align;
+  const char *loop_align;
   int int_reassoc_width;
   int fp_reassoc_width;
   int vec_reassoc_width;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	(revision 239860)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -521,9 +521,9 @@ static const struct tune_params generic_tunings =
   4, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -547,9 +547,9 @@ static const struct tune_params cortexa35_tunings
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -573,9 +573,9 @@ static const struct tune_params cortexa53_tunings
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -599,9 +599,9 @@ static const struct tune_params cortexa57_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -625,9 +625,9 @@ static const struct tune_params cortexa72_tunings
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -651,9 +651,9 @@ static const struct tune_params cortexa73_tunings
   2, /* issue_rate.  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -676,9 +676,9 @@ static const struct tune_params exynosm1_tunings =
   4,	/* memmov_cost  */
   3,	/* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
-  4,	/* function_align.  */
-  4,	/* jump_align.  */
-  4,	/* loop_align.  */
+  "4",	/* function_align.  */
+  "4",	/* jump_align.  */
+  "4",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -701,9 +701,9 @@ static const struct tune_params thunderx_tunings =
   6, /* memmov_cost  */
   2, /* issue_rate  */
   AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
-  8,	/* function_align.  */
-  8,	/* jump_align.  */
-  8,	/* loop_align.  */
+  "8",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "8",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -726,9 +726,9 @@ static const struct tune_params xgene1_tunings =
   6, /* memmov_cost  */
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -752,9 +752,9 @@ static const struct tune_params qdf24xx_tunings =
   4, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   2,	/* int_reassoc_width.  */
   4,	/* fp_reassoc_width.  */
   1,	/* vec_reassoc_width.  */
@@ -777,9 +777,9 @@ static const struct tune_params vulcan_tunings =
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
   AARCH64_FUSE_NOTHING, /* fuseable_ops.  */
-  16,	/* function_align.  */
-  8,	/* jump_align.  */
-  16,	/* loop_align.  */
+  "16",	/* function_align.  */
+  "8",	/* jump_align.  */
+  "16",	/* loop_align.  */
   3,	/* int_reassoc_width.  */
   2,	/* fp_reassoc_width.  */
   2,	/* vec_reassoc_width.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 239860)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2899,9 +2899,10 @@ static GTY(()) tree init_optimize;
 static void
 arm_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (opts->x_align_functions <= 0)
-    opts->x_align_functions = TARGET_THUMB_P (opts->x_target_flags)
-      && opts->x_optimize_size ? 2 : 4;
+  /* -falign-functions without argument: supply one */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = TARGET_THUMB_P (opts->x_target_flags)
+      && opts->x_optimize_size ? "2" : "4";
 }
 
 /* Implement targetm.override_options_after_change.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 239860)
+++ gcc/config/i386/i386.c	(working copy)
@@ -2626,45 +2626,47 @@ struct ptt
 {
   const char *const name;			/* processor name  */
   const struct processor_costs *cost;		/* Processor costs */
-  const int align_loop;				/* Default alignments.  */
-  const int align_loop_max_skip;
-  const int align_jump;
-  const int align_jump_max_skip;
-  const int align_func;
+  const char *const align_loop;			/* Default alignments.  */
+  const char *const align_jump;
+  const char *const align_label;
+  const char *const align_func;
 };
 
-/* This table must be in sync with enum processor_type in i386.h.  */ 
+/* This table must be in sync with enum processor_type in i386.h.  */
 static const struct ptt processor_target_table[PROCESSOR_max] =
 {
-  {"generic", &generic_cost, 16, 10, 16, 10, 16},
-  {"i386", &i386_cost, 4, 3, 4, 3, 4},
-  {"i486", &i486_cost, 16, 15, 16, 15, 16},
-  {"pentium", &pentium_cost, 16, 7, 16, 7, 16},
-  {"lakemont", &lakemont_cost, 16, 7, 16, 7, 16},
-  {"pentiumpro", &pentiumpro_cost, 16, 15, 16, 10, 16},
-  {"pentium4", &pentium4_cost, 0, 0, 0, 0, 0},
-  {"nocona", &nocona_cost, 0, 0, 0, 0, 0},
-  {"core2", &core_cost, 16, 10, 16, 10, 16},
-  {"nehalem", &core_cost, 16, 10, 16, 10, 16},
-  {"sandybridge", &core_cost, 16, 10, 16, 10, 16},
-  {"haswell", &core_cost, 16, 10, 16, 10, 16},
-  {"bonnell", &atom_cost, 16, 15, 16, 7, 16},
-  {"silvermont", &slm_cost, 16, 15, 16, 7, 16},
-  {"knl", &slm_cost, 16, 15, 16, 7, 16},
-  {"skylake-avx512", &core_cost, 16, 10, 16, 10, 16},
-  {"intel", &intel_cost, 16, 15, 16, 7, 16},
-  {"geode", &geode_cost, 0, 0, 0, 0, 0},
-  {"k6", &k6_cost, 32, 7, 32, 7, 32},
-  {"athlon", &athlon_cost, 16, 7, 16, 7, 16},
-  {"k8", &k8_cost, 16, 7, 16, 7, 16},
-  {"amdfam10", &amdfam10_cost, 32, 24, 32, 7, 32},
-  {"bdver1", &bdver1_cost, 16, 10, 16, 7, 11},
-  {"bdver2", &bdver2_cost, 16, 10, 16, 7, 11},
-  {"bdver3", &bdver3_cost, 16, 10, 16, 7, 11},
-  {"bdver4", &bdver4_cost, 16, 10, 16, 7, 11},
-  {"btver1", &btver1_cost, 16, 10, 16, 7, 11},
-  {"btver2", &btver2_cost, 16, 10, 16, 7, 11},
-  {"znver1", &znver1_cost, 16, 10, 16, 7, 11}
+/* The "0,0,8" label alignment specified for some processors generates
+   secondary 8-byte alignment only for those label/jump/loop targets
+   which have primary alignment.  */
+  {"generic",    &generic_cost, "16,11", "16,11", "0,0,8", "16"},
+  {"i386",       &i386_cost,    "4",     "4",     NULL,    "4" },
+  {"i486",       &i486_cost,    "16",    "16",    "0,0,8", "16"},
+  {"pentium",    &pentium_cost, "16,8",  "16,8",  "0,0,8", "16"},
+  {"lakemont",   &lakemont_cost,"16,8",  "16,8",  "0,0,8", "16"},
+  {"pentiumpro", &pentiumpro_cost,"16",  "16,11", "0,0,8", "16"},
+  {"pentium4",   &pentium4_cost,NULL,    NULL,    NULL,    NULL},
+  {"nocona",     &nocona_cost,  NULL,    NULL,    NULL,    NULL},
+  {"core2",      &core_cost,    "16,11", "16,11", "0,0,8", "16"},
+  {"nehalem",    &core_cost,    "16,11", "16,11", "0,0,8", "16"},
+  {"sandybridge",&core_cost,    "16,11", "16,11", "0,0,8", "16"},
+  {"haswell",    &core_cost,    "16,11", "16,11", "0,0,8", "16"},
+  {"bonnell",    &atom_cost,    "16",    "16,8",  "0,0,8", "16"},
+  {"silvermont", &slm_cost,     "16",    "16,8",  "0,0,8", "16"},
+  {"knl",        &slm_cost,     "16",    "16,8",  "0,0,8", "16"},
+  {"skylake-avx512", &core_cost,"16,11", "16,11", "0,0,8", "16"},
+  {"intel",      &intel_cost,   "16",    "16,8",  "0,0,8", "16"},
+  {"geode",      &geode_cost,   NULL,    NULL,    NULL,    NULL},
+  {"k6",         &k6_cost,      "32,8",  "32,8",  "0,0,8", "32"},
+  {"athlon",     &athlon_cost,  "16,8",  "16,8",  "0,0,8", "16"},
+  {"k8",         &k8_cost,      "16,8",  "16,8",  "0,0,8", "16"},
+  {"amdfam10",   &amdfam10_cost,"32,25", "32,8",  "0,0,8", "32"},
+  {"bdver1",     &bdver1_cost,  "16,11", "16,8",  "0,0,8", "11"},
+  {"bdver2",     &bdver2_cost,  "16,11", "16,8",  "0,0,8", "11"},
+  {"bdver3",     &bdver3_cost,  "16,11", "16,8",  "0,0,8", "11"},
+  {"bdver4",     &bdver4_cost,  "16,11", "16,8",  "0,0,8", "11"},
+  {"btver1",     &btver1_cost,  "16,11", "16,8",  "0,0,8", "11"},
+  {"btver2",     &btver2_cost,  "16,11", "16,8",  "0,0,8", "11"},
+  {"znver1",     &znver1_cost,  "16,11", "16,8",  "0,0,8", "11"}
 };
 \f
 static unsigned int
@@ -4695,20 +4697,23 @@ set_ix86_tune_features (enum processor_type ix86_t
 static void
 ix86_default_align (struct gcc_options *opts)
 {
-  if (opts->x_align_loops == 0)
+  /* -falign-foo without argument: supply one */
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
     {
-      opts->x_align_loops = processor_target_table[ix86_tune].align_loop;
-      align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip;
+      opts->x_str_align_loops = processor_target_table[ix86_tune].align_loop;
     }
-  if (opts->x_align_jumps == 0)
+  if (opts->x_flag_align_jumps && !opts->x_str_align_jumps)
     {
-      opts->x_align_jumps = processor_target_table[ix86_tune].align_jump;
-      align_jumps_max_skip = processor_target_table[ix86_tune].align_jump_max_skip;
+      opts->x_str_align_jumps = processor_target_table[ix86_tune].align_jump;
     }
-  if (opts->x_align_functions == 0)
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
     {
-      opts->x_align_functions = processor_target_table[ix86_tune].align_func;
+      opts->x_str_align_labels = processor_target_table[ix86_tune].align_label;
     }
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    {
+      opts->x_str_align_functions = processor_target_table[ix86_tune].align_func;
+    }
 }
 
 /* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 239860)
+++ gcc/final.c	(working copy)
@@ -2415,6 +2415,12 @@ final_scan_insn (rtx_insn *insn, FILE *file, int o
 	    {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
 	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip);
+	      /* Above, we don't know whether a label, jump or loop
+		 alignment was used. Conservatively apply
+		 label subalignment, not jump or loop
+		 subalignment (they are almost always larger).  */
+	      ASM_OUTPUT_MAX_SKIP_ALIGN (file, align_labels[1].log,
+					 align_labels[1].maxskip);
 #else
 #ifdef ASM_OUTPUT_ALIGN_WITH_NOP
               ASM_OUTPUT_ALIGN_WITH_NOP (file, align);
Index: gcc/flags.h
===================================================================
--- gcc/flags.h	(revision 239860)
+++ gcc/flags.h	(working copy)
@@ -43,19 +43,22 @@ extern bool final_insns_dump_p;
 /* Other basic status info about current function.  */
 
 /* Target-dependent global state.  */
-struct target_flag_state {
+struct align_flags {
   /* Values of the -falign-* flags: how much to align labels in code.
-     0 means `use default', 1 means `don't align'.
-     For each variable, there is an _log variant which is the power
-     of two not less than the variable, for .align output.  */
-  int x_align_loops_log;
-  int x_align_loops_max_skip;
-  int x_align_jumps_log;
-  int x_align_jumps_max_skip;
-  int x_align_labels_log;
-  int x_align_labels_max_skip;
-  int x_align_functions_log;
+     log is "align to 2^log" (so 0 means no alignment).
+     maxskip is the maximum allowed amount of padding to insert. */
+  int log;
+  int maxskip;
+};
 
+struct target_flag_state {
+  /* Each falign-foo can generate up to two levels of alignment:
+     -falign-foo=N,M[,N2,M2] */
+  struct align_flags x_align_loops[2];
+  struct align_flags x_align_jumps[2];
+  struct align_flags x_align_labels[2];
+  struct align_flags x_align_functions[2];
+
   /* The excess precision currently in effect.  */
   enum excess_precision x_flag_excess_precision;
 };
@@ -67,20 +70,21 @@ extern struct target_flag_state *this_target_flag_
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops_log \
-  (this_target_flag_state->x_align_loops_log)
-#define align_loops_max_skip \
-  (this_target_flag_state->x_align_loops_max_skip)
-#define align_jumps_log \
-  (this_target_flag_state->x_align_jumps_log)
-#define align_jumps_max_skip \
-  (this_target_flag_state->x_align_jumps_max_skip)
-#define align_labels_log \
-  (this_target_flag_state->x_align_labels_log)
-#define align_labels_max_skip \
-  (this_target_flag_state->x_align_labels_max_skip)
-#define align_functions_log \
-  (this_target_flag_state->x_align_functions_log)
+#define align_loops              (this_target_flag_state->x_align_loops)
+#define align_jumps              (this_target_flag_state->x_align_jumps)
+#define align_labels             (this_target_flag_state->x_align_labels)
+#define align_functions          (this_target_flag_state->x_align_functions)
+#define align_loops_log          (align_loops[0].log)
+#define align_jumps_log          (align_jumps[0].log)
+#define align_labels_log         (align_labels[0].log)
+#define align_functions_log      (align_functions[0].log)
+#define align_loops_max_skip     (align_loops[0].maxskip)
+#define align_jumps_max_skip     (align_jumps[0].maxskip)
+#define align_labels_max_skip    (align_labels[0].maxskip)
+#define align_functions_max_skip (align_functions[0].maxskip)
+/* String representaions of the above options are available in
+   const char *str_align_foo. NULL if not set. */
+
 #define flag_excess_precision \
   (this_target_flag_state->x_flag_excess_precision)
 
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	(revision 239860)
+++ gcc/toplev.c	(working copy)
@@ -1177,29 +1177,96 @@ target_supports_section_anchors_p (void)
   return true;
 }
 
-/* Default the align_* variables to 1 if they're still unset, and
-   set up the align_*_log variables.  */
+/* Read a decimal number from string FLAG, up to end of line or comma.
+   Emit error message if number ends with any other character.
+   Return pointer past comma, or NULL if end of line.  */
+static const char *
+read_uint (const char *flag, const char *name, int *np)
+{
+  const char *flag_start = flag;
+  int n = 0;
+  char c;
+
+  while ((c = *flag++) >= '0' && c <= '9')
+    n = n*10 + (c-'0');
+  *np = n & 0x3fffffff; /* avoid accidentally negative numbers */
+  if (c == '\0')
+    return NULL;
+  if (c == ',')
+    return flag;
+
+  error_at (UNKNOWN_LOCATION, "-falign-%s parameter is bad at '%s'",
+            name, flag_start);
+  return NULL;
+}
+
+/* Parse "N[,M][,...]" string FLAG into struct align_flags A.
+   Return pointer past second comma, or NULL if end of line.  */
+static const char *
+read_log_maxskip (const char *flag, const char *name, struct align_flags *a)
+{
+  int n, m;
+  flag = read_uint (flag, name, &a->log);
+  n = a->log;
+  if (n != 0)
+    a->log = floor_log2 (n * 2 - 1);
+  if (!flag)
+    {
+      a->maxskip = n ? n - 1 : 0;
+      return flag;
+    }
+  flag = read_uint (flag, name, &a->maxskip);
+  m = a->maxskip;
+  if (m > n) m = n;
+  if (m > 0) m--; /* -falign-foo=N,M means M-1 max bytes of padding, not M */
+  a->maxskip = m;
+  return flag;
+}
+
+/* Parse "N[,M[,N2[,M2]]]" string FLAG into a pair of struct align_flags.  */
 static void
+parse_N_M (const char *flag, const char *name, struct align_flags a[2])
+{
+  if (flag)
+    {
+      flag = read_log_maxskip (flag, name, &a[0]);
+      if (flag)
+	flag = read_log_maxskip (flag, name, &a[1]);
+#ifdef SUBALIGN_LOG
+      else
+	{
+	  /* N2[,M2] is not specified. This arch has a default for N2.
+	     Before -falign-foo=N,M,N2,M2 was introduced, x86 had a tweak.
+	     -falign-functions=N with N > 8 was adding secondary alignment.
+	     -falign-functions=10 was emitting this before every function:
+			.p2align 4,,9
+			.p2align 3
+	     Now this behavior (and more) can be explicitly requested:
+	     -falign-functions=16,10,8
+	     Retain old behavior if N2 is missing: */
+
+	  int align = 1 << a[0].log;
+	  int subalign = 1 << SUBALIGN_LOG;
+
+	  if (a[0].log > SUBALIGN_LOG && a[0].maxskip >= subalign - 1)
+	    {
+	      /* Set N2 unless subalign can never have any effect */
+	      if (align > a[0].maxskip + 1)
+		a[1].log = SUBALIGN_LOG;
+	    }
+	}
+#endif
+    }
+}
+
+/* Process -falign-foo=N[,M[,N2[,M2]]] options.  */
+static void
 init_alignments (void)
 {
-  if (align_loops <= 0)
-    align_loops = 1;
-  if (align_loops_max_skip > align_loops)
-    align_loops_max_skip = align_loops - 1;
-  align_loops_log = floor_log2 (align_loops * 2 - 1);
-  if (align_jumps <= 0)
-    align_jumps = 1;
-  if (align_jumps_max_skip > align_jumps)
-    align_jumps_max_skip = align_jumps - 1;
-  align_jumps_log = floor_log2 (align_jumps * 2 - 1);
-  if (align_labels <= 0)
-    align_labels = 1;
-  align_labels_log = floor_log2 (align_labels * 2 - 1);
-  if (align_labels_max_skip > align_labels)
-    align_labels_max_skip = align_labels - 1;
-  if (align_functions <= 0)
-    align_functions = 1;
-  align_functions_log = floor_log2 (align_functions * 2 - 1);
+  parse_N_M (str_align_loops, "loops", align_loops);
+  parse_N_M (str_align_jumps, "jumps", align_jumps);
+  parse_N_M (str_align_labels, "labels", align_labels);
+  parse_N_M (str_align_functions, "functions", align_functions);
 }
 
 /* Process the options that have been parsed.  */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 239860)
+++ gcc/varasm.c	(working copy)
@@ -1790,8 +1790,10 @@ assemble_start_function (tree decl, const char *fn
       && optimize_function_for_speed_p (cfun))
     {
 #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
-      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
-				 align_functions_log, align_functions - 1);
+      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions[0].log,
+				 align_functions[0].maxskip);
+      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_functions[1].log,
+				 align_functions[1].maxskip);
 #else
       ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log);
 #endif
Index: gcc/testsuite/gcc.target/i386/falign-functions.c
===================================================================
--- gcc/testsuite/gcc.target/i386/falign-functions.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/falign-functions.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -falign-functions=64,8" } */
+/* { dg-final { scan-assembler ".p2align 6,,7" } } */
+
+void
+test_func (void)
+{
+}

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-07-04  0:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18 18:30 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
2017-04-18 18:30 ` [PATCH 2/3] Temporary remove "at least 8 byte alignment" code from x86 Denys Vlasenko
2017-04-18 18:30 ` [PATCH 1/3] Remove support for obsolete x86 -malign-foo options Denys Vlasenko
2017-05-06  7:22   ` Uros Bizjak
2017-05-11 12:24     ` Denys Vlasenko
2018-02-12 10:07     ` Martin Liška
2017-04-18 18:46 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
2017-04-18 19:12   ` Sandra Loosemore
2017-05-05 14:40 ` [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 8 Denys Vlasenko
  -- strict thread matches above, loose matches on Subject: below --
2018-05-25 11:04 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 9 marxin
2018-05-25 11:04 ` [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]] marxin
2018-06-29 19:05   ` Jeff Law
2018-07-03  8:53     ` Martin Liška
2018-07-03  9:55       ` Segher Boessenkool
2018-07-03 10:16         ` Martin Liška
2018-07-03 10:58           ` Segher Boessenkool
2018-07-03 12:51             ` Martin Liška
2018-07-03 13:23               ` Segher Boessenkool
2018-07-03 19:12       ` Martin Liška
2018-07-04  0:20         ` Jeff Law
2017-04-17 15:57 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 7 Denys Vlasenko
2017-04-17 16:20 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
2017-04-17 20:02   ` Sandra Loosemore
2017-04-18 18:30     ` Denys Vlasenko
2016-10-12 20:53 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 4 Denys Vlasenko
2016-10-12 20:53 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
2016-09-30 17:55 [PATCH 0/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] version 4 Denys Vlasenko
2016-09-30 17:58 ` [PATCH 3/3] Extend -falign-FOO=N to N[,M[,N2[,M2]]] Denys Vlasenko
2016-10-06  9:47   ` Bernd Schmidt
2016-10-06 18:43     ` Denys Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).