[RFC PATCH] .bundle_align

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [RFC PATCH] .bundle_align_mode
@ 2012-02-15  2:50 Roland McGrath
  2012-02-15 10:20 ` nick clifton
  2012-02-21 20:05 ` DJ Delorie
  0 siblings, 2 replies; 10+ messages in thread
From: Roland McGrath @ 2012-02-15  2:50 UTC (permalink / raw)
  To: binutils; +Cc: David Sehr

This patch is a bit of a straw-man to get some advice on how best to
implement the feature.  Rest assured a real patch will have log
entries, cleaner code, more comments, testsuite coverage, etc.  So
don't bother yet with telling me to do those obvious things.

The as.texinfo patch should explain what functionality I'm trying to
achieve.  (There's an additional variant of the feature I'd like to
figure out how to implement, but I won't go into that before I get
feedback on what I've done so far.)  I'm entirely open to changing
the names and detailed semantics of the directives and so forth to
comport with the consensus among gas maintainers, of course
constrained to details that actually address the needs I'm after.

I'm looking for feedback from folks who grok the gas internals well,
both on the general implementation approach and specifically on the
appropriate interface for backend hooks (md_max_size in this patch).
(After that I'll of course also want the backend maintainers' advice
on the optimal implementation of the hooks in each backend I'm
tackling, which in the near term will be only x86 and ARM/Thumb.)

This is to meet the requirements of particular targets (various
Native Client ABIs, of which there will be more in the future), but
I think it's more maintainable and less work to implement it as a
generic (optional) feature.  Though it makes the feature always
available for any target, what I've done here has no effect
whatsoever when you don't use the .bundle_align_mode directive to
switch it on.

I'm aware that the basic approach is not good for assembler
performance, essentially creating a frag or two for every
instruction.  I'm not really concerned about that problem.
(Assembly-time performance is just not a practical bottleneck.)  But
if you can suggest different approaches that are not much harder to
implement but scale better, of course I'd be glad to see those.

I'm looking for a plan that is not desperately difficult or invasive
to implement, minimizes the new machine-dependent code required, and
that I can understand.  (I don't grok gas internals especially well
despite 20+ years of occasionally fiddling with them, but I'd be
happy to be taught.)  If there is a more clever method that involves
even less machine-dependent code than this, by being closely tied in
with relaxation or something, then I'd be very glad to find it.

The approach I've taken here is pessimistic in relaxation cases.
For example, if a jump instruction would appear two bytes before an
instruction bundle alignment boundary, then it will always be bumped
to that boundary on the assumption that it may need more than two
bytes to encode.  If relaxation in fact delivers a two-byte
encoding, then this was unnecessary but we can't undo it.  I think
this suboptimality is an acceptable trade-off for making the
implementation tractable.  But I'd be thrilled to find a tractable
plan that is more optimal in even some subset of cases.


Thanks,
Roland


diff --git a/gas/config/tc-arm.c b/gas/config/tc-arm.c
index 02a63a6..7634d74 100644
--- a/gas/config/tc-arm.c
+++ b/gas/config/tc-arm.c
@@ -1,7 +1,5 @@
 /* tc-arm.c -- Assemble for the ARM
-   Copyright 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
-   2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
-   Free Software Foundation, Inc.
+   Copyright 1994-2012 Free Software Foundation, Inc.
    Contributed by Richard Earnshaw (rwe@pegasus.esprit.ec.org)
 	Modified by David Taylor (dtaylor@armltd.co.uk)
 	Cirrus coprocessor mods by Aldy Hernandez (aldyh@redhat.com)
@@ -18998,6 +18996,12 @@ md_chars_to_number (char * buf, int n)
 
 /* MD interface: Sections.  */
 
+int
+arm_max_size (fragS *fragp ATTRIBUTE_UNUSED)
+{
+  return 4;
+}
+
 /* Estimate the size of a frag before relaxing.  Assume everything fits in
    2 bytes.  */
 
diff --git a/gas/config/tc-arm.h b/gas/config/tc-arm.h
index 2916ae1..a7ebb40 100644
--- a/gas/config/tc-arm.h
+++ b/gas/config/tc-arm.h
@@ -1,6 +1,5 @@
 /* This file is tc-arm.h
-   Copyright 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
-   2004, 2005, 2006, 2007, 2008, 2009  Free Software Foundation, Inc.
+   Copyright 1994-2012 Free Software Foundation, Inc.
    Contributed by Richard Earnshaw (rwe@pegasus.esprit.ec.org)
 	Modified by David Taylor (dtaylor@armltd.co.uk)
 
@@ -81,6 +80,9 @@ struct fix;
 
 #define TC_FORCE_RELOCATION(FIX) arm_force_relocation (FIX)
 
+#define md_max_size(fragp) arm_max_size(fragp)
+extern int arm_max_size (struct frag *);
+
 #define md_relax_frag(segment, fragp, stretch) \
   arm_relax_frag (segment, fragp, stretch)
 extern int arm_relax_frag (asection *, struct frag *, long);
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 97cb68e..3bbb867 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1,8 +1,5 @@
 /* tc-i386.c -- Assemble code for the Intel 80386
-   Copyright 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-   2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
-   2012
-   Free Software Foundation, Inc.
+   Copyright 1989,1991-2012 Free Software Foundation, Inc.
 
    This file is part of GAS, the GNU Assembler.
 
@@ -7715,6 +7712,15 @@ i386_att_operand (char *operand_string)
   return 1;			/* Normal return.  */
 }
 \f
+/* Called in between md_assemble calls, to calculate the maximum size an
+   rs_machine_dependent frag will have after relaxation.  */
+int
+i386_max_size (fragS *frag)
+{
+  /* The only relaxable frags are for jumps, and these add at most 5 bytes.  */
+  return frag->fr_fix + 5;
+}
+
 /* md_estimate_size_before_relax()
 
    Called just before relax() for rs_machine_dependent frags.  The x86
diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h
index 688c69a..846a5cc 100644
--- a/gas/config/tc-i386.h
+++ b/gas/config/tc-i386.h
@@ -1,7 +1,5 @@
 /* tc-i386.h -- Header file for tc-i386.c
-   Copyright 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-   2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
-   Free Software Foundation, Inc.
+   Copyright 1989,1992-2012 Free Software Foundation, Inc.
 
    This file is part of GAS, the GNU Assembler.
 
@@ -215,6 +213,9 @@ if (fragP->fr_type == rs_align_code) 					\
 void i386_print_statistics (FILE *);
 #define tc_print_statistics i386_print_statistics
 
+extern int i386_max_size (fragS *);
+#define md_max_size i386_max_size
+
 #define md_number_to_chars number_to_chars_littleendian
 
 enum processor_type
diff --git a/gas/doc/as.texinfo b/gas/doc/as.texinfo
index 7a41f02..fd8f360 100644
--- a/gas/doc/as.texinfo
+++ b/gas/doc/as.texinfo
@@ -1,7 +1,5 @@
 \input texinfo @c                               -*-Texinfo-*-
-@c  Copyright 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
-@c  2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
-@c  Free Software Foundation, Inc.
+@c Copyright (C) 1991-2012 Free Software Foundation, Inc.
 @c UPDATE!!  On future updates--
 @c   (1)   check for new machine-dep cmdline options in
 @c         md_parse_option definitions in config/tc-*.c
@@ -3970,6 +3968,7 @@ Some machine configurations provide additional directives.
 * Ascii::                       @code{.ascii "@var{string}"}@dots{}
 * Asciz::                       @code{.asciz "@var{string}"}@dots{}
 * Balign::                      @code{.balign @var{abs-expr} , @var{abs-expr}}
+* Bundle directives::           @code{.bundle_align_mode @var{abs-expr}}, @code{.bundle_lock}, @code{.bundle_unlock}
 * Byte::                        @code{.byte @var{expressions}}
 * CFI directives::		@code{.cfi_startproc [simple]}, @code{.cfi_endproc}, etc.
 * Comm::                        @code{.comm @var{symbol} , @var{length} }
@@ -4292,6 +4291,49 @@ filled in with the value 0x368d (the exact placement of the bytes depends upon
 the endianness of the processor).  If it skips 1 or 3 bytes, the fill value is
 undefined.
 
+@node Bundle directives
+@section @code{.bundle_align_mode @var{abs-expr}}
+@cindex @code{bundle_align_mode} directive
+@cindex bundle
+@cindex instruction bundle
+@cindex aligned instruction bundle
+@code{.bundle_align_mode} enables or disables @defn{aligned instruction bundle}
+mode.  In this mode, sequences of adjacent instructions are grouped into
+fixed-sized @defn{bundles}.  If the argument is zero, this mode is disabled
+(which is the default state).  If the argument it not zero, it gives the size
+of an instruction bundle as a power of two (as for the @code{.p2align}
+directive, @pxref{P2align}).
+
+A @defn{bundle} is simply a sequence of instructions that starts on an aligned
+boundary.  For example, if @var{abs-expr} is @code{5} then the bundle size is
+32, so each aligned chunk of 32 bytes is a bundle.  When aligned instruction
+bundle mode is in effect, no single instruction may span a boundary between
+bundles.  If an instruction would start too close to the end of a bundle for
+the length of that particular instruction to fit within the bundle, then the
+space at the end of that bundle is filled with no-op instructions so the
+instruction starts in the next bundle.  As a corollary, it's an error if any
+single instruction's encoding is longer than the bundle size.
+
+@section @code{.bundle_lock} and @code{.bundle_unlock}
+@cindex @code{bundle_lock} directive
+@cindex @code{bundle_unlock} directive
+The @code{.bundle_lock} and directive @code{.bundle_unlock} directives allow
+explicit control over instruction bundle padding.  These directives are only
+valid when @code{.bundle_align_mode} has been used to enable aligned
+instruction bundle mode.  It's an error if they appear when
+@code{.bundle_align_mode} has not been used at all, or when the last directive
+was @w{@code{.bundle_align_mode 0}}.
+
+@cindex bundle-locked
+A pair of @code{.bundle_lock} and @code{.bundle_unlock} directives define a
+@defn{bundle-locked} instruction sequence.  For purposes of aligned instruction
+bundle mode, a sequence starting with @code{.bundle_lock} and ending with
+@code{.bundle_unlock} is treated as a single instruction.  That is, the entire
+sequence must fit into a single bundle and may not span a bundle boundary.  If
+necessary, no-op instructions will be inserted before the first instruction of
+the sequence so that the whole sequence starts on an aligned bundle boundary.
+It's an error if the sequence is longer than the bundle size.
+
 @node Byte
 @section @code{.byte @var{expressions}}
 
diff --git a/gas/read.c b/gas/read.c
index 445caa1..df7702c 100644
--- a/gas/read.c
+++ b/gas/read.c
@@ -1,7 +1,5 @@
 /* read.c - read a source file -
-   Copyright 1986, 1987, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
-   1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
-   2010, 2011  Free Software Foundation, Inc.
+   Copyright 1986,1987,1990-2012 Free Software Foundation, Inc.
 
    This file is part of GAS, the GNU Assembler.
 
@@ -209,6 +207,19 @@ static int dwarf_file_string;
 #endif
 #endif
 
+#ifdef md_max_size
+# define HANDLE_BUNDLE
+#endif
+
+#ifdef HANDLE_BUNDLE
+/* .bundle_align_mode sets this.
+   XXX say more */
+static unsigned int bundle_align_p2;
+
+static frchainS *bundle_lock_frchain;
+static fragS *bundle_lock_frag;
+#endif
+
 static void do_s_func (int end_p, const char *default_prefix);
 static void do_align (int, char *, int, int);
 static void s_align (int, int);
@@ -277,6 +288,11 @@ static const pseudo_typeS potable[] = {
   {"balignw", s_align_bytes, -2},
   {"balignl", s_align_bytes, -4},
 /* block  */
+#ifdef HANDLE_BUNDLE
+  {"bundle_align_mode", s_bundle_align_mode, 0},
+  {"bundle_lock", s_bundle_lock, 0},
+  {"bundle_unlock", s_bundle_unlock, 0},
+#endif
   {"byte", cons, 1},
   {"comm", s_comm, 0},
   {"common", s_mri_common, 0},
@@ -583,6 +599,99 @@ try_macro (char term, const char *line)
   return 0;
 }
 
+#ifdef HANDLE_BUNDLE
+static unsigned int
+pending_bundle_size (fragS *frag)
+{
+  unsigned int size = 0;
+
+  while (frag != frag_now)
+    {
+      /* This should only happen in what will later become an error case.  */
+      if (frag == NULL)
+        return 0;
+
+      if (frag->fr_type == rs_machine_dependent)
+        size += md_max_size (frag);
+      else
+        size += frag->fr_fix;
+
+      frag = frag->fr_next;
+    }
+
+  size += frag_now_fix ();
+  return size;
+}
+
+static unsigned int
+close_bundle_frag (fragS *frag)
+{
+  unsigned int bundle_size = pending_bundle_size (frag);
+
+  if (bundle_size <= (1U << bundle_align_p2))
+    {
+      if (bundle_size > 1)
+        {
+          frag->fr_offset = bundle_align_p2;
+          frag->fr_subtype = bundle_size - 1;
+        }
+      return 0;
+    }
+
+  return bundle_size;
+}
+
+/* Assemble one instruction.  */
+static void
+assemble_one (char *line)
+{
+  fragS *insn_start_frag = frag_now;
+
+  if (bundle_lock_frchain != NULL && bundle_lock_frchain != frchain_now)
+    {
+      as_bad (_("cannot change section or subsection inside .bundle_lock"));
+      /* Clearing this serves as a marker that we have already complained.  */
+      bundle_lock_frchain = NULL;
+    }
+
+  if (bundle_lock_frag == NULL && bundle_align_p2 > 0)
+    frag_align_code (0, 0);
+
+  md_assemble (line);
+
+  if (bundle_lock_frchain != NULL)
+    {
+      /* Make sure this hasn't pushed the locked sequence
+         past the bundle size.  */
+      unsigned int bundle_size = pending_bundle_size (bundle_lock_frag);
+      if (bundle_size > (1U << bundle_align_p2))
+        as_bad (_("\
+.bundle_lock sequence at %u bytes but .bundle_align_mode limit is %u bytes"),
+                bundle_size, 1U << bundle_align_p2);
+    }
+  else if (bundle_align_p2 > 0)
+    {
+      addressT insn_size = pending_bundle_size (insn_start_frag);
+
+      if (insn_size > (1U << bundle_align_p2))
+        as_bad (_("\
+single instruction is %u bytes long but .bundle_align_mode limit is %u"),
+                (unsigned int) insn_size, 1U << bundle_align_p2);
+
+      if (insn_size > 1)
+        {
+          insn_start_frag->fr_offset = bundle_align_p2;
+          insn_start_frag->fr_subtype = insn_size - 1;
+        }
+    }
+}
+
+#else
+
+# define assemble_one(line) md_assemble(line)
+
+#endif
+
 /* We read the file, putting things into a web that represents what we
    have been reading.  */
 void
@@ -947,7 +1056,7 @@ read_a_source_file (char *name)
 			    }
 			}
 
-		      md_assemble (s);	/* Assemble 1 instruction.  */
+		      assemble_one (s); /* Assemble 1 instruction.  */
 
 		      *input_line_pointer++ = c;
 
@@ -1128,6 +1237,16 @@ read_a_source_file (char *name)
  quit:
   symbol_set_value_now (&dot_symbol);
 
+#ifdef HANDLE_BUNDLE
+  if (bundle_lock_frag != NULL)
+    {
+      as_bad_where (bundle_lock_frag->fr_file, bundle_lock_frag->fr_line,
+                    _(".bundle_lock with no matching .bundle_unlock"));
+      bundle_lock_frag = NULL;
+      bundle_lock_frchain = NULL;
+    }
+#endif
+
 #ifdef md_cleanup
   md_cleanup ();
 #endif
@@ -5867,6 +5986,78 @@ do_s_func (int end_p, const char *default_prefix)
   demand_empty_rest_of_line ();
 }
 \f
+#ifdef HANDLE_BUNDLE
+
+void
+s_bundle_align_mode (int arg ATTRIBUTE_UNUSED)
+{
+  unsigned int align = get_absolute_expression ();
+  SKIP_WHITESPACE ();
+  demand_empty_rest_of_line ();
+
+  if (align > (unsigned int) TC_ALIGN_LIMIT)
+    as_fatal (_("alignment too large"));
+
+  if (bundle_lock_frag != NULL)
+    {
+      as_bad (_("cannot change .bundle_align_mode inside .bundle_lock"));
+      return;
+    }
+
+  bundle_align_p2 = align;
+
+  if (align != 0)
+    record_alignment (now_seg, align - OCTETS_PER_BYTE_POWER);
+  /* XXX what about section changes? */
+}
+
+void
+s_bundle_lock (int arg ATTRIBUTE_UNUSED)
+{
+  demand_empty_rest_of_line ();
+
+  if (bundle_align_p2 == 0)
+    {
+      as_warn (_(".bundle_lock is meaningless without .bundle_align_mode"));
+      return;
+    }
+
+  if (bundle_lock_frag != NULL)
+    {
+      as_bad (_("second .bundle_lock without .bundle_unlock"));
+      return;
+    }
+
+  bundle_lock_frchain = frchain_now;
+  bundle_lock_frag = frag_now;
+  frag_align_code (0, 0);
+}
+
+void
+s_bundle_unlock (int arg ATTRIBUTE_UNUSED)
+{
+  unsigned int size;
+
+  demand_empty_rest_of_line ();
+
+  if (bundle_lock_frag == NULL)
+    {
+      as_bad (_(".bundle_unlock without preceding .bundle_lock"));
+      return;
+    }
+
+  gas_assert (bundle_align_p2 > 0);
+
+  size = close_bundle_frag (bundle_lock_frag);
+  bundle_lock_frag = NULL;
+  bundle_lock_frchain = NULL;
+  if (size > 0)
+    as_bad (_(".bundle_lock sequence is %u bytes, but bundle size only %u"),
+            size, 1 << bundle_align_p2);
+}
+
+#endif
+\f
 void
 s_ignore (int arg ATTRIBUTE_UNUSED)
 {
diff --git a/gas/read.h b/gas/read.h
index 82ccc75..ba43fa0 100644
--- a/gas/read.h
+++ b/gas/read.h
@@ -1,7 +1,5 @@
 /* read.h - of read.c
-   Copyright 1986, 1990, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-   2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009
-   Free Software Foundation, Inc.
+   Copyright 1986,1990,1992-2012 Free Software Foundation, Inc.
 
    This file is part of GAS, the GNU Assembler.
 
@@ -143,6 +141,9 @@ extern symbolS *s_lcomm_internal (int, symbolS *, addressT);
 extern void s_app_file_string (char *, int);
 extern void s_app_file (int);
 extern void s_app_line (int);
+extern void s_bundle_align_mode (int);
+extern void s_bundle_lock (int);
+extern void s_bundle_unlock (int);
 extern void s_comm (int);
 extern void s_data (int);
 extern void s_desc (int);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-15  2:50 [RFC PATCH] .bundle_align_mode Roland McGrath
@ 2012-02-15 10:20 ` nick clifton
  2012-02-15 20:04   ` Roland McGrath
  2012-02-21 20:05 ` DJ Delorie
  1 sibling, 1 reply; 10+ messages in thread
From: nick clifton @ 2012-02-15 10:20 UTC (permalink / raw)
  To: Roland McGrath; +Cc: binutils, David Sehr

Hi Roland,

   A few thoughts on this feature:

* It might be useful to mention in the documentation why this feature is 
needed.

>  When aligned instruction
> +bundle mode is in effect, no single instruction may span a boundary between
> +bundles.  If an instruction would start too close to the end of a bundle for
> +the length of that particular instruction to fit within the bundle, then the
> +space at the end of that bundle is filled with no-op instructions so the
> +instruction starts in the next bundle.

* What about instructions with delay slots ?

* Would it be considered an error, or just unusual, if the 
.bundle_align_mode directive was used inside a non-executable section ?

> +@section @code{.bundle_lock} and @code{.bundle_unlock}

* Can these directives be nested ?

* Presumably once a bundle has been created by the assembler, it needs 
to be preserved by the linker.  Thus it seems to me that you are going 
to need a reloc or two to tell the linker about the bundle.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-15 10:20 ` nick clifton
@ 2012-02-15 20:04   ` Roland McGrath
  2012-02-16 11:03     ` nick clifton
  0 siblings, 1 reply; 10+ messages in thread
From: Roland McGrath @ 2012-02-15 20:04 UTC (permalink / raw)
  To: nick clifton; +Cc: binutils, David Sehr

On Wed, Feb 15, 2012 at 2:18 AM, nick clifton <nickc@redhat.com> wrote:
> * It might be useful to mention in the documentation why this feature is
> needed.

Oh, in the documentation?  I was all set to explain it to you here.  But
for the documentation, I think it's appropriate to say not a whole lot more
than exactly how the assembler behaves.  (The GAS manual is not the place
to describe the details of target ABIs.)  The short version is, "For some
targets, it's an ABI requirement that no instruction may span a certain
aligned boundary."  Is that what you had in mind?

To explain it here, indeed the short version is that it's a target ABI
requirement that no instruction may span a bundle boundary.

The background is that the system disassembles the code at load time and
validates it against a set of ABI constraints.  The constraints include
that only a certain subset of instructions is valid.  On machines with
variable-sized instructions, this is nontrivial in the face of computed
jumps.  To ensure that only recognized instruction boundaries can be jump
targets, the system enforces that a computed jump must target an aligned
address.  Hence the rule that instructions may not span a boundary (so you
can never jump into the middle of what the validator thought was an
instruction).

The system also enforces that certain instructions may appear only as part
of a constrained multi-instruction sequence.  For example, to enforce the
aforementioned constraint on jumps, a computed jump on x86 must look like:
	and $-32, %reg
	jmp *%reg
That constraint would be defeated if you could jump to the second
instruction of the sequence.  Hence the bundle-lock feature is required to
ensure that such multi-instruction sequences stick together and don't span
bundle boundaries.  (I also intend to use it for prescribed cases such as
the TLS sequences, where the linker expects to see exact known instruction
sequences and would be thrown off by implicit nops inserted in the middle.)

> * What about instructions with delay slots ?

That's a good question.  I hadn't thought about it, since so far I've only
considered CPUs that don't have delay slots.  The only machine with delay
slots that I know anything about is sparc, so if my logic doesn't hold for
some other machine you know about, you'll have to educate me.

On machines like sparc, where all instructions are the same size, I don't
think there is any problem.  If an instruction with a delay slot is the
last instruction in a bundle, then there won't be any padding and so the
delay slot will be the first instruction of the next bundle.  If instead
it's not the last instruction in the bundle, then there is always room for
the delay slot inside the bundle.  In no case would a nop be inserted
between an instruction and its delay slot.  It doesn't actually matter to
the ABI constraints whether an instruction and its delay slot fall in the
same bundle or not.  (That is, except if it's a particular case where the
ABI says the latter instruction can only be used as part of a prescribed
multi-instruction sequence, in which case you'd use .bundle_lock for the
instruction and its delay slot just like any other multi-instruction
sequence.)

If there is a machine that has delay slots and variable-sized instructions,
then it becomes an issue.  It wouldn't matter to the ABI constraints
whether an instruction in a delay slot got pushed into the next bundle.
But it would matter to correctness, since a nop in the delay slot wouldn't
do the right thing if the first instruction were a branch.  If there is
such a machine, then it might make sense to automagically bundle-lock a
delay-slot-using instruction with its following instruction, so the code
wouldn't have to use .bundle_lock explicitly.  But I don't feel the need to
worry about that before we actually define such an ABI for such a machine.

> * Would it be considered an error, or just unusual, if the
> .bundle_align_mode directive was used inside a non-executable section ?

I think I'd call it just unusual.  It doesn't seem worth trying to make it
an error.  For the purpose of the target ABIs in question, it only matters
for instructions, not miscellaneous data.  But one could imagine using
assembled instructions in a non-executable section to produce a code blob
that will be copied somewhere to be made executable later.  In that case,
you could well want the same features for your instructions in a
non-executable section.

>> +@section @code{.bundle_lock} and @code{.bundle_unlock}
>
> * Can these directives be nested ?

As I've specified it, no.  I can't really see why it would be useful.  The
inner directive wouldn't mean anything, since the whole sequence inside the
outermost pair has to go into a single bundle.  I guess I could imagine
writing some macros where it becomes relevant:

	.macro foo
	  .bundle_lock
	    insn 1
	    insn 2
	  .bundle_unlock
	.endm
	.macro bar
	  .bundle_lock
	  foo
	  insn 3
	  .bundle_unlock
	.endm

But for the actual ABIs in question I can't actually imagine a case like
that being useful.  If it were, it would be trivial to support, since the
inner ones have no effect but to increment/decrement a nesting count so as
to notice when you've hit the outermost .bundle_unlock.  I can add that if
the need ever arises.  For now, I've added a couple of sentences to the
documentation to make it explicit that nesting is not allowed.

> * Presumably once a bundle has been created by the assembler, it needs to be
> preserved by the linker.  Thus it seems to me that you are going to need a
> reloc or two to tell the linker about the bundle.

I don't think there's any need for this, but perhaps you have something in
mind that I don't know about.  Even if the linker does some instruction
rewriting, I wouldn't expect it to move instructions around.  It always
just rewrites a sequence to one of the same length, or a shorter one and
then pads it with nops, doesn't it?  If it moved things around, then there
would need to be relocs for every local jump (not to mention more obscure
cases), which there aren't.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-15 20:04   ` Roland McGrath
@ 2012-02-16 11:03     ` nick clifton
  2012-02-16 15:13       ` John Reiser
  2012-02-21 19:29       ` Roland McGrath
  0 siblings, 2 replies; 10+ messages in thread
From: nick clifton @ 2012-02-16 11:03 UTC (permalink / raw)
  To: Roland McGrath; +Cc: binutils, David Sehr

Hi Roland,

>> * It might be useful to mention in the documentation why this feature is
>> needed.
>
> Oh, in the documentation?  I was all set to explain it to you here.

Actually both is good.

> The short version is, "For some
> targets, it's an ABI requirement that no instruction may span a certain
> aligned boundary."  Is that what you had in mind?

For the documentation, yes, thanks.

>> * What about instructions with delay slots ?
>
> That's a good question.  I hadn't thought about it, since so far I've only
> considered CPUs that don't have delay slots.  The only machine with delay
> slots that I know anything about is sparc, so if my logic doesn't hold for
> some other machine you know about, you'll have to educate me.
>
> On machines like sparc, where all instructions are the same size, I don't
> think there is any problem.

Good point - I do not know of any architectures with both variable sized 
insns and delay slots.  Although... what about anulling instructions ? 
The RL78 for example has variable length instructions, one of which is 
the SKC instruction (skip following if carry set).  Note: I have no 
problems with you specifying that .bundle_align_mode will not work with 
such architectures.  I just wanted to raise the possibility of awkward 
targets.

>>> +@section @code{.bundle_lock} and @code{.bundle_unlock}
>>
>> * Can these directives be nested ?
>
> As I've specified it, no.
[snip]
> For now, I've added a couple of sentences to the
> documentation to make it explicit that nesting is not allowed.

Fair enough.

>> * Presumably once a bundle has been created by the assembler, it needs to be
>> preserved by the linker.  Thus it seems to me that you are going to need a
>> reloc or two to tell the linker about the bundle.
>
> I don't think there's any need for this, but perhaps you have something in
> mind that I don't know about.

I was thinking specifically of linker relaxation where the linker can 
replace longer instruction sequences with shorter ones.   Targets that 
support this feature already have special relocs to handle normal 
alignment requirements, so I expect that it would be fairly easy to add 
another one to cope with bundle lengths.  Or just refuse to support 
bundle_align_mode and linker relaxation in the same target.

I was also considering the case where a linker script (or user) 
specifies an alignment for a section that is less than a bundle's 
alignment.  Or when the user links with the --nmagic command line 
option.  Both of these can break bundle alignment.   Hmm, but they break 
section alignment requirements in general and the linker does not 
complain about that.  So I guess that it is caveat-user.  If you request 
a reduced alignment then you should know what you are doing.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-16 11:03     ` nick clifton
@ 2012-02-16 15:13       ` John Reiser
  2012-02-21 19:29       ` Roland McGrath
  1 sibling, 0 replies; 10+ messages in thread
From: John Reiser @ 2012-02-16 15:13 UTC (permalink / raw)
  To: binutils

On 02/16/2012 03:01 AM, nick clifton wrote:

> I was also considering the case where a linker script (or user) specifies an alignment for a section that is less than a bundle's alignment.  Or when the user links with the --nmagic command line option.  Both of these can break bundle alignment.   Hmm, but they break section alignment requirements
> in general and the linker does not complain about that.  So I guess that it is caveat-user.  If you request a reduced alignment then you should know what you are doing.

The linker ought to check and ought to warn if the specified output alignment
is less than the specified input alignment, even if by accident the input
alignment happens to be met by the current instantiated output.  The assembler
already maximizes the operands of all .{b,p2,}align in order to compute the
ElfXX_Shdr.sh_addralign. If the linker does not check (or similarly maximize)
when aggregating Sections, then that's a violation of the implicit contract.
Such a violation can have mysterious effects and be hard to find.  Detecting
such a mismatch is exactly the kind of inexpensive-but-powerful check
that robust software should perform.

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-16 11:03     ` nick clifton
  2012-02-16 15:13       ` John Reiser
@ 2012-02-21 19:29       ` Roland McGrath
  2012-02-22 16:52         ` nick clifton
  1 sibling, 1 reply; 10+ messages in thread
From: Roland McGrath @ 2012-02-21 19:29 UTC (permalink / raw)
  To: nick clifton; +Cc: binutils, David Sehr

Thanks for the feedback and sorry for the delay.
(I was out sick and then it was a long weekend in the US.)

On Thu, Feb 16, 2012 at 3:01 AM, nick clifton <nickc@redhat.com> wrote:
>> The short version is, "For some
>> targets, it's an ABI requirement that no instruction may span a certain
>> aligned boundary."  Is that what you had in mind?
>
> For the documentation, yes, thanks.

Done.

> Good point - I do not know of any architectures with both variable sized
> insns and delay slots.  Although... what about anulling instructions ? The
> RL78 for example has variable length instructions, one of which is the SKC
> instruction (skip following if carry set).  Note: I have no problems with
> you specifying that .bundle_align_mode will not work with such
> architectures.  I just wanted to raise the possibility of awkward targets.

Thanks for pointing it out.  It might be sufficient just to expect people
to use .bundle_lock explicitly around such sequences.  Or perhaps for such
targets it will make sense to have special cases for certain instructions.
I think for now I'm happy to conclude that adding the bundle support for
some CPUs may require more implementation work than I'll do in the initial
set of ports.

> I was thinking specifically of linker relaxation where the linker can
> replace longer instruction sequences with shorter ones.  Targets that
> support this feature already have special relocs to handle normal alignment
> requirements, so I expect that it would be fairly easy to add another one to
> cope with bundle lengths.  Or just refuse to support bundle_align_mode and
> linker relaxation in the same target.

Ah, I'm not familiar with such targets, unless there is a reloc like that
for ARM that I don't know about.  So again I think I'm satisfied to leave
that concern for whenever the time comes to do a port to such a target.

> [...]  If you request a reduced alignment then you should know what you
> are doing.

Indeed.

Thanks very much for the feedback so far, Nick.  But I still haven't heard
anything in the areas in which I was hoping to receive some wisdom.  That
is, the general implementation plan with the alignment frags (is there a
better way?), the hook interface for the CPU-specific code (is md_max_size
a good name and/or signature?), and optimal implementations of that hook
for x86 and ARM.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-15  2:50 [RFC PATCH] .bundle_align_mode Roland McGrath
  2012-02-15 10:20 ` nick clifton
@ 2012-02-21 20:05 ` DJ Delorie
  1 sibling, 0 replies; 10+ messages in thread
From: DJ Delorie @ 2012-02-21 20:05 UTC (permalink / raw)
  To: Roland McGrath; +Cc: binutils

On the subject of alignment vs relaxing, look at the rx-elf linker.  I
had to add TWO relocs for each alignment gap, one for the "before the
gap" and one for the "after the gap" marks, and deal with them specially
in the linker during relaxation.  That way, you know when relaxation has
opened the gap enough to redo the alignment.

Also, for rx and rl78, any relaxable insn gets is own frag all to itself
anyway.  I haven't noticed any performance issues.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-21 19:29       ` Roland McGrath
@ 2012-02-22 16:52         ` nick clifton
  2012-02-22 19:21           ` Roland McGrath
  0 siblings, 1 reply; 10+ messages in thread
From: nick clifton @ 2012-02-22 16:52 UTC (permalink / raw)
  To: Roland McGrath; +Cc: binutils, David Sehr

Hi Roland,

>    But I still haven't heard
> anything in the areas in which I was hoping to receive some wisdom.  That
> is, the general implementation plan with the alignment frags (is there a
> better way?),

No.  Well yes - a rewrite of gas, but that is not practical right now. 
So for now, frags are the way.

> the hook interface for the CPU-specific code (is md_max_size
> a good name and/or signature?),

Personally I think that the name is a little too vague.  When I see 
"md_max_size" I think "max size of what ?".  Why not make it a little 
bit more feature specific, eg "md_max_bundle_length".

> and optimal implementations of that hook
> for x86 and ARM.

I have no problems with that.

Cheers
   Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-22 16:52         ` nick clifton
@ 2012-02-22 19:21           ` Roland McGrath
  2012-02-22 19:57             ` John Reiser
  0 siblings, 1 reply; 10+ messages in thread
From: Roland McGrath @ 2012-02-22 19:21 UTC (permalink / raw)
  To: nick clifton; +Cc: binutils, David Sehr

On Wed, Feb 22, 2012 at 8:50 AM, nick clifton <nickc@redhat.com> wrote:
> No.  Well yes - a rewrite of gas, but that is not practical right now. So
> for now, frags are the way.

:-)  I also meant if there is any feasible method for avoiding the
pessimistic relaxation issue.  That is, that I wind up over-padding
unnecessarily when a relaxable instruction is close enough to a boundary
that the longest encoding doesn't fit, but relaxation winds up using a
shorter encoding that would have fit.  I take it none comes to mind.

> Personally I think that the name is a little too vague.  When I see
> "md_max_size" I think "max size of what ?".  Why not make it a little bit
> more feature specific, eg "md_max_bundle_length".

Its concrete meaning is the maximum size a given rs_machine_dependent frag
might have after relaxation.  So how about "md_frag_max_size"?

>> and optimal implementations of that hook for x86 and ARM.
>
> I have no problems with that.

I take it you mean you don't mind my suboptimal implementations.  I was
hoping for advice on how best to make them actually optimal.  (For ARM it
might well be optimal as it is.  Is there a case of a relaxable Thumb
instruction that is known a priori never to become a 4-byte instruction?)

The x86 one is definitely suboptimal, because for an unconditional jump it
says it could grow to 6 bytes, but it can in fact only ever grow to 5 bytes
(conditional jumps can grow to 6 bytes).  Ok, I think I can figure that out
with TYPE_FROM_RELAX_STATE (fr_subtype) == UNCOND_JUMP.  I'll try that and
maybe someone can confirm that it's actually reliable.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] .bundle_align_mode
  2012-02-22 19:21           ` Roland McGrath
@ 2012-02-22 19:57             ` John Reiser
  0 siblings, 0 replies; 10+ messages in thread
From: John Reiser @ 2012-02-22 19:57 UTC (permalink / raw)
  To: binutils

On 02/22/2012 11:21 AM, Roland McGrath wrote:
> On Wed, Feb 22, 2012 at 8:50 AM, nick clifton <nickc@redhat.com> wrote:
>> No.  Well yes - a rewrite of gas, but that is not practical right now. So
>> for now, frags are the way.
> 
> :-)  I also meant if there is any feasible method for avoiding the
> pessimistic relaxation issue.  That is, that I wind up over-padding
> unnecessarily when a relaxable instruction is close enough to a boundary
> that the longest encoding doesn't fit, but relaxation winds up using a
> shorter encoding that would have fit.  I take it none comes to mind.

This problem was solved decades ago.  Search for "span-dependent instruction".
The main article [Thomas G Szymanski, 1978] even appeared in CACM (Communications
of the Association for Computing Machinery), which is about as prominent
and wide-spread as possible.

Although in general it's equivalent to 3-SAT (three satisfiability) and
thus is NP-complete [all the naive algorithms are at least quadratic
for the worst case], in practice it's a non-issue.

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-02-22 19:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-15  2:50 [RFC PATCH] .bundle_align_mode Roland McGrath
2012-02-15 10:20 ` nick clifton
2012-02-15 20:04   ` Roland McGrath
2012-02-16 11:03     ` nick clifton
2012-02-16 15:13       ` John Reiser
2012-02-21 19:29       ` Roland McGrath
2012-02-22 16:52         ` nick clifton
2012-02-22 19:21           ` Roland McGrath
2012-02-22 19:57             ` John Reiser
2012-02-21 20:05 ` DJ Delorie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).