public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: linkonce and exceptions on Solaris
       [not found] <199802182041.VAA00469.cygnus.egcs@mira.isdn.cs.tu-berlin.de>
@ 1998-02-19 10:47 ` Jason Merrill
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Merrill @ 1998-02-19 10:47 UTC (permalink / raw)
  To: Martin von Loewis, egcs, tot

>>>>> Martin von Loewis <martin@mira.isdn.cs.tu-berlin.de> writes:

> When I run a g++ 2.8 compiled image on Solaris, exception handling may
> consume a lot of computing power. In one particular case, I have an
> image which has 9MB of text size. Throwing the first exception will
> take about 10 seconds on an UltraSparc 1.

> So it seems that the bubble sort is the expensive operation
> here. The assumption of O(n) is apparently broken.

Here's a patch to make frame.c use qsort.  I didn't put this in because not
all targets have qsort, and quicksort does badly on already-sorted input.
Anyone want to adapt introsort (see libstdc++/stl/stl_algo.h) for frame.c?

1997-11-25  Teemu Torma  <tot@trema.com>

	* frame.c (fde_compare_sort): New function #if ! DONT_USE_QSORT.
	(fde_compare): Define only #if DONT_USE_QSORT.
	(fde_insert): Insertion sort is now #if DONT_USE_QSORT.
	(add_fdes): Use qsort to sort the array #if ! DONT_USE_QSORT.

Index: frame.c
===================================================================
RCS file: /trema/cvs/gnu/egcs/gcc/frame.c,v
retrieving revision 1.6
diff -u -u -p -r1.6 frame.c
--- frame.c	1997/11/17 09:03:22	1.6
+++ frame.c	1997/11/25 10:17:08
@@ -201,11 +201,19 @@ read_8byte (void *p)
 /* Ordering function for FDEs.  Functions can't overlap, so we just compare
    their starting addresses.  */
 
+#if DONT_USE_QSORT
 static inline saddr
 fde_compare (fde *x, fde *y)
 {
   return (saddr)x->pc_begin - (saddr)y->pc_begin;
 }
+#else
+static int
+fde_compare_sort (fde **x, fde **y)
+{
+  return (saddr)(*x)->pc_begin - (saddr)(*y)->pc_begin;
+}
+#endif
 
 /* Return the address of the FDE after P.  */
 
@@ -220,17 +228,19 @@ next_fde (fde *p)
    O(n) performance.  If this turns out to be overly optimistic, we can have
    the linker sort the FDEs so we don't have to do it at run time.  */
 
-static void
+static inline void
 fde_insert (fde **array, size_t i, fde *this_fde)
 {
   array[i] = this_fde;
 
+#if DONT_USE_QSORT
   for (; i > 0 && fde_compare (array[i], array[i-1]) < 0; --i)
     {
       this_fde = array[i];
       array[i] = array[i-1];
       array[i-1] = this_fde;
     }
+#endif
 }
 
 static size_t
@@ -271,6 +281,10 @@ add_fdes (fde *this_fde, fde **array, si
       if (this_fde->pc_begin + this_fde->pc_range > pc_end)
 	pc_end = this_fde->pc_begin + this_fde->pc_range;
     }
+
+#if ! DONT_USE_QSORT
+  qsort (array, i, sizeof *array, fde_compare_sort);
+#endif
 
   *i_ptr = i;
   *beg_ptr = pc_begin;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: linkonce and exceptions on Solaris
  1998-02-19 10:47 ` Bruno Haible
@ 1998-02-19 12:42   ` Martin von Loewis
  0 siblings, 0 replies; 4+ messages in thread
From: Martin von Loewis @ 1998-02-19 12:42 UTC (permalink / raw)
  To: haible, jason; +Cc: egcs, tot

Jason, Bruno,

Thanks for your patches. I have not tried them, yet, but I will.

Meanwhile, I produced my own patch. It is a two-pass algorithm, and
guesses the border between normal and linkonce entries by choosing
a symbol from libgcc. This gives a reasonable speed-up on the Ultra
I use.

In any case, I think it is important that one of these approaches is
implemented in egcs, and also 2.8.1.

Martin

1998-02-19 Martin von Loewis <loewis@informatik.hu-berlin.de>

	* frame.c (add_fdes, frame_init): Insert FDEs in two passes.

--- frame.c	1998/02/19 14:18:18	1.1
+++ frame.c	1998/02/19 20:29:15
@@ -219,9 +233,13 @@
   return count;
 }
 
+/* Ideally, this should come from crtend */
+extern void __empty();
+static void* last_text_function = __empty;
+
 static void
 add_fdes (fde *this_fde, fde **array, size_t *i_ptr,
-	  void **beg_ptr, void **end_ptr)
+	  void **beg_ptr, void **end_ptr, int skiponce, int *skip_ptr)
 {
   size_t i = *i_ptr;
   void *pc_begin = *beg_ptr;
@@ -233,6 +251,14 @@
       if (this_fde->CIE_delta == 0 || this_fde->pc_begin == 0)
 	continue;
 
+      if(skiponce && this_fde->pc_begin > last_text_function)
+        {
+          *skip_ptr = 1;
+          continue;
+        }
+      else if(!skiponce && this_fde->pc_begin <= last_text_function)
+        continue;
+
       fde_insert (array, i++, this_fde);
 
       if (this_fde->pc_begin < pc_begin)
@@ -254,7 +280,7 @@
 frame_init (struct object* ob)
 {
   fde *this_fde;
-  size_t count;
+  size_t count, skipped;
   fde **array;
   void *pc_begin, *pc_end;
 
@@ -273,15 +299,23 @@
   pc_begin = (void*)(uaddr)-1;
   pc_end = 0;
   count = 0;
+  skipped = 0;
 
   if (ob->fde_array)
     {
       fde **p = ob->fde_array;
       for (; *p; ++p)
-	add_fdes (*p, array, &count, &pc_begin, &pc_end);
+	add_fdes (*p, array, &count, &pc_begin, &pc_end, 1, &skipped);
+      if (skipped)
+        for (; *p; ++p)
+          add_fdes (*p, array, &count, &pc_begin, &pc_end, 0, &skipped);        
     }
   else
-    add_fdes (ob->fde_begin, array, &count, &pc_begin, &pc_end);
+    {
+      add_fdes (ob->fde_begin, array, &count, &pc_begin, &pc_end, 1, &skipped);
+      if (skipped)
+        add_fdes (ob->fde_begin, array, &count, &pc_begin, &pc_end, 0, &skipped);
+    }
 
   ob->fde_array = array;
   ob->pc_begin = pc_begin;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: linkonce and exceptions on Solaris
  1998-02-18 17:15 Martin von Loewis
@ 1998-02-19 10:47 ` Bruno Haible
  1998-02-19 12:42   ` Martin von Loewis
  0 siblings, 1 reply; 4+ messages in thread
From: Bruno Haible @ 1998-02-19 10:47 UTC (permalink / raw)
  To: Martin von Loewis; +Cc: egcs

> When investigating this, I found that the out-of-order FDEs come from
> gnu.linkonce sections. All linkonce functions are after all regular
> functions, but their FDEs are spread all over the place. As more an
> more of these FDEs are added, inserting regular FDEs will get more and
> more expensive.

Very interesting analysis. Does the following patch help?

Bruno



Sat Jan 10 04:23:00 1998  Bruno Haible  <bruno@linuix.mathematik.uni-karlsruhe.de>

        * frame.c (fde_insert, add_fdes, frame_init): Use two-stack insertion
          sort algorithm.

*** egcs-971225/gcc/frame.c.bak	Fri Jan  9 01:41:32 1998
--- egcs-971225/gcc/frame.c	Sat Jan 10 03:51:27 1998
*************** next_fde (fde *p)
*** 192,212 ****
  }
  
  /* One iteration of an insertion sort, for adding new FDEs to the array.
!    Usually the new FDE will go in at the end, so we can expect close to
!    O(n) performance.  If this turns out to be overly optimistic, we can have
!    the linker sort the FDEs so we don't have to do it at run time.  */
  
  static void
! fde_insert (fde **array, size_t i, fde *this_fde)
  {
!   array[i] = this_fde;
  
!   for (; i > 0 && fde_compare (array[i], array[i-1]) < 0; --i)
      {
-       this_fde = array[i];
        array[i] = array[i-1];
!       array[i-1] = this_fde;
      }
  }
  
  static size_t
--- 192,253 ----
  }
  
  /* One iteration of an insertion sort, for adding new FDEs to the array.
!    This is a special kind of insertion sort, optimized for data sets that
!    look like  101 102 103 201 202 105 203 204 106 107 205 206 207.
!    Usually the new FDE will go in at the end or at the last insertio point,
!    so we can expect O(n) performance, instead of the worst-case O(n^2).
!    If this turns out to be overly optimistic, we can have the linker sort
!    the FDEs so we don't have to do it at run time, or we can use an AVL
!    tree for O(n log(n)) performance.  */
! 
! typedef struct fde_accumulator
! {
!   fde **array1;
!   size_t count1;
!   fde **array2;
!   size_t count2;
! } fde_accumulator;
  
  static void
! fde_insert (fde_accumulator *accu, fde *this_fde)
  {
!   fde **array;
!   size_t i;
  
!   if (accu->count2 == 0 || fde_compare (this_fde, accu->array2[0]) < 0)
!     {
!       array = accu->array1;
!       i = accu->count1;
!     }
!   else
!     {
!       array = accu->array2;
!       i = accu->count2;
!     }
! 
!   while (i > 0 && fde_compare (this_fde, array[i-1]) < 0)
      {
        array[i] = array[i-1];
!       i--;
      }
+   if (array == accu->array1)
+     {
+       if (i < accu->count1 && accu->count2 == 0)
+ 	{
+ 	  fde **ptr1 = accu->array1 + i + 1;
+ 	  fde **ptr2 = accu->array2;
+ 	  size_t count = accu->count1 - i;
+ 	  accu->count2 = count;
+ 	  do *ptr2++ = *ptr1++; while (--count > 0);
+ 	  accu->count1 = i;
+ 	}
+       accu->count1++;
+     }
+   else
+     {
+       accu->count2++;
+     }
+   array[i] = this_fde;
  }
  
  static size_t
*************** count_fdes (fde *this_fde)
*** 227,236 ****
  }
  
  static void
! add_fdes (fde *this_fde, fde **array, size_t *i_ptr,
! 	  void **beg_ptr, void **end_ptr)
  {
-   size_t i = *i_ptr;
    void *pc_begin = *beg_ptr;
    void *pc_end = *end_ptr;
  
--- 268,275 ----
  }
  
  static void
! add_fdes (fde *this_fde, fde_accumulator *accu, void **beg_ptr, void **end_ptr)
  {
    void *pc_begin = *beg_ptr;
    void *pc_end = *end_ptr;
  
*************** add_fdes (fde *this_fde, fde **array, si
*** 240,246 ****
        if (this_fde->CIE_delta == 0 || this_fde->pc_begin == 0)
  	continue;
  
!       fde_insert (array, i++, this_fde);
  
        if (this_fde->pc_begin < pc_begin)
  	pc_begin = this_fde->pc_begin;
--- 279,285 ----
        if (this_fde->CIE_delta == 0 || this_fde->pc_begin == 0)
  	continue;
  
!       fde_insert (accu, this_fde);
  
        if (this_fde->pc_begin < pc_begin)
  	pc_begin = this_fde->pc_begin;
*************** add_fdes (fde *this_fde, fde **array, si
*** 248,254 ****
  	pc_end = this_fde->pc_begin + this_fde->pc_range;
      }
  
-   *i_ptr = i;
    *beg_ptr = pc_begin;
    *end_ptr = pc_end;
  }
--- 287,292 ----
*************** add_fdes (fde *this_fde, fde **array, si
*** 260,268 ****
  static void
  frame_init (struct object* ob)
  {
-   fde *this_fde;
    size_t count;
!   fde **array;
    void *pc_begin, *pc_end;
  
    if (ob->fde_array)
--- 298,305 ----
  static void
  frame_init (struct object* ob)
  {
    size_t count;
!   fde_accumulator accu;
    void *pc_begin, *pc_end;
  
    if (ob->fde_array)
*************** frame_init (struct object* ob)
*** 275,296 ****
      count = count_fdes (ob->fde_begin);
  
    ob->count = count;
!   array = (fde **) malloc (sizeof (fde *) * count);
  
    pc_begin = (void*)(uaddr)-1;
    pc_end = 0;
-   count = 0;
  
    if (ob->fde_array)
      {
        fde **p = ob->fde_array;
        for (; *p; ++p)
! 	add_fdes (*p, array, &count, &pc_begin, &pc_end);
      }
    else
!     add_fdes (ob->fde_begin, array, &count, &pc_begin, &pc_end);
  
!   ob->fde_array = array;
    ob->pc_begin = pc_begin;
    ob->pc_end = pc_end;
  }
--- 312,346 ----
      count = count_fdes (ob->fde_begin);
  
    ob->count = count;
!   accu.array1 = (fde **) malloc (sizeof (fde *) * count);
!   accu.array2 = (fde **) alloca (sizeof (fde *) * count);
!   accu.count1 = 0;
!   accu.count2 = 0;
  
    pc_begin = (void*)(uaddr)-1;
    pc_end = 0;
  
    if (ob->fde_array)
      {
        fde **p = ob->fde_array;
        for (; *p; ++p)
! 	add_fdes (*p, &accu, &pc_begin, &pc_end);
      }
    else
!     add_fdes (ob->fde_begin, &accu, &pc_begin, &pc_end);
! 
!   if (accu.count1 + accu.count2 != count)
!     abort ();
! 
!   if (accu.count2 > 0)
!     {
!       fde **ptr1 = accu.array1 + accu.count1;
!       fde **ptr2 = accu.array2;
!       size_t c = accu.count2;
!       do *ptr1++ = *ptr2++; while (--c > 0);
!     }
  
!   ob->fde_array = accu.array1;
    ob->pc_begin = pc_begin;
    ob->pc_end = pc_end;
  }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* linkonce and exceptions on Solaris
@ 1998-02-18 17:15 Martin von Loewis
  1998-02-19 10:47 ` Bruno Haible
  0 siblings, 1 reply; 4+ messages in thread
From: Martin von Loewis @ 1998-02-18 17:15 UTC (permalink / raw)
  To: egcs

When I run a g++ 2.8 compiled image on Solaris, exception handling may
consume a lot of computing power. In one particular case, I have an
image which has 9MB of text size. Throwing the first exception will
take about 10 seconds on an UltraSparc 1.

When analysing this, I found that these 10 seconds are spend inside
add_fdes. So I checked the size of the eh_frame section:

size protocol
text    data    bss     dec     hex     filename
8908320 2110138 170200  11188658        aab9b2  solarisg/protocol

objdump --headers solarisg/protocol |grep eh_frame
SECTION 13 [.eh_frame]  : size 0010dddc vma 009937c0 align 2**3

Next, I modified fde_insert to count various things:

static void
fde_insert (fde **array, size_t i, fde *this_fde)
{
  int this_bubble=0;
  array[i] = this_fde;
  mvl_fde_count++;

  for (; i > 0 && fde_compare (array[i], array[i-1]) < 0; --i)
    {
      mvl_fde_bubble++;
      this_bubble++;
      this_fde = array[i];
      array[i] = array[i-1];
      array[i-1] = this_fde;
    }
  if(this_bubble>mvl_fde_bubble_max)
    mvl_fde_bubble_max = this_bubble;
}

For this image, I got the following results:
Total number of FDEs:                 33618
Total number of bubble operations: 26896088
Maximum offset to move one FDE:        2469

So it seems that the bubble sort is the expensive operation
here. The assumption of O(n) is apparently broken.

When investigating this, I found that the out-of-order FDEs come from
gnu.linkonce sections. All linkonce functions are after all regular
functions, but their FDEs are spread all over the place. As more an
more of these FDEs are added, inserting regular FDEs will get more and
more expensive.

This happens with both the GNU linker (2.8.2) as well as
/usr/ccs/bin/ld, although GNU ld behaves somewhat better.

I know I can get rid of the linkonce by saying -fno-weak, but the
resulting code size would not be acceptable.

Are there any proposals which I could try?

Short of that, I can propose two possible solutions:

1. The heuristics 'insert at the end' is bad. I found that a better
heuristics would be 'insert one after the previous one'.

2. Add the FDEs in two passes. Both the regular FDEs, and the linked
once FDEs appear to be in order, on their own. So if there is a symbol
saying where the linked-once sections start, one can add all regular
entries in the first pass (skipping everything above this symbol), and
the linked onces in the second pass. This might get you back to O(n).

Please comment.

Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1998-02-19 12:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <199802182041.VAA00469.cygnus.egcs@mira.isdn.cs.tu-berlin.de>
1998-02-19 10:47 ` linkonce and exceptions on Solaris Jason Merrill
1998-02-18 17:15 Martin von Loewis
1998-02-19 10:47 ` Bruno Haible
1998-02-19 12:42   ` Martin von Loewis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).