[RFC 0/2] Read memory in multiple lines in dcache_xfer

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory
@ 2013-10-18  3:22 Yao Qi
  2013-10-18  3:22 ` [PATCH 2/2] " Yao Qi
  2013-10-18  3:22 ` [PATCH 1/2] Separate read and write " Yao Qi
  0 siblings, 2 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-18  3:22 UTC (permalink / raw)
  To: gdb-patches

Hi,
This patch series is to improve the dcache on reading from target memory.
The rationale of this patch series is to reduce the number of calling
target_read, and read as much as possible in one time.  Details can be
found in patch 2/2.

Patch 1/2 simply splits the code on reading and writing in
dcache_xfer_memory, and patch 2/2 focuses on optimizing reading target
memory, because GDB reads a lot of data from target, but writes few.

The performance gain is measured and listed in patch 2/2.  The whole
series is regression tested on x86_64-linux.  Comments?

*** BLURB HERE ***

Yao Qi (2):
  Separate read and write in dcache_xfer_memory
  Read memory in multiple lines in dcache_xfer_memory.

 gdb/dcache.c |  356 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 260 insertions(+), 96 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-18  3:22 [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory Yao Qi
@ 2013-10-18  3:22 ` Yao Qi
  2013-10-21 20:38   ` Doug Evans
                     ` (2 more replies)
  2013-10-18  3:22 ` [PATCH 1/2] Separate read and write " Yao Qi
  1 sibling, 3 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-18  3:22 UTC (permalink / raw)
  To: gdb-patches

Hi, this is an optimization to dcache reading contents from target
memory.  Nowadays, when GDB requests to read target memory and
requests go through dcache, dcache will read one cache line in on
time, regardless the size of the requested data.  If GDB read a large
amount of data from target, dcache will read multiple times from
target memory (read in one cache line per time).  In remote debugging,
it means multiple RSP packets to transfer memory from GDBserver, which
is slow.

This patch is to teach dcache to read continuous target memory as much
as possible in one time, and update the multiple cache lines when the
contents are read in.  It can be done by several steps:

 1.  When GDB requests to read data [memaddr, memaddr + len), a
collection of ranges is created to record readable ranges, because
some memory may be marked as write-only.
 2.  Then, we'll check the cache state of these readable ranges.  Some
of them are cached, and some are not.  We record the uncached ranges.
 3.  Iterate the collection of uncached ranges, and issue target_read
to read these uncached ranges from the target memory and update cache
lines.  For cached ranges, read from cache lines directly.

I am using a perf test case backtrace to measure the speed-up of this
patch.  Every time, 'set dcache line-size N' and
'set dcache size 4096 * 64 / N', to make sure the total size of dcache
is unchanged.

With this patch, the number of 'm' RSP packet is reduced
dramatically:

cache line size:	Original        Patched
2			4657894         31224
4       		2317896         28616
8			158948          21462
16			579474          14308
32			293314          14308
64			150234          14308
128			78694           10738
256			42938           8960
512			25046           8064
1024			16100           7616
2048			9184            7392

Performance comparison:

                        cache line size  Patched Original
backtrace        cpu_time        2       4.44    33.83
backtrace        cpu_time        4       3.88    14.27
backtrace        cpu_time        8       3.1     7.92
backtrace        cpu_time        16      2.48    4.79
backtrace        cpu_time        32      2.25    2.51
backtrace        cpu_time        64      1.16    1.93
backtrace        cpu_time        128     1.02    1.69
backtrace        cpu_time        256     1.06    1.37
backtrace        cpu_time        512     1.11    1.17
backtrace        cpu_time        1024    1.1     1.22
backtrace        cpu_time        2048    1.13    1.17
backtrace        wall_time       2       5.49653506279   74.0839848518
backtrace        wall_time       4       4.70916986465   29.94830513
backtrace        wall_time       8       4.11279582977   15.6743021011
backtrace        wall_time       16      3.68633985519   8.83114910126
backtrace        wall_time       32      3.63511800766   5.79059791565
backtrace        wall_time       64      1.61371517181   3.67003703117
backtrace        wall_time       128     1.50599694252   2.60381913185
backtrace        wall_time       256     1.47533297539   2.05611109734
backtrace        wall_time       512     1.48193001747   1.80505800247
backtrace        wall_time       1024    1.50955080986   1.69646501541
backtrace        wall_time       2048    1.54235315323   1.61461496353
backtrace        vmsize          2       104568          104576
backtrace        vmsize          4       100556          102388
backtrace        vmsize          8       95384   97540
backtrace        vmsize          16      94092   94092
backtrace        vmsize          32      93348   93276
backtrace        vmsize          64      93148   92928
backtrace        vmsize          128     93148   93100
backtrace        vmsize          256     93148   93100
backtrace        vmsize          512     93148   93100
backtrace        vmsize          1024    93148   93100
backtrace        vmsize          2048    93148   93100

gdb:

2013-10-18  Yao Qi  <yao@codesourcery.com>

	* dcache.c: Include "memrange.h".
	Update comments.
	(dcache_read_line): Remove.
	(dcache_peek_byte): Remove.
	(dcache_ranges_readable): New function.
	(dcache_ranges_uncached): New function.
	(dcache_xfer_memory): Read multiple cache lines from target
	memory in one time.
---
 gdb/dcache.c |  331 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 244 insertions(+), 87 deletions(-)

diff --git a/gdb/dcache.c b/gdb/dcache.c
index 316f3dd..65bbad1 100644
--- a/gdb/dcache.c
+++ b/gdb/dcache.c
@@ -25,6 +25,7 @@
 #include "target.h"
 #include "inferior.h"
 #include "splay-tree.h"
+#include "memrange.h"
 
 /* Commands with a prefix of `{set,show} dcache'.  */
 static struct cmd_list_element *dcache_set_list = NULL;
@@ -60,8 +61,8 @@ static struct cmd_list_element *dcache_show_list = NULL;
 /* NOTE: Interaction of dcache and memory region attributes
 
    As there is no requirement that memory region attributes be aligned
-   to or be a multiple of the dcache page size, dcache_read_line() and
-   dcache_write_line() must break up the page by memory region.  If a
+   to or be a multiple of the dcache page size, dcache_xfer_memory must
+   break up the page by memory region.  If a
    chunk does not have the cache attribute set, an invalid memory type
    is set, etc., then the chunk is skipped.  Those chunks are handled
    in target_xfer_memory() (or target_xfer_memory_partial()).
@@ -122,8 +123,6 @@ typedef void (block_func) (struct dcache_block *block, void *param);
 
 static struct dcache_block *dcache_hit (DCACHE *dcache, CORE_ADDR addr);
 
-static int dcache_read_line (DCACHE *dcache, struct dcache_block *db);
-
 static struct dcache_block *dcache_alloc (DCACHE *dcache, CORE_ADDR addr);
 
 static void dcache_info (char *exp, int tty);
@@ -305,56 +304,6 @@ dcache_hit (DCACHE *dcache, CORE_ADDR addr)
   return db;
 }
 
-/* Fill a cache line from target memory.
-   The result is 1 for success, 0 if the (entire) cache line
-   wasn't readable.  */
-
-static int
-dcache_read_line (DCACHE *dcache, struct dcache_block *db)
-{
-  CORE_ADDR memaddr;
-  gdb_byte *myaddr;
-  int len;
-  int res;
-  int reg_len;
-  struct mem_region *region;
-
-  len = dcache->line_size;
-  memaddr = db->addr;
-  myaddr  = db->data;
-
-  while (len > 0)
-    {
-      /* Don't overrun if this block is right at the end of the region.  */
-      region = lookup_mem_region (memaddr);
-      if (region->hi == 0 || memaddr + len < region->hi)
-	reg_len = len;
-      else
-	reg_len = region->hi - memaddr;
-
-      /* Skip non-readable regions.  The cache attribute can be ignored,
-         since we may be loading this for a stack access.  */
-      if (region->attrib.mode == MEM_WO)
-	{
-	  memaddr += reg_len;
-	  myaddr  += reg_len;
-	  len     -= reg_len;
-	  continue;
-	}
-      
-      res = target_read (&current_target, TARGET_OBJECT_RAW_MEMORY,
-			 NULL, myaddr, memaddr, reg_len);
-      if (res < reg_len)
-	return 0;
-
-      memaddr += res;
-      myaddr += res;
-      len -= res;
-    }
-
-  return 1;
-}
-
 /* Get a free cache block, put or keep it on the valid list,
    and return its address.  */
 
@@ -395,28 +344,6 @@ dcache_alloc (DCACHE *dcache, CORE_ADDR addr)
   return db;
 }
 
-/* Using the data cache DCACHE, store in *PTR the contents of the byte at
-   address ADDR in the remote machine.  
-
-   Returns 1 for success, 0 for error.  */
-
-static int
-dcache_peek_byte (DCACHE *dcache, CORE_ADDR addr, gdb_byte *ptr)
-{
-  struct dcache_block *db = dcache_hit (dcache, addr);
-
-  if (!db)
-    {
-      db = dcache_alloc (dcache, addr);
-
-      if (!dcache_read_line (dcache, db))
-         return 0;
-    }
-
-  *ptr = db->data[XFORM (dcache, addr)];
-  return 1;
-}
-
 /* Write the byte at PTR into ADDR in the data cache.
 
    The caller is responsible for also promptly writing the data
@@ -473,6 +400,105 @@ dcache_init (void)
   return dcache;
 }
 
+/* Check the readability of memory range [MEMORY, MEMORY + LEN) and
+   return the readable ranges and caller is responsible to release it.  */
+
+static VEC(mem_range_s) *
+dcache_ranges_readable (CORE_ADDR memaddr, int len)
+{
+  VEC(mem_range_s) *readable_memory = NULL;
+
+  while (len > 0)
+    {
+      struct mem_range *r;
+      int reg_len;
+      /* Don't overrun if this block is right at the end of the region.  */
+      struct mem_region *region = lookup_mem_region (memaddr);
+
+      if (region->hi == 0 || memaddr + len < region->hi)
+	reg_len = len;
+      else
+	reg_len = region->hi - memaddr;
+
+      /* Skip non-readable regions.  The cache attribute can be ignored,
+	 since we may be loading this for a stack access.  */
+      if (region->attrib.mode == MEM_WO)
+	{
+	  memaddr += reg_len;
+	  len -= reg_len;
+	  continue;
+	}
+
+      r = VEC_safe_push (mem_range_s, readable_memory, NULL);
+      r->start = memaddr;
+      r->length = reg_len;
+
+      memaddr += reg_len;
+      len -= reg_len;
+    }
+
+  return readable_memory;
+}
+
+/* Return the uncached ranges from RANGES.   */
+
+static VEC(mem_range_s) *
+dcache_ranges_uncached (DCACHE *dcache, VEC(mem_range_s) *ranges)
+{
+  int b;
+  struct mem_range *rb;
+  VEC(mem_range_s) *uncached = NULL;
+
+  for (b = 0; VEC_iterate (mem_range_s, ranges, b, rb); b++)
+    {
+      CORE_ADDR memaddr_start = rb->start;
+      CORE_ADDR memaddr_end = rb->start;
+
+      while (memaddr_end < rb->start + rb->length)
+	{
+	  struct dcache_block *db = dcache_hit (dcache, memaddr_end);
+
+	  if (db != NULL)
+	    {
+	      /* Set MEMADDR_END to the start address of this cache line.  */
+	      memaddr_end = align_down (memaddr_end, dcache->line_size);
+
+	      if (memaddr_end > memaddr_start)
+		{
+		  struct mem_range *r;
+
+		  r = VEC_safe_push (mem_range_s, uncached, NULL);
+		  r->start = memaddr_start;
+		  r->length = memaddr_end - memaddr_start;
+		}
+	    }
+
+	  /* Increase memaddr_end to a dcache->line_size-aligned value.  */
+	  if (memaddr_end < align_up (memaddr_end, dcache->line_size))
+	    memaddr_end = align_up (memaddr_end, dcache->line_size);
+	  else
+	    memaddr_end += dcache->line_size;
+
+	  if (db != NULL)
+	    memaddr_start = memaddr_end;
+	}
+
+      if (memaddr_end > rb->start + rb->length)
+	memaddr_end = rb->start + rb->length;
+
+      if (memaddr_start < memaddr_end)
+	{
+	  struct mem_range *r;
+
+	  r = VEC_safe_push (mem_range_s, uncached, NULL);
+
+	  r->start = memaddr_start;
+	  r->length = memaddr_end - memaddr_start;
+	}
+    }
+
+  return uncached;
+}
 
 /* Read or write LEN bytes from inferior memory at MEMADDR, transferring
    to or from debugger address MYADDR.  Write to inferior if SHOULD_WRITE is
@@ -489,9 +515,6 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
 		    CORE_ADDR memaddr, gdb_byte *myaddr,
 		    int len, int should_write)
 {
-  int i;
-  int res;
-
   /* If this is a different inferior from what we've recorded,
      flush the cache.  */
 
@@ -506,8 +529,10 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
 
   if (should_write)
     {
-      res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
-			  NULL, myaddr, memaddr, len);
+      int res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
+			      NULL, myaddr, memaddr, len);
+      int i;
+
       if (res <= 0)
 	return res;
       /* Update LEN to what was actually written.  */
@@ -527,16 +552,148 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
     }
   else
     {
-      for (i = 0; i < len; i++)
+      int i;
+      struct mem_range *r;
+      /* The starting address of each cached range.  */
+      CORE_ADDR cached_addr = memaddr;
+
+      VEC(mem_range_s) *memory;
+      VEC(mem_range_s) *uncached = NULL;
+
+      /* Find readable ranges in range [MEMADDR, MEMADDR + LEN),
+	 supposing write-only regions are wo1 and wo2.  Then,
+	 readable ranges are r1, r2 and r3.
+
+	 MEMADDR                               MEMADDR + LEN
+	 |<------------------------------------------------->|
+		|<-- wo1 -->|        |<-- wo2 -->|
+
+	 |<-r1->|           |<--r2-->|           |<---r3---->|  */
+      memory = dcache_ranges_readable (memaddr, len);
+
+      /* GDB will read from these three readable ranges, r1, r2 and r3.
+	 GDB has to check the corresponding cache lines' state (cached
+	 or uncached) to determine whether to read from the target
+	 memory or the cache lines.
+
+	 MEMADDR                               MEMADDR + LEN
+	 |<------------------------------------------------->|
+		|<-- wo1 -->|        |<-- wo2 -->|
+
+	 |<-r1->|           |<--r2-->|           |<---r3---->|
+
+	 -u-|-----c----|-----u----|-----c----|-----c----|--u--
+	 'u' stands for unchaced 'c' stands for cached.
+
+	 |u1|-c1-|          |  u2 |c2|           |--c3--| u3  |
+
+	 Uncached ranges are u1, u2 and u3, and cached ranges are c1,
+	 c2 and c3.  */
+      uncached = dcache_ranges_uncached (dcache, memory);
+
+      VEC_free (mem_range_s, memory);
+
+      /* Iterate each uncached range.  Read memory from cache lines if
+	 memory address is not within the uncached range, otherwise, read
+	 from the target memory and update corresponding cache lines.  */
+
+      for (i = 0; VEC_iterate (mem_range_s, uncached, i, r); i++)
 	{
-	  if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
+	  int j;
+
+	  if (cached_addr < r->start)
 	    {
-	      /* That failed.  Discard its cache line so we don't have a
-		 partially read line.  */
-	      dcache_invalidate_line (dcache, memaddr + i);
-	      return i;
+	      /* Read memory [cached_addr, MIN (r->start, MEMADDR + LEN))
+		 from cache lines.  */
+
+	      for (; cached_addr < r->start && cached_addr < (memaddr + len);
+		   cached_addr++)
+		{
+		  struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+		  gdb_assert (db != NULL);
+
+		  myaddr[cached_addr - memaddr]
+		    = db->data[XFORM (dcache, cached_addr)];
+		}
+	    }
+	  cached_addr = r->start + r->length;
+
+	  /* Part of the memory range [MEMADDR, MEMADDR + LEN) is
+	     not cached.  */
+	  if (r->start < len + memaddr)
+	    {
+	      /* MEMADDR_START and MEMADDR_END are aligned on
+		 dcache->line_size, because dcache->line_size is the
+		 minimal unit to update cache and fetch from the target
+		 memory.  */
+	      CORE_ADDR memaddr_start
+		= align_down (r->start, dcache->line_size);
+	      CORE_ADDR memaddr_end
+		= align_up (r->start + r->length, dcache->line_size);
+	      int res;
+	      int len1 = memaddr_end - memaddr_start;
+	      int len2;
+	      gdb_byte *buf = xmalloc (len1);
+
+	      /* Read multiple cache lines to cover memory range
+		 [r->start, r->start + MIN (r->length,
+		 LEN + MEMADDR - r->start)) from target.  */
+
+	      res = target_read (&current_target, TARGET_OBJECT_RAW_MEMORY,
+				 NULL, buf, memaddr_start, len1);
+
+	      if (res == -1)
+		{
+		  VEC_free (mem_range_s, uncached);
+		  xfree (buf);
+		  return r->start - memaddr;
+		}
+
+	      /* Copy contents to MYADDR.  */
+	      len2 = r->length;
+	      if (len2 > len + memaddr - r->start)
+		len2 = len + memaddr - r->start;
+
+	      memcpy ((r->start - memaddr) + myaddr,
+		      buf + (r->start - memaddr_start),
+		      len2);
+
+	      /* Update cache lines in range
+		 [MEMADDR_START, MEMADDR_START + LEN1).  */
+	      for (j = 0; j < (len1 / dcache->line_size); j++)
+		{
+		  struct dcache_block *db
+		    = dcache_hit (dcache, memaddr_start + j * dcache->line_size);
+
+		  gdb_assert (db == NULL);
+
+		  db = dcache_alloc (dcache, memaddr_start + j * dcache->line_size);
+
+		  memcpy (db->data, &buf[j * dcache->line_size], dcache->line_size);
+		}
+
+	      xfree (buf);
+
+	      if (res < len1)
+		{
+		  VEC_free (mem_range_s, uncached);
+		  return r->start - memaddr + res;
+		}
 	    }
 	}
+
+      VEC_free (mem_range_s, uncached);
+
+      for (; cached_addr < (memaddr + len); cached_addr++)
+	{
+	  struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+	  gdb_assert (db != NULL);
+
+	  myaddr[cached_addr - memaddr]
+	    = db->data[XFORM (dcache, cached_addr)];
+	}
     }
 
   return len;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] Separate read and write in dcache_xfer_memory
  2013-10-18  3:22 [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory Yao Qi
  2013-10-18  3:22 ` [PATCH 2/2] " Yao Qi
@ 2013-10-18  3:22 ` Yao Qi
  1 sibling, 0 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-18  3:22 UTC (permalink / raw)
  To: gdb-patches

This patch is to refactor dcache_xfer_memory to separate the code
reading and wring target memory.

gdb:

2013-10-18  Yao Qi  <yao@codesourcery.com>

	* dcache.c (dcache_xfer_memory): Separate memory read and
	write.
---
 gdb/dcache.c |   37 ++++++++++++++++++++++---------------
 1 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/gdb/dcache.c b/gdb/dcache.c
index acb9de4..316f3dd 100644
--- a/gdb/dcache.c
+++ b/gdb/dcache.c
@@ -491,9 +491,6 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
 {
   int i;
   int res;
-  int (*xfunc) (DCACHE *dcache, CORE_ADDR addr, gdb_byte *ptr);
-
-  xfunc = should_write ? dcache_poke_byte : dcache_peek_byte;
 
   /* If this is a different inferior from what we've recorded,
      flush the cache.  */
@@ -515,23 +512,33 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
 	return res;
       /* Update LEN to what was actually written.  */
       len = res;
+
+      for (i = 0; i < len; i++)
+	{
+	  if (!dcache_poke_byte (dcache, memaddr + i, myaddr + i))
+	    {
+	      /* That failed.  Discard its cache line so we don't have a
+		 partially read line.  */
+	      dcache_invalidate_line (dcache, memaddr + i);
+	      /* We still wrote LEN bytes.  */
+	      return len;
+	    }
+	}
     }
-      
-  for (i = 0; i < len; i++)
+  else
     {
-      if (!xfunc (dcache, memaddr + i, myaddr + i))
+      for (i = 0; i < len; i++)
 	{
-	  /* That failed.  Discard its cache line so we don't have a
-	     partially read line.  */
-	  dcache_invalidate_line (dcache, memaddr + i);
-	  /* If we're writing, we still wrote LEN bytes.  */
-	  if (should_write)
-	    return len;
-	  else
-	    return i;
+	  if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
+	    {
+	      /* That failed.  Discard its cache line so we don't have a
+		 partially read line.  */
+	      dcache_invalidate_line (dcache, memaddr + i);
+	      return i;
+	    }
 	}
     }
-    
+
   return len;
 }
 
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-18  3:22 ` [PATCH 2/2] " Yao Qi
@ 2013-10-21 20:38   ` Doug Evans
  2013-10-22  0:50     ` Yao Qi
  2013-10-21 23:16   ` Doug Evans
  2013-11-18 18:17   ` Pedro Alves
  2 siblings, 1 reply; 11+ messages in thread
From: Doug Evans @ 2013-10-21 20:38 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

Yao Qi writes:
 > Hi, this is an optimization to dcache reading contents from target
 > memory.  Nowadays, when GDB requests to read target memory and
 > requests go through dcache, dcache will read one cache line in on
 > time, regardless the size of the requested data.  If GDB read a large
 > amount of data from target, dcache will read multiple times from
 > target memory (read in one cache line per time).  In remote debugging,
 > it means multiple RSP packets to transfer memory from GDBserver, which
 > is slow.
 > 
 > This patch is to teach dcache to read continuous target memory as much
 > as possible in one time, and update the multiple cache lines when the
 > contents are read in.  It can be done by several steps:
 > 
 >  1.  When GDB requests to read data [memaddr, memaddr + len), a
 > collection of ranges is created to record readable ranges, because
 > some memory may be marked as write-only.
 >  2.  Then, we'll check the cache state of these readable ranges.  Some
 > of them are cached, and some are not.  We record the uncached ranges.
 >  3.  Iterate the collection of uncached ranges, and issue target_read
 > to read these uncached ranges from the target memory and update cache
 > lines.  For cached ranges, read from cache lines directly.
 > 
 > I am using a perf test case backtrace to measure the speed-up of this
 > patch.  Every time, 'set dcache line-size N' and
 > 'set dcache size 4096 * 64 / N', to make sure the total size of dcache
 > is unchanged.
 > 
 > With this patch, the number of 'm' RSP packet is reduced
 > dramatically:
 > 
 > cache line size:	Original        Patched
 > 2			4657894         31224
 > 4       		2317896         28616
 > 8			158948          21462
 > 16			579474          14308
 > 32			293314          14308
 > 64			150234          14308
 > 128			78694           10738
 > 256			42938           8960
 > 512			25046           8064
 > 1024			16100           7616
 > 2048			9184            7392
 > 
 > Performance comparison:
 > 
 >                         cache line size  Patched Original
 > backtrace        cpu_time        2       4.44    33.83
 > backtrace        cpu_time        4       3.88    14.27
 > backtrace        cpu_time        8       3.1     7.92
 > backtrace        cpu_time        16      2.48    4.79
 > backtrace        cpu_time        32      2.25    2.51
 > backtrace        cpu_time        64      1.16    1.93
 > backtrace        cpu_time        128     1.02    1.69
 > backtrace        cpu_time        256     1.06    1.37
 > backtrace        cpu_time        512     1.11    1.17
 > backtrace        cpu_time        1024    1.1     1.22
 > backtrace        cpu_time        2048    1.13    1.17
 > backtrace        wall_time       2       5.49653506279   74.0839848518
 > backtrace        wall_time       4       4.70916986465   29.94830513
 > backtrace        wall_time       8       4.11279582977   15.6743021011
 > backtrace        wall_time       16      3.68633985519   8.83114910126
 > backtrace        wall_time       32      3.63511800766   5.79059791565
 > backtrace        wall_time       64      1.61371517181   3.67003703117
 > backtrace        wall_time       128     1.50599694252   2.60381913185
 > backtrace        wall_time       256     1.47533297539   2.05611109734
 > backtrace        wall_time       512     1.48193001747   1.80505800247
 > backtrace        wall_time       1024    1.50955080986   1.69646501541
 > backtrace        wall_time       2048    1.54235315323   1.61461496353
 > backtrace        vmsize          2       104568          104576
 > backtrace        vmsize          4       100556          102388
 > backtrace        vmsize          8       95384   97540
 > backtrace        vmsize          16      94092   94092
 > backtrace        vmsize          32      93348   93276
 > backtrace        vmsize          64      93148   92928
 > backtrace        vmsize          128     93148   93100
 > backtrace        vmsize          256     93148   93100
 > backtrace        vmsize          512     93148   93100
 > backtrace        vmsize          1024    93148   93100
 > backtrace        vmsize          2048    93148   93100
 > 
 > gdb:
 > 
 > 2013-10-18  Yao Qi  <yao@codesourcery.com>
 > 
 > 	* dcache.c: Include "memrange.h".
 > 	Update comments.
 > 	(dcache_read_line): Remove.
 > 	(dcache_peek_byte): Remove.
 > 	(dcache_ranges_readable): New function.
 > 	(dcache_ranges_uncached): New function.
 > 	(dcache_xfer_memory): Read multiple cache lines from target
 > 	memory in one time.

Hi.
While I understand the improvement is nice, it does add complexity to the code.
[I'm not suggesting this patch isn't justified.
But I think we do need to weigh complexity vs actual speed-ups, and
I'm in the "Need more data." state at the moment.]

The data for cache line sizes less than the default (64 bytes)
is nice but not compelling.
E.g., What if we just increase the cache line size to 256, say?
Or 1024?  And/or maybe auto-adjust it based on
"remote memory-read-packet-size" or some such?
[I'm not arguing for anything in particular.
Again, I'm just collecting data.]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-18  3:22 ` [PATCH 2/2] " Yao Qi
  2013-10-21 20:38   ` Doug Evans
@ 2013-10-21 23:16   ` Doug Evans
  2013-10-22 12:27     ` Yao Qi
  2013-11-18 18:17   ` Pedro Alves
  2 siblings, 1 reply; 11+ messages in thread
From: Doug Evans @ 2013-10-21 23:16 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

Yao Qi writes:
 > [...]
 > 2013-10-18  Yao Qi  <yao@codesourcery.com>
 > 
 > 	* dcache.c: Include "memrange.h".
 > 	Update comments.
 > 	(dcache_read_line): Remove.
 > 	(dcache_peek_byte): Remove.
 > 	(dcache_ranges_readable): New function.
 > 	(dcache_ranges_uncached): New function.
 > 	(dcache_xfer_memory): Read multiple cache lines from target
 > 	memory in one time.
 > [...]
 > +      for (i = 0; VEC_iterate (mem_range_s, uncached, i, r); i++)
 >  	{
 > -	  if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
 > +	  int j;
 > +
 > +	  if (cached_addr < r->start)
 >  	    {
 > -	      /* That failed.  Discard its cache line so we don't have a
 > -		 partially read line.  */
 > -	      dcache_invalidate_line (dcache, memaddr + i);
 > -	      return i;
 > +	      /* Read memory [cached_addr, MIN (r->start, MEMADDR + LEN))
 > +		 from cache lines.  */
 > +
 > +	      for (; cached_addr < r->start && cached_addr < (memaddr + len);
 > +		   cached_addr++)
 > +		{
 > +		  struct dcache_block *db = dcache_hit (dcache, cached_addr);
 > +
 > +		  gdb_assert (db != NULL);

Hi.
Going through the code ...
This assert will trigger if the request ends up filling the cache
and existing lines get pushed out, right?
[There's a similar assert later in the function too.]
Could be missing something of course.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-21 20:38   ` Doug Evans
@ 2013-10-22  0:50     ` Yao Qi
  2013-10-25 17:08       ` Doug Evans
  0 siblings, 1 reply; 11+ messages in thread
From: Yao Qi @ 2013-10-22  0:50 UTC (permalink / raw)
  To: Doug Evans; +Cc: gdb-patches

On 10/22/2013 04:38 AM, Doug Evans wrote:
> While I understand the improvement is nice, it does add complexity to the code.
> [I'm not suggesting this patch isn't justified.
> But I think we do need to weigh complexity vs actual speed-ups, and

Agreed.  The code is more complex than I expected.  It takes me two days 
to fix a bug.

> I'm in the "Need more data." state at the moment.]
>
> The data for cache line sizes less than the default (64 bytes)
> is nice but not compelling.
> E.g., What if we just increase the cache line size to 256, say?
> Or 1024?  And/or maybe auto-adjust it based on

In this case, it is useful to increase the cache line size.  Increase 
the cache line size to 256 can save 0.1 (1.16 -> 1.06) in my experiment, 
which is a good improvement.

When people are doing remote debugging, and find it is slow.  The reason 
of slowness is usually unclear, so it may not be useful advise them to 
increase data cache line size.

> "remote memory-read-packet-size" or some such?

"remote memory-read-packet-size" is not useful in this case, I am 
afraid.  GDB reads one data cache line by a single 'm' packet, and the 
size (data cache line size) is less than the memory read packet size. 
Changing "remote memory-read-packet-size" doesn't have effects.

-- 
Yao (é½å°§)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-21 23:16   ` Doug Evans
@ 2013-10-22 12:27     ` Yao Qi
  0 siblings, 0 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-22 12:27 UTC (permalink / raw)
  To: Doug Evans; +Cc: gdb-patches

On 10/22/2013 07:15 AM, Doug Evans wrote:
> This assert will trigger if the request ends up filling the cache
> and existing lines get pushed out, right?
> [There's a similar assert later in the function too.]
> Could be missing something of course.

I can reproduce it by setting "dcache size" to a small number, 2, for
example, and type commands "up" and "down".

In dcache_xfer_memory, we assume that the cache lines in DCACHE are not
pushed out.  IOW, new cache lines can be added but existing lines can't
be pushed out.  In the updated patch, when adding a new cache line, if
DCACHE is not full, add it, otherwise, save the line in
POSTPONE_INSERT.  When the loop is finished, move lines from
POST_INSERT to DCACHE.  In this way, we don't break the order of
evicting cache lines.

Regression tested on x86_64-linux.  Performance shouldn't be affected.

-- 
Yao (é½å°§)

gdb:

2013-10-22  Yao Qi  <yao@codesourcery.com>

	* dcache.c: Include "memrange.h".
	Update comments.
	(dcache_is_full): New function.
	(dcache_read_line): Remove.
	(dcache_peek_byte): Remove.
	(dcache_ranges_readable): New function.
	(dcache_ranges_uncached): New function.
	(copy_blokc): New function.
	(dcache_xfer_memory): Read multiple cache lines from target
	memory in one time.
---
 gdb/dcache.c |  375 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 292 insertions(+), 83 deletions(-)

diff --git a/gdb/dcache.c b/gdb/dcache.c
index 316f3dd..a089b46 100644
--- a/gdb/dcache.c
+++ b/gdb/dcache.c
@@ -25,6 +25,7 @@
 #include "target.h"
 #include "inferior.h"
 #include "splay-tree.h"
+#include "memrange.h"
 
 /* Commands with a prefix of `{set,show} dcache'.  */
 static struct cmd_list_element *dcache_set_list = NULL;
@@ -60,8 +61,8 @@ static struct cmd_list_element *dcache_show_list = NULL;
 /* NOTE: Interaction of dcache and memory region attributes
 
    As there is no requirement that memory region attributes be aligned
-   to or be a multiple of the dcache page size, dcache_read_line() and
-   dcache_write_line() must break up the page by memory region.  If a
+   to or be a multiple of the dcache page size, dcache_xfer_memory must
+   break up the page by memory region.  If a
    chunk does not have the cache attribute set, an invalid memory type
    is set, etc., then the chunk is skipped.  Those chunks are handled
    in target_xfer_memory() (or target_xfer_memory_partial()).
@@ -122,8 +123,6 @@ typedef void (block_func) (struct dcache_block *block, void *param);
 
 static struct dcache_block *dcache_hit (DCACHE *dcache, CORE_ADDR addr);
 
-static int dcache_read_line (DCACHE *dcache, struct dcache_block *db);
-
 static struct dcache_block *dcache_alloc (DCACHE *dcache, CORE_ADDR addr);
 
 static void dcache_info (char *exp, int tty);
@@ -305,54 +304,12 @@ dcache_hit (DCACHE *dcache, CORE_ADDR addr)
   return db;
 }
 
-/* Fill a cache line from target memory.
-   The result is 1 for success, 0 if the (entire) cache line
-   wasn't readable.  */
+/* Return true if DCACHE is full.  */
 
 static int
-dcache_read_line (DCACHE *dcache, struct dcache_block *db)
+dcache_is_full (DCACHE *dcache)
 {
-  CORE_ADDR memaddr;
-  gdb_byte *myaddr;
-  int len;
-  int res;
-  int reg_len;
-  struct mem_region *region;
-
-  len = dcache->line_size;
-  memaddr = db->addr;
-  myaddr  = db->data;
-
-  while (len > 0)
-    {
-      /* Don't overrun if this block is right at the end of the region.  */
-      region = lookup_mem_region (memaddr);
-      if (region->hi == 0 || memaddr + len < region->hi)
-	reg_len = len;
-      else
-	reg_len = region->hi - memaddr;
-
-      /* Skip non-readable regions.  The cache attribute can be ignored,
-         since we may be loading this for a stack access.  */
-      if (region->attrib.mode == MEM_WO)
-	{
-	  memaddr += reg_len;
-	  myaddr  += reg_len;
-	  len     -= reg_len;
-	  continue;
-	}
-      
-      res = target_read (&current_target, TARGET_OBJECT_RAW_MEMORY,
-			 NULL, myaddr, memaddr, reg_len);
-      if (res < reg_len)
-	return 0;
-
-      memaddr += res;
-      myaddr += res;
-      len -= res;
-    }
-
-  return 1;
+  return dcache->size >= dcache_size;
 }
 
 /* Get a free cache block, put or keep it on the valid list,
@@ -363,7 +320,7 @@ dcache_alloc (DCACHE *dcache, CORE_ADDR addr)
 {
   struct dcache_block *db;
 
-  if (dcache->size >= dcache_size)
+  if (dcache_is_full (dcache))
     {
       /* Evict the least recently allocated line.  */
       db = dcache->oldest;
@@ -395,28 +352,6 @@ dcache_alloc (DCACHE *dcache, CORE_ADDR addr)
   return db;
 }
 
-/* Using the data cache DCACHE, store in *PTR the contents of the byte at
-   address ADDR in the remote machine.  
-
-   Returns 1 for success, 0 for error.  */
-
-static int
-dcache_peek_byte (DCACHE *dcache, CORE_ADDR addr, gdb_byte *ptr)
-{
-  struct dcache_block *db = dcache_hit (dcache, addr);
-
-  if (!db)
-    {
-      db = dcache_alloc (dcache, addr);
-
-      if (!dcache_read_line (dcache, db))
-         return 0;
-    }
-
-  *ptr = db->data[XFORM (dcache, addr)];
-  return 1;
-}
-
 /* Write the byte at PTR into ADDR in the data cache.
 
    The caller is responsible for also promptly writing the data
@@ -473,6 +408,122 @@ dcache_init (void)
   return dcache;
 }
 
+/* Check the readability of memory range [MEMORY, MEMORY + LEN) and
+   return the readable ranges and caller is responsible to release it.  */
+
+static VEC(mem_range_s) *
+dcache_ranges_readable (CORE_ADDR memaddr, int len)
+{
+  VEC(mem_range_s) *readable_memory = NULL;
+
+  while (len > 0)
+    {
+      struct mem_range *r;
+      int reg_len;
+      /* Don't overrun if this block is right at the end of the region.  */
+      struct mem_region *region = lookup_mem_region (memaddr);
+
+      if (region->hi == 0 || memaddr + len < region->hi)
+	reg_len = len;
+      else
+	reg_len = region->hi - memaddr;
+
+      /* Skip non-readable regions.  The cache attribute can be ignored,
+	 since we may be loading this for a stack access.  */
+      if (region->attrib.mode == MEM_WO)
+	{
+	  memaddr += reg_len;
+	  len -= reg_len;
+	  continue;
+	}
+
+      r = VEC_safe_push (mem_range_s, readable_memory, NULL);
+      r->start = memaddr;
+      r->length = reg_len;
+
+      memaddr += reg_len;
+      len -= reg_len;
+    }
+
+  return readable_memory;
+}
+
+/* Return the uncached ranges from RANGES.   */
+
+static VEC(mem_range_s) *
+dcache_ranges_uncached (DCACHE *dcache, VEC(mem_range_s) *ranges)
+{
+  int b;
+  struct mem_range *rb;
+  VEC(mem_range_s) *uncached = NULL;
+
+  for (b = 0; VEC_iterate (mem_range_s, ranges, b, rb); b++)
+    {
+      CORE_ADDR memaddr_start = rb->start;
+      CORE_ADDR memaddr_end = rb->start;
+
+      while (memaddr_end < rb->start + rb->length)
+	{
+	  struct dcache_block *db = dcache_hit (dcache, memaddr_end);
+
+	  if (db != NULL)
+	    {
+	      /* Set MEMADDR_END to the start address of this cache line.  */
+	      memaddr_end = align_down (memaddr_end, dcache->line_size);
+
+	      if (memaddr_end > memaddr_start)
+		{
+		  struct mem_range *r;
+
+		  r = VEC_safe_push (mem_range_s, uncached, NULL);
+		  r->start = memaddr_start;
+		  r->length = memaddr_end - memaddr_start;
+		}
+	    }
+
+	  /* Increase memaddr_end to a dcache->line_size-aligned value.  */
+	  if (memaddr_end < align_up (memaddr_end, dcache->line_size))
+	    memaddr_end = align_up (memaddr_end, dcache->line_size);
+	  else
+	    memaddr_end += dcache->line_size;
+
+	  if (db != NULL)
+	    memaddr_start = memaddr_end;
+	}
+
+      if (memaddr_end > rb->start + rb->length)
+	memaddr_end = rb->start + rb->length;
+
+      if (memaddr_start < memaddr_end)
+	{
+	  struct mem_range *r;
+
+	  r = VEC_safe_push (mem_range_s, uncached, NULL);
+
+	  r->start = memaddr_start;
+	  r->length = memaddr_end - memaddr_start;
+	}
+    }
+
+  return uncached;
+}
+
+/* Helper function to copy BLOCK to dcache represented by PARAM.  */
+
+static void
+copy_block (struct dcache_block *block, void *param)
+{
+  DCACHE *dcache = param;
+  struct dcache_block *db = dcache_hit (dcache, block->addr);
+
+  /* This function is only used when DCACHE is full and BLOCK doesn't
+     exist in DCACHE.  */
+  gdb_assert (dcache_is_full (dcache));
+  gdb_assert (db == NULL);
+
+  db = dcache_alloc (dcache, block->addr);
+  memcpy (db->data, block->data, dcache->line_size);
+}
 
 /* Read or write LEN bytes from inferior memory at MEMADDR, transferring
    to or from debugger address MYADDR.  Write to inferior if SHOULD_WRITE is
@@ -489,9 +540,6 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
 		    CORE_ADDR memaddr, gdb_byte *myaddr,
 		    int len, int should_write)
 {
-  int i;
-  int res;
-
   /* If this is a different inferior from what we've recorded,
      flush the cache.  */
 
@@ -506,8 +554,10 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
 
   if (should_write)
     {
-      res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
-			  NULL, myaddr, memaddr, len);
+      int res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
+			      NULL, myaddr, memaddr, len);
+      int i;
+
       if (res <= 0)
 	return res;
       /* Update LEN to what was actually written.  */
@@ -527,16 +577,175 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
     }
   else
     {
-      for (i = 0; i < len; i++)
+      int i;
+      struct mem_range *r;
+      /* The starting address of each cached range.  */
+      CORE_ADDR cached_addr = memaddr;
+      /* A list of cache blocks should be postponed inserting to
+	 dcache.  */
+      struct dcache_block *postpone_insert = NULL;
+
+      VEC(mem_range_s) *memory;
+      VEC(mem_range_s) *uncached = NULL;
+
+      /* Find readable ranges in range [MEMADDR, MEMADDR + LEN),
+	 supposing write-only regions are wo1 and wo2.  Then,
+	 readable ranges are r1, r2 and r3.
+
+	 MEMADDR                               MEMADDR + LEN
+	 |<------------------------------------------------->|
+		|<-- wo1 -->|        |<-- wo2 -->|
+
+	 |<-r1->|           |<--r2-->|           |<---r3---->|  */
+      memory = dcache_ranges_readable (memaddr, len);
+
+      /* GDB will read from these three readable ranges, r1, r2 and r3.
+	 GDB has to check the corresponding cache lines' state (cached
+	 or uncached) to determine whether to read from the target
+	 memory or the cache lines.
+
+	 MEMADDR                               MEMADDR + LEN
+	 |<------------------------------------------------->|
+		|<-- wo1 -->|        |<-- wo2 -->|
+
+	 |<-r1->|           |<--r2-->|           |<---r3---->|
+
+	 -u-|-----c----|-----u----|-----c----|-----c----|--u--
+	 'u' stands for unchaced 'c' stands for cached.
+
+	 |u1|-c1-|          |  u2 |c2|           |--c3--| u3  |
+
+	 Uncached ranges are u1, u2 and u3, and cached ranges are c1,
+	 c2 and c3.  */
+      uncached = dcache_ranges_uncached (dcache, memory);
+
+      VEC_free (mem_range_s, memory);
+
+      /* Iterate each uncached range.  Read memory from cache lines if
+	 memory address is not within the uncached range, otherwise, read
+	 from the target memory and update corresponding cache lines.  */
+
+      for (i = 0; VEC_iterate (mem_range_s, uncached, i, r); i++)
 	{
-	  if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
+	  int j;
+
+	  if (cached_addr < r->start)
 	    {
-	      /* That failed.  Discard its cache line so we don't have a
-		 partially read line.  */
-	      dcache_invalidate_line (dcache, memaddr + i);
-	      return i;
+	      /* Read memory [cached_addr, MIN (r->start, MEMADDR + LEN))
+		 from cache lines.  */
+
+	      for (; cached_addr < r->start && cached_addr < (memaddr + len);
+		   cached_addr++)
+		{
+		  struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+		  gdb_assert (db != NULL);
+
+		  myaddr[cached_addr - memaddr]
+		    = db->data[XFORM (dcache, cached_addr)];
+		}
+	    }
+	  cached_addr = r->start + r->length;
+
+	  /* Part of the memory range [MEMADDR, MEMADDR + LEN) is
+	     not cached.  */
+	  if (r->start < len + memaddr)
+	    {
+	      /* MEMADDR_START and MEMADDR_END are aligned on
+		 dcache->line_size, because dcache->line_size is the
+		 minimal unit to update cache and fetch from the target
+		 memory.  */
+	      CORE_ADDR memaddr_start
+		= align_down (r->start, dcache->line_size);
+	      CORE_ADDR memaddr_end
+		= align_up (r->start + r->length, dcache->line_size);
+	      int res;
+	      int len1 = memaddr_end - memaddr_start;
+	      int len2;
+	      gdb_byte *buf = xmalloc (len1);
+
+	      /* Read multiple cache lines to cover memory range
+		 [r->start, r->start + MIN (r->length,
+		 LEN + MEMADDR - r->start)) from target.  */
+
+	      res = target_read (&current_target, TARGET_OBJECT_RAW_MEMORY,
+				 NULL, buf, memaddr_start, len1);
+
+	      if (res == -1)
+		{
+		  VEC_free (mem_range_s, uncached);
+		  xfree (buf);
+		  return r->start - memaddr;
+		}
+
+	      /* Copy contents to MYADDR.  */
+	      len2 = r->length;
+	      if (len2 > len + memaddr - r->start)
+		len2 = len + memaddr - r->start;
+
+	      memcpy ((r->start - memaddr) + myaddr,
+		      buf + (r->start - memaddr_start),
+		      len2);
+
+	      /* Update cache lines in range
+		 [MEMADDR_START, MEMADDR_START + LEN1).  */
+	      for (j = 0; j < (len1 / dcache->line_size); j++)
+		{
+		  struct dcache_block *db = NULL;
+
+		  if (dcache_is_full (dcache))
+		    {
+		      /* DCACHE is full, don't put dcache_block in it,
+			 otherwise one dcache_block is evicted, which
+			 may be used later in the loop.  Save the
+			 dcache_block temporarily in POSTPONE_INSERT
+			 and use it to update DCACHE out side the
+			 loop.  */
+		      db = xmalloc (offsetof (struct dcache_block, data)
+				    + dcache->line_size);
+		      db->addr = MASK (dcache,
+				       memaddr_start + j * dcache->line_size);
+		      append_block (&postpone_insert, db);
+		    }
+		  else
+		    {
+		      /* DCACHE is not full, it is OK to put dcache_block in
+			 it.  */
+		      db = dcache_hit (dcache, memaddr_start + j * dcache->line_size);
+
+		      gdb_assert (db == NULL);
+
+		      db = dcache_alloc (dcache, memaddr_start + j * dcache->line_size);
+		    }
+
+		  memcpy (db->data, &buf[j * dcache->line_size], dcache->line_size);
+		}
+
+	      xfree (buf);
+
+	      if (res < len1)
+		{
+		  VEC_free (mem_range_s, uncached);
+		  return r->start - memaddr + res;
+		}
 	    }
 	}
+
+      VEC_free (mem_range_s, uncached);
+
+      for (; cached_addr < (memaddr + len); cached_addr++)
+	{
+	  struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+	  gdb_assert (db != NULL);
+
+	  myaddr[cached_addr - memaddr]
+	    = db->data[XFORM (dcache, cached_addr)];
+	}
+
+      /* Copy dcache_blocks in POSTPONE_INSERT to DCACHE.  */
+      for_each_block (&postpone_insert, copy_block, dcache);
+      for_each_block (&postpone_insert, free_block, NULL);
     }
 
   return len;
-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-22  0:50     ` Yao Qi
@ 2013-10-25 17:08       ` Doug Evans
  2013-11-06  9:24         ` Yao Qi
  2013-11-18 17:08         ` Pedro Alves
  0 siblings, 2 replies; 11+ messages in thread
From: Doug Evans @ 2013-10-25 17:08 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

On Mon, Oct 21, 2013 at 5:48 PM, Yao Qi <yao@codesourcery.com> wrote:
>> The data for cache line sizes less than the default (64 bytes)
>> is nice but not compelling.
>> E.g., What if we just increase the cache line size to 256, say?
>> Or 1024?  And/or maybe auto-adjust it based on
>
>
> In this case, it is useful to increase the cache line size.  Increase the
> cache line size to 256 can save 0.1 (1.16 -> 1.06) in my experiment, which
> is a good improvement.
>
> When people are doing remote debugging, and find it is slow.  The reason of
> slowness is usually unclear, so it may not be useful advise them to increase
> data cache line size.

If the cache line size is set based on the remote packet size (and all
else being equal, that's a reasonable thing to do),
then we don't have to tell them anything.
OTOH, I'm not entirely thrilled with changing the cache-line size automagically.

OTOOH, how many users even know that they *can* change the cache line size?
We don't have to advise them to change the cache line size (with the
word "advise" implying the end result is necessarily an improvement),
but I'm totally ok with suggesting it as something to *try*.

>> "remote memory-read-packet-size" or some such?
>
>
> "remote memory-read-packet-size" is not useful in this case, I am afraid.
> GDB reads one data cache line by a single 'm' packet, and the size (data
> cache line size) is less than the memory read packet size. Changing "remote
> memory-read-packet-size" doesn't have effects.

You misparsed what I wrote, sorry for being unclear.

Putting the sentence back together:
And/or maybe auto-adjust it based on "remote memory-read-packet-size"
or some such?
["it" being the cache line size]

When I connect to gdbserver on my amd64-linux box the cache line size
is 64 but the memory-read-packet-size is 16383.
Those are wildly different numbers.  I'm not suggesting setting the
cache line size to 16K, but it does suggest that 64 is a bit small.
OTOH, there's no point in having a 256 byte line size of the
memory-read-packet-size is 64.
OTOOH, with your patch to read multiple cache lines at once the cost
of smaller cache line sizes is mitigated.

It's not my favorite choice, I just mentioned it for discussion's sake.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-25 17:08       ` Doug Evans
@ 2013-11-06  9:24         ` Yao Qi
  2013-11-18 17:08         ` Pedro Alves
  1 sibling, 0 replies; 11+ messages in thread
From: Yao Qi @ 2013-11-06  9:24 UTC (permalink / raw)
  To: Doug Evans; +Cc: gdb-patches

On 10/26/2013 01:08 AM, Doug Evans wrote:
> If the cache line size is set based on the remote packet size (and all
> else being equal, that's a reasonable thing to do),
> then we don't have to tell them anything.
> OTOH, I'm not entirely thrilled with changing the cache-line size automagically.
>

Data cache line size should be related to the pattern GDB reads target 
data, instead of remote packet size.  Data cache is quite general, and 
we may not have to connect it with remote packet size.

"Setting cache line size based on the remote packet size" helps on 
performance, but IMO it is sort of different optimization from what I 
proposed here.

> OTOOH, how many users even know that they*can*  change the cache line size?
> We don't have to advise them to change the cache line size (with the
> word "advise" implying the end result is necessarily an improvement),
> but I'm totally ok with suggesting it as something to*try*.
>

Of course.  However, I still prefer to get GDB smarter to avoid 
performance issues as much as possible :).

>
>>> >>"remote memory-read-packet-size" or some such?
>> >
>> >
>> >"remote memory-read-packet-size" is not useful in this case, I am afraid.
>> >GDB reads one data cache line by a single 'm' packet, and the size (data
>> >cache line size) is less than the memory read packet size. Changing "remote
>> >memory-read-packet-size" doesn't have effects.
> You misparsed what I wrote, sorry for being unclear.
>
> Putting the sentence back together:
> And/or maybe auto-adjust it based on "remote memory-read-packet-size"
> or some such?
> ["it" being the cache line size]
>
> When I connect to gdbserver on my amd64-linux box the cache line size
> is 64 but the memory-read-packet-size is 16383.
> Those are wildly different numbers.  I'm not suggesting setting the
> cache line size to 16K, but it does suggest that 64 is a bit small.
> OTOH, there's no point in having a 256 byte line size of the
> memory-read-packet-size is 64.

Yeah, I agree that these two sizes should match each other to some extent.

> OTOOH, with your patch to read multiple cache lines at once the cost
> of smaller cache line sizes is mitigated.

Right.

-- 
Yao (é½å°§)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-25 17:08       ` Doug Evans
  2013-11-06  9:24         ` Yao Qi
@ 2013-11-18 17:08         ` Pedro Alves
  1 sibling, 0 replies; 11+ messages in thread
From: Pedro Alves @ 2013-11-18 17:08 UTC (permalink / raw)
  To: Doug Evans; +Cc: Yao Qi, gdb-patches

On 10/25/2013 06:08 PM, Doug Evans wrote:

> OTOH, there's no point in having a 256 byte line size of the
> memory-read-packet-size is 64.

That's an excellent point.  I think that we could also address that
in another way, without having the dcache code query the target
(or targets!) for maximum sizes -- currently, each cache line has a
fixed size, but instead, dcache could treat the line
size as the _maximum_ line size.  That is, 'struct dcache_block' would
gain a new field that recorded much much data actually is stored
in the block.  Then, when filling the block, dcache would use
target_xfer_partial instead of target_read/write, and if if the target
returns less than the block size, record how much it was read in the
block, and return to the dcache caller.  Further partial transfers
would fill the rest of the dcache line (up until it was full).  In
essence, dcache works at the target_xfer level, but has its own
loop that sort of "breaks" the partial transfer logic.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
  2013-10-18  3:22 ` [PATCH 2/2] " Yao Qi
  2013-10-21 20:38   ` Doug Evans
  2013-10-21 23:16   ` Doug Evans
@ 2013-11-18 18:17   ` Pedro Alves
  2 siblings, 0 replies; 11+ messages in thread
From: Pedro Alves @ 2013-11-18 18:17 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb-patches

Hmm, do we really this all this memory range merging and stuff?

+	 MEMADDR                               MEMADDR + LEN
+	 |<------------------------------------------------->|
+		|<-- wo1 -->|        |<-- wo2 -->|
+
+	 |<-r1->|           |<--r2-->|           |<---r3---->|
+
+	 -u-|-----c----|-----u----|-----c----|-----c----|--u--
+	 'u' stands for unchaced 'c' stands for cached.
+
+	 |u1|-c1-|          |  u2 |c2|           |--c3--| u3  |
+
+	 Uncached ranges are u1, u2 and u3, and cached ranges are c1,
+	 c2 and c3.  */

I feels like the dcache code only needs to find the next
boundary, and read/write till that.  That is, say, for a read,
the first dcache_xfer_memory would find the u1/c1 boundary,
then read [MEMADDR,u1/c1], and return.  memory_xfer_partial
will then request another chunk, starting at u1/c1.  That starts
at a cached range, so dcache would return c1.  memory_xfer_partial
then requests more data, starting at the end of c1.  That hits
a cached, non-readable range, so dcache returns error.  Etc.
I think finding the next cached/uncached boundary address should
be simpler than getting vecs of ranges and merging them all,
and handling reads/writes separately.  If the next boundary
spans multiple dcache lines, then you'd still read multiple
lines in one go.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-18 17:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-18  3:22 [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory Yao Qi
2013-10-18  3:22 ` [PATCH 2/2] " Yao Qi
2013-10-21 20:38   ` Doug Evans
2013-10-22  0:50     ` Yao Qi
2013-10-25 17:08       ` Doug Evans
2013-11-06  9:24         ` Yao Qi
2013-11-18 17:08         ` Pedro Alves
2013-10-21 23:16   ` Doug Evans
2013-10-22 12:27     ` Yao Qi
2013-11-18 18:17   ` Pedro Alves
2013-10-18  3:22 ` [PATCH 1/2] Separate read and write " Yao Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).