public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Optimize memory_xfer_partial for remote
@ 2016-06-03 19:02 Don Breazeal
  2016-06-20 15:31 ` [PING] " Don Breazeal
  2016-06-20 19:25 ` Pedro Alves
  0 siblings, 2 replies; 13+ messages in thread
From: Don Breazeal @ 2016-06-03 19:02 UTC (permalink / raw)
  To: gdb-patches

Some analysis we did here showed that increasing the cap on the
transfer size in target.c:memory_xfer_partial could give 20% or more
improvement in remote load across JTAG.  Transfer sizes are capped
to 4K bytes because of performance problems encountered with the
restore command, documented here:

https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html

and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34

The 4K cap was introduced because in a case where the restore command
requested a 100MB transfer, memory_xfer_partial would allocate and copy
an entire 100MB buffer in order to properly handle breakpoint shadow
instructions, even though memory_xfer_partial would actually only
write a small portion of the buffer contents.

A couple of alternative solutions were suggested:
* change the algorithm for handling the breakpoint shadow instructions
* throttle the transfer size up or down based on the previous actual
  transfer size

I tried implementing the throttling approach, and my implementation
reduced the performance in some cases.

This patch takes a simple approach: instead of hard-coding the cap on
transfer requests to 4096, use a variable and allow the target to set it.
This allows the remote target to set the cap to its packetsize.

Here is the performance for a 100MB restore command using an srec file
(minutes:seconds), where the remote has a packetsize of 16K bytes:
* existing implementation:   7:50
* proposed implementation:   5:34

We could make a similar change in target_read_alloc_1 and
target_fileio_read_alloc_1, but I left that alone for now.

I considered making target_set_memory_xfer_limit a function in the target
vector, but concluded that was overkill.  In this patch it is an external
function in target.c.

Tested on x86_64 Linux with native and native-gdbserver, and manually
tested 'restore' on a Windows 7 host with a bare-metal ARM board.

Thanks,
--Don

gdb/ChangeLog:
2016-06-03  Don Breazeal  <donb@codesourcery.com>

	* remote.c (remote_start_remote): Call
	target_set_memory_xfer_limit.
	* target.c (memory_xfer_limit): New static variable.
	(target_set_memory_xfer_limit): New function.
	(memory_xfer_partial): Use memory_xfer_limit in place of
	constant.
	* target.h (target_set_memory_xfer_limit): Declare new function.

---
 gdb/remote.c |  3 +++
 gdb/target.c | 19 +++++++++++++++++--
 gdb/target.h |  6 ++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 1f0d67c..028d555 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -4079,6 +4079,9 @@ remote_start_remote (int from_tty, struct target_ops *target, int extended_p)
       getpkt (&rs->buf, &rs->buf_size, 0);
     }
 
+  /* Set the cap on memory transfer requests to our packet size.  */
+  target_set_memory_xfer_limit (get_remote_packet_size ());
+
   /* Let the target know which signals it is allowed to pass down to
      the program.  */
   update_signals_program_target ();
diff --git a/gdb/target.c b/gdb/target.c
index c0ce46d..dde71c2 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -162,6 +162,10 @@ int may_insert_fast_tracepoints = 1;
 
 int may_stop = 1;
 
+/* Limit on size of memory transfers, see memory_xfer_partial.  */
+
+static ULONGEST memory_xfer_limit = 4096;
+
 /* Non-zero if we want to see trace of target level stuff.  */
 
 static unsigned int targetdebug = 0;
@@ -1235,6 +1239,16 @@ memory_xfer_partial_1 (struct target_ops *ops, enum target_object object,
   return res;
 }
 
+/* Set the cap on actual memory transfer requests.  This prevents
+   repeated requests to transfer much more than the transport
+   mechanism can accommodate.  See memory_xfer_partial.  */
+
+void
+target_set_memory_xfer_limit (ULONGEST new_limit)
+{
+  memory_xfer_limit = new_limit;
+}
+
 /* Perform a partial memory transfer.  For docs see target.h,
    to_xfer_partial.  */
 
@@ -1269,8 +1283,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
 	 by memory_xfer_partial_1.  We will continually malloc
 	 and free a copy of the entire write request for breakpoint
 	 shadow handling even though we only end up writing a small
-	 subset of it.  Cap writes to 4KB to mitigate this.  */
-      len = min (4096, len);
+	 subset of it.  Cap writes to the value of memory_xfer_limit
+	 to mitigate this.  */
+      len = min (memory_xfer_limit, len);
 
       buf = (gdb_byte *) xmalloc (len);
       old_chain = make_cleanup (xfree, buf);
diff --git a/gdb/target.h b/gdb/target.h
index 6b5b6e0..b511746 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -266,6 +266,12 @@ enum target_xfer_status
 			   const gdb_byte *writebuf, ULONGEST memaddr,
 			   LONGEST len, ULONGEST *xfered_len);
 
+/* Set the cap on actual memory transfer requests.  This prevents
+   repeated requests to transfer much more than the transport
+   mechanism can accommodate.  See memory_xfer_partial.  */
+
+extern void target_set_memory_xfer_limit (ULONGEST new_limit);
+
 /* Request that OPS transfer up to LEN addressable units of the target's
    OBJECT.  When reading from a memory object, the size of an addressable unit
    is architecture dependent and can be found using
-- 
2.8.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PING] [PATCH] Optimize memory_xfer_partial for remote
  2016-06-03 19:02 [PATCH] Optimize memory_xfer_partial for remote Don Breazeal
@ 2016-06-20 15:31 ` Don Breazeal
  2016-06-20 19:25 ` Pedro Alves
  1 sibling, 0 replies; 13+ messages in thread
From: Don Breazeal @ 2016-06-20 15:31 UTC (permalink / raw)
  To: gdb-patches

Ping

On 6/3/2016 12:02 PM, Don Breazeal wrote:
> Some analysis we did here showed that increasing the cap on the
> transfer size in target.c:memory_xfer_partial could give 20% or more
> improvement in remote load across JTAG.  Transfer sizes are capped
> to 4K bytes because of performance problems encountered with the
> restore command, documented here:
> 
> https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html
> 
> and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34
> 
> The 4K cap was introduced because in a case where the restore command
> requested a 100MB transfer, memory_xfer_partial would allocate and copy
> an entire 100MB buffer in order to properly handle breakpoint shadow
> instructions, even though memory_xfer_partial would actually only
> write a small portion of the buffer contents.
> 
> A couple of alternative solutions were suggested:
> * change the algorithm for handling the breakpoint shadow instructions
> * throttle the transfer size up or down based on the previous actual
>   transfer size
> 
> I tried implementing the throttling approach, and my implementation
> reduced the performance in some cases.
> 
> This patch takes a simple approach: instead of hard-coding the cap on
> transfer requests to 4096, use a variable and allow the target to set it.
> This allows the remote target to set the cap to its packetsize.
> 
> Here is the performance for a 100MB restore command using an srec file
> (minutes:seconds), where the remote has a packetsize of 16K bytes:
> * existing implementation:   7:50
> * proposed implementation:   5:34
> 
> We could make a similar change in target_read_alloc_1 and
> target_fileio_read_alloc_1, but I left that alone for now.
> 
> I considered making target_set_memory_xfer_limit a function in the target
> vector, but concluded that was overkill.  In this patch it is an external
> function in target.c.
> 
> Tested on x86_64 Linux with native and native-gdbserver, and manually
> tested 'restore' on a Windows 7 host with a bare-metal ARM board.
> 
> Thanks,
> --Don
> 
> gdb/ChangeLog:
> 2016-06-03  Don Breazeal  <donb@codesourcery.com>
> 
> 	* remote.c (remote_start_remote): Call
> 	target_set_memory_xfer_limit.
> 	* target.c (memory_xfer_limit): New static variable.
> 	(target_set_memory_xfer_limit): New function.
> 	(memory_xfer_partial): Use memory_xfer_limit in place of
> 	constant.
> 	* target.h (target_set_memory_xfer_limit): Declare new function.
> 
> ---
>  gdb/remote.c |  3 +++
>  gdb/target.c | 19 +++++++++++++++++--
>  gdb/target.h |  6 ++++++
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 1f0d67c..028d555 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -4079,6 +4079,9 @@ remote_start_remote (int from_tty, struct target_ops *target, int extended_p)
>        getpkt (&rs->buf, &rs->buf_size, 0);
>      }
>  
> +  /* Set the cap on memory transfer requests to our packet size.  */
> +  target_set_memory_xfer_limit (get_remote_packet_size ());
> +
>    /* Let the target know which signals it is allowed to pass down to
>       the program.  */
>    update_signals_program_target ();
> diff --git a/gdb/target.c b/gdb/target.c
> index c0ce46d..dde71c2 100644
> --- a/gdb/target.c
> +++ b/gdb/target.c
> @@ -162,6 +162,10 @@ int may_insert_fast_tracepoints = 1;
>  
>  int may_stop = 1;
>  
> +/* Limit on size of memory transfers, see memory_xfer_partial.  */
> +
> +static ULONGEST memory_xfer_limit = 4096;
> +
>  /* Non-zero if we want to see trace of target level stuff.  */
>  
>  static unsigned int targetdebug = 0;
> @@ -1235,6 +1239,16 @@ memory_xfer_partial_1 (struct target_ops *ops, enum target_object object,
>    return res;
>  }
>  
> +/* Set the cap on actual memory transfer requests.  This prevents
> +   repeated requests to transfer much more than the transport
> +   mechanism can accommodate.  See memory_xfer_partial.  */
> +
> +void
> +target_set_memory_xfer_limit (ULONGEST new_limit)
> +{
> +  memory_xfer_limit = new_limit;
> +}
> +
>  /* Perform a partial memory transfer.  For docs see target.h,
>     to_xfer_partial.  */
>  
> @@ -1269,8 +1283,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
>  	 by memory_xfer_partial_1.  We will continually malloc
>  	 and free a copy of the entire write request for breakpoint
>  	 shadow handling even though we only end up writing a small
> -	 subset of it.  Cap writes to 4KB to mitigate this.  */
> -      len = min (4096, len);
> +	 subset of it.  Cap writes to the value of memory_xfer_limit
> +	 to mitigate this.  */
> +      len = min (memory_xfer_limit, len);
>  
>        buf = (gdb_byte *) xmalloc (len);
>        old_chain = make_cleanup (xfree, buf);
> diff --git a/gdb/target.h b/gdb/target.h
> index 6b5b6e0..b511746 100644
> --- a/gdb/target.h
> +++ b/gdb/target.h
> @@ -266,6 +266,12 @@ enum target_xfer_status
>  			   const gdb_byte *writebuf, ULONGEST memaddr,
>  			   LONGEST len, ULONGEST *xfered_len);
>  
> +/* Set the cap on actual memory transfer requests.  This prevents
> +   repeated requests to transfer much more than the transport
> +   mechanism can accommodate.  See memory_xfer_partial.  */
> +
> +extern void target_set_memory_xfer_limit (ULONGEST new_limit);
> +
>  /* Request that OPS transfer up to LEN addressable units of the target's
>     OBJECT.  When reading from a memory object, the size of an addressable unit
>     is architecture dependent and can be found using
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize memory_xfer_partial for remote
  2016-06-03 19:02 [PATCH] Optimize memory_xfer_partial for remote Don Breazeal
  2016-06-20 15:31 ` [PING] " Don Breazeal
@ 2016-06-20 19:25 ` Pedro Alves
  2016-06-21 10:15   ` Yao Qi
  2016-06-24 21:21   ` [PATCH v2] " Don Breazeal
  1 sibling, 2 replies; 13+ messages in thread
From: Pedro Alves @ 2016-06-20 19:25 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 06/03/2016 08:02 PM, Don Breazeal wrote:

> I considered making target_set_memory_xfer_limit a function in the target
> vector, but concluded that was overkill.  In this patch it is an external
> function in target.c.

Sorry, but that doesn't make sense.  If in the same session you
switch to another target (e.g., core or native debugging), you'll 
continue using the limit set up by the previous remote connection.

There's also an effort to teach gdb about connecting to 
multiple remote targets at the same time.  A global like this
would need to be adjusted to per-connection anyway.

So seems to be we should have a real target_get_memory_xfer_limit()
(or some such) target method.

>  
> +  /* Set the cap on memory transfer requests to our packet size.  */
> +  target_set_memory_xfer_limit (get_remote_packet_size ());

Shouldn't this be based on get_memory_write_packet_size() instead?

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Optimize memory_xfer_partial for remote
  2016-06-20 19:25 ` Pedro Alves
@ 2016-06-21 10:15   ` Yao Qi
  2016-06-24 21:21   ` [PATCH v2] " Don Breazeal
  1 sibling, 0 replies; 13+ messages in thread
From: Yao Qi @ 2016-06-21 10:15 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Don Breazeal, gdb-patches

Pedro Alves <palves@redhat.com> writes:

> So seems to be we should have a real target_get_memory_xfer_limit()
> (or some such) target method.

Alternatively, we can move breakpoint_xfer_memory down to each target
(remote, native, etc), on which we know what limit it is.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] Optimize memory_xfer_partial for remote
  2016-06-20 19:25 ` Pedro Alves
  2016-06-21 10:15   ` Yao Qi
@ 2016-06-24 21:21   ` Don Breazeal
  2016-06-24 22:23     ` Pedro Alves
  1 sibling, 1 reply; 13+ messages in thread
From: Don Breazeal @ 2016-06-24 21:21 UTC (permalink / raw)
  To: gdb-patches, qiyaoltc, palves

Pedro, Yao,
Here is version two of the patch to optimize the limit on the size of
memory transfers, specifically memory writes.  Sorry it took a little
while to turn this around.  The patch has been rewritten to use a
target function as Pedro suggested.  Responses to both Pedro and Yao's
comments follow, then the revised patch.

Note that performance numbers in the patch description were generated on
hardware different than the hardware used for the previous patch, so
one can't compare them with the previous patch's performance numbers.

Thanks,
--Don

On 6/20/2016 12:25 PM, Pedro Alves wrote:
> On 06/03/2016 08:02 PM, Don Breazeal wrote:
> 
>> I considered making target_set_memory_xfer_limit a function in the target
>> vector, but concluded that was overkill.  In this patch it is an external
>> function in target.c.
> 
> Sorry, but that doesn't make sense.  If in the same session you
> switch to another target (e.g., core or native debugging), you'll 
> continue using the limit set up by the previous remote connection.
> 
> There's also an effort to teach gdb about connecting to 
> multiple remote targets at the same time.  A global like this
> would need to be adjusted to per-connection anyway.
> 
> So seems to be we should have a real target_get_memory_xfer_limit()
> (or some such) target method.

Ah, I see.  The patch has been revised to do this.

On 6/21/2016 3:15 AM, Yao Qi wrote:
> Alternatively, we can move breakpoint_xfer_memory down to each target
> (remote, native, etc), on which we know what limit it is.

I went with the get_memory_xfer_limit approach, lacking any technical
reason to choose one approach over the other.

>> +  /* Set the cap on memory transfer requests to our packet size.  */
>> +  target_set_memory_xfer_limit (get_remote_packet_size ());
> 
> Shouldn't this be based on get_memory_write_packet_size() instead?

The patch has been revised to use this as well.

Thanks,
--Don
----------
Some analysis we did here showed that increasing the cap on the
transfer size in target.c:memory_xfer_partial could give 20% or more
improvement in remote load across JTAG.  Transfer sizes are capped
to 4K bytes because of performance problems encountered with the
restore command, documented here:

https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html

and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34

The 4K cap was introduced because in a case where the restore command
requested a 100MB transfer, memory_xfer_partial would allocate and copy
an entire 100MB buffer in order to properly handle breakpoint shadow
instructions, even though memory_xfer_partial would actually only
write a small portion of the buffer contents.

A couple of alternative solutions were suggested:
* change the algorithm for handling the breakpoint shadow instructions
* throttle the transfer size up or down based on the previous actual
  transfer size

I tried implementing the throttling approach, and my implementation
reduced the performance in some cases.

This patch implements a new target function that returns that target's
limit on memory transfer size.  It defaults to 4K bytes as before, but
for remote it uses remote.c:get_memory_write_packet_size.

The performance differences that I saw were as follows (in seconds),
using an artificially large application and a 100MB srec file:

USB  load:     orig   53.2 patch  18.9
USB  restore:  orig 1522.4 patch 543.6
Enet load:     orig   12.2 patch  10.0
Enet restore:  orig  368.0 patch 294.3

Tested on x86_64 Linux with native and native-gdbserver, and manually
tested 'load' and 'restore' on a Windows 7 host with a bare-metal ARM
board.

gdb/ChangeLog:
2016-06-24  Don Breazeal  <donb@codesourcery.com>

	* remote.c (remote_get_memory_xfer_limit): New function.
	* target-delegates.c (delegate_get_memory_xfer_limit,
	debug_get_memory_xfer_limit, install_delegators,
	install_dummy_methods, init_debug_target): New functions
	and target_ops initialization from regenerating the file.
	* target.c (default_get_memory_xfer_limit): New function and
	forward declaration.
	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
	* target.h (struct target_ops)<to_get_memory_xfer_limit>: New
	member.

---
 gdb/remote.c           |  7 +++++++
 gdb/target-delegates.c | 25 +++++++++++++++++++++++++
 gdb/target.c           | 18 ++++++++++++++++--
 gdb/target.h           |  6 ++++++
 4 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 501f3c6..03c7ab7 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -10160,6 +10160,12 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
   return TARGET_XFER_OK;
 }
 
+static ULONGEST
+remote_get_memory_xfer_limit (struct target_ops *ops)
+{
+  return get_memory_write_packet_size ();
+}
+
 static int
 remote_search_memory (struct target_ops* ops,
 		      CORE_ADDR start_addr, ULONGEST search_space_len,
@@ -13073,6 +13079,7 @@ Specify the serial device it is connected to\n\
   remote_ops.to_interrupt = remote_interrupt;
   remote_ops.to_pass_ctrlc = remote_pass_ctrlc;
   remote_ops.to_xfer_partial = remote_xfer_partial;
+  remote_ops.to_get_memory_xfer_limit = remote_get_memory_xfer_limit;
   remote_ops.to_rcmd = remote_rcmd;
   remote_ops.to_pid_to_exec_file = remote_pid_to_exec_file;
   remote_ops.to_log_command = serial_log_command;
diff --git a/gdb/target-delegates.c b/gdb/target-delegates.c
index 03aa2cc..18f22b5 100644
--- a/gdb/target-delegates.c
+++ b/gdb/target-delegates.c
@@ -2064,6 +2064,27 @@ debug_xfer_partial (struct target_ops *self, enum target_object arg1, const char
   return result;
 }
 
+static ULONGEST
+delegate_get_memory_xfer_limit (struct target_ops *self)
+{
+  self = self->beneath;
+  return self->to_get_memory_xfer_limit (self);
+}
+
+static ULONGEST
+debug_get_memory_xfer_limit (struct target_ops *self)
+{
+  ULONGEST result;
+  fprintf_unfiltered (gdb_stdlog, "-> %s->to_get_memory_xfer_limit (...)\n", debug_target.to_shortname);
+  result = debug_target.to_get_memory_xfer_limit (&debug_target);
+  fprintf_unfiltered (gdb_stdlog, "<- %s->to_get_memory_xfer_limit (", debug_target.to_shortname);
+  target_debug_print_struct_target_ops_p (&debug_target);
+  fputs_unfiltered (") = ", gdb_stdlog);
+  target_debug_print_ULONGEST (result);
+  fputs_unfiltered ("\n", gdb_stdlog);
+  return result;
+}
+
 static VEC(mem_region_s) *
 delegate_memory_map (struct target_ops *self)
 {
@@ -4223,6 +4244,8 @@ install_delegators (struct target_ops *ops)
     ops->to_get_thread_local_address = delegate_get_thread_local_address;
   if (ops->to_xfer_partial == NULL)
     ops->to_xfer_partial = delegate_xfer_partial;
+  if (ops->to_get_memory_xfer_limit == NULL)
+    ops->to_get_memory_xfer_limit = delegate_get_memory_xfer_limit;
   if (ops->to_memory_map == NULL)
     ops->to_memory_map = delegate_memory_map;
   if (ops->to_flash_erase == NULL)
@@ -4454,6 +4477,7 @@ install_dummy_methods (struct target_ops *ops)
   ops->to_goto_bookmark = tdefault_goto_bookmark;
   ops->to_get_thread_local_address = tdefault_get_thread_local_address;
   ops->to_xfer_partial = tdefault_xfer_partial;
+  ops->to_get_memory_xfer_limit = default_get_memory_xfer_limit;
   ops->to_memory_map = tdefault_memory_map;
   ops->to_flash_erase = tdefault_flash_erase;
   ops->to_flash_done = tdefault_flash_done;
@@ -4610,6 +4634,7 @@ init_debug_target (struct target_ops *ops)
   ops->to_goto_bookmark = debug_goto_bookmark;
   ops->to_get_thread_local_address = debug_get_thread_local_address;
   ops->to_xfer_partial = debug_xfer_partial;
+  ops->to_get_memory_xfer_limit = debug_get_memory_xfer_limit;
   ops->to_memory_map = debug_memory_map;
   ops->to_flash_erase = debug_flash_erase;
   ops->to_flash_done = debug_flash_done;
diff --git a/gdb/target.c b/gdb/target.c
index 6f69ac3..57202b4 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -60,6 +60,8 @@ static int default_region_ok_for_hw_watchpoint (struct target_ops *,
 
 static void default_rcmd (struct target_ops *, const char *, struct ui_file *);
 
+static ULONGEST default_get_memory_xfer_limit (struct target_ops *self);
+
 static ptid_t default_get_ada_task_ptid (struct target_ops *self,
 					 long lwp, long tid);
 
@@ -623,6 +625,17 @@ default_terminal_info (struct target_ops *self, const char *args, int from_tty)
   printf_unfiltered (_("No saved terminal information.\n"));
 }
 
+/* The default implementation for the to_get_memory_xfer_limit method.
+   The hard-coded limit here was determined to be a reasonable default
+   that eliminated exponential slowdown on very large transfers without
+   unduly compromising performance on smaller transfers.  */
+
+static ULONGEST
+default_get_memory_xfer_limit (struct target_ops *self)
+{
+  return 4096;
+}
+
 /* A default implementation for the to_get_ada_task_ptid target method.
 
    This function builds the PTID by using both LWP and TID as part of
@@ -1301,8 +1314,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
 	 by memory_xfer_partial_1.  We will continually malloc
 	 and free a copy of the entire write request for breakpoint
 	 shadow handling even though we only end up writing a small
-	 subset of it.  Cap writes to 4KB to mitigate this.  */
-      len = min (4096, len);
+	 subset of it.  Cap writes to a limit specified by the target
+	 to mitigate this.  */
+      len = min (ops->to_get_memory_xfer_limit (ops), len);
 
       buf = (gdb_byte *) xmalloc (len);
       old_chain = make_cleanup (xfree, buf);
diff --git a/gdb/target.h b/gdb/target.h
index 6b5b6e0..84f12a9 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -745,6 +745,12 @@ struct target_ops
 						ULONGEST *xfered_len)
       TARGET_DEFAULT_RETURN (TARGET_XFER_E_IO);
 
+    /* Return the limit on the size of any single memory transfer
+       for the target.  */
+
+    ULONGEST (*to_get_memory_xfer_limit) (struct target_ops *)
+      TARGET_DEFAULT_FUNC (default_get_memory_xfer_limit);
+
     /* Returns the memory map for the target.  A return value of NULL
        means that no memory map is available.  If a memory address
        does not fall within any returned regions, it's assumed to be
-- 
2.8.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Optimize memory_xfer_partial for remote
  2016-06-24 21:21   ` [PATCH v2] " Don Breazeal
@ 2016-06-24 22:23     ` Pedro Alves
  2016-06-27 20:23       ` Don Breazeal
  0 siblings, 1 reply; 13+ messages in thread
From: Pedro Alves @ 2016-06-24 22:23 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches, qiyaoltc

On 06/24/2016 10:21 PM, Don Breazeal wrote:
> 
> and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34
> 

Write:

 ... with commit 67c059c29e1f ("Improve performance of large restore
 commands") ...

so the reader has a clue what the commit is about without having
to check.


> gdb/ChangeLog:
> 2016-06-24  Don Breazeal  <donb@codesourcery.com>
> 
> 	* remote.c (remote_get_memory_xfer_limit): New function.
> 	* target-delegates.c (delegate_get_memory_xfer_limit,
> 	debug_get_memory_xfer_limit, install_delegators,
> 	install_dummy_methods, init_debug_target): New functions
> 	and target_ops initialization from regenerating the file.

The standard practice is to just say:

 	* target-delegates.c: Regenerate.

> 	* target.c (default_get_memory_xfer_limit): New function and
> 	forward declaration.
> 	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
> 	* target.h (struct target_ops)<to_get_memory_xfer_limit>: New
> 	member.

Space between ")<".

> 
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 501f3c6..03c7ab7 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -10160,6 +10160,12 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
>    return TARGET_XFER_OK;
>  }
>  
> +static ULONGEST
> +remote_get_memory_xfer_limit (struct target_ops *ops)

Intro comment, something like "Implementation of ... method.".

>  
> +/* The default implementation for the to_get_memory_xfer_limit method.
> +   The hard-coded limit here was determined to be a reasonable default
> +   that eliminated exponential slowdown on very large transfers without
> +   unduly compromising performance on smaller transfers.  */

Where's this coming from?  Is this new experimentation you did,
or are you talking about Anton's patch?

> @@ -1301,8 +1314,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
>  	 by memory_xfer_partial_1.  We will continually malloc
>  	 and free a copy of the entire write request for breakpoint
>  	 shadow handling even though we only end up writing a small
> -	 subset of it.  Cap writes to 4KB to mitigate this.  */
> -      len = min (4096, len);
> +	 subset of it.  Cap writes to a limit specified by the target
> +	 to mitigate this.  */
> +      len = min (ops->to_get_memory_xfer_limit (ops), len);
>  

Does this still work if remote is not the top-most target?

E.g., what happens if you do "record" to push a record_statum
target on top?  Do we get the 4KB default limit, or the
remote limit?

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Optimize memory_xfer_partial for remote
  2016-06-24 22:23     ` Pedro Alves
@ 2016-06-27 20:23       ` Don Breazeal
  2016-06-30 17:06         ` Pedro Alves
  0 siblings, 1 reply; 13+ messages in thread
From: Don Breazeal @ 2016-06-27 20:23 UTC (permalink / raw)
  To: gdb-patches, palves, qiyaoltc

On 6/24/2016 3:23 PM, Pedro Alves wrote:
> On 06/24/2016 10:21 PM, Don Breazeal wrote:
>>
>> and with commit hash: 67c059c29e1fb0cdeacdd2005f955514d8d1fb34
>>
> 
> Write:
> 
>  ... with commit 67c059c29e1f ("Improve performance of large restore
>  commands") ...
> 
> so the reader has a clue what the commit is about without having
> to check.

Hi Pedro,
Thanks for your comments; sorry about the coding convention and clarity
issues. The one above and the others are fixed in the attached patch below.

> 
> 
>> gdb/ChangeLog:
>> 2016-06-24  Don Breazeal  <donb@codesourcery.com>
>>
>> 	* remote.c (remote_get_memory_xfer_limit): New function.
>> 	* target-delegates.c (delegate_get_memory_xfer_limit,
>> 	debug_get_memory_xfer_limit, install_delegators,
>> 	install_dummy_methods, init_debug_target): New functions
>> 	and target_ops initialization from regenerating the file.
> 
> The standard practice is to just say:
> 
>  	* target-delegates.c: Regenerate.

Fixed in the attached patch.

> 
>> 	* target.c (default_get_memory_xfer_limit): New function and
>> 	forward declaration.
>> 	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
>> 	* target.h (struct target_ops)<to_get_memory_xfer_limit>: New
>> 	member.
> 
> Space between ")<".

Fixed in the attached patch.

> 
>>
>> diff --git a/gdb/remote.c b/gdb/remote.c
>> index 501f3c6..03c7ab7 100644
>> --- a/gdb/remote.c
>> +++ b/gdb/remote.c
>> @@ -10160,6 +10160,12 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
>>    return TARGET_XFER_OK;
>>  }
>>  
>> +static ULONGEST
>> +remote_get_memory_xfer_limit (struct target_ops *ops)
> 
> Intro comment, something like "Implementation of ... method.".

Fixed in the attached patch.

> 
>>  
>> +/* The default implementation for the to_get_memory_xfer_limit method.
>> +   The hard-coded limit here was determined to be a reasonable default
>> +   that eliminated exponential slowdown on very large transfers without
>> +   unduly compromising performance on smaller transfers.  */
> 
> Where's this coming from?  Is this new experimentation you did,
> or are you talking about Anton's patch?

Both.  I did some experimentation to verify that things were significantly
slower without any memory transfer limit, which they were, although I never
reproduced the extreme scenario Anton had reported.  Presumably the
performance differences were due to hardware and environment differences.
Regarding the comment, I thought some explanation of the hard-coded number
was appropriate.  Is there a better or more preferable way to do this, e.g.
refer to the commit hash, or does it just seem superfluous?

> 
>> @@ -1301,8 +1314,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
>>  	 by memory_xfer_partial_1.  We will continually malloc
>>  	 and free a copy of the entire write request for breakpoint
>>  	 shadow handling even though we only end up writing a small
>> -	 subset of it.  Cap writes to 4KB to mitigate this.  */
>> -      len = min (4096, len);
>> +	 subset of it.  Cap writes to a limit specified by the target
>> +	 to mitigate this.  */
>> +      len = min (ops->to_get_memory_xfer_limit (ops), len);
>>  
> 
> Does this still work if remote is not the top-most target?

Yes, see next comment.

> 
> E.g., what happens if you do "record" to push a record_statum
> target on top?  Do we get the 4KB default limit, or the
> remote limit?

I ran this experiment and verified that the target delegation mechanism
worked as expected in this case. It walks down the target stack, skipping
targets that don't implement to_get_memory_xfer_limit, and returning after
calling the function in the first target that does implement it. I also
verified that it walks all the way down the target stack and calls the
default function if no target has defined its own version of
to_get_memory_xfer_limit.

Output of the experiment with record and remote is below, followed by the
revised patch.

Thanks
--Don

[build5-trusty-cs(5.68) 694] bin$ gdb ./gdb
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
... GDB banner stuff...
Reading symbols from ./gdb...done.
(gdb) set prompt (top)
(top) b target.c:1319
Breakpoint 1 at 0x65985b: file ../../binutils-gdb/gdb/target.c, line 1319.
(top) run ~/test/big
Starting program: /scratch/dbreazea/gdb-5301/install/bin/gdb ~/test/big
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
GNU gdb (GDB) 7.11.50.20160624-git
...GDB banner stuff...
Reading symbols from /home/dbreazea/test/big...done.
(gdb) tar rem localhost:51111
Remote debugging using localhost:51111
...Reading symbols msgs...
0x00007ffff7ddb2d0 in ?? () from target:/lib64/ld-linux-x86-64.so.2
(gdb) b main
Breakpoint 1 at 0x40051e: file /home/dbreazea/test/big.c, line 9.
(gdb) c
Continuing.
Reading /lib/x86_64-linux-gnu/libc.so.6 from remote target...
Reading /lib/x86_64-linux-gnu/libc-2.19.so from remote target...
Reading /lib/x86_64-linux-gnu/.debug/libc-2.19.so from remote target...

Breakpoint 1, main () at /home/dbreazea/test/big.c:9
9 printf ("starting...\n");
(gdb) record full <---<<< push the target on top of the remote target
(gdb) b 12
Breakpoint 2 at 0x40054d: file /home/dbreazea/test/big.c, line 12.
(gdb) c
Continuing.

Breakpoint 2, main () at /home/dbreazea/test/big.c:12
12 printf ("done, a[%d] is %d\n", 42, a[42]);
(gdb) restore ../../build/gdb/a.srec <---<<< 'restore' for big writes
Restoring section .sec1 (0x6009a0 to 0x6089a0)

Breakpoint 1, memory_xfer_partial (ops=0xe07e20 <record_full_ops>,
object=TARGET_OBJECT_MEMORY, readbuf=0x0, writebuf=0x212c480 "",
memaddr=6293920, len=32768, xfered_len=0x7fffffffdea8)
at ../../binutils-gdb/gdb/target.c:1319
1319 len = min (ops->to_get_memory_xfer_limit (ops), len);
(top) p ops->to_longname
$1 = 0x9ddc00 "Process record and replay target"
(top) s
delegate_get_memory_xfer_limit (self=0xe07e20 <record_full_ops>)
at ../../binutils-gdb/gdb/target-delegates.c:2070
2070 self = self->beneath;
(top) display self->to_longname
1: self->to_longname = 0x9ddc00 "Process record and replay target"
(top) s
2071 return self->to_get_memory_xfer_limit (self);
1: self->to_longname = 0x8fcd00 "Remote serial target in gdb-specific protocol"
(top) s
remote_get_memory_xfer_limit (ops=0xdf0360 <remote_ops>)
at ../../binutils-gdb/gdb/remote.c:10166
10166 return get_memory_write_packet_size ();
(top) n
10167 }
(top) n
delegate_get_memory_xfer_limit (self=0xdf0360 <remote_ops>)
at ../../binutils-gdb/gdb/target-delegates.c:2072
2072 }
1: self->to_longname = 0x8fcd00 "Remote serial target in gdb-specific protocol"
(top) n
memory_xfer_partial (ops=0xe07e20 <record_full_ops>,
object=TARGET_OBJECT_MEMORY, readbuf=0x0, writebuf=0x212c480 "",
memaddr=6293920, len=16383, xfered_len=0x7fffffffdea8)
at ../../binutils-gdb/gdb/target.c:1321
1321 buf = (gdb_byte *) xmalloc (len);
(top) p len
$2 = 16383 <---<<< 16K (approx) len from remote target function

---------- revised patch -----------
Some analysis we did here showed that increasing the cap on the
transfer size in target.c:memory_xfer_partial could give 20% or more
improvement in remote load across JTAG.  Transfer sizes are capped
to 4K bytes because of performance problems encountered with the
restore command, documented here:

https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html

and in commit 67c059c29e1f ("Improve performance of large restore
commands").

The 4K cap was introduced because in a case where the restore command
requested a 100MB transfer, memory_xfer_partial would repeatedy
allocate and copy an entire 100MB buffer in order to properly handle
breakpoint shadow instructions, even though memory_xfer_partial would
actually only write a small portion of the buffer contents.

A couple of alternative solutions were suggested:
* change the algorithm for handling the breakpoint shadow instructions
* throttle the transfer size up or down based on the previous actual
  transfer size

I tried implementing the throttling approach, and my implementation
reduced the performance in some cases.

This patch implements a new target function that returns that target's
limit on memory transfer size.  It defaults to 4K bytes as before, but
for remote it uses remote.c:get_memory_write_packet_size.

The performance differences that I saw were as follows (in seconds),
using an artificially large application and a 100MB srec file:

USB  load:     orig   53.2 patch  18.9
USB  restore:  orig 1522.4 patch 543.6
Enet load:     orig   12.2 patch  10.0
Enet restore:  orig  368.0 patch 294.3

Tested on x86_64 Linux with native and native-gdbserver, and manually
tested 'load' and 'restore' on a Windows 7 host with a bare-metal ARM
board.

gdb/ChangeLog:
2016-06-27  Don Breazeal  <donb@codesourcery.com>

	* remote.c (remote_get_memory_xfer_limit): New function.
	* target-delegates.c: Regenerate.
	* target.c (default_get_memory_xfer_limit): New function and
	forward declaration.
	(memory_xfer_partial): Call target_ops.to_get_memory_xfer_limit.
	* target.h (struct target_ops) <to_get_memory_xfer_limit>: New
	member.

---
 gdb/remote.c           |  9 +++++++++
 gdb/target-delegates.c | 25 +++++++++++++++++++++++++
 gdb/target.c           | 18 ++++++++++++++++--
 gdb/target.h           |  6 ++++++
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 501f3c6..dfa41ef 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -10160,6 +10160,14 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
   return TARGET_XFER_OK;
 }
 
+/* Implementation of to_get_memory_xfer_limit.  */
+
+static ULONGEST
+remote_get_memory_xfer_limit (struct target_ops *ops)
+{
+  return get_memory_write_packet_size ();
+}
+
 static int
 remote_search_memory (struct target_ops* ops,
 		      CORE_ADDR start_addr, ULONGEST search_space_len,
@@ -13073,6 +13081,7 @@ Specify the serial device it is connected to\n\
   remote_ops.to_interrupt = remote_interrupt;
   remote_ops.to_pass_ctrlc = remote_pass_ctrlc;
   remote_ops.to_xfer_partial = remote_xfer_partial;
+  remote_ops.to_get_memory_xfer_limit = remote_get_memory_xfer_limit;
   remote_ops.to_rcmd = remote_rcmd;
   remote_ops.to_pid_to_exec_file = remote_pid_to_exec_file;
   remote_ops.to_log_command = serial_log_command;
diff --git a/gdb/target-delegates.c b/gdb/target-delegates.c
index 03aa2cc..18f22b5 100644
--- a/gdb/target-delegates.c
+++ b/gdb/target-delegates.c
@@ -2064,6 +2064,27 @@ debug_xfer_partial (struct target_ops *self, enum target_object arg1, const char
   return result;
 }
 
+static ULONGEST
+delegate_get_memory_xfer_limit (struct target_ops *self)
+{
+  self = self->beneath;
+  return self->to_get_memory_xfer_limit (self);
+}
+
+static ULONGEST
+debug_get_memory_xfer_limit (struct target_ops *self)
+{
+  ULONGEST result;
+  fprintf_unfiltered (gdb_stdlog, "-> %s->to_get_memory_xfer_limit (...)\n", debug_target.to_shortname);
+  result = debug_target.to_get_memory_xfer_limit (&debug_target);
+  fprintf_unfiltered (gdb_stdlog, "<- %s->to_get_memory_xfer_limit (", debug_target.to_shortname);
+  target_debug_print_struct_target_ops_p (&debug_target);
+  fputs_unfiltered (") = ", gdb_stdlog);
+  target_debug_print_ULONGEST (result);
+  fputs_unfiltered ("\n", gdb_stdlog);
+  return result;
+}
+
 static VEC(mem_region_s) *
 delegate_memory_map (struct target_ops *self)
 {
@@ -4223,6 +4244,8 @@ install_delegators (struct target_ops *ops)
     ops->to_get_thread_local_address = delegate_get_thread_local_address;
   if (ops->to_xfer_partial == NULL)
     ops->to_xfer_partial = delegate_xfer_partial;
+  if (ops->to_get_memory_xfer_limit == NULL)
+    ops->to_get_memory_xfer_limit = delegate_get_memory_xfer_limit;
   if (ops->to_memory_map == NULL)
     ops->to_memory_map = delegate_memory_map;
   if (ops->to_flash_erase == NULL)
@@ -4454,6 +4477,7 @@ install_dummy_methods (struct target_ops *ops)
   ops->to_goto_bookmark = tdefault_goto_bookmark;
   ops->to_get_thread_local_address = tdefault_get_thread_local_address;
   ops->to_xfer_partial = tdefault_xfer_partial;
+  ops->to_get_memory_xfer_limit = default_get_memory_xfer_limit;
   ops->to_memory_map = tdefault_memory_map;
   ops->to_flash_erase = tdefault_flash_erase;
   ops->to_flash_done = tdefault_flash_done;
@@ -4610,6 +4634,7 @@ init_debug_target (struct target_ops *ops)
   ops->to_goto_bookmark = debug_goto_bookmark;
   ops->to_get_thread_local_address = debug_get_thread_local_address;
   ops->to_xfer_partial = debug_xfer_partial;
+  ops->to_get_memory_xfer_limit = debug_get_memory_xfer_limit;
   ops->to_memory_map = debug_memory_map;
   ops->to_flash_erase = debug_flash_erase;
   ops->to_flash_done = debug_flash_done;
diff --git a/gdb/target.c b/gdb/target.c
index 6f69ac3..57202b4 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -60,6 +60,8 @@ static int default_region_ok_for_hw_watchpoint (struct target_ops *,
 
 static void default_rcmd (struct target_ops *, const char *, struct ui_file *);
 
+static ULONGEST default_get_memory_xfer_limit (struct target_ops *self);
+
 static ptid_t default_get_ada_task_ptid (struct target_ops *self,
 					 long lwp, long tid);
 
@@ -623,6 +625,17 @@ default_terminal_info (struct target_ops *self, const char *args, int from_tty)
   printf_unfiltered (_("No saved terminal information.\n"));
 }
 
+/* The default implementation for the to_get_memory_xfer_limit method.
+   The hard-coded limit here was determined to be a reasonable default
+   that eliminated exponential slowdown on very large transfers without
+   unduly compromising performance on smaller transfers.  */
+
+static ULONGEST
+default_get_memory_xfer_limit (struct target_ops *self)
+{
+  return 4096;
+}
+
 /* A default implementation for the to_get_ada_task_ptid target method.
 
    This function builds the PTID by using both LWP and TID as part of
@@ -1301,8 +1314,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
 	 by memory_xfer_partial_1.  We will continually malloc
 	 and free a copy of the entire write request for breakpoint
 	 shadow handling even though we only end up writing a small
-	 subset of it.  Cap writes to 4KB to mitigate this.  */
-      len = min (4096, len);
+	 subset of it.  Cap writes to a limit specified by the target
+	 to mitigate this.  */
+      len = min (ops->to_get_memory_xfer_limit (ops), len);
 
       buf = (gdb_byte *) xmalloc (len);
       old_chain = make_cleanup (xfree, buf);
diff --git a/gdb/target.h b/gdb/target.h
index 6b5b6e0..84f12a9 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -745,6 +745,12 @@ struct target_ops
 						ULONGEST *xfered_len)
       TARGET_DEFAULT_RETURN (TARGET_XFER_E_IO);
 
+    /* Return the limit on the size of any single memory transfer
+       for the target.  */
+
+    ULONGEST (*to_get_memory_xfer_limit) (struct target_ops *)
+      TARGET_DEFAULT_FUNC (default_get_memory_xfer_limit);
+
     /* Returns the memory map for the target.  A return value of NULL
        means that no memory map is available.  If a memory address
        does not fall within any returned regions, it's assumed to be
-- 
2.8.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Optimize memory_xfer_partial for remote
  2016-06-27 20:23       ` Don Breazeal
@ 2016-06-30 17:06         ` Pedro Alves
  2016-06-30 17:45           ` Don Breazeal
  0 siblings, 1 reply; 13+ messages in thread
From: Pedro Alves @ 2016-06-30 17:06 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches, qiyaoltc

On 06/27/2016 09:22 PM, Don Breazeal wrote:

>>> +/* The default implementation for the to_get_memory_xfer_limit method.
>>> +   The hard-coded limit here was determined to be a reasonable default
>>> +   that eliminated exponential slowdown on very large transfers without
>>> +   unduly compromising performance on smaller transfers.  */
>>
>> Where's this coming from?  Is this new experimentation you did,
>> or are you talking about Anton's patch?
> 
> Both.  I did some experimentation to verify that things were significantly
> slower without any memory transfer limit, which they were, although I never
> reproduced the extreme scenario Anton had reported.  Presumably the
> performance differences were due to hardware and environment differences.
> Regarding the comment, I thought some explanation of the hard-coded number
> was appropriate.  Is there a better or more preferable way to do this, e.g.
> refer to the commit hash, or does it just seem superfluous?

OK, you didn't mention this experimentation, which left me wondering.
Particularly, the mention of "exponential" is what most made me pause,
as it's a qualifier not mentioned elsewhere.

I guess my main problem with the comment is that by reading it in
isolation, one has no clue of how what would cause the slowdown (normally
transferring more at a time is faster!), and thus how to reevaluate
the default in the future.  How about extending to something like:

/* The default implementation for the to_get_memory_xfer_limit method.
   The hard-coded limit here was determined to be a reasonable default
   that eliminated exponential slowdown on very large transfers without
   unduly compromising performance on smaller transfers.  
   This slowdown is mostly caused by memory writing routines doing
   unnecessary work upfront when large requests end up usually
   only partially satisfied.  See memory_xfer_partial's handling of
   breakpoint shadows.  */

Actually, I was going to approve this with that change, but another
another thought crossed my mind, sorry...

I assume you did this experimentation with remote targets?  But this default
will never be used with those, so that experimentation is meaningless for
native targets?  Actually, the whole capping is probably pointless with
native targets, since there's really no marshalling and thus no limit.
That'd suggest making the target method return "-1" or some such
to indicate there's no limit.  WDTY?

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Optimize memory_xfer_partial for remote
  2016-06-30 17:06         ` Pedro Alves
@ 2016-06-30 17:45           ` Don Breazeal
  2016-06-30 18:40             ` Pedro Alves
  0 siblings, 1 reply; 13+ messages in thread
From: Don Breazeal @ 2016-06-30 17:45 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches, qiyaoltc

On 6/30/2016 10:06 AM, Pedro Alves wrote:
> On 06/27/2016 09:22 PM, Don Breazeal wrote:
> 
>>>> +/* The default implementation for the to_get_memory_xfer_limit method.
>>>> +   The hard-coded limit here was determined to be a reasonable default
>>>> +   that eliminated exponential slowdown on very large transfers without
>>>> +   unduly compromising performance on smaller transfers.  */
>>>
>>> Where's this coming from?  Is this new experimentation you did,
>>> or are you talking about Anton's patch?
>>
>> Both.  I did some experimentation to verify that things were significantly
>> slower without any memory transfer limit, which they were, although I never
>> reproduced the extreme scenario Anton had reported.  Presumably the
>> performance differences were due to hardware and environment differences.
>> Regarding the comment, I thought some explanation of the hard-coded number
>> was appropriate.  Is there a better or more preferable way to do this, e.g.
>> refer to the commit hash, or does it just seem superfluous?
> 
> OK, you didn't mention this experimentation, which left me wondering.
> Particularly, the mention of "exponential" is what most made me pause,
> as it's a qualifier not mentioned elsewhere.

I should have used something like "significant" or "extreme" instead of
exponential.

> 
> I guess my main problem with the comment is that by reading it in
> isolation, one has no clue of how what would cause the slowdown (normally
> transferring more at a time is faster!), and thus how to reevaluate
> the default in the future.  How about extending to something like:
> 
> /* The default implementation for the to_get_memory_xfer_limit method.
>    The hard-coded limit here was determined to be a reasonable default
>    that eliminated exponential slowdown on very large transfers without
>    unduly compromising performance on smaller transfers.  
>    This slowdown is mostly caused by memory writing routines doing
>    unnecessary work upfront when large requests end up usually
>    only partially satisfied.  See memory_xfer_partial's handling of
>    breakpoint shadows.  */
> 
> Actually, I was going to approve this with that change, but another
> another thought crossed my mind, sorry...
> 
> I assume you did this experimentation with remote targets?  But this default
> will never be used with those, so that experimentation is meaningless for
> native targets?  Actually, the whole capping is probably pointless with
> native targets, since there's really no marshalling and thus no limit.
> That'd suggest making the target method return "-1" or some such
> to indicate there's no limit.  WDTY?

That makes sense to me.  If it returns ULONGEST_MAX then the rest of the
patch can stay as-is.  Something like this?

+/* The default implementation for the to_get_memory_xfer_limit method.
+   The default limit is essentially "no limit".  */
+
+static ULONGEST
+default_get_memory_xfer_limit (struct target_ops *self)
+{
+  return ULONGEST_MAX;
+}
+

Thanks
--Don

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] Optimize memory_xfer_partial for remote
  2016-06-30 17:45           ` Don Breazeal
@ 2016-06-30 18:40             ` Pedro Alves
  2016-06-30 22:40               ` [PATCH v3] " Don Breazeal
  0 siblings, 1 reply; 13+ messages in thread
From: Pedro Alves @ 2016-06-30 18:40 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches, qiyaoltc

On 06/30/2016 06:45 PM, Don Breazeal wrote:

> That makes sense to me.  If it returns ULONGEST_MAX then the rest of the
> patch can stay as-is.  Something like this?
> 
> +/* The default implementation for the to_get_memory_xfer_limit method.
> +   The default limit is essentially "no limit".  */
> +
> +static ULONGEST
> +default_get_memory_xfer_limit (struct target_ops *self)
> +{
> +  return ULONGEST_MAX;
> +}

Agreed.  Though if you use TARGET_DEFAULT_RETURN, then you don't
even need that function:

    /* Return the limit on the size of any single memory transfer
       for the target.  The default limit is essentially "no limit".  */

    ULONGEST (*to_get_memory_xfer_limit) (struct target_ops *)
      TARGET_DEFAULT_RETURN (ULONGEST_MAX);

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] Optimize memory_xfer_partial for remote
  2016-06-30 18:40             ` Pedro Alves
@ 2016-06-30 22:40               ` Don Breazeal
  2016-06-30 23:44                 ` Pedro Alves
  0 siblings, 1 reply; 13+ messages in thread
From: Don Breazeal @ 2016-06-30 22:40 UTC (permalink / raw)
  To: gdb-patches, palves

Hi Pedro,
Here is v3 of the patch.  Differences from v2 include returning
ULONGEST_MAX instead of 4096 from the default target function
and using TARGET_DEFAULT_RETURN instead of TARGET_DEFAULT_FUNCTION
in the definition of to_get_memory_xfer_limit.

I re-ran the testsuite with native and native-gdbserver and saw no
regressions.  I also verified that the behavior of the memory transfers
was as expected for the remote and native targets.

Does this one look good-to-go?
thanks
--Don

---------- revised patch -----------
Some analysis we did here showed that increasing the cap on the
transfer size in target.c:memory_xfer_partial could give 20% or more
improvement in remote load across JTAG.  Transfer sizes were capped
to 4K bytes because of performance problems encountered with the
restore command, documented here:

https://sourceware.org/ml/gdb-patches/2013-07/msg00611.html

and in commit 67c059c29e1f ("Improve performance of large restore
commands").

The 4K cap was introduced because in a case where the restore command
requested a 100MB transfer, memory_xfer_partial would repeatedy
allocate and copy an entire 100MB buffer in order to properly handle
breakpoint shadow instructions, even though memory_xfer_partial would
actually only write a small portion of the buffer contents.

A couple of alternative solutions were suggested:
* change the algorithm for handling the breakpoint shadow instructions
* throttle the transfer size up or down based on the previous actual
  transfer size

I tried implementing the throttling approach, and my implementation
reduced the performance in some cases.

This patch implements a new target function that returns that target's
limit on memory transfer size.  It defaults to ULONGEST_MAX bytes,
because for native targets there is no marshaling and thus no limit is
needed.

The performance differences that I saw were as follows (in seconds),
using a remote target with an artificially large application and a
100MB srec file:

USB  load:     orig   53.2 patch  18.9
USB  restore:  orig 1522.4 patch 543.6
Enet load:     orig   12.2 patch  10.0
Enet restore:  orig  368.0 patch 294.3

Tested on x86_64 Linux with native and native-gdbserver, and manually
tested 'load' and 'restore' on a Windows 7 host with a bare-metal ARM
board.

gdb/ChangeLog:
2016-06-30  Don Breazeal  <donb@codesourcery.com>

	* remote.c (remote_get_memory_xfer_limit): New function.
	* target-delegates.c: Regenerate.
	* target.c (memory_xfer_partial): Call
	target_ops.to_get_memory_xfer_limit.
	* target.h (struct target_ops) <to_get_memory_xfer_limit>: New
	member.

---
 gdb/remote.c           |  9 +++++++++
 gdb/target-delegates.c | 31 +++++++++++++++++++++++++++++++
 gdb/target.c           |  5 +++--
 gdb/target.h           |  6 ++++++
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/gdb/remote.c b/gdb/remote.c
index 501f3c6..dfa41ef 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -10160,6 +10160,14 @@ remote_xfer_partial (struct target_ops *ops, enum target_object object,
   return TARGET_XFER_OK;
 }
 
+/* Implementation of to_get_memory_xfer_limit.  */
+
+static ULONGEST
+remote_get_memory_xfer_limit (struct target_ops *ops)
+{
+  return get_memory_write_packet_size ();
+}
+
 static int
 remote_search_memory (struct target_ops* ops,
 		      CORE_ADDR start_addr, ULONGEST search_space_len,
@@ -13073,6 +13081,7 @@ Specify the serial device it is connected to\n\
   remote_ops.to_interrupt = remote_interrupt;
   remote_ops.to_pass_ctrlc = remote_pass_ctrlc;
   remote_ops.to_xfer_partial = remote_xfer_partial;
+  remote_ops.to_get_memory_xfer_limit = remote_get_memory_xfer_limit;
   remote_ops.to_rcmd = remote_rcmd;
   remote_ops.to_pid_to_exec_file = remote_pid_to_exec_file;
   remote_ops.to_log_command = serial_log_command;
diff --git a/gdb/target-delegates.c b/gdb/target-delegates.c
index 03aa2cc..2887033 100644
--- a/gdb/target-delegates.c
+++ b/gdb/target-delegates.c
@@ -2064,6 +2064,33 @@ debug_xfer_partial (struct target_ops *self, enum target_object arg1, const char
   return result;
 }
 
+static ULONGEST
+delegate_get_memory_xfer_limit (struct target_ops *self)
+{
+  self = self->beneath;
+  return self->to_get_memory_xfer_limit (self);
+}
+
+static ULONGEST
+tdefault_get_memory_xfer_limit (struct target_ops *self)
+{
+  return ULONGEST_MAX;
+}
+
+static ULONGEST
+debug_get_memory_xfer_limit (struct target_ops *self)
+{
+  ULONGEST result;
+  fprintf_unfiltered (gdb_stdlog, "-> %s->to_get_memory_xfer_limit (...)\n", debug_target.to_shortname);
+  result = debug_target.to_get_memory_xfer_limit (&debug_target);
+  fprintf_unfiltered (gdb_stdlog, "<- %s->to_get_memory_xfer_limit (", debug_target.to_shortname);
+  target_debug_print_struct_target_ops_p (&debug_target);
+  fputs_unfiltered (") = ", gdb_stdlog);
+  target_debug_print_ULONGEST (result);
+  fputs_unfiltered ("\n", gdb_stdlog);
+  return result;
+}
+
 static VEC(mem_region_s) *
 delegate_memory_map (struct target_ops *self)
 {
@@ -4223,6 +4250,8 @@ install_delegators (struct target_ops *ops)
     ops->to_get_thread_local_address = delegate_get_thread_local_address;
   if (ops->to_xfer_partial == NULL)
     ops->to_xfer_partial = delegate_xfer_partial;
+  if (ops->to_get_memory_xfer_limit == NULL)
+    ops->to_get_memory_xfer_limit = delegate_get_memory_xfer_limit;
   if (ops->to_memory_map == NULL)
     ops->to_memory_map = delegate_memory_map;
   if (ops->to_flash_erase == NULL)
@@ -4454,6 +4483,7 @@ install_dummy_methods (struct target_ops *ops)
   ops->to_goto_bookmark = tdefault_goto_bookmark;
   ops->to_get_thread_local_address = tdefault_get_thread_local_address;
   ops->to_xfer_partial = tdefault_xfer_partial;
+  ops->to_get_memory_xfer_limit = tdefault_get_memory_xfer_limit;
   ops->to_memory_map = tdefault_memory_map;
   ops->to_flash_erase = tdefault_flash_erase;
   ops->to_flash_done = tdefault_flash_done;
@@ -4610,6 +4640,7 @@ init_debug_target (struct target_ops *ops)
   ops->to_goto_bookmark = debug_goto_bookmark;
   ops->to_get_thread_local_address = debug_get_thread_local_address;
   ops->to_xfer_partial = debug_xfer_partial;
+  ops->to_get_memory_xfer_limit = debug_get_memory_xfer_limit;
   ops->to_memory_map = debug_memory_map;
   ops->to_flash_erase = debug_flash_erase;
   ops->to_flash_done = debug_flash_done;
diff --git a/gdb/target.c b/gdb/target.c
index bb86adf..c88607e 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -1301,8 +1301,9 @@ memory_xfer_partial (struct target_ops *ops, enum target_object object,
 	 by memory_xfer_partial_1.  We will continually malloc
 	 and free a copy of the entire write request for breakpoint
 	 shadow handling even though we only end up writing a small
-	 subset of it.  Cap writes to 4KB to mitigate this.  */
-      len = min (4096, len);
+	 subset of it.  Cap writes to a limit specified by the target
+	 to mitigate this.  */
+      len = min (ops->to_get_memory_xfer_limit (ops), len);
 
       buf = (gdb_byte *) xmalloc (len);
       old_chain = make_cleanup (xfree, buf);
diff --git a/gdb/target.h b/gdb/target.h
index 6b5b6e0..68ef470 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -745,6 +745,12 @@ struct target_ops
 						ULONGEST *xfered_len)
       TARGET_DEFAULT_RETURN (TARGET_XFER_E_IO);
 
+    /* Return the limit on the size of any single memory transfer
+       for the target.  */
+
+    ULONGEST (*to_get_memory_xfer_limit) (struct target_ops *)
+      TARGET_DEFAULT_RETURN (ULONGEST_MAX);
+
     /* Returns the memory map for the target.  A return value of NULL
        means that no memory map is available.  If a memory address
        does not fall within any returned regions, it's assumed to be
-- 
2.8.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] Optimize memory_xfer_partial for remote
  2016-06-30 22:40               ` [PATCH v3] " Don Breazeal
@ 2016-06-30 23:44                 ` Pedro Alves
  2016-07-01 18:24                   ` [pushed] " Don Breazeal
  0 siblings, 1 reply; 13+ messages in thread
From: Pedro Alves @ 2016-06-30 23:44 UTC (permalink / raw)
  To: Don Breazeal, gdb-patches

On 06/30/2016 11:40 PM, Don Breazeal wrote:
> Hi Pedro,
> Here is v3 of the patch.  Differences from v2 include returning
> ULONGEST_MAX instead of 4096 from the default target function
> and using TARGET_DEFAULT_RETURN instead of TARGET_DEFAULT_FUNCTION
> in the definition of to_get_memory_xfer_limit.
> 
> I re-ran the testsuite with native and native-gdbserver and saw no
> regressions.  I also verified that the behavior of the memory transfers
> was as expected for the remote and native targets.
> 
> Does this one look good-to-go?

Yes, looks good.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [pushed] Re: [PATCH v3] Optimize memory_xfer_partial for remote
  2016-06-30 23:44                 ` Pedro Alves
@ 2016-07-01 18:24                   ` Don Breazeal
  0 siblings, 0 replies; 13+ messages in thread
From: Don Breazeal @ 2016-07-01 18:24 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

On 6/30/2016 4:37 PM, Pedro Alves wrote:
> On 06/30/2016 11:40 PM, Don Breazeal wrote:
>> Hi Pedro,
>> Here is v3 of the patch.  Differences from v2 include returning
>> ULONGEST_MAX instead of 4096 from the default target function
>> and using TARGET_DEFAULT_RETURN instead of TARGET_DEFAULT_FUNCTION
>> in the definition of to_get_memory_xfer_limit.
>>
>> I re-ran the testsuite with native and native-gdbserver and saw no
>> regressions.  I also verified that the behavior of the memory transfers
>> was as expected for the remote and native targets.
>>
>> Does this one look good-to-go?
> 
> Yes, looks good.
> 
> Thanks,
> Pedro Alves
> 
I've pushed this in.
Thanks
--Don

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-01 18:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-03 19:02 [PATCH] Optimize memory_xfer_partial for remote Don Breazeal
2016-06-20 15:31 ` [PING] " Don Breazeal
2016-06-20 19:25 ` Pedro Alves
2016-06-21 10:15   ` Yao Qi
2016-06-24 21:21   ` [PATCH v2] " Don Breazeal
2016-06-24 22:23     ` Pedro Alves
2016-06-27 20:23       ` Don Breazeal
2016-06-30 17:06         ` Pedro Alves
2016-06-30 17:45           ` Don Breazeal
2016-06-30 18:40             ` Pedro Alves
2016-06-30 22:40               ` [PATCH v3] " Don Breazeal
2016-06-30 23:44                 ` Pedro Alves
2016-07-01 18:24                   ` [pushed] " Don Breazeal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).