public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* LTO+profiled enabled builds
@ 2019-07-04 11:14 Matthias Klose
  2019-09-14 18:39 ` Jason Merrill
  0 siblings, 1 reply; 3+ messages in thread
From: Matthias Klose @ 2019-07-04 11:14 UTC (permalink / raw)
  To: gcc Development

[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]

I'm running into some issues building LTO+profiled enabled configurations in
some constrained build environment called buildds, having four cores and 16GB of
RAM.

configured for all frontends (maximum number of LTO links) and configured with

  --enable-bootstrap \
  --with-build-config=bootstrap-lto-lean \
  --enable-link-mutex

and building the make profiledbootstrap-lean target.

Most builds time out after 150 minutes.

A typical LTO link runs for around one minute on this hardware, however a LTO
link with -fprofile-use runs for up to three hours.

So gcc/lock-and-run.sh runs the first lto-link, waits for all other 300 seconds,
then removes the "stale" locks, and runs everything in parallel ...  Which
surprisingly goes well, because -flto=jobserver is in effect, so I don't see any
memory constraints yet.

The machine then starts building all front-ends, but apparently is not
overloaded, as -flto=jobserver is in effect.  However there is no output, and
that triggers the timeout. Richi mentioned on IRC that the LTO links only have
buffered output (unless you run in debug mode), and that is only emitted once
the link finishes.  However even with unbuffered output, there could be times
when nothing is happening, no warnings?

I'm currently experimenting with a modified lock-and-run.sh, which basically
sets the delay for releasing the "stale" locks to 30min instead of 5 min, runs
the LTO link in the background and checks for the status of the background job,
emitting some "running ..." messages while not finished.  Still adjusting some
parameters, but at least that succeeds on some of my configurations.

The locking mechanism was introduced in 2013,
https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00001.html

lock-and-run.sh should probably modified not to release the "stale" locks based
on a fixed timeout value. How?

While the "no-output" problem can be fixed in the lock script as well
(attached), this doesn't apply to third party apps.  Having unbuffered output
and/or an option to print progress would be beneficial.

Matthias




[-- Attachment #2: lock-and-run.sh --]
[-- Type: application/x-shellscript, Size: 1738 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: LTO+profiled enabled builds
  2019-07-04 11:14 LTO+profiled enabled builds Matthias Klose
@ 2019-09-14 18:39 ` Jason Merrill
  2019-09-21 23:09   ` Jason Merrill
  0 siblings, 1 reply; 3+ messages in thread
From: Jason Merrill @ 2019-09-14 18:39 UTC (permalink / raw)
  To: Matthias Klose; +Cc: gcc Development

[-- Attachment #1: Type: text/plain, Size: 2300 bytes --]

How does this do for you?

On Thu, Jul 4, 2019 at 7:15 AM Matthias Klose <doko@ubuntu.com> wrote:
>
> I'm running into some issues building LTO+profiled enabled configurations in
> some constrained build environment called buildds, having four cores and 16GB of
> RAM.
>
> configured for all frontends (maximum number of LTO links) and configured with
>
>   --enable-bootstrap \
>   --with-build-config=bootstrap-lto-lean \
>   --enable-link-mutex
>
> and building the make profiledbootstrap-lean target.
>
> Most builds time out after 150 minutes.
>
> A typical LTO link runs for around one minute on this hardware, however a LTO
> link with -fprofile-use runs for up to three hours.
>
> So gcc/lock-and-run.sh runs the first lto-link, waits for all other 300 seconds,
> then removes the "stale" locks, and runs everything in parallel ...  Which
> surprisingly goes well, because -flto=jobserver is in effect, so I don't see any
> memory constraints yet.
>
> The machine then starts building all front-ends, but apparently is not
> overloaded, as -flto=jobserver is in effect.  However there is no output, and
> that triggers the timeout. Richi mentioned on IRC that the LTO links only have
> buffered output (unless you run in debug mode), and that is only emitted once
> the link finishes.  However even with unbuffered output, there could be times
> when nothing is happening, no warnings?
>
> I'm currently experimenting with a modified lock-and-run.sh, which basically
> sets the delay for releasing the "stale" locks to 30min instead of 5 min, runs
> the LTO link in the background and checks for the status of the background job,
> emitting some "running ..." messages while not finished.  Still adjusting some
> parameters, but at least that succeeds on some of my configurations.
>
> The locking mechanism was introduced in 2013,
> https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00001.html
>
> lock-and-run.sh should probably modified not to release the "stale" locks based
> on a fixed timeout value. How?
>
> While the "no-output" problem can be fixed in the lock script as well
> (attached), this doesn't apply to third party apps.  Having unbuffered output
> and/or an option to print progress would be beneficial.
>
> Matthias
>
>
>

[-- Attachment #2: lock-check-pid.diff --]
[-- Type: text/x-patch, Size: 1455 bytes --]

commit c570f3f4751385153292c85c2f38dc78e9443923
Author: Jason Merrill <jason@redhat.com>
Date:   Sat Sep 14 14:02:20 2019 -0400

            * lock-and-run.sh: Check for process existence rather than timeout.

diff --git a/gcc/lock-and-run.sh b/gcc/lock-and-run.sh
index 3a6a84c253a..b1a4a4c8220 100644
--- a/gcc/lock-and-run.sh
+++ b/gcc/lock-and-run.sh
@@ -5,29 +5,28 @@ lockdir="$1" prog="$2"; shift 2 || exit 1
 
 # Remember when we started trying to acquire the lock.
 count=0
-touch lock-stamp.$$
 
-trap 'rm -r "$lockdir" lock-stamp.$$' 0
+trap 'rm -rf "$lockdir"' 0
 
 until mkdir "$lockdir" 2>/dev/null; do
     # Say something periodically so the user knows what's up.
     if [ `expr $count % 30` = 0 ]; then
-	# Reset if the lock has been renewed.
-	if [ -n "`find \"$lockdir\" -newer lock-stamp.$$`" ]; then
-	    touch lock-stamp.$$
-	    count=1
-	# Steal the lock after 5 minutes.
-	elif [ $count = 300 ]; then
-	    echo removing stale $lockdir >&2
-	    rm -r "$lockdir"
+	# Check for stale lock.
+	pid="`(cd $lockdir; echo *)`"
+	if ps "$pid" >/dev/null; then
+	    echo waiting $count sec to acquire $lockdir from PID $pid>&2
+	    found=$pid
 	else
-	    echo waiting to acquire $lockdir >&2
+	    echo PID $pid is dead, removing stale $lockdir >&2
+	    rm -r "$lockdir"
 	fi
     fi
     sleep 1
     count=`expr $count + 1`
 done
 
+touch $lockdir/$$
+echo acquired $lockdir after $count seconds >&2
 echo $prog "$@"
 $prog "$@"
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: LTO+profiled enabled builds
  2019-09-14 18:39 ` Jason Merrill
@ 2019-09-21 23:09   ` Jason Merrill
  0 siblings, 0 replies; 3+ messages in thread
From: Jason Merrill @ 2019-09-21 23:09 UTC (permalink / raw)
  To: Matthias Klose; +Cc: gcc Development

Have you had a chance to try this?

On Sat, Sep 14, 2019 at 2:39 PM Jason Merrill <jason@redhat.com> wrote:
>
> How does this do for you?
>
> On Thu, Jul 4, 2019 at 7:15 AM Matthias Klose <doko@ubuntu.com> wrote:
> >
> > I'm running into some issues building LTO+profiled enabled configurations in
> > some constrained build environment called buildds, having four cores and 16GB of
> > RAM.
> >
> > configured for all frontends (maximum number of LTO links) and configured with
> >
> >   --enable-bootstrap \
> >   --with-build-config=bootstrap-lto-lean \
> >   --enable-link-mutex
> >
> > and building the make profiledbootstrap-lean target.
> >
> > Most builds time out after 150 minutes.
> >
> > A typical LTO link runs for around one minute on this hardware, however a LTO
> > link with -fprofile-use runs for up to three hours.
> >
> > So gcc/lock-and-run.sh runs the first lto-link, waits for all other 300 seconds,
> > then removes the "stale" locks, and runs everything in parallel ...  Which
> > surprisingly goes well, because -flto=jobserver is in effect, so I don't see any
> > memory constraints yet.
> >
> > The machine then starts building all front-ends, but apparently is not
> > overloaded, as -flto=jobserver is in effect.  However there is no output, and
> > that triggers the timeout. Richi mentioned on IRC that the LTO links only have
> > buffered output (unless you run in debug mode), and that is only emitted once
> > the link finishes.  However even with unbuffered output, there could be times
> > when nothing is happening, no warnings?
> >
> > I'm currently experimenting with a modified lock-and-run.sh, which basically
> > sets the delay for releasing the "stale" locks to 30min instead of 5 min, runs
> > the LTO link in the background and checks for the status of the background job,
> > emitting some "running ..." messages while not finished.  Still adjusting some
> > parameters, but at least that succeeds on some of my configurations.
> >
> > The locking mechanism was introduced in 2013,
> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00001.html
> >
> > lock-and-run.sh should probably modified not to release the "stale" locks based
> > on a fixed timeout value. How?
> >
> > While the "no-output" problem can be fixed in the lock script as well
> > (attached), this doesn't apply to third party apps.  Having unbuffered output
> > and/or an option to print progress would be beneficial.
> >
> > Matthias
> >
> >
> >

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-21 23:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-04 11:14 LTO+profiled enabled builds Matthias Klose
2019-09-14 18:39 ` Jason Merrill
2019-09-21 23:09   ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).