From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luis.machado@linaro.org>
Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com
 [IPv6:2607:f8b0:4864:20::f30])
 by sourceware.org (Postfix) with ESMTPS id 693273857011
 for <gdb@sourceware.org>; Thu, 15 Oct 2020 12:55:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 693273857011
Received: by mail-qv1-xf30.google.com with SMTP id t20so1096923qvv.8
 for <gdb@sourceware.org>; Thu, 15 Oct 2020 05:55:25 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=pgTsHb/p22GepDoIXRlJjtsMdk/6BozIizAZFsYs3qQ=;
 b=P/vdp2JLXwzh3Y1ZrHnfMYKrcQXujSTtP4VQCYApLVX07PV6repcvr6+99U9vGA/lI
 Ks5U0aJ9FJrKIAK1hTrazS89H3GLAl4AQT9uqRc1NYZgan7cBIgylET2V8t+WsdEBLgk
 PexOK+w/AIL4CLuddvOFJd0tBYt2czAY5HOT+aDjqWrAwlCysYooB+3GaF2cwuVDmS/7
 P2sw+uF9255KXTSBQ4nkgvWTE2C2t/NBDqU/WHFdg19Jpe4AolZoSDU0bSZCspflZQlX
 sX8w+VXHXDCDQhU/tDGoQ3wSgTXCWhigPlY3spilg2wVmxaYIZMwiFUwdSVG7l5eSxqi
 HPJQ==
X-Gm-Message-State: AOAM530NZfL722ww1aatQ68W+Ci+WWtN7Lsb2Q0Updlngww+bxYUhK91
 crkeOWbs7kShE6ysFjrXP2EMJhTQ6pWydA==
X-Google-Smtp-Source: ABdhPJzuQTbFK1bDOF4vG39VGYhrucreA81xDTTNZliSb1V+1sJBFgmeQAQMUCOxAlC6RnjPR8Bu1w==
X-Received: by 2002:a05:6214:14b4:: with SMTP id
 bo20mr4481499qvb.24.1602766524640; 
 Thu, 15 Oct 2020 05:55:24 -0700 (PDT)
Received: from ?IPv6:2804:7f0:8283:fe4b:a815:361f:f688:d8a1?
 ([2804:7f0:8283:fe4b:a815:361f:f688:d8a1])
 by smtp.gmail.com with ESMTPSA id b33sm1136865qtk.38.2020.10.15.05.55.23
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 15 Oct 2020 05:55:24 -0700 (PDT)
Subject: Re: Regressions getting more common
To: Simon Marchi <simark@simark.ca>, "gdb@sourceware.org" <gdb@sourceware.org>
References: <d1d61d0f-7f9e-2cf8-2f05-908638c2faab@linaro.org>
 <d3e5bce3-42a0-71b3-b22a-c0f698dfd0e4@simark.ca>
From: Luis Machado <luis.machado@linaro.org>
Message-ID: <5218c203-aa6d-6d00-f8e7-18420fa3d55b@linaro.org>
Date: Thu, 15 Oct 2020 09:55:21 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <d3e5bce3-42a0-71b3-b22a-c0f698dfd0e4@simark.ca>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb mailing list <gdb.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb>,
 <mailto:gdb-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb>,
 <mailto:gdb-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Oct 2020 12:55:26 -0000

Hi,

On 10/14/20 11:41 AM, Simon Marchi wrote:
> On 2020-10-13 1:05 p.m., Luis Machado via Gdb wrote:
>> Hi,
>>
>> I don't know about other non-x86 architectures, but over the past year I've been noticing more and more regressions being introduced, unnoticed, for ARM/AArch64. This is not good and causes a lot of pain if you have to keep tracking things manually, like we do now.
>>
>> The buildbots worked great for this very purpose, but Sergio has moved on to other duties (thanks for all the work!) and can't maintain it anymore. The builders are still there though, sitting mostly idle.
>> We have a beefy ARM/AArch64 builder, which I can maintain for others to use.
>>
>> We can do better than to declare things OK after a single round of tests under x86, which has been the trend unfortunately.
>>
>> The subject of better CI has come up multiple times on IRC, with sad memories of the gerrit experiment's demise. Now we're left with review by e-mail and no broad testing.
>>
>> I think we need to discuss better validation pre-commit and possible CI solutions for GDB. It is pretty easy to exercise x86, but it doesn't sound fair to other architectures to have to keep cleaning up after things that have only been validated on that architecture.
>>
>> It would be great to establish a roadmap so we can get GDB's testing to today's standards, and maybe revisit the use of more modern patch review tools while at it.
>>
>> What do you think?
> 
> I agree with all of this.
> 
> It all comes down to:
> 
> - human resources: a system like that is not fire and forget, there's
>    always something to look into to make sure it runs smoothly
> - hardware resources: it takes a lot of CPU time, that takes some
>    dedicated machines

Indeed. There will most definitely be some maintenance burden. At least 
from ARM/AArch64's side, I can offer to keep the builders in working 
condition and reasonably updated.

I have the say I'm not terribly skilled with the buildbot 
infrastructure, so I may not be able to help much there. I can learn 
though. So far I've only dealt with Jenkins/Gerrit.

> 
> I think it would be good to reboot the buildbot, but start by focusing
> on what delivers the best "bang for the buck".  I remember workers
> lagging a lot behind, meaning we would get notifications of breakage
> quite a lot after a commit was pushed.  I think it would be good to not
> make the workers build all commits on master.  Either do one build a
> day, or have them constantly build the current master (as a background /
> low priority task).  If there's a regression, then there's a window of
> commits that might be responsible for it.  But in any case, I would see
> having a stable post-commit CI as the first step.

That sounds reasonable to me. That is already a good improvement.

> 
> Then, we can look at having try jobs work again.  I never really liked
> submitting try jobs through the buildbot command line tool, I always
> found there was to little feedback (did my upload work? what jobs did I
> initiate?).  I would receive test results by email and would have a ahrd
> time figuring out which result was for which patch.

I think it lacked more verbosity, or better ways to follow the status of 
that particular build. Like, "here, take this key and use it to monitor 
this particular build request".

> 
> One idea would be to re-use the now abandonned Gerrit instance for this.
> Those who have use Gerrit will probably agree that it integrates quite
> well with CI.  After pushing your patch for review, you can ask the CI
> to test a given patch (often by setting a label on it).  The CI posts a
> comment on the patch with a link to the build, so you can follow it if
> you want.  Once it has ran, the CI posts comments on the patch to say
> that version N of the patch has this result, again with a link to the
> build.

That resembles the workflow I experimented with Jenkins/Gerrit. I quite 
like it and it kept things pretty organized.

But, like I said, I wouldn't know how to configure this properly. But 
I'll ask around at Linaro to see if we can spare someone that can help.

> 
> I would be happy to help take over maintenance of the buildbot master
> from Sergio, but I wouldn't want to be alone in this.

I'm willing to help here. I just need to get more familiar with some 
admin steps so I can get things unstuck when needed.