From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 5C2FE3858C52 for ; Mon, 13 May 2024 07:40:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5C2FE3858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5C2FE3858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::236 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715586040; cv=none; b=WWdvZAe63fr6lBEF/2xnFLlJ51Bn0uk0IQhfZ6lmgUoVqCDVF4yFn/GBq/TdAklcFciM5ZvZYaClbH9O3bGuRzJk6ro7aajNCO/HqW+5kij/T6nM23KUDqNHTuvABFkw7BgC3+r/vsmmdRF40cDTW1+qPWJiuAxYVWYRJkRB1/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715586040; c=relaxed/simple; bh=D9CCqYcMpaqvNJ0Ts1gAzrIw0ybCdL2KmN0w7warguU=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Bst1pbYTEjOlIw2VaA9ajQfOkdDfpXgELWT/Ac2+Cd5g7APAxkY7qRCfnvqW0dKwYsODmb3KmWi0dl/1OJfeBLSrzsCKA33c38A6W7ZujFc8p46AxBKfpY6tHk1t9Nzokkikp1Ty5q2X2Ir+6/qQFYjYQOnJL1JA0H8NUSl0rHY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2e6792ea67fso7633491fa.1 for ; Mon, 13 May 2024 00:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715586036; x=1716190836; darn=gcc.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=HmVfgThpwIOdtk1BZAeyam+JPM1wfau4ujqmYjD7Lw0=; b=CWJZaIvrnpQAzZ1/vj6NwOmqZ8RgCYU+nW1f4xYV1hCFyF0omnAlw91Ggd0uQ9yVl8 2gNT+DnMynWZJ7zOYq8f4oTerXoQa4jQR0p6Iyv97efhVKgWVN3PWmpiS2MrxpJdp180 uXFb/53xzrhQj1T0Zm9RivsJqtIXdysOlz73PX0vQ7QW4MANF504NVUA+fHf58ReIk1h WlqCDMih6+dWHaZ3D/Qx+zmk1AMU6aiMYLOoCSaKmEZkqTelxl10eybZdwQ6QQaIQSXC YQL/Q1ZXDVVbOGeugkUgFIrth2TvPWqn0mUEUJr9LjhFmmdTx5GAp/ZOYlzXACZdhzKQ 77Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715586036; x=1716190836; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HmVfgThpwIOdtk1BZAeyam+JPM1wfau4ujqmYjD7Lw0=; b=lpzU+TwDplqyFF3Jr9u1uW7KvorEvGjKBipxw4zJdCJXz+AA7L0MqcD+7CanHUd4Qn h5emDS6NRWn2TmnWcZAwEtTj06ToE6IJPY76dnd70cIqZuiek3hCCtfhZXT3IitQMrPQ dpAI+Hro9zlGxv3locIh9/2NjBbcQHvdC1+JBGsJcX0DO2EqypURpNh2i9oCG09xbKCZ jpkhK7RwIZWK1TCMC6BOmWIneWXPD/NbVO9pPHAmJAz+1ewpi+8uj6vQ1XaTi+IbBovA g0kQGB48f4+KKX5R7zsThBXySBarsGK0t4uz/MuqyEJ2M7z2Pd/livD4j8sdgil8MAm0 9l4g== X-Gm-Message-State: AOJu0YySxjarZVYAyiIWWd6taLz2PWa2wMpgLlArvDx6XTLyLN5YRckc /+gL1hYAEVjry6YIJkNukdgsBSDu/7TNR6G4unio8cxH5GWyReWlnZSxVVhhHvPVExLu6vEqXXX +6oEJf0SfLbhU+lmmT5ObSOwZEGFENZhY X-Google-Smtp-Source: AGHT+IHhFQifcsffO25spm/i8/J4rp9E9jd0p7s9oMwp/Nk44zMOxyjME3ZW+7/wzQIBuYrXph6XZe6QMzbHky5ew88= X-Received: by 2002:a2e:81d6:0:b0:2e4:45a6:cdcf with SMTP id 38308e7fff4ca-2e5205ec95bmr63515671fa.43.1715586035421; Mon, 13 May 2024 00:40:35 -0700 (PDT) MIME-Version: 1.0 References: <20240513022737.3105192-1-hongtao.liu@intel.com> In-Reply-To: <20240513022737.3105192-1-hongtao.liu@intel.com> From: Richard Biener Date: Mon, 13 May 2024 09:40:24 +0200 Message-ID: Subject: Re: [PATCH] Don't reduce estimated unrolled size for innermost loop. To: liuhongt Cc: gcc-patches@gcc.gnu.org Content-Type: multipart/mixed; boundary="0000000000003931b906185100f0" X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000003931b906185100f0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, May 13, 2024 at 4:29=E2=80=AFAM liuhongt wr= ote: > > As testcase in the PR, O3 cunrolli may prevent vectorization for the > innermost loop and increase register pressure. > The patch removes the 1/3 reduction of unr_insn for innermost loop for UL= _ALL. > ul !=3D UR_ALL is needed since some small loop complete unrolling at O2 r= elies > the reduction. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > No big impact for SPEC2017. > Ok for trunk? This removes the 1/3 reduction when unrolling a loop nest (the case I was concerned about). Unrolling of a nest is by iterating in tree_unroll_loops_completely so the to be unrolled loop appears innermost. So I think you need a new parameter on tree_unroll_loops_completely_1 indicating whether we're in the first iteration (or whether to assume inner most loops will "simplify"). Few comments below > gcc/ChangeLog: > > PR tree-optimization/112325 > * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Add 2 > new parameters: loop and ul, and remove unr_insns reduction > for innermost loop. > (try_unroll_loop_completely): Pass loop and ul to > estimated_unrolled_size. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/pr112325.c: New test. > * gcc.dg/vect/pr69783.c: Add extra option --param > max-completely-peeled-insns=3D300. > --- > gcc/testsuite/gcc.dg/tree-ssa/pr112325.c | 57 ++++++++++++++++++++++++ > gcc/testsuite/gcc.dg/vect/pr69783.c | 2 +- > gcc/tree-ssa-loop-ivcanon.cc | 16 +++++-- > 3 files changed, 71 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c b/gcc/testsuite/gcc= .dg/tree-ssa/pr112325.c > new file mode 100644 > index 00000000000..14208b3e7f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c > @@ -0,0 +1,57 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */ > + > +typedef unsigned short ggml_fp16_t; > +static float table_f32_f16[1 << 16]; > + > +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) { > + unsigned short s; > + __builtin_memcpy(&s, &f, sizeof(unsigned short)); > + return table_f32_f16[s]; > +} > + > +typedef struct { > + ggml_fp16_t d; > + ggml_fp16_t m; > + unsigned char qh[4]; > + unsigned char qs[32 / 2]; > +} block_q5_1; > + > +typedef struct { > + float d; > + float s; > + char qs[32]; > +} block_q8_1; > + > +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void = * restrict vx, const void * restrict vy) { > + const int qk =3D 32; > + const int nb =3D n / qk; > + > + const block_q5_1 * restrict x =3D vx; > + const block_q8_1 * restrict y =3D vy; > + > + float sumf =3D 0.0; > + > + for (int i =3D 0; i < nb; i++) { > + unsigned qh; > + __builtin_memcpy(&qh, x[i].qh, sizeof(qh)); > + > + int sumi =3D 0; > + > + for (int j =3D 0; j < qk/2; ++j) { > + const unsigned char xh_0 =3D ((qh >> (j + 0)) << 4) & 0x10; > + const unsigned char xh_1 =3D ((qh >> (j + 12)) ) & 0x10; > + > + const int x0 =3D (x[i].qs[j] & 0xF) | xh_0; > + const int x1 =3D (x[i].qs[j] >> 4) | xh_1; > + > + sumi +=3D (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]); > + } > + > + sumf +=3D (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + ggml_= lookup_fp16_to_fp32(x[i].m)*y[i].s; > + } > + > + *s =3D sumf; > +} > + > +/* { dg-final { scan-tree-dump {(?n)Not unrolling loop [1-9] \(--param m= ax-completely-peel-times limit reached} "cunrolli"} } */ > diff --git a/gcc/testsuite/gcc.dg/vect/pr69783.c b/gcc/testsuite/gcc.dg/v= ect/pr69783.c > index 5df95d0ce4e..a1f75514d72 100644 > --- a/gcc/testsuite/gcc.dg/vect/pr69783.c > +++ b/gcc/testsuite/gcc.dg/vect/pr69783.c > @@ -1,6 +1,6 @@ > /* { dg-do compile } */ > /* { dg-require-effective-target vect_float } */ > -/* { dg-additional-options "-Ofast -funroll-loops" } */ > +/* { dg-additional-options "-Ofast -funroll-loops --param max-completely= -peeled-insns=3D300" } */ If we rely on unrolling of a loop can you put #pragma unroll [N] before the respective loop instead? > #define NXX 516 > #define NYY 516 > diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc > index bf017137260..5e0eca647a1 100644 > --- a/gcc/tree-ssa-loop-ivcanon.cc > +++ b/gcc/tree-ssa-loop-ivcanon.cc > @@ -444,7 +444,9 @@ tree_estimate_loop_size (class loop *loop, edge exit,= edge edge_to_cancel, > > static unsigned HOST_WIDE_INT > estimated_unrolled_size (struct loop_size *size, > - unsigned HOST_WIDE_INT nunroll) > + unsigned HOST_WIDE_INT nunroll, > + enum unroll_level ul, > + class loop* loop) > { > HOST_WIDE_INT unr_insns =3D ((nunroll) > * (HOST_WIDE_INT) (size->overall > @@ -453,7 +455,15 @@ estimated_unrolled_size (struct loop_size *size, > unr_insns =3D 0; > unr_insns +=3D size->last_iteration - size->last_iteration_eliminated_= by_peeling; > > - unr_insns =3D unr_insns * 2 / 3; > + /* For innermost loop, loop body is not likely to be simplied as much = as 1/3. > + and may increase a lot of register pressure. > + UL !=3D UL_ALL is need to unroll small loop at O2. */ > + class loop *loop_father =3D loop_outer (loop); > + if (loop->inner || !loop_father Do we ever get here for !loop_father? We shouldn't. > + || loop_father->latch =3D=3D EXIT_BLOCK_PTR_FOR_FN (cfun) This means you excempt all loops that are direct children of the loop root tree. That doesn't make much sense. > + || ul !=3D UL_ALL) This is also quite odd - we're being more optimistic for UL_NO_GROWTH than for UL_ALL? This doesn't make much sense. Overall I think this means removal of being optimistic doesn't work so well= ? If we need some extra leeway for UL_NO_GROWTH for what we expect to unroll it might be better to add sth like --param nogrowth-completely-peeled-insns specifying a fixed surplus size? Or we need to look at what's the problem with the testcases regressing or the one you are trying to fix. I did experiment with better estimating cleanup done at some point (see attached), but didn't get to finishing that (and as said, as we're running VN on the r= esult we'd ideally do that as part of the estimation somehow). Richard. > + unr_insns =3D unr_insns * 2 / 3; > + > if (unr_insns <=3D 0) > unr_insns =3D 1; > > @@ -837,7 +847,7 @@ try_unroll_loop_completely (class loop *loop, > > unsigned HOST_WIDE_INT ninsns =3D size.overall; > unsigned HOST_WIDE_INT unr_insns > - =3D estimated_unrolled_size (&size, n_unroll); > + =3D estimated_unrolled_size (&size, n_unroll, ul, loop); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, " Loop size: %d\n", (int) ninsns); > -- > 2.31.1 > --0000000000003931b906185100f0 Content-Type: application/octet-stream; name=p Content-Disposition: attachment; filename=p Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lw4nmekb0 RnJvbSAwMDkwMDI2MzNlM2FjY2U2MDMxMDk0YzhkMTkxMGI3ZDk5MzE0ZjlmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBSaWNoYXJkIEJpZW5lciA8cmd1ZW50aGVyQHN1c2UuZGU+CkRh dGU6IE1vbiwgMTQgQXVnIDIwMjMgMTI6MDI6NDEgKzAyMDAKU3ViamVjdDogW1BBVENIXSB0ZXN0 IHVucm9sbApUbzogZ2NjLXBhdGNoZXNAZ2NjLmdudS5vcmcKCi0tLQogLi4uL2djYy5kZy9mc3Rh Y2stcHJvdGVjdG9yLXN0cm9uZy5jICAgICAgICAgIHwgICA0ICstCiBnY2MvdHJlZS1zc2EtbG9v cC1pdmNhbm9uLmNjICAgICAgICAgICAgICAgICAgfCAxNTggKysrKysrKysrKysrLS0tLS0tCiAy IGZpbGVzIGNoYW5nZWQsIDExMyBpbnNlcnRpb25zKCspLCA0OSBkZWxldGlvbnMoLSkKCmRpZmYg LS1naXQgYS9nY2MvdGVzdHN1aXRlL2djYy5kZy9mc3RhY2stcHJvdGVjdG9yLXN0cm9uZy5jIGIv Z2NjL3Rlc3RzdWl0ZS9nY2MuZGcvZnN0YWNrLXByb3RlY3Rvci1zdHJvbmcuYwppbmRleCA5NGRj MzUwOGYxYS4uZmFmYTE5MTc0NDkgMTAwNjQ0Ci0tLSBhL2djYy90ZXN0c3VpdGUvZ2NjLmRnL2Zz dGFjay1wcm90ZWN0b3Itc3Ryb25nLmMKKysrIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MuZGcvZnN0YWNr LXByb3RlY3Rvci1zdHJvbmcuYwpAQCAtMjgsNyArMjgsNyBAQCBmb28xICgpCiBzdHJ1Y3QgQXJy YXlTdHJ1Y3QKIHsKICAgaW50IGE7Ci0gIGludCBhcnJheVsxMF07CisgIGludCBhcnJheVsxOF07 CiB9OwogCiBzdHJ1Y3QgQUEKQEAgLTQzLDcgKzQzLDcgQEAgZm9vMiAoKQogewogICBzdHJ1Y3Qg QUEgYWE7CiAgIGludCBpOwotICBmb3IgKGkgPSAwOyBpIDwgMTA7ICsraSkKKyAgZm9yIChpID0g MDsgaSA8IDE4OyArK2kpCiAgICAgewogICAgICAgYWEuYXMuYXJyYXlbaV0gPSBpICogKGktMSkg KyBpIC8gMjsKICAgICB9CmRpZmYgLS1naXQgYS9nY2MvdHJlZS1zc2EtbG9vcC1pdmNhbm9uLmNj IGIvZ2NjL3RyZWUtc3NhLWxvb3AtaXZjYW5vbi5jYwppbmRleCBiZjAxNzEzNzI2MC4uNThlNDE2 OGMyMDkgMTAwNjQ0Ci0tLSBhL2djYy90cmVlLXNzYS1sb29wLWl2Y2Fub24uY2MKKysrIGIvZ2Nj L3RyZWUtc3NhLWxvb3AtaXZjYW5vbi5jYwpAQCAtMTU5LDYgKzE1OSw3IEBAIHN0cnVjdCBsb29w X3NpemUKICAgaW50IG51bV9icmFuY2hlc19vbl9ob3RfcGF0aDsKIH07CiAKKyNpZiAwCiAvKiBS ZXR1cm4gdHJ1ZSBpZiBPUCBpbiBTVE1UIHdpbGwgYmUgY29uc3RhbnQgYWZ0ZXIgcGVlbGluZyBM T09QLiAgKi8KIAogc3RhdGljIGJvb2wKQEAgLTI0Niw2ICsyNDcsNyBAQCBjb25zdGFudF9hZnRl cl9wZWVsaW5nICh0cmVlIG9wLCBnaW1wbGUgKnN0bXQsIGNsYXNzIGxvb3AgKmxvb3ApCiAgICAg fQogICByZXR1cm4gdHJ1ZTsKIH0KKyNlbmRpZgogCiAvKiBDb21wdXRlcyBhbiBlc3RpbWF0ZWQg bnVtYmVyIG9mIGluc25zIGluIExPT1AuCiAgICBFWElUIChpZiBub24tTlVMTCkgaXMgYW4gZXhp dGUgZWRnZSB0aGF0IHdpbGwgYmUgZWxpbWluYXRlZCBpbiBhbGwgYnV0IGxhc3QKQEAgLTI3Nyw2 ICsyNzksMzEgQEAgdHJlZV9lc3RpbWF0ZV9sb29wX3NpemUgKGNsYXNzIGxvb3AgKmxvb3AsIGVk Z2UgZXhpdCwgZWRnZSBlZGdlX3RvX2NhbmNlbCwKIAogICBpZiAoZHVtcF9maWxlICYmIChkdW1w X2ZsYWdzICYgVERGX0RFVEFJTFMpKQogICAgIGZwcmludGYgKGR1bXBfZmlsZSwgIkVzdGltYXRp bmcgc2l6ZXMgZm9yIGxvb3AgJWlcbiIsIGxvb3AtPm51bSk7CisKKyAgc3RhdGljIGhhc2hfbWFw PHRyZWUsIHRyZWU+ICp2YWxzOworICB2YWxzID0gbmV3IGhhc2hfbWFwPHRyZWUsIHRyZWU+Owor ICBlZGdlIHBlID0gbG9vcF9wcmVoZWFkZXJfZWRnZSAobG9vcCk7CisgIGZvciAoYXV0byBzaSA9 IGdzaV9zdGFydF9waGlzIChsb29wLT5oZWFkZXIpOworICAgICAgICFnc2lfZW5kX3AgKHNpKTsg Z3NpX25leHQgKCZzaSkpCisgICAgeworICAgICAgaWYgKHZpcnR1YWxfb3BlcmFuZF9wIChnaW1w bGVfcGhpX3Jlc3VsdCAoKnNpKSkpCisJY29udGludWU7CisgICAgICB0cmVlIHZhbCA9IGdpbXBs ZV9waGlfYXJnX2RlZl9mcm9tX2VkZ2UgKCpzaSwgcGUpOworICAgICAgaWYgKENPTlNUQU5UX0NM QVNTX1AgKHZhbCkpCisJeworCSAgdmFscy0+cHV0IChnaW1wbGVfcGhpX3Jlc3VsdCAoKnNpKSwg dmFsKTsKKwkgIHRyZWUgZXYgPSBhbmFseXplX3NjYWxhcl9ldm9sdXRpb24gKGxvb3AsIGdpbXBs ZV9waGlfcmVzdWx0ICgqc2kpKTsKKwkgIGlmICghY2hyZWNfY29udGFpbnNfdW5kZXRlcm1pbmVk IChldikKKwkgICAgICAmJiAhY2hyZWNfY29udGFpbnNfc3ltYm9scyAoZXYpKQorCSAgICBzaXpl LT5jb25zdGFudF9pdiA9IHRydWU7CisJfQorICAgIH0KKworICBhdXRvIGVsc192YWx1ZWl6ZSA9 IFtdICh0cmVlIG9wKSAtPiB0cmVlCisgICAgeyBpZiAodHJlZSAqdmFsID0gdmFscy0+Z2V0IChv cCkpIHJldHVybiAqdmFsOyByZXR1cm4gb3A7IH07CisKKyAgYXV0byBwcm9jZXNzX2xvb3AgPSBb Jl0gKCkgLT4gYm9vbAorICAgIHsKICAgZm9yIChpID0gMDsgaSA8IGxvb3AtPm51bV9ub2Rlczsg aSsrKQogICAgIHsKICAgICAgIGlmIChlZGdlX3RvX2NhbmNlbCAmJiBib2R5W2ldICE9IGVkZ2Vf dG9fY2FuY2VsLT5zcmMKQEAgLTMyMyw1NyArMzUwLDUxIEBAIHRyZWVfZXN0aW1hdGVfbG9vcF9z aXplIChjbGFzcyBsb29wICpsb29wLCBlZGdlIGV4aXQsIGVkZ2UgZWRnZV90b19jYW5jZWwsCiAJ CQkgICAgICJpbiBsYXN0IGNvcHkuXG4iKTsKIAkJICBsaWtlbHlfZWxpbWluYXRlZF9sYXN0ID0g dHJ1ZTsKIAkJfQotCSAgICAgIC8qIFNldHMgb2YgSVYgdmFyaWFibGVzICAqLwotCSAgICAgIGlm IChnaW1wbGVfY29kZSAoc3RtdCkgPT0gR0lNUExFX0FTU0lHTgotCQkgICYmIGNvbnN0YW50X2Fm dGVyX3BlZWxpbmcgKGdpbXBsZV9hc3NpZ25fbGhzIChzdG10KSwgc3RtdCwgbG9vcCkpCisJICAg ICAgLyogU3RvcmVzIGFyZSBub3QgZWxpbWluYXRlZC4gICovCisJICAgICAgaWYgKGdpbXBsZV92 ZGVmIChzdG10KSkKKwkJZ290byBhY2NvdW50OworCSAgICAgIC8qIEJlbG93IHdlIGFyZSB1c2lu ZyBjb25zdGFudCBmb2xkaW5nIHRvIGRlY2lkZSB3aGV0aGVyCisJCSB3ZSBjYW4gZWxpZGUgYSBz dG10LiAgV2hpbGUgZm9yIHRoZSBmaXJzdCBpdGVyYXRpb24gd2UKKwkJIGNvdWxkIHVzZSB0aGUg YWN0dWFsIHZhbHVlIGZvciB0aGUgcmVzdCB3ZSBoYXZlIHRvCisJCSBhdm9pZCB0aGUgc2l0dWF0 aW9uIHJlLXVzaW5nIGEgKiAxIG9yICsgMCBvcGVyYW5kLCBzbworCQkgcmVxdWlyZSBhbGwgU1NB IG9wZXJhbmRzIHRvIGJlIGNvbnN0YW50cyBoZXJlLiAgKi8KKwkgICAgICBib29sIGZhaWwgPSBm YWxzZTsKKwkgICAgICBzc2Ffb3BfaXRlciBpdGVyOworCSAgICAgIHVzZV9vcGVyYW5kX3AgdXNl X3A7CisJICAgICAgRk9SX0VBQ0hfU1NBX1VTRV9PUEVSQU5EICh1c2VfcCwgc3RtdCwgaXRlciwg U1NBX09QX1VTRSkKKwkJaWYgKCF2YWxzLT5nZXQgKFVTRV9GUk9NX1BUUiAodXNlX3ApKSkKKwkJ ICB7CisJCSAgICBmYWlsID0gdHJ1ZTsKKwkJICAgIGJyZWFrOworCQkgIH0KKwkgICAgICBpZiAo ZmFpbCkKKwkJZ290byBhY2NvdW50OworCSAgICAgIHRyZWUgdmFsOworCSAgICAgIC8qIFN3aXRj aGVzIGFyZSBub3QgaGFuZGxlZCBieSBmb2xkaW5nLiAgKi8KKwkgICAgICBpZiAoZ2ltcGxlX2Nv ZGUgKHN0bXQpID09IEdJTVBMRV9TV0lUQ0gKKwkJICAmJiAhIGlzX2dpbXBsZV9taW5faW52YXJp YW50CisJCSAgKGdpbXBsZV9zd2l0Y2hfaW5kZXggKGFzX2EgPGdzd2l0Y2ggKj4gKHN0bXQpKSkK KwkJICAmJiB2YWxzLT5nZXQgKGdpbXBsZV9zd2l0Y2hfaW5kZXgKKwkJCQkoYXNfYSA8Z3N3aXRj aCAqPiAoc3RtdCkpKSkKIAkJewogCQkgIGlmIChkdW1wX2ZpbGUgJiYgKGR1bXBfZmxhZ3MgJiBU REZfREVUQUlMUykpCi0JCSAgICBmcHJpbnRmIChkdW1wX2ZpbGUsICIgICBJbmR1Y3Rpb24gdmFy aWFibGUgY29tcHV0YXRpb24gd2lsbCIKLQkJCSAgICAgIiBiZSBmb2xkZWQgYXdheS5cbiIpOwor CQkgICAgZnByaW50ZiAoZHVtcF9maWxlLCAiICAgQ29uc3RhbnQgY29uZGl0aW9uYWwuXG4iKTsK IAkJICBsaWtlbHlfZWxpbWluYXRlZCA9IHRydWU7CiAJCX0KLQkgICAgICAvKiBBc3NpZ25tZW50 cyBvZiBJViB2YXJpYWJsZXMuICAqLwotCSAgICAgIGVsc2UgaWYgKGdpbXBsZV9jb2RlIChzdG10 KSA9PSBHSU1QTEVfQVNTSUdOCi0JCSAgICAgICAmJiBUUkVFX0NPREUgKGdpbXBsZV9hc3NpZ25f bGhzIChzdG10KSkgPT0gU1NBX05BTUUKLQkJICAgICAgICYmIGNvbnN0YW50X2FmdGVyX3BlZWxp bmcgKGdpbXBsZV9hc3NpZ25fcmhzMSAoc3RtdCksCi0JCQkJCQkgIHN0bXQsIGxvb3ApCi0JCSAg ICAgICAmJiAoZ2ltcGxlX2Fzc2lnbl9yaHNfY2xhc3MgKHN0bXQpICE9IEdJTVBMRV9CSU5BUllf UkhTCi0JCQkgICB8fCBjb25zdGFudF9hZnRlcl9wZWVsaW5nIChnaW1wbGVfYXNzaWduX3JoczIg KHN0bXQpLAotCQkJCQkJICAgICAgc3RtdCwgbG9vcCkpCi0JCSAgICAgICAmJiBnaW1wbGVfYXNz aWduX3Joc19jbGFzcyAoc3RtdCkgIT0gR0lNUExFX1RFUk5BUllfUkhTKQorCSAgICAgIGVsc2Ug aWYgKCh2YWwgPSBnaW1wbGVfZm9sZF9zdG10X3RvX2NvbnN0YW50IChzdG10LCBlbHNfdmFsdWVp emUpKQorCQkgICAgICAgJiYgQ09OU1RBTlRfQ0xBU1NfUCAodmFsKSkKIAkJewotCQkgIHNpemUt PmNvbnN0YW50X2l2ID0gdHJ1ZTsKIAkJICBpZiAoZHVtcF9maWxlICYmIChkdW1wX2ZsYWdzICYg VERGX0RFVEFJTFMpKQogCQkgICAgZnByaW50ZiAoZHVtcF9maWxlLAogCQkJICAgICAiICAgQ29u c3RhbnQgZXhwcmVzc2lvbiB3aWxsIGJlIGZvbGRlZCBhd2F5LlxuIik7CiAJCSAgbGlrZWx5X2Vs aW1pbmF0ZWQgPSB0cnVlOwotCQl9Ci0JICAgICAgLyogQ29uZGl0aW9uYWxzLiAgKi8KLQkgICAg ICBlbHNlIGlmICgoZ2ltcGxlX2NvZGUgKHN0bXQpID09IEdJTVBMRV9DT05ECi0JCQkmJiBjb25z dGFudF9hZnRlcl9wZWVsaW5nIChnaW1wbGVfY29uZF9saHMgKHN0bXQpLCBzdG10LAotCQkJCQkJ ICAgbG9vcCkKLQkJCSYmIGNvbnN0YW50X2FmdGVyX3BlZWxpbmcgKGdpbXBsZV9jb25kX3JocyAo c3RtdCksIHN0bXQsCi0JCQkJCQkgICBsb29wKQotCQkJLyogV2UgZG9uJ3Qgc2ltcGxpZnkgYWxs IGNvbnN0YW50IGNvbXBhcmVzIHNvIG1ha2Ugc3VyZQotCQkJICAgdGhleSBhcmUgbm90IGJvdGgg Y29uc3RhbnQgYWxyZWFkeS4gIFNlZSBQUjcwMjg4LiAgKi8KLQkJCSYmICghIGlzX2dpbXBsZV9t aW5faW52YXJpYW50IChnaW1wbGVfY29uZF9saHMgKHN0bXQpKQotCQkJICAgIHx8ICEgaXNfZ2lt cGxlX21pbl9pbnZhcmlhbnQKLQkJCQkgKGdpbXBsZV9jb25kX3JocyAoc3RtdCkpKSkKLQkJICAg ICAgIHx8IChnaW1wbGVfY29kZSAoc3RtdCkgPT0gR0lNUExFX1NXSVRDSAotCQkJICAgJiYgY29u c3RhbnRfYWZ0ZXJfcGVlbGluZyAoZ2ltcGxlX3N3aXRjaF9pbmRleCAoCi0JCQkJCQkJYXNfYSA8 Z3N3aXRjaCAqPgotCQkJCQkJCSAgKHN0bXQpKSwKLQkJCQkJCSAgICAgIHN0bXQsIGxvb3ApCi0J CQkgICAmJiAhIGlzX2dpbXBsZV9taW5faW52YXJpYW50Ci0JCQkJICAgKGdpbXBsZV9zd2l0Y2hf aW5kZXgKLQkJCQkgICAgICAoYXNfYSA8Z3N3aXRjaCAqPiAoc3RtdCkpKSkpCi0JCXsKLQkJICBp ZiAoZHVtcF9maWxlICYmIChkdW1wX2ZsYWdzICYgVERGX0RFVEFJTFMpKQotCQkgICAgZnByaW50 ZiAoZHVtcF9maWxlLCAiICAgQ29uc3RhbnQgY29uZGl0aW9uYWwuXG4iKTsKLQkJICBsaWtlbHlf ZWxpbWluYXRlZCA9IHRydWU7CisJCSAgaWYgKHRyZWUgbGhzID0gZ2ltcGxlX2dldF9saHMgKHN0 bXQpKQorCQkgICAgaWYgKFRSRUVfQ09ERSAobGhzKSA9PSBTU0FfTkFNRSkKKwkJICAgICAgdmFs cy0+cHV0IChsaHMsIHZhbCk7CiAJCX0KIAkgICAgfQogCithY2NvdW50OgogCSAgc2l6ZS0+b3Zl cmFsbCArPSBudW07CiAJICBpZiAobGlrZWx5X2VsaW1pbmF0ZWQgfHwgbGlrZWx5X2VsaW1pbmF0 ZWRfcGVlbGVkKQogCSAgICBzaXplLT5lbGltaW5hdGVkX2J5X3BlZWxpbmcgKz0gbnVtOwpAQCAt Mzg2LDExICs0MDcsNTUgQEAgdHJlZV9lc3RpbWF0ZV9sb29wX3NpemUgKGNsYXNzIGxvb3AgKmxv b3AsIGVkZ2UgZXhpdCwgZWRnZSBlZGdlX3RvX2NhbmNlbCwKIAkgIGlmICgoc2l6ZS0+b3ZlcmFs bCAqIDMgLyAyIC0gc2l6ZS0+ZWxpbWluYXRlZF9ieV9wZWVsaW5nCiAJICAgICAgLSBzaXplLT5s YXN0X2l0ZXJhdGlvbl9lbGltaW5hdGVkX2J5X3BlZWxpbmcpID4gdXBwZXJfYm91bmQpCiAJICAg IHsKLSAgICAgICAgICAgICAgZnJlZSAoYm9keSk7Ci0JICAgICAgcmV0dXJuIHRydWU7CisJICAg ICAgZnJlZSAoYm9keSk7CisJICAgICAgZGVsZXRlIHZhbHM7CisJICAgICAgdmFscyA9IG51bGxw dHI7CisJICAgICAgcmV0dXJuIGZhbHNlOwogCSAgICB9CiAJfQogICAgIH0KKyAgcmV0dXJuIHRy dWU7CisgIH07CisKKyAgLyogRXN0aW1hdGUgdGhlIHNpemUgb2YgdGhlIHVucm9sbGVkIGZpcnN0 IGl0ZXJhdGlvbi4gICovCisgIGlmICghcHJvY2Vzc19sb29wICgpKQorICAgIHJldHVybiB0cnVl OworCisgIC8qIERldGVybWluZSB3aGV0aGVyIHRoZSBJVnMgd2lsbCBzdGF5IGNvbnN0YW50ICh3 ZSBzaW1wbHkgYXNzdW1lIHRoYXQKKyAgICAgaWYgdGhlIDJuZCBpdGVyYXRpb24gcmVjZWl2ZXMg YSBjb25zdGFudCB2YWx1ZSB0aGUgdGhpcmQgYW5kIGFsbAorICAgICBmdXJ0aGVyIHdpbGwgc28g YXMgd2VsbCkuICAqLworICBmb3IgKGF1dG8gc2kgPSBnc2lfc3RhcnRfcGhpcyAobG9vcC0+aGVh ZGVyKTsKKyAgICAgICAhZ3NpX2VuZF9wIChzaSk7IGdzaV9uZXh0ICgmc2kpKQorICAgIHsKKyAg ICAgIGlmICh2aXJ0dWFsX29wZXJhbmRfcCAoZ2ltcGxlX3BoaV9yZXN1bHQgKCpzaSkpKQorCWNv bnRpbnVlOworICAgICAgdHJlZSBkZWYgPSBnaW1wbGVfcGhpX2FyZ19kZWZfZnJvbV9lZGdlICgq c2ksIGxvb3BfbGF0Y2hfZWRnZSAobG9vcCkpOworICAgICAgaWYgKENPTlNUQU5UX0NMQVNTX1Ag KGRlZikgfHwgdmFscy0+Z2V0IChkZWYpKQorCS8qID8/PyAgSWYgd2UgY29tcHV0ZSB0aGUgZmly c3QgaXRlcmF0aW9uIHNpemUgc2VwYXJhdGVseSB3ZQorCSAgIGNvdWxkIGFsc28gaGFuZGxlIGFu IGludmFyaWFudCBiYWNrZWRnZSB2YWx1ZSBtb3JlCisJICAgb3B0aW1pc3RpY2FsbHkuCisJICAg Pz8/ICBOb3RlIHRoZSBhY3R1YWwgdmFsdWUgd2UgbGVhdmUgaGVyZSBtYXkgc3RpbGwgaGF2ZSBh bgorCSAgIGVmZmVjdCBvbiB0aGUgY29uc3RhbnQtbmVzcy4gICovCisJOworICAgICAgZWxzZQor CXZhbHMtPnJlbW92ZSAoZ2ltcGxlX3BoaV9yZXN1bHQgKCpzaSkpOworICAgIH0KKworICAvKiBS ZXNldCBzaXplcyBhbmQgY29tcHV0ZSB0aGUgc2l6ZSBiYXNlZCBvbiB0aGUgYWRqdXN0bWVudCBh Ym92ZS4KKyAgICAgPz8/ICBXZSBjb3VsZCBrZWVwIHRoZSBtb3JlIHByZWNpc2UgYW5kIG9wdGlt aXN0aWMgY291bnRzIGZvcgorICAgICB0aGUgZmlyc3QgaXRlcmF0aW9uLiAgKi8KKyAgc2l6ZS0+ b3ZlcmFsbCA9IDA7CisgIHNpemUtPmVsaW1pbmF0ZWRfYnlfcGVlbGluZyA9IDA7CisgIHNpemUt Pmxhc3RfaXRlcmF0aW9uID0gMDsKKyAgc2l6ZS0+bGFzdF9pdGVyYXRpb25fZWxpbWluYXRlZF9i eV9wZWVsaW5nID0gMDsKKyAgc2l6ZS0+bnVtX3B1cmVfY2FsbHNfb25faG90X3BhdGggPSAwOwor ICBzaXplLT5udW1fbm9uX3B1cmVfY2FsbHNfb25faG90X3BhdGggPSAwOworICBzaXplLT5ub25f Y2FsbF9zdG10c19vbl9ob3RfcGF0aCA9IDA7CisgIHNpemUtPm51bV9icmFuY2hlc19vbl9ob3Rf cGF0aCA9IDA7CisgIHNpemUtPmNvbnN0YW50X2l2ID0gMDsKKyAgaWYgKCFwcm9jZXNzX2xvb3Ag KCkpCisgICAgcmV0dXJuIHRydWU7CisKICAgd2hpbGUgKHBhdGgubGVuZ3RoICgpKQogICAgIHsK ICAgICAgIGJhc2ljX2Jsb2NrIGJiID0gcGF0aC5wb3AgKCk7CkBAIC00MTIsMTMgKzQ3NywxMCBA QCB0cmVlX2VzdGltYXRlX2xvb3Bfc2l6ZSAoY2xhc3MgbG9vcCAqbG9vcCwgZWRnZSBleGl0LCBl ZGdlIGVkZ2VfdG9fY2FuY2VsLAogCSAgZWxzZSBpZiAoZ2ltcGxlX2NvZGUgKHN0bXQpICE9IEdJ TVBMRV9ERUJVRykKIAkgICAgc2l6ZS0+bm9uX2NhbGxfc3RtdHNfb25faG90X3BhdGgrKzsKIAkg IGlmICgoKGdpbXBsZV9jb2RlIChzdG10KSA9PSBHSU1QTEVfQ09ORAotCSAgICAgICAgJiYgKCFj b25zdGFudF9hZnRlcl9wZWVsaW5nIChnaW1wbGVfY29uZF9saHMgKHN0bXQpLCBzdG10LCBsb29w KQotCQkgICAgfHwgIWNvbnN0YW50X2FmdGVyX3BlZWxpbmcgKGdpbXBsZV9jb25kX3JocyAoc3Rt dCksIHN0bXQsCi0JCQkJCQlsb29wKSkpCisJCSYmICghdmFscy0+Z2V0IChnaW1wbGVfY29uZF9s aHMgKHN0bXQpKQorCQkgICAgfHwgIXZhbHMtPmdldCAoZ2ltcGxlX2NvbmRfcmhzIChzdG10KSkp KQogCSAgICAgICB8fCAoZ2ltcGxlX2NvZGUgKHN0bXQpID09IEdJTVBMRV9TV0lUQ0gKLQkJICAg JiYgIWNvbnN0YW50X2FmdGVyX3BlZWxpbmcgKGdpbXBsZV9zd2l0Y2hfaW5kZXggKAotCQkJCQkJ IGFzX2EgPGdzd2l0Y2ggKj4gKHN0bXQpKSwKLQkJCQkJICAgICAgIHN0bXQsIGxvb3ApKSkKKwkJ ICAgJiYgIXZhbHMtPmdldCAoZ2ltcGxlX3N3aXRjaF9pbmRleCAoYXNfYSA8Z3N3aXRjaCAqPiAo c3RtdCkpKSkpCiAJICAgICAgJiYgKCFleGl0IHx8IGJiICE9IGV4aXQtPnNyYykpCiAJICAgIHNp emUtPm51bV9icmFuY2hlc19vbl9ob3RfcGF0aCsrOwogCX0KQEAgLTQzMCw2ICs0OTIsOCBAQCB0 cmVlX2VzdGltYXRlX2xvb3Bfc2l6ZSAoY2xhc3MgbG9vcCAqbG9vcCwgZWRnZSBleGl0LCBlZGdl IGVkZ2VfdG9fY2FuY2VsLAogCSAgICAgc2l6ZS0+bGFzdF9pdGVyYXRpb25fZWxpbWluYXRlZF9i eV9wZWVsaW5nKTsKIAogICBmcmVlIChib2R5KTsKKyAgZGVsZXRlIHZhbHM7CisgIHZhbHMgPSBu dWxscHRyOwogICByZXR1cm4gZmFsc2U7CiB9CiAKLS0gCjIuMzUuMwoK --0000000000003931b906185100f0--