From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22126 invoked by alias); 19 Oct 2018 03:10:57 -0000 Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org Received: (qmail 22105 invoked by uid 89); 19 Oct 2018 03:10:56 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Received:e28a, website, HTo:U*gsl-discuss X-HELO: mail-oi1-f177.google.com Received: from mail-oi1-f177.google.com (HELO mail-oi1-f177.google.com) (209.85.167.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 19 Oct 2018 03:10:55 +0000 Received: by mail-oi1-f177.google.com with SMTP id 22-v6so25757501oiz.2 for ; Thu, 18 Oct 2018 20:10:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=H79osXwwEJkLwV/qRLcLx6mROxFPbEmkwgfcuePOleY=; b=XJdK7+cwFTW/6evIj/FSGRq3mh7IS+XPT9lLOKu8+XxY0hXK7htBYP1O2TAwmAK2CY PTJoGuX1DtBcgnqAa+xv/kA4yX9XmQN5bdFuCGHA0NxL8zhrdrYZUFDcxaGWJ60KtYRI h19WF2C0NboYMx8ysnkisw2MUpF8EosOnf1BYsz6mRC+lrHi/PjrURP9aWwo0hDtNnTM 2+k1vdoaxNuPKNXFlDkHclFabHSffJcaV4lbgYtE/SUgCv26QV0/+uTFxDdQ4V89vDS3 uAFZ1d9iX507jNsZ1e1tVs32iF01rDjjagkgME6x6Gu6gTh3qY3fBwi3YI8CqhJXDAQE Xsmg== MIME-Version: 1.0 From: Max Bruce Date: Fri, 19 Oct 2018 03:10:00 -0000 Message-ID: Subject: Reduce cache misses for source_gemm_r To: gsl-discuss@sourceware.org Content-Type: multipart/mixed; boundary="00000000000056747505788c40d5" X-SW-Source: 2018-q4/txt/msg00000.txt.bz2 --00000000000056747505788c40d5 Content-Type: text/plain; charset="UTF-8" Content-length: 437 Hey guys, I'm new to contributing to GNU projects, but... I'm guessing I send commits through here? Would appreciate some sort of note on the procedure on the website I noticed that your matrix multiplication code had bad cache performance due to a misordering of a loop. In a replicated version of my change, I saw about 20% performance gains on my AMD FX CPU. Do let me know if this is not the correct contribution procedure. -Max --00000000000056747505788c40d5 Content-Type: text/x-patch; charset="US-ASCII"; name="0001-Reduce-cache-misses-for-source_gemm_r.patch" Content-Disposition: attachment; filename="0001-Reduce-cache-misses-for-source_gemm_r.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jnffss7q0 Content-length: 2201 RnJvbSAwMzQ1ZWFmMmViNDg5OTdmYTNkMDBmYWUyYjM3Y2Y0MTZkMzcxM2Q0 IE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBKYXZhUHJvcGhldCA8 bWF4LmJydWNlMTJAZ21haWwuY29tPgpEYXRlOiBUaHUsIDE4IE9jdCAyMDE4 IDIwOjAwOjQ3IC0wNzAwClN1YmplY3Q6IFtQQVRDSF0gUmVkdWNlIGNhY2hl IG1pc3NlcyBmb3Igc291cmNlX2dlbW1fcgoKLS0tCiBjYmxhcy9zb3VyY2Vf Z2VtbV9yLmggfCAxMCArKysrKy0tLS0tCiAxIGZpbGUgY2hhbmdlZCwgNSBp bnNlcnRpb25zKCspLCA1IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2Ni bGFzL3NvdXJjZV9nZW1tX3IuaCBiL2NibGFzL3NvdXJjZV9nZW1tX3IuaApp bmRleCBhMDA4ZDIyLi43Yzk4NDhlIDEwMDY0NAotLS0gYS9jYmxhcy9zb3Vy Y2VfZ2VtbV9yLmgKKysrIGIvY2JsYXMvc291cmNlX2dlbW1fci5oCkBAIC03 MSw4ICs3MSw4IEBACiAKICAgICAvKiBmb3JtICBDIDo9IGFscGhhKkEqQiAr IEMgKi8KIAotICAgIGZvciAoayA9IDA7IGsgPCBLOyBrKyspIHsKLSAgICAg IGZvciAoaSA9IDA7IGkgPCBuMTsgaSsrKSB7CisgICAgZm9yIChpID0gMDsg aSA8IG4xOyBpKyspIHsKKyAgICAgIGZvciAoayA9IDA7IGsgPCBLOyBrKysp IHsKICAgICAgICAgY29uc3QgQkFTRSB0ZW1wID0gYWxwaGEgKiBGW2xkZiAq IGkgKyBrXTsKICAgICAgICAgaWYgKHRlbXAgIT0gMC4wKSB7CiAgICAgICAg ICAgZm9yIChqID0gMDsgaiA8IG4yOyBqKyspIHsKQEAgLTg2LDggKzg2LDgg QEAKIAogICAgIC8qIGZvcm0gIEMgOj0gYWxwaGEqQSpCJyArIEMgKi8KIAor ICBmb3IgKGogPSAwOyBqIDwgbjI7IGorKykgewogICAgIGZvciAoaSA9IDA7 IGkgPCBuMTsgaSsrKSB7Ci0gICAgICBmb3IgKGogPSAwOyBqIDwgbjI7IGor KykgewogICAgICAgICBCQVNFIHRlbXAgPSAwLjA7CiAgICAgICAgIGZvciAo ayA9IDA7IGsgPCBLOyBrKyspIHsKICAgICAgICAgICB0ZW1wICs9IEZbbGRm ICogaSArIGtdICogR1tsZGcgKiBqICsga107CkBAIC05OCw4ICs5OCw4IEBA CiAKICAgfSBlbHNlIGlmIChUcmFuc0YgPT0gQ2JsYXNUcmFucyAmJiBUcmFu c0cgPT0gQ2JsYXNOb1RyYW5zKSB7CiAKKyAgZm9yIChpID0gMDsgaSA8IG4x OyBpKyspIHsKICAgICBmb3IgKGsgPSAwOyBrIDwgSzsgaysrKSB7Ci0gICAg ICBmb3IgKGkgPSAwOyBpIDwgbjE7IGkrKykgewogICAgICAgICBjb25zdCBC QVNFIHRlbXAgPSBhbHBoYSAqIEZbbGRmICogayArIGldOwogICAgICAgICBp ZiAodGVtcCAhPSAwLjApIHsKICAgICAgICAgICBmb3IgKGogPSAwOyBqIDwg bjI7IGorKykgewpAQCAtMTExLDggKzExMSw4IEBACiAKICAgfSBlbHNlIGlm IChUcmFuc0YgPT0gQ2JsYXNUcmFucyAmJiBUcmFuc0cgPT0gQ2JsYXNUcmFu cykgewogCisgIGZvciAoaiA9IDA7IGogPCBuMjsgaisrKSB7CiAgICAgZm9y IChpID0gMDsgaSA8IG4xOyBpKyspIHsKLSAgICAgIGZvciAoaiA9IDA7IGog PCBuMjsgaisrKSB7CiAgICAgICAgIEJBU0UgdGVtcCA9IDAuMDsKICAgICAg ICAgZm9yIChrID0gMDsgayA8IEs7IGsrKykgewogICAgICAgICAgIHRlbXAg Kz0gRltsZGYgKiBrICsgaV0gKiBHW2xkZyAqIGogKyBrXTsKLS0gCjIuNy40 Cgo= --00000000000056747505788c40d5--