Matrix Method (c)RS

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Matrix Method (c)RS
@ 2024-01-16 11:18 Duke Abbaddon
  0 siblings, 0 replies; only message in thread
From: Duke Abbaddon @ 2024-01-16 11:18 UTC (permalink / raw)
  To: opencode

Matrix Method (c)RS

Any GPU & CPU SiMD can do a form of Matrix maths in an Array Parallel
Load & Run as consecutive tasks..

Like So

Matrix Formulas : (c)RS

SiMD Array A to X, Usually 8, 16, 32, 64 Parallel Groups

Grouped Parallel Runs
A 1, 2, 3, N
B 1, 2, 3, N
to
Y 1, 2, 3, N
X 1, 2, 3, N
Run 1 {A1, B1 to X1, Y1} Run 2+ {A2, B2 to X2, Y2}++ {An, Bn to Xn, Yn}

Matrix Processor Method Synchronous Cube Map Usually 8x8, 16x16,
32x32, 64x64 Parallel Quad++ Groups

2D:3D Cube

A 1, 2, 3, N
B1, 2, 3, N
C1, 2, 3, N
D1, 2, 3, N

Run 1 2D:3D Cube {
A 1, 2, 3, N
B1, 2, 3, N
C1, 2, 3, N
D1, 2, 3, N
};

Run N 2D:3D Cube {
A 1, 2, 3, N
B1, 2, 3, N
C1, 2, 3, N
D1, 2, 3, N
}

Rupert Summerskill

ML Batch Matrix MAP in FPGA
https://drive.google.com/file/d/1hdxeK1r8LIhvpn7poOm3MfXmGr9Tq-ni/view?usp=sharing

ML Compressed Dynamic16bit-8Bit - Hardware-friendly compression and
hardware acceleration for ML Transformer
https://aimspress.com/article/doi/10.3934/era.2022192

Matrix Processors - Memory & command - All-Digital Compute-In-Memory
FPGA Architecture for Deep Learning Acceleration
https://dl.acm.org/doi/pdf/10.1145/3640469

Matrix Processors - Inline Ram & Command { CMD : RAM }:{NET}
https://www.xilinx.com/content/dam/xilinx/support/documents/white_papers/wp506-ai-engine.pdf
https://www.xilinx.com/content/dam/xilinx/support/documents/white_papers/EW2020-Deep-Learning-Inference-AICore.pdf

TAC (Tiny Anomaly Compression)
https://pypi.org/project/Conect2ai/

Inference on any device with a C99 compiler
https://pypi.org/project/emlearn/

to run without activating C99; Installs under Python 3.10+
https://github.com/emlearn/emlearn-micropython
https://github.com/emlearn/emlearn-micropython/releases
git clone https://github.com/emlearn/emlearn-micropython

Rupert S

https://is.gd/LEDSource

https://science.n-helix.com/2023/06/ptp.html
https://science.n-helix.com/2023/06/map.html
https://science.n-helix.com/2023/06/tops.html
https://science.n-helix.com/2022/01/ntp.html
https://science.n-helix.com/2023/02/pm-qos.html

https://science.n-helix.com/2023/07/3dchiplet.html

https://science.n-helix.com/2018/01/integer-floats-with-remainder-theory.html
https://science.n-helix.com/2021/02/multi-operation-maths.html
https://science.n-helix.com/2021/11/parallel-execution.html
https://science.n-helix.com/2022/12/math-error-solve.html
https://science.n-helix.com/2021/03/brain-bit-precision-int32-fp32-int16.html
https://science.n-helix.com/2022/10/ml.html

https://drive.google.com/file/d/1li5MDf5FFPMEdpsgX6OEpn79aWZE19PW/view?usp=drive_link
https://is.gd/HuffBrotliAE

ML tensor + ONNX Learner libraries & files
Model examples in models folder

https://is.gd/DictionarySortJS
https://is.gd/UpscalerUSB_ROM
https://is.gd/UpscaleWinDL
https://is.gd/HPC_HIP_CUDA

https://is.gd/OpenStreamingCodecs

ML_With_USB_Stress-Testing_USB_Accelerators_for_Efficient_Edge
https://drive.google.com/file/d/1s2DORhFyvg0jT7AMhtTPdyPk0Aimdemi/view?usp=drive_link
https://github.com/raphischer/edge-acc

TAC (Tiny Anomaly Compression)
https://pypi.org/project/Conect2ai/

Inference on any device with a C99 compiler
https://pypi.org/project/emlearn/

to run without activating C99; Installs under Python 3.10+
https://github.com/emlearn/emlearn-micropython
https://github.com/emlearn/emlearn-micropython/releases
git clone https://github.com/emlearn/emlearn-micropython

Tensor-light? With DOT4 4x8Bit packed U32, Now you could tensor-light,
The effect of Tri-linear upscaling is truly epic! You can F32, F64,
Float, SiMD & INT8, All in the same bag: ONNX Optimiser & Chrome

Batch Size 240W>65W, 32GB{64, 16}, 15W>5W, 4gb{16, 1} : 16, 8, 4 seems optimal,
Time taken compatible:

Desktop (Dell OptiPlex 7040) Intel Core i7-6700 32 GB 240W 64
Laptop (Dell Latitude 7410) Intel Core i7-5650U 32 GB 65W 16
RasPi (Raspberry Pi 4) ARM Cortex-A7 2 4 GB 15W 16
Google Coral Accelerator Edge TPU — 4.5W 1
Intel Neural Compute Stick 2 Movidius Myriad VPU 4 GB 4.5W 1

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-01-16 11:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-16 11:18 Matrix Method (c)RS Duke Abbaddon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).