public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [hsa-branch 0/5] Gridification support for tiling algorithms
@ 2016-06-09 14:01 Martin Jambor
  2016-06-09 14:01 ` [hsa-branch 4/5] New HSA builtins needed for tiling Martin Jambor
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Martin Jambor @ 2016-06-09 14:01 UTC (permalink / raw)
  To: GCC Patches

Hi,

this patch series, currently intended just for the branch but
eventually also for trunk in time for gcc 7, enables gridification,
that is the expansion of OpenMP loops for HSA GPUs, to work with
separate distribute and loop construct, provided that the step size of
the distribute loop is equal to the iteration size of iteration space
of the  "normal" loops in it.

It also allows the HSA back-end to emit group-segment variables and
expands any variables private to a distribute construct as such.

Apart from increased flexibility, one of the main motivations is to
enable tiling.  The patches enable the compiler to grok the matrix
multiplication code example and and emit it to HSA, which than runs
2.5 times faster (in my very non-scientific settings) than a naive
implementation (compiled for HSA).

Thanks,

Martin


#define BLOCK_SIZE 16

void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
   const float*B, const int LDB, const float beta, float*C, const int LDC){

#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
#pragma omp distribute collapse(2)
   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
	{
         /* Each team has a local copy of these mini matrices */
         float As[BLOCK_SIZE][BLOCK_SIZE];
         float Bs[BLOCK_SIZE][BLOCK_SIZE];
#pragma omp parallel
	 {
         int C_row, C_col;
         float Cval = 0.0;

         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
	   {
#pragma omp for collapse(2)
	     for (int row=0 ; row < BLOCK_SIZE ; row++)
               for (int col=0 ; col < BLOCK_SIZE ; col++)
		 {
		   C_row = C_row_start + row;
		   C_col = C_col_start + col;
		   if ((C_row < M) && (kblock + col < K))
		     As[row][col] = A[(C_row*LDA)+ kblock + col];
		   else
		     As[row][col] = 0;
		   if ((kblock + row < K) && C_col < N)
		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
		   else
		     Bs[row][col] = 0;
		 }

#pragma omp for collapse(2)
	     for (int row=0 ; row < BLOCK_SIZE ; row++)
	       for (int col=0 ; col < BLOCK_SIZE ; col++)
		 {
		   for (int e = 0; e < BLOCK_SIZE; ++e)
                     Cval += As[row][e] * Bs[e][col];
		 }
	   }  /* End for kblock .. */


#pragma omp for collapse(2)
         for (int row=0 ; row < BLOCK_SIZE ; row++)
	   for (int col=0 ; col < BLOCK_SIZE ; col++)
	     {
               C_row = C_row_start + row;
               C_col = C_col_start + col;
	       if ((C_row < M) && (C_col < N))
		 C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];

	     }
         } /* end parallel */
      }	   /* end target teams distribute */
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-09 14:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 14:01 [hsa-branch 0/5] Gridification support for tiling algorithms Martin Jambor
2016-06-09 14:01 ` [hsa-branch 4/5] New HSA builtins needed for tiling Martin Jambor
2016-06-09 14:01 ` [hsa-branch 3/5] Reorganize HSA branches representation Martin Jambor
2016-06-09 14:01 ` [hsa-branch 5/5] OMP lowering/expansion changes to gridify tiled loops Martin Jambor
2016-06-09 14:01 ` [hsa-branch 1/5] Allow putting local variables into group and global segments Martin Jambor
2016-06-09 14:01 ` [hsa-branch 2/5] Make emit_insn_operands handle zero operands Martin Jambor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).