hw1: done ex1

2022-09-27 10:39:48 +02:00 · 2022-09-27 10:39:48 +02:00 · 262701b276
commit 262701b276
parent 27fc66cf14
5 changed files with 6398 additions and 2 deletions
--- a/Project1/generic_cluster.pdf
+++ b/Project1/generic_cluster.pdf
--- a/Project1/generic_macos.pdf
+++ b/Project1/generic_macos.pdf
--- a/Project1/project_1_maggioni_claudio.pdf
+++ b/Project1/project_1_maggioni_claudio.pdf
--- a/Project1/project_1_maggioni_claudio.tex
+++ b/Project1/project_1_maggioni_claudio.tex
@ -18,6 +18,8 @@ on the ICS Cluster .

 \section{Explaining Memory Hierarchies \punkte{25}}

+\subsection{Memory Hierarchy Parameters of the Cluster}
+
 By identifying the memory hierarchy parameters through \texttt{likwid-topology} 
 for the cache topology and \texttt{free -g} for the amount of primary memory I
 find the following values:
@ -70,6 +72,44 @@ Socket 1:
 +---------------------------------------------------------------------------------------------------------------+
 \end{Verbatim}

+\subsection{Memory Access Pattern of \texttt{membench.c}}
+
+The benchmark \texttt{membench.c} measures the average time of repeated read and
+write overations across a set of indices of a stack-allocated array of 32-bit
+signed integers. The indices vary according to the access pattern used, which in
+turn is defined by two variables, \texttt{csize} and \texttt{stride}.
+\texttt{csize} is an upper bound on the index value, i.e.  (one more of) the
+highest index used to access the array in the pattern.  \texttt{stride}
+determines the difference between array indexes over access iterations, i.e. a
+\texttt{stride} of 1 will access every array index, a \texttt{stride} of 2 will
+skip every other index, a \texttt{stride} of 4 will access one index then skip 3
+and so on and so forth.
+
+Therefore, for \texttt{csize = 128} and \texttt{stride = 1} the array will
+access all indexes between 0 and 127 sequentially, and for \texttt{csize =
+$2^{20}$} and \texttt{stride = $2^{10}$} the benchmark will access index 0, then
+index $2^{10}-1$, and finally index $2^{20}-1$i.
+
+\subsection{Analyzing Benchmark Results}
+
+The \texttt{membench.c} benchmark results for my personal laptop (Macbook Pro
+2018 with a Core i7-8750H CPU) and the cluster are shown below respectively:
+
+\begin{center}
+\includegraphics[width=12cm]{generic_macos.pdf}
+\includegraphics[width=12cm]{generic_cluster.pdf}
+\end{center}
+
+The memory access graph for the cluster's benchmark results shows that temporal
+locality is best for small array sizes and for small \texttt{stride} values.
+In particular, for array memory sizes of 16MB or lower (\texttt{csize} of $4
+\cdot 2^{20}$ or lower) and \texttt{stride} values of 2048 or lower the mean
+read+write time is less than 10 nanoseconds. Temporal locality is worst for
+large sizes and strides, although the largest values of \texttt{stride} for each
+size (like \texttt{csize / 2} and \texttt{csize / 4}) achieve better mean times
+due to the few elements accessed in the pattern (this observation is also valid
+for the largest strides of each size series shown in the graph).
+
 \section{Optimize Square Matrix-Matrix Multiplication  \punkte{60}}


--- a/Project1/project_1_maggioni_claudio/membench/generic_macos.ps
+++ b/Project1/project_1_maggioni_claudio/membench/generic_macos.ps
@ -1650,11 +1650,11 @@ LTb
 LCb setrgbcolor
 LCb setrgbcolor
 3774 4829 M
-[ [(Helvetica) 140.0 0.0 true true 0 (10-Core Intel\(R\) Xeon\(R\) CPU E3-1585L v5 )]
+[ [(Helvetica) 140.0 0.0 true true 0 (6-Core Intel\(R\) Core\(R\) CPU i7-8750H )]
 XYsave
 [(Helvetica) 140.0 0.0 true true 0 ( )]
 XYrestore
-[(Helvetica) 140.0 0.0 true true 0 (3.00GHz Read+Write \(ns\) Versus Stride)]
+[(Helvetica) 140.0 0.0 true true 0 (4.10GHz Read+Write \(ns\) Versus Stride)]
 ] -46.7 MCshow
 /Helvetica findfont 140 scalefont setfont
 LTb