hw1: done ex1
This commit is contained in:
parent
27fc66cf14
commit
262701b276
5 changed files with 6398 additions and 2 deletions
3183
Project1/generic_cluster.pdf
Normal file
3183
Project1/generic_cluster.pdf
Normal file
File diff suppressed because it is too large
Load diff
3173
Project1/generic_macos.pdf
Normal file
3173
Project1/generic_macos.pdf
Normal file
File diff suppressed because it is too large
Load diff
Binary file not shown.
|
@ -18,6 +18,8 @@ on the ICS Cluster .
|
|||
|
||||
\section{Explaining Memory Hierarchies \punkte{25}}
|
||||
|
||||
\subsection{Memory Hierarchy Parameters of the Cluster}
|
||||
|
||||
By identifying the memory hierarchy parameters through \texttt{likwid-topology}
|
||||
for the cache topology and \texttt{free -g} for the amount of primary memory I
|
||||
find the following values:
|
||||
|
@ -70,6 +72,44 @@ Socket 1:
|
|||
+---------------------------------------------------------------------------------------------------------------+
|
||||
\end{Verbatim}
|
||||
|
||||
\subsection{Memory Access Pattern of \texttt{membench.c}}
|
||||
|
||||
The benchmark \texttt{membench.c} measures the average time of repeated read and
|
||||
write overations across a set of indices of a stack-allocated array of 32-bit
|
||||
signed integers. The indices vary according to the access pattern used, which in
|
||||
turn is defined by two variables, \texttt{csize} and \texttt{stride}.
|
||||
\texttt{csize} is an upper bound on the index value, i.e. (one more of) the
|
||||
highest index used to access the array in the pattern. \texttt{stride}
|
||||
determines the difference between array indexes over access iterations, i.e. a
|
||||
\texttt{stride} of 1 will access every array index, a \texttt{stride} of 2 will
|
||||
skip every other index, a \texttt{stride} of 4 will access one index then skip 3
|
||||
and so on and so forth.
|
||||
|
||||
Therefore, for \texttt{csize = 128} and \texttt{stride = 1} the array will
|
||||
access all indexes between 0 and 127 sequentially, and for \texttt{csize =
|
||||
$2^{20}$} and \texttt{stride = $2^{10}$} the benchmark will access index 0, then
|
||||
index $2^{10}-1$, and finally index $2^{20}-1$i.
|
||||
|
||||
\subsection{Analyzing Benchmark Results}
|
||||
|
||||
The \texttt{membench.c} benchmark results for my personal laptop (Macbook Pro
|
||||
2018 with a Core i7-8750H CPU) and the cluster are shown below respectively:
|
||||
|
||||
\begin{center}
|
||||
\includegraphics[width=12cm]{generic_macos.pdf}
|
||||
\includegraphics[width=12cm]{generic_cluster.pdf}
|
||||
\end{center}
|
||||
|
||||
The memory access graph for the cluster's benchmark results shows that temporal
|
||||
locality is best for small array sizes and for small \texttt{stride} values.
|
||||
In particular, for array memory sizes of 16MB or lower (\texttt{csize} of $4
|
||||
\cdot 2^{20}$ or lower) and \texttt{stride} values of 2048 or lower the mean
|
||||
read+write time is less than 10 nanoseconds. Temporal locality is worst for
|
||||
large sizes and strides, although the largest values of \texttt{stride} for each
|
||||
size (like \texttt{csize / 2} and \texttt{csize / 4}) achieve better mean times
|
||||
due to the few elements accessed in the pattern (this observation is also valid
|
||||
for the largest strides of each size series shown in the graph).
|
||||
|
||||
\section{Optimize Square Matrix-Matrix Multiplication \punkte{60}}
|
||||
|
||||
|
||||
|
|
|
@ -1650,11 +1650,11 @@ LTb
|
|||
LCb setrgbcolor
|
||||
LCb setrgbcolor
|
||||
3774 4829 M
|
||||
[ [(Helvetica) 140.0 0.0 true true 0 (10-Core Intel\(R\) Xeon\(R\) CPU E3-1585L v5 )]
|
||||
[ [(Helvetica) 140.0 0.0 true true 0 (6-Core Intel\(R\) Core\(R\) CPU i7-8750H )]
|
||||
XYsave
|
||||
[(Helvetica) 140.0 0.0 true true 0 ( )]
|
||||
XYrestore
|
||||
[(Helvetica) 140.0 0.0 true true 0 (3.00GHz Read+Write \(ns\) Versus Stride)]
|
||||
[(Helvetica) 140.0 0.0 true true 0 (4.10GHz Read+Write \(ns\) Versus Stride)]
|
||||
] -46.7 MCshow
|
||||
/Helvetica findfont 140 scalefont setfont
|
||||
LTb
|
||||
|
|
Reference in a new issue