Added fixed alloc size graphs

This commit is contained in:
Zhengyi Chen 2024-03-20 13:30:24 +00:00
parent fc777526ce
commit 7b475ae100
6 changed files with 42 additions and 6 deletions

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -39,6 +39,9 @@
% -> href (LOAD LAST) % -> href (LOAD LAST)
\usepackage{hyperref} \usepackage{hyperref}
% <- href % <- href
% -> subfigures
\usepackage{subcaption}
% <- subfigures
\begin{document} \begin{document}
\begin{preliminary} \begin{preliminary}
@ -71,7 +74,7 @@
\date{\today} \date{\today}
\abstract{ \abstract{
\textcolor{red}{To be done\dots} \textcolor{red}{[TODO] \dots}
} }
\maketitle \maketitle
@ -105,6 +108,8 @@ from the Informatics Research Ethics committee.
\begin{acknowledgements} \begin{acknowledgements}
\textcolor{red}{[TODO]:}
\textcolor{red}{For unbounded peace and happiness among all peoples of the world.} \textcolor{red}{For unbounded peace and happiness among all peoples of the world.}
\textcolor{red}{May we, one day, be able to see each other as equals.} \textcolor{red}{May we, one day, be able to see each other as equals.}
@ -390,7 +395,7 @@ void dma_sync_single_for_cpu(
if (!dev_is_dma_coherent(dev)) { if (!dev_is_dma_coherent(dev)) {
arch_sync_dma_for_cpu(paddr, size, dir); arch_sync_dma_for_cpu(paddr, size, dir);
arch_sync_dma_for_cpu_all(); // MIPS quirks... arch_sync_dma_for_cpu_all(); // MIPS quirks, nop for ARM64
} }
/* Miscellaneous cases...*/ /* Miscellaneous cases...*/
@ -593,7 +598,7 @@ The primary source of experimental data come from a virtualized machine: a virtu
\label{table:star} \label{table:star}
\end{table} \end{table}
\footnotetext[3]{As reported from \texttt{lscpu}.} \footnotetext[3]{As reported from \texttt{lscpu}. Likely not reflective of actual emulation performance.}
\begin{table}[h] \begin{table}[h]
\centering \centering
@ -655,7 +660,7 @@ The primary source of experimental data come from a virtualized machine: a virtu
\label{table:rose} \label{table:rose}
\end{table} \end{table}
Additional to virtualized testbench, I have had the honor to access \texttt{rose}, a ARMv8 server rack system hosted by the \textcolor{red}{Systems Group} at the \textit{Informatics Forum}, through the invaluable assistance of my primary advisor, \textit{Amir Noohi}, for instrumentation of similar experimental setups on server-grade bare-metal systems. Additional to virtualized testbench, I have had the honor to access \texttt{rose}, a ARMv8 server rack system hosted by the \textcolor{red}{[TODO] PLACEHOLDER} at the \textit{Informatics Forum}, through the invaluable assistance of my primary advisor, \textit{Amir Noohi}, for instrumentation of similar experimental setups on server-grade bare-metal systems.
The specifications of \texttt{rose} is listed in table \ref{table:rose}. The specifications of \texttt{rose} is listed in table \ref{table:rose}.
@ -929,7 +934,7 @@ Several implementation quirks that warrant attention are as follows:
\item {\label{quirk:__my_shmem_fault_remap} \item {\label{quirk:__my_shmem_fault_remap}
\texttt{\_\_my\_shmem\_fault\_remap} serves as inner logic for when outer page fault handling (allocation) logic deems that a sufficient number of pages exist for handling the current page fault. As its name suggests, it finds and remaps the correct allocation into the page fault's parent VMA (assuming that such allocation, of course, exists). \texttt{\_\_my\_shmem\_fault\_remap} serves as inner logic for when outer page fault handling (allocation) logic deems that a sufficient number of pages exist for handling the current page fault. As its name suggests, it finds and remaps the correct allocation into the page fault's parent VMA (assuming that such allocation, of course, exists).
The logic of this function is similar to \hyperref[para:file_operations]{\texttt{my\_shmem\_fops\_mmap}}. For a code excerpt listing, refer to \textcolor{red}{Appendix ???}. The logic of this function is similar to \hyperref[para:file_operations]{\texttt{my\_shmem\_fops\_mmap}}. For a code excerpt listing, refer to \textcolor{red}{[TODO] Appendix ???}.
} }
\end{enumerate} \end{enumerate}
@ -1006,11 +1011,42 @@ Because we do not inline \texttt{\_\_dcache\_clean\_poc}, we are able to include
\texttt{bcc-tools}, on the other hand, provide an array of handy instrumentation tools that is compiled just-in-time into \textit{BPF} programs and ran inside a in-kernel virtual machine. Description of how BPF programs are parsed and run inside the Linux kernel is documented in the kernel documentations \cite{N/A.Kernelv6.7-libbpf.2023}. The ability of \texttt{bcc}/\texttt{libbpf} programs to interface with both userspace and kernelspace function tracing mechanisms make \texttt{bcc-tools} ideal as a easy tracing interface for both userspace and kernelspace tracing. \texttt{bcc-tools}, on the other hand, provide an array of handy instrumentation tools that is compiled just-in-time into \textit{BPF} programs and ran inside a in-kernel virtual machine. Description of how BPF programs are parsed and run inside the Linux kernel is documented in the kernel documentations \cite{N/A.Kernelv6.7-libbpf.2023}. The ability of \texttt{bcc}/\texttt{libbpf} programs to interface with both userspace and kernelspace function tracing mechanisms make \texttt{bcc-tools} ideal as a easy tracing interface for both userspace and kernelspace tracing.
\subsection{Userspace Programs} \subsection{Userspace Programs}
Finally, two simple userspace programs are written to invoke the corresponding kernelspace callback operations -- namely, allocation and cleaning of kernel buffers for simulating DMA behaviors. To achieve this, it simply \texttt{mmap}s the amount of pages passed in as argument and either reads or writes the entirety of the buffer (which differentiates the two programs). A listing of their logic is at \textcolor{red}{Appendix ???}. Finally, two simple userspace programs are written to invoke the corresponding kernelspace callback operations -- namely, allocation and cleaning of kernel buffers for simulating DMA behaviors. To achieve this, it simply \texttt{mmap}s the amount of pages passed in as argument and either reads or writes the entirety of the buffer (which differentiates the two programs). A listing of their logic is at \textcolor{red}{[TODO] Appendix ???}.
\section{Results}\label{sec:sw-coherency-results} \section{Results}\label{sec:sw-coherency-results}
\subsection{Controlled Allocation Size; Variable Page Count} \subsection{Controlled Allocation Size; Variable Page Count}
\textcolor{red}{[TODO] See \ref{fig:coherency-op-per-page-alloc}, \ref{fig:coherency-op-tlb}.}
\begin{figure}[h]
\centering
\begin{subfigure}{.8\textwidth}
\centering
\includegraphics[width=\textwidth]{graphics/out-95p-new.pdf}
\end{subfigure}
\begin{subfigure}{.8\textwidth}
\centering
\includegraphics[width=\textwidth]{graphics/out-log-new.pdf}
\end{subfigure}
\caption{Per-page allocation, coherency operations}
\label{fig:coherency-op-per-page-alloc}
\end{figure}
\begin{figure}[h]
\centering
\begin{subfigure}{.8\textwidth}
\centering
\includegraphics[width=\textwidth]{graphics/tlb-95p.pdf}
\end{subfigure}
\begin{subfigure}{.8\textwidth}
\centering
\includegraphics[width=\textwidth]{graphics/tlb-log.pdf}
\end{subfigure}
\caption{Per-page allocation, TLB operations}
\label{fig:coherency-op-tlb}
\end{figure}
\subsection{Controlled Page Count; Variable Allocation Size} \subsection{Controlled Page Count; Variable Allocation Size}
\textcolor{red}{[TODO] Didn't make the graphs yet\dots Run some bcc-tools capture and draw a gnuplot.}
\section{Discussion}\label{sec:sw-coherency-discuss} \section{Discussion}\label{sec:sw-coherency-discuss}
% - you should also measure the access latency after coherency operation, though this is impl-specific (e.g., one vendor can have a simple PoC mechanism where e.g. you have a shared L2-cache that is snooped by DMA engine, hence flush to L2-cache and call it a day for PoC; but another can just as well call main mem the PoC, dep. on impl.) % - you should also measure the access latency after coherency operation, though this is impl-specific (e.g., one vendor can have a simple PoC mechanism where e.g. you have a shared L2-cache that is snooped by DMA engine, hence flush to L2-cache and call it a day for PoC; but another can just as well call main mem the PoC, dep. on impl.)