diff --git a/tex/draft/graphics/out-95p-new.pdf b/tex/draft/graphics/out-95p-new.pdf new file mode 100644 index 0000000..3342380 Binary files /dev/null and b/tex/draft/graphics/out-95p-new.pdf differ diff --git a/tex/draft/graphics/out-log-new.pdf b/tex/draft/graphics/out-log-new.pdf new file mode 100644 index 0000000..33f93e9 Binary files /dev/null and b/tex/draft/graphics/out-log-new.pdf differ diff --git a/tex/draft/graphics/tlb-95p.pdf b/tex/draft/graphics/tlb-95p.pdf new file mode 100644 index 0000000..8fdc6e5 Binary files /dev/null and b/tex/draft/graphics/tlb-95p.pdf differ diff --git a/tex/draft/graphics/tlb-log.pdf b/tex/draft/graphics/tlb-log.pdf new file mode 100644 index 0000000..91055e5 Binary files /dev/null and b/tex/draft/graphics/tlb-log.pdf differ diff --git a/tex/draft/skeleton.pdf b/tex/draft/skeleton.pdf index 22b5350..e2b8bad 100644 Binary files a/tex/draft/skeleton.pdf and b/tex/draft/skeleton.pdf differ diff --git a/tex/draft/skeleton.tex b/tex/draft/skeleton.tex index c5f40c8..5d1c29b 100644 --- a/tex/draft/skeleton.tex +++ b/tex/draft/skeleton.tex @@ -39,6 +39,9 @@ % -> href (LOAD LAST) \usepackage{hyperref} % <- href +% -> subfigures +\usepackage{subcaption} +% <- subfigures \begin{document} \begin{preliminary} @@ -71,7 +74,7 @@ \date{\today} \abstract{ - \textcolor{red}{To be done\dots} + \textcolor{red}{[TODO] \dots} } \maketitle @@ -105,6 +108,8 @@ from the Informatics Research Ethics committee. \begin{acknowledgements} +\textcolor{red}{[TODO]:} + \textcolor{red}{For unbounded peace and happiness among all peoples of the world.} \textcolor{red}{May we, one day, be able to see each other as equals.} @@ -390,7 +395,7 @@ void dma_sync_single_for_cpu( if (!dev_is_dma_coherent(dev)) { arch_sync_dma_for_cpu(paddr, size, dir); - arch_sync_dma_for_cpu_all(); // MIPS quirks... + arch_sync_dma_for_cpu_all(); // MIPS quirks, nop for ARM64 } /* Miscellaneous cases...*/ @@ -593,7 +598,7 @@ The primary source of experimental data come from a virtualized machine: a virtu \label{table:star} \end{table} -\footnotetext[3]{As reported from \texttt{lscpu}.} +\footnotetext[3]{As reported from \texttt{lscpu}. Likely not reflective of actual emulation performance.} \begin{table}[h] \centering @@ -655,7 +660,7 @@ The primary source of experimental data come from a virtualized machine: a virtu \label{table:rose} \end{table} -Additional to virtualized testbench, I have had the honor to access \texttt{rose}, a ARMv8 server rack system hosted by the \textcolor{red}{Systems Group} at the \textit{Informatics Forum}, through the invaluable assistance of my primary advisor, \textit{Amir Noohi}, for instrumentation of similar experimental setups on server-grade bare-metal systems. +Additional to virtualized testbench, I have had the honor to access \texttt{rose}, a ARMv8 server rack system hosted by the \textcolor{red}{[TODO] PLACEHOLDER} at the \textit{Informatics Forum}, through the invaluable assistance of my primary advisor, \textit{Amir Noohi}, for instrumentation of similar experimental setups on server-grade bare-metal systems. The specifications of \texttt{rose} is listed in table \ref{table:rose}. @@ -929,7 +934,7 @@ Several implementation quirks that warrant attention are as follows: \item {\label{quirk:__my_shmem_fault_remap} \texttt{\_\_my\_shmem\_fault\_remap} serves as inner logic for when outer page fault handling (allocation) logic deems that a sufficient number of pages exist for handling the current page fault. As its name suggests, it finds and remaps the correct allocation into the page fault's parent VMA (assuming that such allocation, of course, exists). - The logic of this function is similar to \hyperref[para:file_operations]{\texttt{my\_shmem\_fops\_mmap}}. For a code excerpt listing, refer to \textcolor{red}{Appendix ???}. + The logic of this function is similar to \hyperref[para:file_operations]{\texttt{my\_shmem\_fops\_mmap}}. For a code excerpt listing, refer to \textcolor{red}{[TODO] Appendix ???}. } \end{enumerate} @@ -1006,11 +1011,42 @@ Because we do not inline \texttt{\_\_dcache\_clean\_poc}, we are able to include \texttt{bcc-tools}, on the other hand, provide an array of handy instrumentation tools that is compiled just-in-time into \textit{BPF} programs and ran inside a in-kernel virtual machine. Description of how BPF programs are parsed and run inside the Linux kernel is documented in the kernel documentations \cite{N/A.Kernelv6.7-libbpf.2023}. The ability of \texttt{bcc}/\texttt{libbpf} programs to interface with both userspace and kernelspace function tracing mechanisms make \texttt{bcc-tools} ideal as a easy tracing interface for both userspace and kernelspace tracing. \subsection{Userspace Programs} -Finally, two simple userspace programs are written to invoke the corresponding kernelspace callback operations -- namely, allocation and cleaning of kernel buffers for simulating DMA behaviors. To achieve this, it simply \texttt{mmap}s the amount of pages passed in as argument and either reads or writes the entirety of the buffer (which differentiates the two programs). A listing of their logic is at \textcolor{red}{Appendix ???}. +Finally, two simple userspace programs are written to invoke the corresponding kernelspace callback operations -- namely, allocation and cleaning of kernel buffers for simulating DMA behaviors. To achieve this, it simply \texttt{mmap}s the amount of pages passed in as argument and either reads or writes the entirety of the buffer (which differentiates the two programs). A listing of their logic is at \textcolor{red}{[TODO] Appendix ???}. \section{Results}\label{sec:sw-coherency-results} \subsection{Controlled Allocation Size; Variable Page Count} +\textcolor{red}{[TODO] See \ref{fig:coherency-op-per-page-alloc}, \ref{fig:coherency-op-tlb}.} + +\begin{figure}[h] + \centering + \begin{subfigure}{.8\textwidth} + \centering + \includegraphics[width=\textwidth]{graphics/out-95p-new.pdf} + \end{subfigure} + \begin{subfigure}{.8\textwidth} + \centering + \includegraphics[width=\textwidth]{graphics/out-log-new.pdf} + \end{subfigure} + \caption{Per-page allocation, coherency operations} + \label{fig:coherency-op-per-page-alloc} +\end{figure} + +\begin{figure}[h] + \centering + \begin{subfigure}{.8\textwidth} + \centering + \includegraphics[width=\textwidth]{graphics/tlb-95p.pdf} + \end{subfigure} + \begin{subfigure}{.8\textwidth} + \centering + \includegraphics[width=\textwidth]{graphics/tlb-log.pdf} + \end{subfigure} + \caption{Per-page allocation, TLB operations} + \label{fig:coherency-op-tlb} +\end{figure} + \subsection{Controlled Page Count; Variable Allocation Size} +\textcolor{red}{[TODO] Didn't make the graphs yet\dots Run some bcc-tools capture and draw a gnuplot.} \section{Discussion}\label{sec:sw-coherency-discuss} % - you should also measure the access latency after coherency operation, though this is impl-specific (e.g., one vendor can have a simple PoC mechanism where e.g. you have a shared L2-cache that is snooped by DMA engine, hence flush to L2-cache and call it a day for PoC; but another can just as well call main mem the PoC, dep. on impl.)