Can begin Results

This commit is contained in:
Zhengyi Chen 2024-03-19 23:34:19 +00:00
parent f9adbf1f1d
commit fc777526ce
4 changed files with 44 additions and 8 deletions

View file

@ -661,7 +661,7 @@ The specifications of \texttt{rose} is listed in table \ref{table:rose}.
\section{Methodology}\label{sec:sw-coherency-method}
\subsection{Exporting \texttt{dcache\_clean\_poc}}
As established in subsection \ref{subsec:armv8a-swcoherency}, software cache-coherence maintenance operations (e.g., \texttt{dcache\_[clean|inval]\_poc}) are wrapped behind DMA API function calls and are hence unavailable for direct use in drivers. Moreover, instrumentation of assembly code becomes non-trivial when compared to instrumenting C function symbols, likely due to automatically stripped assembly symbols during kernel linkage. Consequently, it becomes impossible to utilize the existing instrumentation tools available in the Linux kernel (e.g., \texttt{ftrace}) to trace assembly routines.
As established in subsection \ref{subsec:armv8a-swcoherency}, software cache-coherence maintenance operations (e.g., \texttt{dcache\_[clean|inval]\_poc}) are wrapped behind DMA API function calls and are hence unavailable for direct use in drivers. Moreover, instrumentation of assembly code becomes non-trivial when compared to instrumenting C function symbols, likely due to automatically stripped assembly symbols in C object files. Consequently, it becomes impossible to utilize the existing instrumentation tools available in the Linux kernel (e.g., \texttt{ftrace}) to trace assembly routines.
In order to convert \texttt{dcache\_clean\_poc} to a traceable equivalent, a wrapper function \texttt{\_\_dcache\_clean\_poc} is created as follows:
\begin{minted}[mathescape, linenos, bgcolor=code-bg]{c}
@ -988,10 +988,25 @@ $ echo 2 > \
/sys/module/my_shmem/parameters/max_contiguous_alloc_order
\end{minted}
Consequently, all allocations occuring after this change will be allocated with a 4-page contiguous granularity.
Consequently, all allocations occuring after this change will be allocated with a 4-page contiguous granularity. Upon further testing, the maximum value allowed here is 10 (i.e., $2^{10} = 1024$ 4K pages).
\subsection{Instrumentation: \texttt{ftrace} and \texttt{bcc-tools}}
We use two instrumentation frameworks to evaluate the latency of software-initiated coherency operations. \texttt{ftrace} is the primary kernel tracing mechanism across multiple (supporting) architectures, which supports both \textit{static} tracing of tracepoints and \textit{dynamic} tracing of function symbols:
\begin{itemize}
\item {
\textbf{Static} tracepoints describe tracepoints compiled into the Linux kernel. They are defined by kernel programmers and is otherwise known as \textit{event tracing}.
}
\item {
\textbf{Dynamic} \texttt{ftrace} support is enabled by self-modifying the kernel code to replace injected placeholder nop-routines with \texttt{ftrace} infrastructure calls. This allows for function tracing of all function symbols present in C object files created for linkage. \cite{Rostedt.Kernelv6.7-ftrace.2023}
}
\end{itemize}
Because we do not inline \texttt{\_\_dcache\_clean\_poc}, we are able to include its symbol inside compiled C object files and hence expose its internals for dynamic tracing.
\texttt{bcc-tools}, on the other hand, provide an array of handy instrumentation tools that is compiled just-in-time into \textit{BPF} programs and ran inside a in-kernel virtual machine. Description of how BPF programs are parsed and run inside the Linux kernel is documented in the kernel documentations \cite{N/A.Kernelv6.7-libbpf.2023}. The ability of \texttt{bcc}/\texttt{libbpf} programs to interface with both userspace and kernelspace function tracing mechanisms make \texttt{bcc-tools} ideal as a easy tracing interface for both userspace and kernelspace tracing.
\subsection{Instrumentation: \texttt{ftrace} and \textit{eBPF}}
\subsection{Userspace Programs}
Finally, two simple userspace programs are written to invoke the corresponding kernelspace callback operations -- namely, allocation and cleaning of kernel buffers for simulating DMA behaviors. To achieve this, it simply \texttt{mmap}s the amount of pages passed in as argument and either reads or writes the entirety of the buffer (which differentiates the two programs). A listing of their logic is at \textcolor{red}{Appendix ???}.
\section{Results}\label{sec:sw-coherency-results}
\subsection{Controlled Allocation Size; Variable Page Count}