Maintenance & added text

This commit is contained in:
Zhengyi Chen 2024-03-13 21:25:53 +00:00
parent 6a444ccf89
commit 017006fe4e
3 changed files with 49 additions and 14 deletions

View file

@ -237,6 +237,7 @@ movement manually when working with shared memory over network to maximize
their performance metrics of interest.
\subsection{Message Passing}
\label{sec:msg-passing}
\textit{Message Passing} remains the predominant programming model for
parallelism between loosely-coupled nodes within a computer system, much as it
is ubiquitous in supporting all levels of abstraction within any concurrent
@ -550,37 +551,37 @@ data representation over disaggregated memory over network when compared to
contemporary DSM approaches.
\subsection{Coherence Protocol}
Coherence protocols hence becomes the means over which DSM systems implement their consistency model guarantees. As table \ref{table:1} shows, DSM studies tends to implement write-invalidated coherence either via a \textit{home-based} or \textit{directory-based} protocol, while a subset of DSM studies sought to reduce communication overheads and/or improve data persistence by offering write-update protocol extensions \cites{Carter_Bennett_Zwaenepoel.Munin.1991}{Shan_Tsai_Zhang.DSPM.2017}.
Coherence protocols hence becomes the means over which DSM systems implement their consistency model guarantees. As table \ref{table:1} shows, DSM studies tends to implement write-invalidated coherence under a \textit{home-based} or \textit{directory-based} protocol framework, while a subset of DSM studies sought to reduce communication overheads and/or improve data persistence by offering write-update protocol extensions \cites{Carter_Bennett_Zwaenepoel.Munin.1991}{Shan_Tsai_Zhang.DSPM.2017}.
% The concepts of \textit{home-based} vs. \textit{directory-based} protocols are not parallels, however, but instead differentiates the perspective
\subsubsection{Home-Based Protocols}
\textit{Home-based} protocols define each shared memory object with a corresponding ``home'' node, under the assumption that a many-node network would distribute home-node ownership of shared memory objects across all hosts \cite{Hu_Shi_Tang.JIAJIA.1999}. On top of home-node ownership, each mutable shared memory object may be additionally cached by other nodes within the network, creating the coherence problem. To our knowledge, in addition to table \ref{table:1}, this protocol and its derivatives had been adopted by \cites{Fleisch_Popek.Mirage.1989}{Schaefer_Li.Shiva.1989}{Hu_Shi_Tang.JIAJIA.1999}{Nelson_etal.Grappa_DSM.2015}{Shan_Tsai_Zhang.DSPM.2017}{Endo_Sato_Taura.MENPS_DSM.2020}.
We identify that home-based protocols are conceptually straightforward when compared to directory-based protocols, centering communications over storage of distributed metadata (in this case, regarding the manager node for each shared memory object). This leads to
We identify that home-based protocols are conceptually straightforward compared to directory-based protocols, centering communications over storage of global metadata (in this case ownership of each shared memory object). This leads to greater flexibility in implementing coherence protocols. A shared memory object at its creation may be made known globally via broadcast, or made known to only a subset of nodes (0 or more) via multicast. Likewise, metadata storage could be cached locally to each node and invalidated alongside object invalidation or fetched from a fixed node with respect to one object. This implementation flexibility is further taken advantage of in \textit{Hotpot}\cite{Shan_Tsai_Zhang.DSPM.2017}, which refines the ``home node'' concept into \textit{owner node} to provide replication and persistence, in addition to adopting a dynamic home protocol similar to that of \cite{Endo_Sato_Taura.MENPS_DSM.2020}.
\subsubsection{Directory-Based Protocols}
To our knowledge, in addition to table \ref{table:1}, this protocol and its derivatives had been adopted by \cites{Carter_Bennett_Zwaenepoel.Munin.1991}{Amza_etal.Treadmarks.1996}{Schoinas_etal.Sirocco.1998}{Eisley_Peh_Shang.In-net-coherence.2006}{Hong_etal.NUMA-to-RDMA-DSM.2019}.
\textit{Directory-based} protocols instead take a shared database approach by denoting each shared memory object with a globally shared entry describing ownership and sharing status. In its non-distributed form (e.g., \cite{Wang_etal.Concordia.2021}), a global, central directory is maintained for all nodes in network for ownership information: the directory hence becomes a bottleneck for imposing latency and bandwidth constraints on parallel processing systems. Comparatively, a distributed directory scheme may delegate responsibilities across all nodes in network mostly in accordance to sharded address space \cites{Hong_etal.NUMA-to-RDMA-DSM.2019}{Cai_etal.Distributed_Memory_RDMA_Cached.2018}. Though theoretically sound, this scheme performs no dynamic load-balancing for commonly shared memory objects, which in the worst case would function exactly like a non-distributed directory coherence scheme. To our knowledge, in addition to table \ref{table:1}, this protocol and its derivatives had been adopted by \cites{Carter_Bennett_Zwaenepoel.Munin.1991}{Amza_etal.Treadmarks.1996}{Schoinas_etal.Sirocco.1998}{Eisley_Peh_Shang.In-net-coherence.2006}{Hong_etal.NUMA-to-RDMA-DSM.2019}.
\subsection{DMA and Cache Coherence}
% Because this thesis specifically studies cache coherence in ARMv8, we
The advent of high-speed RDMA-capable network interfaces introduce introduce opportunities for designing more performant DSM systems over RDMA (as established in \ref{sec:msg-passing}). Orthogonally, RDMA-capable NICs on a fundamental level perform direct memory access over the main memory to achieve one-sided RDMA operations to reduce the effect of OS jittering on RDMA latencies. For modern computer systems with cached multiprocessors, this poses a potential cache coherence problem on a local level -- because RDMA operations happen concurrently with regards to memory accesses by CPUs, which stores copies of memory data in cache lines which may \cites{Kjos_etal.HP-HW-CC-IO.1996}{Ven.LKML_x86_DMA.2008} or may not \cites{Giri_Mantovani_Carloni.NoC-CC-over-SoC.2018}{Corbet.LWN-NC-DMA.2021} be fully coherent by the DMA mechanism, any DMA operations performed by the RDMA NIC may be incoherent with the cached copy of the same data inside the CPU caches (as is the case for accelerators, etc.). This issue is of particular concern to the kernel development community, who needs to ensure that the behaviors of DMA operations remain identical across architectures regardless of support of cache-coherent DMA \cite{Corbet.LWN-NC-DMA.2021}. Likewise existing RDMA implementations which make heavy use of architecture-specific DMA memory allocation implementations, implementing RDMA-based DSM systems in kernel also requires careful use of kernel API functions that ensure cache coherency as necessary.
\subsection{Cache Coherence in ARMv8}
We specifically focus on the implementation of cache coherence in ARMv8. Unlike x86 which guarantees cache-coherent DMA \cites{Ven.LKML_x86_DMA.2008}{Corbet.LWN-NC-DMA.2021}, the ARMv8 architecture (and many other popular ISAs e.g. RISC-V) \emph{does not} guarantee cache-coherency of DMA operations across vendor implementations.
% Experiment: ...
% Discussion: (1) Linux and DMA and RDMA (2) replacement and other ideas...
% (I need to read more into this. Most of the contribution comes from CPU caches,
% less so for DSM systems.) \textbf{[Talk about JIAJIA and Treadmark's coherence
% protocol.]}
% Consistency and communication protocols naturally affect the cost for each faulted
% memory access \dots
(I need to read more into this. Most of the contribution comes from CPU caches,
less so for DSM systems.) \textbf{[Talk about JIAJIA and Treadmark's coherence
protocol.]}
Consistency and communication protocols naturally affect the cost for each faulted
memory access \dots
\textbf{[Talk about directory, transactional, scope, and library cache coherence,
which allow for multi-casted communications at page fault but all with different
levels of book-keeping.]}
% \textbf{[Talk about directory, transactional, scope, and library cache coherence,
% which allow for multi-casted communications at page fault but all with different
% levels of book-keeping.]}
\printbibliography
\end{document}