Compiles on laptop, weird

2024-02-28 19:07:58 +00:00 · 2024-02-28 19:07:58 +00:00 · b3c3ec961b
commit b3c3ec961b
parent a6d78ffc04
2 changed files with 45 additions and 44 deletions
--- a/tex/misc/background_draft.pdf
+++ b/tex/misc/background_draft.pdf
--- a/tex/misc/background_draft.tex
+++ b/tex/misc/background_draft.tex
@ -303,8 +303,8 @@ a $k$-partitioned object, $k$ global pointers are recorded in the runtime each
 pointing to the same object, with different offsets and (intuitively)
 independently-chosen virtual addresses. Note that this design naturally requires
 virtual addresses within each node to be \emph{pinned} -- the allocated object
-cannot be re-addressed to a different virtual address, thus preventing the 
-global pointer that records the local virtual address from becoming 
+cannot be re-addressed to a different virtual address, thus preventing the
+global pointer that records the local virtual address from becoming
 spontaneously invalidated.

 Similar schemes can be observed in other PGAS backends/runtimes, albeit they may
@ -316,54 +316,55 @@ movement manually when working with shared memory over network to maximize
 their performance metrics of interest.

 \subsection{Message Passing}
-\textit{Message Passing} remains the predominant programming model for 
-parallelism between loosely-coupled nodes within a computer system, much as it 
-is ubiquitous in supporting all levels of abstraction within any concurrent 
-components of a computer system. Specific to cluster computing systems is the 
-message-passing programming model, where parallel programs (or instances of 
-the same parallel program) on different nodes within the system communicate 
-via exchanging messages over network between these nodes. Such models exchange 
-programming model productivity for more fine-grained control over the messages 
-passed, as well as more explicit separation between communication and computation 
-stages within a programming subproblem. 
+\textit{Message Passing} remains the predominant programming model for
+parallelism between loosely-coupled nodes within a computer system, much as it
+is ubiquitous in supporting all levels of abstraction within any concurrent
+components of a computer system. Specific to cluster computing systems is the
+message-passing programming model, where parallel programs (or instances of
+the same parallel program) on different nodes within the system communicate
+via exchanging messages over network between these nodes. Such models exchange
+programming model productivity for more fine-grained control over the messages
+passed, as well as more explicit separation between communication and computation
+stages within a programming subproblem.

-Commonly, message-passing backends function as \textit{middlewares} -- 
+Commonly, message-passing backends function as \textit{middlewares} --
 communication runtimes --  to aid distributed software development
-\cite{AST_Steen.Distributed_Systems-3ed.2017}. Such a message-passing backend 
-expose facilities for inter-application communication to frontend developers 
-while transparently providing security, accounting, and fault-tolerance, much 
-like how an operating system may provide resource management, scheduling, and 
-security to traditional applications \cite{AST_Steen.Distributed_Systems-3ed.2017}. 
-This is the case for implementing the PGAS programming model, which mostly rely 
-on common message-passing backends to facilitate orchestrated data manipulation 
-across distributed nodes. Likewise, message-passing backends, including RDMA API, 
-form the backbone of many research-oriented DSM systems 
-\cites{Endo_Sato_Taura.MENPS_DSM.2020}{Hong_etal.NUMA-to-RDMA-DSM.2019}{Cai_etal.Distributed_Memory_RDMA_Cached.2018}{Kaxiras_etal.DSM-Argos.2015}.
+\cite{AST_Steen.Distributed_Systems-3ed.2017}. Such a message-passing backend
+expose facilities for inter-application communication to frontend developers
+while transparently providing security, accounting, and fault-tolerance, much
+like how an operating system may provide resource management, scheduling, and
+security to traditional applications \cite{AST_Steen.Distributed_Systems-3ed.2017}.
+This is the case for implementing the PGAS programming model, which mostly rely
+on common message-passing backends to facilitate orchestrated data manipulation
+across distributed nodes. Likewise, message-passing backends, including RDMA API,
+form the backbone of many research-oriented DSM systems
+\cites{Endo_Sato_Taura.MENPS_DSM.2020}{Hong_etal.NUMA-to-RDMA-DSM.2019}
+{Cai_etal.Distributed_Memory_RDMA_Cached.2018}{Kaxiras_etal.DSM-Argos.2015}.

-Message-passing between network-connected nodes may be \textit{two-sided} or 
-\textit{one-sided}. The former models an intuitive workflow to sending and receiving 
-datagrams over the network -- the sender initiates a transfer; the receiver 
-copies a received packet from the network card into a kernel buffer; the 
+Message-passing between network-connected nodes may be \textit{two-sided} or
+\textit{one-sided}. The former models an intuitive workflow to sending and receiving
+datagrams over the network -- the sender initiates a transfer; the receiver
+copies a received packet from the network card into a kernel buffer; the
 receiver's kernel filters the packet and (optionally)\cite{FreeBSD.man-BPF-4.2021}
-copies the internal message 
-into the message-passing runtime/middleware's address space; the receiver's 
-middleware inspects the copied message and performs some procedures accordingly, 
-likely also involving copying slices of message data to some registered distributed 
-shared memory buffer for the distributed application to access. Despite it 
-being a highly intuitive model of data manipulation over the network, this 
-poses a fundamental performance issue: because the process requires the receiver's 
-kernel AND userspace to exert CPU-time, upon reception of each message, the 
-receiver node needs to proactively exert CPU-time to move the received data 
-from bytes read from NIC devices to userspace. Because this happens concurrently 
-with other kernel and userspace routines in a multi-processing system, a 
-preemptable kernel may incur significant latency if the kernel routine for 
+copies the internal message
+into the message-passing runtime/middleware's address space; the receiver's
+middleware inspects the copied message and performs some procedures accordingly,
+likely also involving copying slices of message data to some registered distributed
+shared memory buffer for the distributed application to access. Despite it
+being a highly intuitive model of data manipulation over the network, this
+poses a fundamental performance issue: because the process requires the receiver's
+kernel AND userspace to exert CPU-time, upon reception of each message, the
+receiver node needs to proactively exert CPU-time to move the received data
+from bytes read from NIC devices to userspace. Because this happens concurrently
+with other kernel and userspace routines in a multi-processing system, a
+preemptable kernel may incur significant latency if the kernel routine for
 packet filtering is pre-empted by another kernel routine, userspace, or IRQs.

-Comparatively, a ``one-sided'' message-passing scheme, notably \textit{RDMA}, 
-allows the network interface card to bypass in-kernel packet filters and 
-perform DMA on registered memory regions. The NIC can hence notify the CPU via 
-interrupts, thus allowing the kernel and the userspace programs to perform 
-callbacks at reception time with reduced latency. Because of this advantage, 
+Comparatively, a ``one-sided'' message-passing scheme, notably \textit{RDMA},
+allows the network interface card to bypass in-kernel packet filters and
+perform DMA on registered memory regions. The NIC can hence notify the CPU via
+interrupts, thus allowing the kernel and the userspace programs to perform
+callbacks at reception time with reduced latency. Because of this advantage,
 many recent studies attempt to leverage RDMA APIs \dots