Compiles on laptop, weird

This commit is contained in:
Zhengyi Chen 2024-02-28 19:07:58 +00:00
parent a6d78ffc04
commit b3c3ec961b
2 changed files with 45 additions and 44 deletions

Binary file not shown.

View file

@ -303,8 +303,8 @@ a $k$-partitioned object, $k$ global pointers are recorded in the runtime each
pointing to the same object, with different offsets and (intuitively)
independently-chosen virtual addresses. Note that this design naturally requires
virtual addresses within each node to be \emph{pinned} -- the allocated object
cannot be re-addressed to a different virtual address, thus preventing the
global pointer that records the local virtual address from becoming
cannot be re-addressed to a different virtual address, thus preventing the
global pointer that records the local virtual address from becoming
spontaneously invalidated.
Similar schemes can be observed in other PGAS backends/runtimes, albeit they may
@ -316,54 +316,55 @@ movement manually when working with shared memory over network to maximize
their performance metrics of interest.
\subsection{Message Passing}
\textit{Message Passing} remains the predominant programming model for
parallelism between loosely-coupled nodes within a computer system, much as it
is ubiquitous in supporting all levels of abstraction within any concurrent
components of a computer system. Specific to cluster computing systems is the
message-passing programming model, where parallel programs (or instances of
the same parallel program) on different nodes within the system communicate
via exchanging messages over network between these nodes. Such models exchange
programming model productivity for more fine-grained control over the messages
passed, as well as more explicit separation between communication and computation
stages within a programming subproblem.
\textit{Message Passing} remains the predominant programming model for
parallelism between loosely-coupled nodes within a computer system, much as it
is ubiquitous in supporting all levels of abstraction within any concurrent
components of a computer system. Specific to cluster computing systems is the
message-passing programming model, where parallel programs (or instances of
the same parallel program) on different nodes within the system communicate
via exchanging messages over network between these nodes. Such models exchange
programming model productivity for more fine-grained control over the messages
passed, as well as more explicit separation between communication and computation
stages within a programming subproblem.
Commonly, message-passing backends function as \textit{middlewares} --
Commonly, message-passing backends function as \textit{middlewares} --
communication runtimes -- to aid distributed software development
\cite{AST_Steen.Distributed_Systems-3ed.2017}. Such a message-passing backend
expose facilities for inter-application communication to frontend developers
while transparently providing security, accounting, and fault-tolerance, much
like how an operating system may provide resource management, scheduling, and
security to traditional applications \cite{AST_Steen.Distributed_Systems-3ed.2017}.
This is the case for implementing the PGAS programming model, which mostly rely
on common message-passing backends to facilitate orchestrated data manipulation
across distributed nodes. Likewise, message-passing backends, including RDMA API,
form the backbone of many research-oriented DSM systems
\cites{Endo_Sato_Taura.MENPS_DSM.2020}{Hong_etal.NUMA-to-RDMA-DSM.2019}{Cai_etal.Distributed_Memory_RDMA_Cached.2018}{Kaxiras_etal.DSM-Argos.2015}.
\cite{AST_Steen.Distributed_Systems-3ed.2017}. Such a message-passing backend
expose facilities for inter-application communication to frontend developers
while transparently providing security, accounting, and fault-tolerance, much
like how an operating system may provide resource management, scheduling, and
security to traditional applications \cite{AST_Steen.Distributed_Systems-3ed.2017}.
This is the case for implementing the PGAS programming model, which mostly rely
on common message-passing backends to facilitate orchestrated data manipulation
across distributed nodes. Likewise, message-passing backends, including RDMA API,
form the backbone of many research-oriented DSM systems
\cites{Endo_Sato_Taura.MENPS_DSM.2020}{Hong_etal.NUMA-to-RDMA-DSM.2019}
{Cai_etal.Distributed_Memory_RDMA_Cached.2018}{Kaxiras_etal.DSM-Argos.2015}.
Message-passing between network-connected nodes may be \textit{two-sided} or
\textit{one-sided}. The former models an intuitive workflow to sending and receiving
datagrams over the network -- the sender initiates a transfer; the receiver
copies a received packet from the network card into a kernel buffer; the
Message-passing between network-connected nodes may be \textit{two-sided} or
\textit{one-sided}. The former models an intuitive workflow to sending and receiving
datagrams over the network -- the sender initiates a transfer; the receiver
copies a received packet from the network card into a kernel buffer; the
receiver's kernel filters the packet and (optionally)\cite{FreeBSD.man-BPF-4.2021}
copies the internal message
into the message-passing runtime/middleware's address space; the receiver's
middleware inspects the copied message and performs some procedures accordingly,
likely also involving copying slices of message data to some registered distributed
shared memory buffer for the distributed application to access. Despite it
being a highly intuitive model of data manipulation over the network, this
poses a fundamental performance issue: because the process requires the receiver's
kernel AND userspace to exert CPU-time, upon reception of each message, the
receiver node needs to proactively exert CPU-time to move the received data
from bytes read from NIC devices to userspace. Because this happens concurrently
with other kernel and userspace routines in a multi-processing system, a
preemptable kernel may incur significant latency if the kernel routine for
copies the internal message
into the message-passing runtime/middleware's address space; the receiver's
middleware inspects the copied message and performs some procedures accordingly,
likely also involving copying slices of message data to some registered distributed
shared memory buffer for the distributed application to access. Despite it
being a highly intuitive model of data manipulation over the network, this
poses a fundamental performance issue: because the process requires the receiver's
kernel AND userspace to exert CPU-time, upon reception of each message, the
receiver node needs to proactively exert CPU-time to move the received data
from bytes read from NIC devices to userspace. Because this happens concurrently
with other kernel and userspace routines in a multi-processing system, a
preemptable kernel may incur significant latency if the kernel routine for
packet filtering is pre-empted by another kernel routine, userspace, or IRQs.
Comparatively, a ``one-sided'' message-passing scheme, notably \textit{RDMA},
allows the network interface card to bypass in-kernel packet filters and
perform DMA on registered memory regions. The NIC can hence notify the CPU via
interrupts, thus allowing the kernel and the userspace programs to perform
callbacks at reception time with reduced latency. Because of this advantage,
Comparatively, a ``one-sided'' message-passing scheme, notably \textit{RDMA},
allows the network interface card to bypass in-kernel packet filters and
perform DMA on registered memory regions. The NIC can hence notify the CPU via
interrupts, thus allowing the kernel and the userspace programs to perform
callbacks at reception time with reduced latency. Because of this advantage,
many recent studies attempt to leverage RDMA APIs \dots