Added stuff, somehow biblatex fails to compile?

This commit is contained in:
Zhengyi Chen 2024-02-27 17:06:31 +00:00
parent 8e430d13f2
commit a6d78ffc04
5 changed files with 93 additions and 24 deletions

View file

@ -5,7 +5,7 @@ CC += ${MY_CFLAGS}
KDIR := /lib/modules/$(shell uname -r)/build
KDIR_CROSS := ${HOME}/Git/linux
KDIR_UOE := /disk/scratch/s2018374/linux
KDIR_UOE := /tmp/s2018374/linux
KDIR_SSHFS := /tmp/inf-sshfs/linux
PWD := $(shell pwd)

View file

@ -54,9 +54,13 @@ const char* DEV_NAME = "my_shmem";
*/
static void my_shmem_vmops_close(struct vm_area_struct *vma)
{
pr_info("[%s] Entered.\n", __func__);
size_t nr_pages_in_cache = list_count_nodes(&my_shmem_pages);
size_t nr_pages_of_vma = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
pr_info(
"[%s] Entered. vma size: %ld; cached pages: %ld.\n",
__func__, nr_pages_of_vma, nr_pages_in_cache
);
size_t nr_pages_offset = vma->vm_pgoff;
struct my_shmem_page *entry;
// u64 clean_time_bgn, clean_time_end;

View file

@ -344,8 +344,7 @@
url={https://lkml.org/lkml/2008/4/29/480},
journal={lkml.org},
author={Ven, Arjan van de},
year={2008},
month={Apr}
year={2008}
}
@inproceedings{Li_etal.RelDB_RDMA.2016,
@ -356,3 +355,38 @@
year={2016}
}
@article{Hong_etal.NUMA-to-RDMA-DSM.2019,
title={Scaling out NUMA-aware applications with RDMA-based distributed shared memory},
author={Hong, Yang and Zheng, Yang and Yang, Fan and Zang, Bin-Yu and Guan, Hai-Bing and Chen, Hai-Bo},
journal={Journal of Computer Science and Technology},
volume={34},
pages={94--112},
year={2019},
publisher={Springer}
}
@inproceedings{Kaxiras_etal.DSM-Argos.2015,
author = {Kaxiras, Stefanos and Klaftenegger, David and Norgren, Magnus and Ros, Alberto and Sagonas, Konstantinos},
title = {Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory},
year = {2015},
isbn = {9781450335508},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2749246.2749250},
doi = {10.1145/2749246.2749250},
abstract = {A coherent global address space in a distributed system enables shared memory programming in a much larger scale than a single multicore or a single SMP. Without dedicated hardware support at this scale, the solution is a software distributed shared memory (DSM) system. However, traditional approaches to coherence (centralized via "active" home-node directories) and critical-section execution (distributed across nodes and cores) are inherently unfit for such a scenario. Instead, it is crucial to make decisions locally and avoid the long latencies imposed by both network and software message handlers. Likewise, synchronization is fast if it rarely involves communication with distant nodes (or even other sockets). To minimize the amount of long-latency communication required in both coherence and critical section execution, we propose a DSM system with a novel coherence protocol, and a novel hierarchical queue delegation locking approach. More specifically, we propose an approach, suitable for Data-Race-Free programs, based on self-invalidation, self-downgrade, and passive data classification directories that require no message handlers, thereby incurring no extra latency. For fast synchronization we extend Queue Delegation Locking to execute critical sections in large batches on a single core before passing execution along to other cores, sockets, or nodes, in that hierarchical order. The result is a software DSM system called Argo which localizes as many decisions as possible and allows high parallel performance with little overhead on synchronization when compared to prior DSM implementations.},
booktitle = {Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing},
pages = {3-14},
numpages = {12},
location = {Portland, Oregon, USA},
series = {HPDC '15}
}
@misc{FreeBSD.man-BPF-4.2021,
title={FreeBSD manual pages},
url={https://man.freebsd.org/cgi/man.cgi?query=bpf&manpath=FreeBSD+14.0-RELEASE+and+Ports},
journal={BPF(4) Kernel Interfaces Manual},
publisher={The FreeBSD Project},
author={The FreeBSD Project},
year={2021}
}

Binary file not shown.

View file

@ -300,11 +300,12 @@ context of some user-defined group of associated nodes. Comparatively, a
\textit{collective} PGAS object is allocated such that a partition of the object
(i.e., a sub-array of the repr) is stored in each of the associated node -- for
a $k$-partitioned object, $k$ global pointers are recorded in the runtime each
pointing to the same object, with different offsets and (naturally)
pointing to the same object, with different offsets and (intuitively)
independently-chosen virtual addresses. Note that this design naturally requires
virtual addresses within each node to be \emph{pinned} -- the allocated object
cannot be re-addressed to a different virtual address i.e., the global pointer
that records the local virtual address cannot be auto-invalidated.
cannot be re-addressed to a different virtual address, thus preventing the
global pointer that records the local virtual address from becoming
spontaneously invalidated.
Similar schemes can be observed in other PGAS backends/runtimes, albeit they may
opt to use a map-like data structure for addressing instead. In general, despite
@ -315,27 +316,57 @@ movement manually when working with shared memory over network to maximize
their performance metrics of interest.
\subsection{Message Passing}
\textit{Message Passing} remains the predominant programming model for
parallelism between loosely-coupled nodes within a computer system, much as it
is ubiquitous in supporting all levels of abstraction within any concurrent
components of a computer system. Specific to cluster computing systems is the
message-passing programming model, where parallel programs (or instances of
the same parallel program) on different nodes within the system communicate
via exchanging messages over network between these nodes. Such models exchange
programming model productivity for more fine-grained control over the messages
passed, as well as more explicit separation between communication and computation
stages within a programming subproblem.
Commonly, message-passing backends function as \textit{middlewares} --
communication runtimes -- to aid distributed software development
\cite{AST_Steen.Distributed_Systems-3ed.2017}. Such a message-passing backend
expose facilities for inter-application communication to frontend developers
while transparently providing security, accounting, and fault-tolerance, much
like how an operating system may provide resource management, scheduling, and
security to traditional applications \cite{AST_Steen.Distributed_Systems-3ed.2017}.
This is the case for implementing the PGAS programming model, which mostly rely
on common message-passing backends to facilitate orchestrated data manipulation
across distributed nodes. Likewise, message-passing backends, including RDMA API,
form the backbone of many research-oriented DSM systems
\cites{Endo_Sato_Taura.MENPS_DSM.2020}{Hong_etal.NUMA-to-RDMA-DSM.2019}{Cai_etal.Distributed_Memory_RDMA_Cached.2018}{Kaxiras_etal.DSM-Argos.2015}.
% \dots
Message-passing between network-connected nodes may be \textit{two-sided} or
\textit{one-sided}. The former models an intuitive workflow to sending and receiving
datagrams over the network -- the sender initiates a transfer; the receiver
copies a received packet from the network card into a kernel buffer; the
receiver's kernel filters the packet and (optionally)\cite{FreeBSD.man-BPF-4.2021}
copies the internal message
into the message-passing runtime/middleware's address space; the receiver's
middleware inspects the copied message and performs some procedures accordingly,
likely also involving copying slices of message data to some registered distributed
shared memory buffer for the distributed application to access. Despite it
being a highly intuitive model of data manipulation over the network, this
poses a fundamental performance issue: because the process requires the receiver's
kernel AND userspace to exert CPU-time, upon reception of each message, the
receiver node needs to proactively exert CPU-time to move the received data
from bytes read from NIC devices to userspace. Because this happens concurrently
with other kernel and userspace routines in a multi-processing system, a
preemptable kernel may incur significant latency if the kernel routine for
packet filtering is pre-empted by another kernel routine, userspace, or IRQs.
% Improvement in NIC bandwidth and transfer rate benefits DSM applications that expose
% global address space, and those that leverage single-writer capabilities over hierarchical memory nodes. \textbf{[GAS and PGAS (Partitioned GAS)
% technologies for example Openshmem, OpenMPI, Cray Chapel, etc. that leverage
% specially-linked memory sections and \texttt{/dev/shm} to abstract away RDMA access]}.
Comparatively, a ``one-sided'' message-passing scheme, notably \textit{RDMA},
allows the network interface card to bypass in-kernel packet filters and
perform DMA on registered memory regions. The NIC can hence notify the CPU via
interrupts, thus allowing the kernel and the userspace programs to perform
callbacks at reception time with reduced latency. Because of this advantage,
many recent studies attempt to leverage RDMA APIs \dots
% Contemporary works on DSM systems focus more on leveraging hardware advancements
% to provide fast and/or seamless software support. Adrias \cite{Masouros_etal.Adrias.2023},
% for example, implements a complex system for memory disaggregation over multiple
% compute nodes connected via the \textit{ThymesisFlow}-based RDMA fabric, where
% they observed significant performance improvements over existing data-intensive
% processing frameworks, for example APACHE Spark, Memcached, and Redis, over
% no-disaggregation (i.e., using node-local memory only, similar to cluster computing)
% systems.
% \subsection{Programming Model}
\subsection{Data to Process, or Process to Data?}
(TBD -- The former is costly for data-intensive computation, but the latter may
be impossible for certain tasks, and greatly hardens the replacement problem.)