profile image

Bart Cox

I am a PhD Student in the Data Intensive Systems (DIS) group at the TU Delft. My research focusses on Distributed Machine Learning with a strong focus on Federated Learning, Distributed Systems, and Edge AI. I love building scalable, production-ready ML frameworks and optimizing models to run efficiently across diverse hardware environments.

email: bart [at] bcox.nl

Client transfers between servers

Accelerating Geo-distributed Learning with Client Transfers

Nomad is the first dynamic client transfer framework for multi-server FL, reallocating clients based on network conditions and data alignment to reduce latency and improve learning. Unlike static assignments, Nomad enables flexible migration during training. Experiments show accuracy improvements of up to 31.8 points in join-only settings and 18.8 points under churn, consistently surpassing strong baselines and scaling well across geographic deployments.

Crash-recovery during decentralized training of an LLM

Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models

GWTF is the first practical, crash-tolerant decentralized framework for collaboratively training LLMs on heterogeneous volunteer clients. It handles node churn and unstable networks through a novel decentralized flow algorithm that optimizes microbatch routing. Evaluations on GPT- and LLaMa-like models show that GWTF reduces training time by up to 45% in challenging, geographically distributed settings.

Asynchronous Byzantine Federated Learning

We propose an asynchronous, Byzantine-resilient FL algorithm that avoids straggler delays and requires no server dataset. By updating after a safe number of client contributions, it outperforms state-of-the-art methods, achieving faster training and higher accuracy under multiple attack types.

Topology Update

Dynamic Topology Optimization for Non-IID Data in Decentralized Learning

Morph is a decentralized learning topology optimizer that adapts peer selection based on model dissimilarity to overcome non-IID data and static communication limits. By reshaping the graph through gossip-based discovery, it boosts robustness and performance. Experiments on CIFAR-10 and FEMNIST show Morph outperforming static and epidemic baselines, achieving higher accuracy, faster convergence, and more stable learning with fewer communication rounds.

xample of a network where not forwarding signatures after delivering a message based on dissemination paths would prevent some nodes from authenticating it.

Reliable-Communication-in-Hybrid-Authentication-and-Trust-Models

This work extends two classical reliable communication protocols to combine authenticated links and processes, introducing DualRC. It leverages trusted nodes (e.g., gateways) and components (e.g., Intel SGX) to improve communication reliability, with methods to validate network implementation.

OPODIS · January 2025 · Rowdy Chotkan,  Bart Cox,  Vincent Rahli,  Jérémie Decouchant
Flat Multi-Server

Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients

Spyker is the first fully asynchronous multi-server FL system, eliminating server idle time and single-server bottlenecks. Clients communicate only with their nearest server, while servers also update each other asynchronously. This continuously active design improves scalability and performance across MNIST, CIFAR-10, and WikiText-2.

Diffusion Process

Training Diffusion Models with Federated Learning

We introduce a federated diffusion framework that allows independent, privacy-preserving training of DDPMs without exposing local data. By adapting FedAvg and leveraging the UNet backbone efficiently, our method cuts parameter exchange by up to 74% compared to naive FedAvg, while preserving image quality close to centralized training, as measured by FID.

Reduced catastrophic forgetting effect (preferably 1280x720 pixels)

Parameterizing federated continual learning for reproducible research

We present the first fully configurable framework for Federated Continual Learning, designed to reproduce complex, evolving learning scenarios. It supports large-scale deployments via containerization and Kubernetes, enabling precise experimentation. Demonstrations on CIFAR-100 and heterogeneous task sequences show Freddie’s effectiveness and uncover persistent performance challenges in real FCL settings.

Model training phases during a local training

Aergia: leveraging heterogeneity in federated learning systems

To speed up the Federated Learning process, learning tasks can be offloaded to other clients. Using similarity metrics and a resource aware scheduler, we are able to speed up the training process for Federated Learning.

Layer size distribution

Memory-aware and context-aware multi-DNN inference on the edge

Masa is a memory-aware multi-DNN scheduling framework for edge devices that ensures low response times without modifying models. It leverages inter/intra-network dependencies and context to cut latency by up to 90% on low-memory devices.

Pervasive and Mobile Computing · July 2022 · Bart Cox,  Robert Birke,  Lydia Y Chen
Relaxed loading policy ordering

MemA: Fast Inference of Multiple Deep Models

The paper introduces EdgeCaffe, a framework for exploring scheduling policies in multi-inference DNN jobs on resource-constrained edge devices. It proposes MemA, a memory-aware policy that improves execution time by up to 5x without additional resources, based on layer-specific memory demands.

IEEE PerCom · May 2021 · Jeroen Galjaard,  Bart Cox,  Amirmasoud Ghiassi,  Lydia Y. Chen,  Robert Birke
Architecture of M ASA

Masa: Responsive multi-dnn inference on the edge

Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer.

IEEE PerCom · April 2021 · Bart Cox
Figure title (preferably 1280x720 pixels)

Force processes to use swap in memory

Force processes to swap memory (Ubuntu 18.04) When benchmarking processes, looking how a process perform when swapping memory can be useful. Since the operating system determines the memory allocation policy, testing with limited memory can be cumbersome. One could try to fill the virtual memory with other processes to limit the available memory but since the operating system tries to swap inactive processes, our benchmarked process will probably remain in virtual memory. In any case, we cannot know this for certain. ...

March 2020 · Bart Cox
Figure title (preferably 1280x720 pixels)

Minimizing idleness in Spark Clusters

The GDelt database 1, on 18-11-2019, consists of 492,618 segments. Processing the top 10 most mentioned topics for each date on the whole data set would take a long time. Using clusters in the cloud, like AWS EMR, significantly decreases up the needed computation time but might be costly. To make the best use of the clusters on AWS EMR a minimization of the idle time of the machines is desired. ...