Hierarchical Split Federated Learning for Resource-Constrained Environments
The proliferation of Large Language Models (LLMs) and Generative Artificial Intelligence has fundamentally transformed the computational landscape, imposing hardware requirements that far exceed the capabilities of conventional IoT and edge devices. While centralized cloud computing offers the necessary processing power to train these models, this approach raises critical concerns regarding data privacy, transmission latency, and bandwidth consumption.
In this context, Federated Learning (FL) has established itself as the standard for collaborative, privacy-preserving training. However, traditional FL faces a physical barrier known as the "Memory Wall": the memory required to store optimizer states and gradients for models with billions of parameters exceeds the physical capacity of most edge devices, making their direct participation in training unfeasible.
This project investigates Hierarchical Split Federated Learning (HSFL), an emerging paradigm designed to democratize access to AI training. By merging the layer-wise model partitioning of Split Learning (SL) with the parallel aggregation of FL, HSFL proposes decomposing deep neural networks across the Computing Continuum. In this architecture, resource-constrained devices process only the initial layers of a model, offloading the heavy computational workload to intermediate servers or edge clusters, while a central entity coordinates global knowledge.
Despite its potential to enable AI at the edge, deploying HSFL in real-world, heterogeneous environments presents significant technical challenges that motivate this research:
- System Heterogeneity and Stragglers: The Cloud-Edge Continuum is composed of devices with vastly different computational capabilities. The presence of slower nodes can severely degrade the performance of the entire synchronized training system.
- Communication Overhead: Unlike standard FL, model partitioning requires the frequent transmission of activations and gradients between model shards. Without efficient management, network latency can negate the gains achieved through distributed computing.
- Server-Side Scalability: Naive Split Learning architectures tend to impose a one-to-one ratio between clients and servers, leading to rapid saturation of server resources as the number of clients increases.
The objective of this project is to explore novel strategies for orchestration, resource allocation, and system optimization to overcome these barriers, enabling scalable, efficient training of foundational models across distributed and constrained infrastructures.
References
[1] Gemma Team et al., Gemma: Open Models Based on Gemini Research and Technology, (2024). DOI: 10.48550/ARXIV.2403.08295
[2] S. Rajbhandari, J. Rasley, O. Ruwase, Y. He, Zero-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning, Proc. SC21 (2021). DOI: 10.1145/3458817.3476205
[3] X. Xia, Y. Xu, S. Wang, HSFL: An Efficient Split Federated Learning Framework via Hierarchical Organization, (2022). DOI: 10.23919/cnsm55787.2022.9964646
[4] N. Prigent, D. Rosário, S. Cerf, A. Goldman, Enabling Federated Learning Across the Computing Continuum: Systems, Challenges and Future Directions, (2024). DOI: 10.1016/j.future.2024.06.043
[5] Z. Lin, G. Huang, J. Zhang, Hierarchical Split Federated Learning: Convergence Analysis and System Optimization, IEEE Internet of Things Journal (2025). DOI: 10.1109/tmc.2025.3565509
[6] A. Khan, J. A. M. Naranjo, S. Mozaffari, Y. Ghamri-Doudane, A Joint Communication and Learning Framework for Hierarchical Split Federated Learning, IEEE GLOBECOM (2024). DOI: 10.1109/jiot.2023.3315673
[7] Z. Zhang, Y. Wang, Z. Wang, Q. Yang, SplitLLM: Hierarchical Split Learning for Large Language Model over Wireless Network, (2025). DOI> 10.1109/gcwkshp64532.2024.11100350
Contact
- Name / Titel
- Marcio Lopes Moraes
- Funktion
- Research Assistant
- marcio.moraeslopes@unibe.ch
- Phone
- +41 31 684 8424
