<aside> 📌 This page is maintained by the DML group. We introduce the works accomplished and the work we are going to do. Our research interests include gradient compression algorithms, the training system on the LLM. We are seeking for cooperators on the DML😊. Please feel free to reach out to Zhi Wang (mail: [email protected]) or Rongwei Lu (mail: [email protected]) for further information or potential collaboration opportunities.

</aside>

<aside> 📌 该页面由分布式机器学习小组维护,包含已经完成的工作以及即将进行的工作。我们的研究兴趣包括梯度压缩算法以及大型语言模型的训练系统。我们正在寻找DML的合作伙伴,如果需要更多信息或有意向合作,请联系王智(电子邮件:[email protected])或路荣伟(电子邮件:[email protected])。

</aside>

Research / 研究方向

Works Accomplished / 近期工作

1. Gradient Compression / 梯度压缩

<aside> ✏️ The gradient compression algorithms are widely used in DML, which can effectively alleviate the communication bottleneck. But in Non-IID scenarios, traditional gradient compression algorithms face challenges. To address the accuracy degradation in Non-IID scenarios, we propose the data-aware adaptive gradient compresion algorithm, called DAGC. For the failure of the traditional hard-threshold compressor in federated learning scenarios, we propose γ-FedHT, which is a stepsize-aware adative hard-threshold compressor. In asynchronous federated learning, conventional solutions couldn’t optimize local update and communication jointly, we propose FedLuck, which optimizes the convergence speed via joint adjustment of local update frequency and gradient compression rate. For more details, please refer to the following.

</aside>

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

Untitled

22.png

33.png

2. Decentralized Federated Learning / 去中心化联邦学习算法

<aside> ✏️ The emerging concern about data privacy and security has motivated the proposal of federated learning, which allows nodes to only synchronize the locally-trained models instead their own original data. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized topologies and the assumption of large nodes-to-server bandwidths. However, in real-world federated learning scenarios the network capacities between nodes are highly uniformly distributed and smaller than that in a datacenter. It is of great challenges for conventional federated learning approaches to efficiently utilize network capacities between nodes. In this paper, we propose a model segment level decentralized federated learning to tackle this problem. In particular, we propose a segmented gossip approach, which not only makes full utilization of node-tonode bandwidth, but also has good training convergence.

</aside>