课题:分布式数据并行训练(或联邦学习)中的异步训练算法
参考文献:
[1] Alistarh, D., Grubic, D., Li, J., Tomioka, R., & Vojnovic, M. (2017). QSGD: Communication-efficient SGD via gradient quantization and encoding. Advances in Neural Information Processing Systems, 30, 1709-1720.
[2] Koloskova, A., Stich, S. U., & Jaggi, M. (2022). Sharper convergence guarantees for asynchronous SGD for distributed and federated learning. Advances in Neural Information Processing Systems, 35, 17202-17215.
[3] Stich, S. U. (2020). On communication compression for distributed optimization on heterogeneous data. arXiv preprint arXiv:2009.02388.
[4] Stich, S. U., Cordonnier, J.-B., & Jaggi, M. (2018). Sparsified SGD with Memory. arXiv:1809.07599. Retrieved September 01, 2018, from https://ui.adsabs.harvard.edu/abs/2018arXiv180907599S
[5] Vogels, T., Karimireddy, S. P., & Jaggi, M. (2019). PowerSGD: Practical low-rank gradient compression for distributed optimization. Advances in Neural Information Processing Systems, 32.
[6] Zhu, L., Lin, H., Lu, Y., Lin, Y., & Han, S. (2021). Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning. Advances in Neural Information Processing Systems, 34.
[7] Xu, Hang, et al. Compressed communication for distributed deep learning: Survey and quantitative evaluation. 2020.