<aside> 💡 This page is maintained by the D2I group. We introduce the works accomplished and the work we are going to do. Our research interest is to design algorithms and systems that provide accurate and efficient inference capabilities. We are seeking cooperators on the D2I😊. You are welcome to contact Zhi Wang (mail: [email protected]) or Qingting Jiang(mail: [email protected][email protected]).

</aside>

⭐️ Highlight

1. Introduction

With the development of pre-training models, the last mile problem of machine learning has become an urgent issue to be solved. In the open world, there are many challenges:

  1. Data problem: the mismatch between test data and training data in the open world and the problem of unlabeled test time data.
  2. Model problem: the performance degradation of deployed models in the testing phase and the problem of large models.
  3. Deployment problem: the problem of distributed deployment on devices and expensive resources.

To address these challenges, the Deep Model Deployment and Inference Group (D2I) works closely with coauthors to ensure that models are deployed efficiently and perform optimally in real-world scenarios. The goal is to design algorithms and systems that provide accurate and efficient inference capabilities. In general, our team is trying to solve the challenges from three directions:

Untitled

2. Projects

2.1 Efficient Deploy and Inference Serving

Joint Model and Data Adaptation for Cloud Inference Serving (RTSS 21’ CCF A)

We tackle the dual challenge of computation-bandwidth trade-off and cost-effectiveness by proposing A2, an efficient joint adaptive model and adaptive data deep learning serving solution across the geo-datacenters. Inspired by the insight that there is a trade-off between computational cost and bandwidth cost in achieving the same accuracy, we design a real-time inference serving framework that selectively places different "versions" of the deep learning models at different geolocations and schedules different data sample versions to be sent to those model versions for inference. We deploy A2 on Amazon EC2 for experiments, which shows that A2 achieves a 30%-50% reduction in serving cost under the same required latency and accuracy compared to baselines.