en.gif

en.mp4

With the advancement of deep learning techniques, major cloud providers and niche machine learning service providers start to offer their cloud-based machine learning tools, also known as machine learning as a service (MLaaS), to the public. According to our measurement, for the same task, these MLaaSes from different providers have varying performance due to the proprietary datasets, models, etc. Federating different MLaaSes together allows us to improve the analytic performance further. However, naively aggregating results from different MLaaSes not only incurs significant momentary cost but also may lead to sub-optimal performance gain due to the introduction of possible false-positive results. In this paper, we propose Armol, a framework to federate the right selection of MLaaS providers to achieve the best possible analytic performance. We first design a word grouping algorithm to unify the output labels across different providers. We then present a deep combinatorial reinforcement learning based-approach to maximize the accuracy while minimizing the cost. The predictions from the selected providers are then aggregated together using carefully chosen ensemble strategies. The real-world trace-driven evaluation further demonstrates that Armol is able to achieve the same accuracy results with 67% less inference cost.

Background

Major cloud providers and niche machine learning service providers start to offer their cloud-based machine learning tools, also known as machine learning as a service (MLaaS), to the public, including object detection services.

Screen Shot 2022-07-15 at 5.49.50 PM.png

Due to MLaaS is a black box, meaning that the model behind it and the dataset that is utilized for training the model are confidential and untouchable, users cannot figure out which MLaaS suits their requirements better.

Screen Shot 2022-07-15 at 6.07.01 PM.png

Measurements and Related work

To tackle the above problem, we plan to measure the accuracy of MLaaS. We divide recent works into three categories, however, the MLaaS mentioned in our work has nothing to do with them.

Screen Shot 2022-07-15 at 6.08.52 PM.png

Screen Shot 2022-07-15 at 6.09.26 PM.png

Thus, we measure object detection services from AWS, Azure, and Google Cloud Platform (GCP). We find that AWS is the best on average precision. Nevertheless, though GCP performs worst along other providers, it achieves the best accuracy in the “Book” category. Hence, we conclude that the most appropriate MLaaS provider differs for input with different features.

Screen Shot 2022-07-15 at 6.13.01 PM.png

Here are some interesting examples. We can observe that predictions from AWS, Azure, and Aliyun each have their own characteristics, which even have a complementary relationship. Thus, we want to federate the predictions from all providers to achieve the best accuracy.

Screen Shot 2022-07-15 at 6.14.34 PM.png