Analytics

There have been several efforts to apply FaaS to analytics workloads. We touch upon a few examples here and refer the reader to Werner et al. [1] for an overview and comparison of serverless data processing frameworks.

PyWren [2] demonstrated the benefits of simplified cloud programming with a simple FaaS-based framework geared at analytics tasks. Subsequent work by IBM [3] extends it with additional constructs, and Locus [4] showed how to implement shuffling, a an important analytics primitive, in a FaaS environment.

Wukong [5,6] focuses on optimizing analytics tasks, enhancing locality by minimizing data movement across tasks. In a similar vein, HASTE [7] focuses on optimizing serverless DAG execution.

In the database literature, Lambda [8] and Starling [9] both and use FaaS to operate on data stored in S3. Flint [10] tackles the same problem using Apache Spark [11].

[1]Sebastian Werner, Richard Girke, and Jörn Kuhlenkamp. 2020. An Evaluation of Serverless Data Processing Frameworks. In Proceedings of the 2020 Sixth International Workshop on Serverless Computing, 19–24.
[2]Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing, ACM, 445–451.
[3]Josep Sampé, Gil Vernik, Marc Sánchez-Artigas, and Garcı́a-López Pedro. 2018. Serverless Data Analytics in the IBM Cloud. In Proceedings of the 19th International Middleware Conference Industry, 1–8.
[4]Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), 193–206.
[5]Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Panruo Wu, and Yue Cheng. 2020. Wukong: A Scalable and Locality-Enhanced Framework for Serverless Parallel Computing. In Proceedings of the 11th ACM Symposium on Cloud Computing, 1–15.
[6]Benjamin Carver, Jingyuan Zhang, Ao Wang, and Yue Cheng. 2019. In Search of a Fast and Efficient Serverless Dag Engine. In 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), IEEE, 1–10.
[7]Avinash Arjavalingam and Aditya Parameswaran. 2021. HASTE: Serverless DAG Execution Optimizer. (2021).
[8]Ingo Müller, Marroquı́n Renato, and Gustavo Alonso. 2020. Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 115–130.
[9]Matthew Perron, Raul Castro Fernandez, David DeWitt, and Samuel Madden. 2020. Starling: A Scalable Query Engine on Cloud Functions. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 131–141.
[10]Youngbin Kim and Jimmy Lin. 2018. Serverless Data Analytics With Flint. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), IEEE, 451–455.
[11]Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, and others. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM 59, 11 (2016), 56–65.