Arrow is used by open-source projects like Apache Parquet, Apache Spark, pandas, and many commercial or closed-source services. It provides the following functionality: a) in-memory computing, b) a standardized columnar storage format, c) an IPC and RPC framework for data exchange between processes and nodes respectively.
I've been following Apache Arrow for a long time (2016!) and have been / continue to be bullish on it. This post is a great intro if you're not familiar, but it also has some excellent performance data in it that was new to me.
IBM measured a 53x speedup in data processing by Python and Spark after adding support for Arrow in PySpark
...heh, wow.Read more...