ETL pipeline rewriting legacy Java code into Rust. Reduced memory footprint by 60% and improved processing speed by 10x using Apache Arrow bindings.
Project Overview
A complete rewrite of a legacy Java-based ETL pipeline using Rust. By utilizing Apache Arrow for zero-copy memory management, we achieved a 10x performance improvement and a 60% reduction in memory usage. The pipeline processes terabytes of data daily, performing complex transformations and aggregations before loading data into a data warehouse.
Technology Stack
Rust
Apache Arrow
Systems