Learn more about Open Source Summit Japan and register here
Back To Schedule
Thursday, June 1 • 16:50 - 17:30
Automating Workflows for Analytics Pipelines - Sadayuki Furuhashi, Treasure Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning heterogeneous systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of Digdag, our recently open-sourced workflow management project. I’ll show how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. I also demonstrate how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use the workflows to automate data lifecycle management across Apache Hadoop, PostgreSQL, Amazon S3 and Apache Spark. You'll see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines.

avatar for Sadayuki Furuhashi

Sadayuki Furuhashi

Founder and Software Architect, Treasure Data
Sada is the original author of Fluentd, Embulk, MessagePack, and now Digdag: an open-sourced workflow management project. Sada is a co-founder Treasure Data, Inc., a cloud-based data warehousing and analytics service. He has been working on production distributed systems for a decade... Read More →

Thursday June 1, 2017 16:50 - 17:30 JST
Private Dining