Skip to content

datacoon/awesome-dataops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Awesome DataOps Awesome

Awesome list of DataOps open source software, online services, courses and use cases

Table of contents

Opensource

Data Pipeline Orchestration

  • Apache Airlow - Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
  • Apache Oozie - Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
  • Dagster - A Python library for building data applications: ETL, ML, Data Pipelines, and more.
  • DBT Cmd tool - the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
  • Reflow - A language and runtime for distributed, incremental data processing in the cloud

ETL tools

  • Apache Kafka - a distributed streaming platform.
  • Apache Nifi - an easy to use, powerful, and reliable system to process and distribute data.
  • Squirrel - a Python library for large-scale data loading, transforming and sharing.

Commercial products and services

Platforms

  • Astronomer - spin up and scale Apache Airflow clusters
  • Databand - Databand tracks your pipeline execution metadata, so you can evaluate changes in runtimes, code, data, and critical business KPIs.
  • DataKitchen - end-to-end DataOps platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing, and monitoring to development and deployment.
  • Prefect - is a new workflow management system, designed for modern infrastructure and powered by open-source software.
  • Saagie - Saagie DataOps Orchestrator integrates the commercial and open source data technologies to accelerate project delivery
  • Unravel - helps ops engineers, app developers, and enterprise architects reduce the complexity of delivering reliable application performance – providing unified visibility and operational intelligence to optimize your entire ecosystem

Cloud ETL

  • AWS Glue - is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
  • Azure Data Factory - a hybrid data integration service, simplified ETL operations
  • Google Cloud Dataflow - unified stream and batch data processing that's serverless, fast, and cost-effective.
  • ETLWorks - a cloud-first, any-to-any data integration platform

Data catalogs

Testing and monitoring

  • RightData - is a data testing, reconciliation, validation suite that allows stakeholders in identifying issues related to data consistency, quality, completeness, and gaps.

About

Awesome list of dataops products, open source and resources

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •