COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200811164635/https://github.com/topics/etl-pipeline
Here are
252 public repositories
matching this topic...
A stream processor for mundane tasks written in Go
Updated
Jul 22, 2020
Java
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Updated
Mar 9, 2020
Python
Example project implementing best practices for PySpark ETL jobs and applications.
Updated
Jul 9, 2020
Python
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Updated
Mar 5, 2020
Python
A simplified, lightweight ETL Framework based on Apache Spark
Updated
Aug 11, 2020
Scala
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Clojure dataframe library and pipeline for data processing and machine learning
Updated
Aug 11, 2020
Clojure
Download DIG to run on your laptop or server.
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
A simple Spark-powered ETL framework that just works 🍺
Updated
Aug 6, 2020
Scala
SEO dashboard from Search console Data using the Google Search API, Mysql database , NodeJS RESTAPI( ExpressJS) and reactJs Dashboard
Updated
Jul 7, 2020
JavaScript
Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Updated
May 14, 2020
Scala
🚹 💾 Script to import issues from a JIRA instance into a database.
Updated
Jul 28, 2020
Python
Updated
Aug 11, 2020
Scala
A Kafka Connect source connector that generates data for tests
Updated
Jun 26, 2019
Java
Updated
Jul 31, 2020
HTML
Updated
Apr 9, 2018
Python
Blog post on ETL pipelines with Airflow
Updated
Jun 7, 2020
Jupyter Notebook
Running an ETL pipeline with COBOL on Kubernetes
Updated
Jul 16, 2020
Shell
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
Updated
Oct 30, 2019
HTML
A tutorial to setup and deploy a simple Serverless Python workflow with REST API endpoints in AWS Lambda.
Updated
Apr 22, 2020
Python
Waterdrop Plugin developing examples.
Updated
Jun 11, 2020
Scala
ETL pipeline for the Ethereum blockchain
Updated
Feb 13, 2019
JavaScript
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
Updated
Feb 24, 2019
Python
Data monitoring tool, monitors the result, not the run
Updated
Aug 11, 2020
Scala
Parallel Streaming Transformation Loader
Updated
Apr 23, 2019
Java
Build end-to-end Machine Learning pipeline to predict accessibility of playgrounds in NYC
Updated
Jul 9, 2020
Jupyter Notebook
🏭 Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and pub/sub
Updated
Jun 4, 2019
Python
Improve this page
Add a description, image, and links to the
etl-pipeline
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
etl-pipeline
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.