Simple and Distributed Machine Learning
-
Updated
Nov 17, 2025 - Scala
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Simple and Distributed Machine Learning
酷玩 Spark: Spark 源代码解析、Spark 类库等
Feathr – A scalable, unified data and AI engineering platform for enterprise
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
A Spark Atlas connector to track data lineage in Apache Atlas
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
A recommender system for discovering GitHub repos, built with Apache Spark
Apache Spark on AWS Lambda
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Project for James' Apache Spark with Scala course
Spark Connector to read and write with Pulsar
Created by Matei Zaharia
Released May 26, 2014