Apache Spark

Unified analytics engine

82
Project is healthy
Funding
Stable
Maintenance
Active
Contributors
Healthy
Bus Factor
Low risk
Metrics last updated: 6 days ago (2026-02-07)

Overview

Apache Spark is a unified analytics engine for large-scale data processing with built-in modules for SQL, streaming, machine learning, and graph processing.

Importance

  • Leading big data processing framework
  • Powers data analytics worldwide
  • Foundation for data engineering
  • Critical for AI/ML pipelines

Key Features

  • In-memory computing
  • SQL support (Spark SQL)
  • Streaming (Structured Streaming)
  • MLlib for machine learning

Sustainability

Spark is an Apache Foundation project with strong backing from Databricks and major cloud providers.

Dependencies

⬆️ Depends On (1)
Apache Spark
unknown

Dependency Chain

Upstream
Linux Kernel
Apache Spark

Impact Analysis

1 Direct Dependencies
0 Dependent Projects
🔝 Top-level project