Apache Spark

Unified analytics engine

Project is healthy

Funding

Stable

Maintenance

Active

Contributors

Healthy

Bus Factor

Low risk

Metrics last updated: 6 days ago (2026-02-07)

Overview

Apache Spark is a unified analytics engine for large-scale data processing with built-in modules for SQL, streaming, machine learning, and graph processing.

Importance

Leading big data processing framework
Powers data analytics worldwide
Foundation for data engineering
Critical for AI/ML pipelines

Key Features

In-memory computing
SQL support (Spark SQL)
Streaming (Structured Streaming)
MLlib for machine learning

Sustainability

Spark is an Apache Foundation project with strong backing from Databricks and major cloud providers.

Dependencies

⬆️ Depends On (1)

Linux Kernel 88

Apache Spark

unknown

Dependency Chain

Upstream

Linux Kernel →

Apache Spark

Impact Analysis

1 Direct Dependencies

0 Dependent Projects

🔝
Top-level project

Back to all projects