About K-BERDL¶
KBase BER Data Lakehouse — A unified, AI-native data platform for DOE Biological and Environmental Research
The KBase BER Data Lakehouse (K-BERDL) is a multi-tenant, FAIR-compliant data platform developed to support the DOE Office of Biological and Environmental Research (BER). It brings together heterogeneous biological and environmental datasets from across the BER ecosystem into a single, governed, AI-ready infrastructure.
Mission¶
K-BERDL exists to accelerate scientific discovery by making BER data more accessible, interoperable, and actionable. The platform is designed to:
- Break down data silos across BER programs
- Provide a shared, scalable analytics infrastructure
- Enable AI-assisted discovery across national datasets
- Ensure data is FAIR — Findable, Accessible, Interoperable, and Reusable
Background¶
DOE BER programs collectively generate some of the world's most valuable biological and environmental datasets — spanning genomics, metagenomics, proteomics, environmental observations, and more. Historically, these datasets have been managed independently by individual programs, limiting cross-program discovery and integration.
K-BERDL addresses this challenge by providing a unified platform where BER programs can publish, govern, and share their data while retaining full stewardship and control.
Key BER Programs¶
K-BERDL serves or is onboarding the following BER programs as tenants:
| Program | Description |
|---|---|
| KBase | Narrative-driven computational biology and workflow execution |
| JGI | Genomics and metagenomics data from the Joint Genome Institute |
| NMDC | National Microbiome Data Collaborative metadata and workflows |
| EMSL | Environmental Molecular Sciences Laboratory multi-omics data |
| ESS-DIVE | Environmental Systems Science Data Infrastructure for a Virtual Ecosystem |
| ARM | Atmospheric Radiation Measurement climate research data |
Platform Principles¶
- Open standards — Built on Delta Lake, Apache Parquet, and Apache Atlas
- Tenant autonomy — Each program retains stewardship over its own data
- Reproducible science — Full data lineage and provenance tracking
- AI-native — Designed from the ground up to support AI agents and automated workflows
- Scalable — Supports KBase's 50,000+ user community and beyond
Contact & Access¶
To request access or onboard your BER program as a tenant, contact the K-BERDL platform team.
- Documentation: https://gazimahmud.github.io/kberdl-docs/
- GitHub: https://github.com/gazimahmud/kberdl-docs