Skip to content

About K-BERDL

KBase BER Data Lakehouse — A unified, AI-native data platform for DOE Biological and Environmental Research

The KBase BER Data Lakehouse (K-BERDL) is a multi-tenant, FAIR-compliant data platform developed to support the DOE Office of Biological and Environmental Research (BER). It brings together heterogeneous biological and environmental datasets from across the BER ecosystem into a single, governed, AI-ready infrastructure.


Mission

K-BERDL exists to accelerate scientific discovery by making BER data more accessible, interoperable, and actionable. The platform is designed to:

  • Break down data silos across BER programs
  • Provide a shared, scalable analytics infrastructure
  • Enable AI-assisted discovery across national datasets
  • Ensure data is FAIR — Findable, Accessible, Interoperable, and Reusable

Background

DOE BER programs collectively generate some of the world's most valuable biological and environmental datasets — spanning genomics, metagenomics, proteomics, environmental observations, and more. Historically, these datasets have been managed independently by individual programs, limiting cross-program discovery and integration.

K-BERDL addresses this challenge by providing a unified platform where BER programs can publish, govern, and share their data while retaining full stewardship and control.


Key BER Programs

K-BERDL serves or is onboarding the following BER programs as tenants:

Program Description
KBase Narrative-driven computational biology and workflow execution
JGI Genomics and metagenomics data from the Joint Genome Institute
NMDC National Microbiome Data Collaborative metadata and workflows
EMSL Environmental Molecular Sciences Laboratory multi-omics data
ESS-DIVE Environmental Systems Science Data Infrastructure for a Virtual Ecosystem
ARM Atmospheric Radiation Measurement climate research data

Platform Principles

  • Open standards — Built on Delta Lake, Apache Parquet, and Apache Atlas
  • Tenant autonomy — Each program retains stewardship over its own data
  • Reproducible science — Full data lineage and provenance tracking
  • AI-native — Designed from the ground up to support AI agents and automated workflows
  • Scalable — Supports KBase's 50,000+ user community and beyond

Contact & Access

To request access or onboard your BER program as a tenant, contact the K-BERDL platform team.