SBOM Parsers

Multi-format SBOM parsing: supported formats, parser backends, configuration, and trade-offs.

Overview

SeeBOM supports multiple SBOM formats through a format-detection dispatch layer at internal/sbom/parse.go. When a file is processed, the dispatcher:

  1. Reads the raw bytes
  2. Probes JSON fields to identify the format
  3. Routes to the appropriate parser backend
  4. Returns a unified ParseResult (SBOM metadata + packages)

This happens transparently — the parsing worker simply calls sbom.Parse(reader, sourceFile, hash).

Supported Formats

FormatFile ExtensionDetection Method
SPDX 2.3 JSON.spdx.json, .json"spdxVersion" field present
In-toto attestation (SPDX).spdx.json, .json"predicateType" contains “spdx”
CycloneDX 1.0–1.7 JSON.cdx.json, .json"bomFormat": "CycloneDX"

Parser Backends

Built-in (Default)

The default parsers use goccy/go-json for high-performance streaming and add zero additional dependencies beyond what SeeBOM already requires.

PackageFormatNotes
internal/spdxSPDX 2.3 + in-totoStreaming parser, handles Go temp module names
internal/cyclonedxCycloneDX 1.0–1.7Maps components + dependencies to parallel arrays

Advantages:

  • ✅ Minimal memory footprint (no protobuf overhead)
  • ✅ No additional transitive dependencies
  • ✅ Optimized for the specific fields SeeBOM needs
  • ✅ Handles in-toto attestation envelopes natively
  • ✅ Handles Go-specific temp module name cleanup

Limitations:

  • ❌ Only supports SPDX 2.3 and CycloneDX JSON
  • ❌ No XML/protobuf format support
  • ❌ Manual effort to support new format versions

Protobom (Opt-in)

The protobom backend provides maximum format coverage through the community-maintained SBOM library. It supports all formats that protobom supports, including future additions.

PackageFormatsNotes
internal/protobomparserSPDX 2.3 + CycloneDX 1.0–1.7Unified graph model

Advantages:

  • ✅ Broad format coverage (SPDX + CycloneDX all versions)
  • ✅ Community-maintained — automatic support for new spec versions
  • ✅ Unified protobuf data model for all formats
  • ✅ Future-proof: new formats (SPDX 3.0, etc.) come “for free”

Limitations:

  • ❌ Higher memory footprint (loads entire document into protobuf structs)
  • ❌ Adds ~30 transitive dependencies (protobuf, grpc, etc.)
  • ❌ Does not handle in-toto envelopes out-of-the-box
  • ❌ Slight parsing overhead vs. the tuned built-in parsers

Configuration

Environment Variable

# Enable protobom backend (replaces built-in parsers)
USE_PROTOBOM=true

Docker Compose (.env)

USE_PROTOBOM=true

Helm Values

parsingWorker:
  extraEnv:
    USE_PROTOBOM: "true"

Programmatic (tests)

import "github.com/seebom-labs/seebom/backend/internal/sbom"

sbom.SetUseProtobom(true)
result, err := sbom.Parse(reader, "file.cdx.json", "sha256hash")

Architecture

                         ┌──────────────────────┐
                         │    sbom.Parse()       │
                         │  (Format Detection)   │
                         └──────────┬───────────┘
                                    │
               USE_PROTOBOM=false   │   USE_PROTOBOM=true
          ┌─────────────────────────┼─────────────────────┐
          │                         │                     │
          ▼                         ▼                     ▼
  ┌──────────────┐          ┌────────────┐       ┌──────────────┐
  │ SPDX Parser  │          │ CycloneDX  │       │  Protobom    │
  │ (internal/   │          │ (internal/ │       │ (internal/   │
  │  spdx)       │          │  cyclonedx)│       │  protobom-   │
  │              │          │            │       │  parser)     │
  │ goccy/json   │          │ goccy/json │       │              │
  └──────────────┘          └────────────┘       └──────────────┘

CycloneDX Field Mapping

CycloneDX FieldSeeBOM Model FieldNotes
specVersionSBOM.SPDXVersionStored as "CycloneDX-1.5"
serialNumberSBOM.DocumentNamespaceURN format
metadata.component.nameSBOM.DocumentNameWith version appended
metadata.timestampSBOM.CreationDateRFC3339
metadata.tools[].nameSBOM.CreatorToolsPrefixed with “Tool: "
components[].bom-refPackageSPDXIDsUsed as node identifier
components[].namePackageNames
components[].versionPackageVersions
components[].purlPackagePURLs
components[].licensesPackageLicensesExpression or ID
dependencies[].ref/dependsOnRelSource/TargetIndicesType: "DEPENDS_ON"

Recommendation

ScenarioBackendReason
Production with CNCF S3 bucketsBuilt-inAll files are SPDX JSON, maximum performance
Mixed-format ingestionBuilt-inSPDX + CycloneDX covered with zero overhead
Unknown/exotic formatsProtobomBroader format sniffing and parsing
Future SPDX 3.0 supportProtobomWill be added by the protobom community
CI/CD with custom SBOMsBuilt-inPredictable behavior, no surprises

Adding a New Format

To add support for a new SBOM format:

  1. Create a new parser package at internal/<format>/parser.go
  2. Implement a Parse(data []byte, sourceFile, sha256Hash string) (*ParseResult, error) function
  3. Add format detection logic to internal/sbom/parse.go (probe a distinguishing JSON field)
  4. Write tests in internal/<format>/parser_test.go
  5. Update this documentation page