FAQ

Frequently asked questions about operating and troubleshooting SeeBOM.

Common questions about running SeeBOM — re-ingestion, database resets, GitHub token setup, example files, and day-to-day operations.

How do I force a re-ingestion?

The Ingestion Watcher runs as a CronJob (default: every 6 hours). It deduplicates files by SHA-256 hash — only new or changed SBOMs are enqueued. There are two scenarios:

Ingest new/changed files immediately

Trigger the CronJob manually instead of waiting for the next scheduled run:

kubectl create job --from=cronjob/<RELEASE>-ingestion-watcher manual-ingest-$(date +%s) -n <NAMESPACE>

Replace <RELEASE> with your Helm release name (e.g. seebom) and <NAMESPACE> with your namespace.

Example:

kubectl create job --from=cronjob/seebom-ingestion-watcher manual-ingest-$(date +%s) -n seebom

This creates a one-off Job from the CronJob spec. The Watcher scans all sources (S3 buckets + local PVC), skips already-processed files, and enqueues anything new.

Full re-ingestion from scratch

If you need to reprocess everything (e.g. after changing the license policy, enabling OSV scanning, or upgrading parsing logic), you must first truncate all data tables and then trigger re-ingestion:

Step 1 — Truncate all data tables:

kubectl exec -n <NAMESPACE> <CLICKHOUSE_POD> -c clickhouse -- \
  clickhouse-client --database=seebom --password="<PASSWORD>" --multiquery \
  --query "TRUNCATE TABLE ingestion_queue; \
           TRUNCATE TABLE license_compliance; \
           TRUNCATE TABLE vulnerabilities; \
           TRUNCATE TABLE sbom_packages; \
           TRUNCATE TABLE sboms; \
           TRUNCATE TABLE vex_statements;"

The default ClickHouse pod name follows this pattern: chi-<RELEASE>-clickhouse-<CLUSTER_NAME>-0-0-0

For a typical installation this would be:

kubectl exec -n seebom chi-seebom-clickhouse-seebom-cluster-0-0-0 -c clickhouse -- \
  clickhouse-client --database=seebom --password="$CLICKHOUSE_PASSWORD" --multiquery \
  --query "TRUNCATE TABLE ingestion_queue; \
           TRUNCATE TABLE license_compliance; \
           TRUNCATE TABLE vulnerabilities; \
           TRUNCATE TABLE sbom_packages; \
           TRUNCATE TABLE sboms; \
           TRUNCATE TABLE vex_statements;"

Step 2 — Trigger re-ingestion:

kubectl create job --from=cronjob/seebom-ingestion-watcher reingest-$(date +%s) -n seebom

Step 3 — Monitor progress:

# Watch the watcher job
kubectl logs -n seebom -l job-name=reingest-<TIMESTAMP> -f

# Watch the parsing workers
kubectl logs -n seebom -l app.kubernetes.io/component=parsing-worker --tail=20 -f

# Check dashboard stats via API
kubectl exec -n seebom deploy/seebom-api-gateway -- \
  wget -qO- http://localhost:8080/api/v1/stats/dashboard | jq '.total_sboms, .total_packages'

When do I need a full re-ingestion?

A full re-ingestion (truncate + re-ingest) is required when:

ScenarioWhy
Changed the license policy (license-policy.json)Existing packages need reclassification
Enabled/disabled OSV scanning (skipOSV)Vulnerability data needs to be fetched or cleared
Enabled/disabled GitHub license resolution (skipGitHubResolve)Unknown licenses need re-resolution
Upgraded parsing logic (new image version)Existing SBOMs may parse differently (e.g., new in-toto attestation support or improved license resolution with well-known Go module mappings)
Changed license exceptionsException matching is applied during ingestion
Added a GitHub token (GITHUB_TOKEN)Previously rate-limited resolution may have missed packages — a re-ingestion with the token resolves all licenses

For these changes, a simple incremental re-trigger will not reprocess existing files because the SHA-256 hashes haven’t changed.


How do I check ingestion progress?

# Job queue status
kubectl exec -n seebom <CLICKHOUSE_POD> -c clickhouse -- \
  clickhouse-client --database=seebom --password="$CLICKHOUSE_PASSWORD" \
  --query "SELECT argMax(status, created_at) AS status, count() AS cnt \
           FROM ingestion_queue \
           GROUP BY job_id \
           HAVING status != '' \
           GROUP BY status \
           ORDER BY status" \
  --format=PrettyCompact

# Data summary
kubectl exec -n seebom <CLICKHOUSE_POD> -c clickhouse -- \
  clickhouse-client --database=seebom --password="$CLICKHOUSE_PASSWORD" \
  --query "SELECT 'sboms' AS table, count() AS rows FROM sboms FINAL \
           UNION ALL SELECT 'packages', count() FROM sbom_packages FINAL \
           UNION ALL SELECT 'vulns', count() FROM vulnerabilities FINAL \
           UNION ALL SELECT 'licenses', count() FROM license_compliance FINAL" \
  --format=PrettyCompact

Or use the API endpoint:

curl -s http://<API_HOST>/api/v1/stats/dashboard | jq .

How do I clear / reset the database?

Docker Compose (local development)

The quickest way to start from scratch:

make dev-reset

This stops all containers, deletes the ClickHouse data volume, and restarts everything. All SBOMs will be re-ingested from the sboms/ directory.

If you only want to wipe specific tables (e.g. vulnerabilities and licenses) without losing SBOM data:

make ch-shell
TRUNCATE TABLE vulnerabilities;
TRUNCATE TABLE license_compliance;

Then re-trigger processing:

make re-scan

Kubernetes

See Full re-ingestion from scratch above — it covers truncating all tables and re-triggering the Ingestion Watcher.


Should I use a GitHub token?

Yes — we strongly recommend setting GITHUB_TOKEN, especially when ingesting container-image SBOMs (e.g. those generated by Syft/BuildKit).

Why?

Many container-image SBOMs contain packages with NOASSERTION as the declared license. SeeBOM’s parsing worker resolves these by querying the GitHub API. Without a token, GitHub enforces a rate limit of 60 requests per hour — which is quickly exhausted when processing SBOMs with hundreds of packages.

Without tokenWith token
Rate limit60 req/h5,000 req/h
License resolutionPartial (many remain NOASSERTION)Complete
RecommendedOnly for small test runsAlways

How to set it up

  1. Create a GitHub Personal Access Token (classic) at github.com/settings/tokens. No scopes are required — the token only needs unauthenticated-level access to public repository metadata.

  2. Set the token:

Docker Compose (.env):

GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Kind (local Kubernetes):

# local/secrets.env
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Kubernetes (Helm values):

github:
  token: "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Or via --set:

helm upgrade seebom deploy/helm/seebom/ --set github.token="ghp_..."

What example files are included?

The sboms/ directory ships with several example files for testing and demonstration:

FileTypePurpose
_example.spdx.jsonPlain SPDX JSONA minimal SPDX 2.3 document with packages, licenses, and PURLs. Good starting point for testing the basic pipeline.
_example-intoto.spdx.jsonIn-toto attestation envelopeAn SPDX document wrapped in an in-toto attestation predicate field — the format generated by Syft/BuildKit for container images. Tests the parser’s auto-detection and unwrapping.
_example-violations.spdx.jsonPlain SPDX JSONContains packages with copyleft licenses (GPL, AGPL) to test license compliance violations and exception handling.
_example.openvex.jsonOpenVEXA VEX document that suppresses specific CVEs for demonstration. Tests the VEX ingestion and dashboard effective-vulnerability counts.
golang-common.openvex.jsonOpenVEXReal-world VEX statements for common Go ecosystem false positives.
otel-protobuf.openvex.jsonOpenVEXVEX statements for OpenTelemetry protobuf-related findings.
license-policy.jsonLicense PolicyThe active license classification policy (permissive vs. copyleft SPDX IDs). Based on the CNCF Allowed Third-Party License Policy.
license-exceptions.jsonLicense ExceptionsActive license exceptions in CNCF format. Exempts specific packages from violation reporting.

How do I trigger a CVE refresh?

The CVE Refresher runs as a CronJob (default: daily at 2 AM). To trigger it immediately:

kubectl create job --from=cronjob/seebom-cve-refresher manual-cve-refresh-$(date +%s) -n seebom

This checks all known PURLs against the OSV database for newly disclosed vulnerabilities.

Docker Compose:

make cve-refresh