Berserk Docs

Ingestion

How data flows into Berserk and how to configure your OpenTelemetry Collector

Berserk ingests telemetry — logs, traces, and metrics — via the OpenTelemetry Protocol (OTLP). You configure a standard OpenTelemetry Collector to send data to Berserk. Everything after that is handled automatically.

How Data Flows

  1. Your OpenTelemetry Collector sends data to Berserk's ingest component, named Tjalfe, over OTLP (gRPC or HTTP). Alternatively, Promtail or any Loki-compatible client can send logs via the Loki push API.
  2. Your collector includes an ingest token in each request. Tjalfe validates it with the Meta service, which authenticates the token. Tjalfe then batches incoming data and uploads it to S3.
  3. The query component (Nursery) follows each stream, downloads batches from S3, routes data to the correct datasets, and makes them searchable. Nursery also merges small batches into larger optimized segments in the background.

Tjalfe holds each request open until S3 confirms the upload, then returns that result to the collector. A success response means the data is durably stored. If S3 is temporarily unavailable, Tjalfe returns a failure to the collector rather than buffer locally. Your OpenTelemetry Collector is the durability layer — it is responsible for retrying failed requests and persistently queuing data until Tjalfe accepts it.

Ingest Tokens

Every request to Tjalfe must carry an ingest token for authentication and routing:

Authorization: Bearer ing_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Each token is bound to a dataset. All data sent with that token routes to that dataset. You can override routing per-record by setting the bzrk.dataset resource attribute.

Create a token with the CLI:

bzrk ingest-token create --dataset default my-token

The token value is only shown once at creation time. Store it securely.

Default Ingest Token (Kubernetes)

When deploying with the Helm chart, the ingest service can be configured with a default ingest token via a Kubernetes Secret. This token is used to authenticate incoming data when no other token is provided.

Managed mode (recommended): Set global.ingestToken.managed: true in your Helm values. An init container will automatically create the token by calling Meta's API and store it in a Kubernetes Secret before Tjalfe starts. This is idempotent — if the secret already exists, the init container is skipped entirely.

global:
  ingestToken:
    managed: true

Manual mode: Create the secret yourself and reference it in the chart:

kubectl create secret generic ingest-token \
  --from-literal=default_ingest_token="ing_<your-token-value>"

The Helm chart references this secret by default (ingest-token with key default_ingest_token).

Streams

A stream is a sequential write path in S3. Tjalfe registers a stream with Meta on startup and writes all incoming data — from any number of collectors and ingest tokens — to that single stream. Data from different tokens targeting different datasets is batched together in the same upload; Nursery handles the routing.

In some cases Meta may assign more than one stream to a Tjalfe instance (e.g., after a restart or during scaling), but typically there is just one. Streams are created and managed automatically — you do not need to configure or interact with them directly.

Latency Error Recovery and Durability

PropertyBehavior
Ingest latencyData is batched for up to 2 seconds (or 10 MB) in Tjalfe before S3 upload. This is configurable. End-to-end latency from collector send to searchable is typically 1-10 seconds.
DurabilityData is durable once the collector receives a success response. This confirms data has been written to S3.
BackpressureIf Tjalfe cannot keep up, it returns errors (503/UNAVAILABLE or 429/RESOURCE_EXHAUSTED). The collector's retry and queue handle this automatically.
Error recoveryWhen S3 or Meta is having problems, Tjalfe returns retryable error codes to the collector. The collector queues failed requests and retries automatically.

Protocols

Tjalfe accepts OTLP over both gRPC and HTTP, and the Loki push API for log ingestion:

ProtocolDefault PortUse
OTLP gRPC4317Standard transport. Preferred.
OTLP HTTP4318Useful when gRPC is not available (e.g., browser, Lambda).
Loki push3100Promtail-compatible HTTP endpoint for log ingestion.

The Loki receiver accepts JSON and Protobuf push requests at /loki/api/v1/push, including Loki 3.0+ structured metadata. Stream labels are mapped to resource attributes and log lines become the OTLP log body. This makes it easy to ingest logs from existing Promtail or Grafana Agent deployments without switching to an OpenTelemetry Collector.

OpenTelemetry Collector Configuration

Below is the recommended default configuration for sending data to Berserk. Your setup may vary depending on your environment and use case, but these settings are a good starting point.

# Disk-backed queue so buffered data survives collector restarts.
extensions:
  file_storage/queue:
    directory: /var/lib/otel/queue

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  otlp/berserk:
    endpoint: "<your-endpoint>:4317"
    tls:
      insecure: true #if not using tls/https
    headers:
      authorization: "Bearer <your-ingest-token>"

    # OTLP payloads typically compress 3-5x with gzip (ratio depends on
    # payload redundancy). This cuts network egress, shortens per-request
    # wall time, and usually lets larger batches fit under the ingest
    # service's 16 MiB wire cap.
    compression: gzip

    # Berserk ingest service may hold each request until its batch window closes (up to 2s)
    # and the S3 upload completes. 30s stays above the ingest service's
    # internal ack timeout so it can return a real retryable error
    # instead of the collector timing out first.
    timeout: 30s

    sending_queue:
      # Persist the queue to disk so data survives collector restarts.
      # Without this, an in-memory queue loses all buffered data on restart.
      storage: file_storage/queue

      # Parallel connections to the ingest service. Each in-flight request
      # blocks until the batch window closes + S3 upload completes (2-4s),
      # so parallelism keeps throughput up during that wait.
      num_consumers: 10

      # Outage buffer. Size this for your ingest rate and the amount of
      # downtime you want to survive: peak_rate × tolerated_outage × headroom.
      # 1 GiB is a reasonable starting point.
      queue_size: 1073741824
      sizer: bytes

    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 60s
      # 0 = retry forever. The default 5min limit drops data after timeout.
      max_elapsed_time: 0
      multiplier: 2

processors:
  # Backpressure on receivers when approaching memory limit.
  # Prevents OOM before the disk queue absorbs everything.
  memory_limiter:
    check_interval: 1s
    limit_mib: 256
    spike_limit_mib: 64

service:
  extensions: [file_storage/queue]
  # No batch processor — Berserk batches per stream internally.
  # A collector-side batch processor would fragment data across streams.
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp/berserk]
    logs:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp/berserk]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp/berserk]

Why These Settings Matter

timeout: 30s — Berserk batches incoming data for up to 2 seconds before uploading to S3. The collector's default 5-second timeout will cause spurious failures during normal operation. 30 seconds gives headroom for S3 uploads under load and stays above the ingest service's internal 25s ack timeout — during a sustained S3 outage the ingest service responds at ~22s with an HTTP 429 + Retry-After: 30s, which the collector's exporter respects so it doesn't keep hammering the backend. Shortening the collector timeout below 25s risks dropping this signal and converting intentional throttles into generic timeouts.

file_storage/queue — Berserk's ingest service has no local durability — your collector is the durability layer. If the collector restarts with an in-memory queue, all buffered data is lost. The file_storage extension persists the queue to disk.

max_elapsed_time: 0 — Disables the default 5-minute retry limit. The ingest service returns retryable errors for backpressure (429), transient S3 failures (503), and stream recovery (503). With a disk-backed queue, the collector should retry indefinitely. Setting a limit means data is silently dropped after the timeout.

num_consumers: 10 — Each in-flight request blocks until Berserk's batch window closes and the S3 upload completes (typically 2–4 seconds). Parallel consumers keep throughput up during that wait. The ingest service merges concurrent requests per stream internally, so this does not cause redundant S3 uploads.

memory_limiter — Applies backpressure to receivers when the collector approaches its memory limit. Without this, if the ingest service is slow and the queue is filling, the collector can OOM before the disk queue absorbs everything.

No batch processor — Do not add a batch processor to pipelines sending to Berserk. The ingest service batches data per stream internally, merging concurrent requests into a single S3 upload. A collector-side batch processor splits and recombines requests on its own timer, working against that model.

compression: gzip — OTLP payloads typically compress 3–5× with gzip (ratio varies with payload redundancy). Enabling it cuts network egress, shortens per-request wall time, and usually lets larger batches fit under the per-request wire cap.

Request Size Limits

Berserk's ingest service accepts OTLP requests up to 16 MiB on the wire. With compression: gzip enabled that typically carries 40–80 MiB of uncompressed telemetry per request — ample for busy collectors.

If your collector emits requests above this ceiling, the ingest service returns 413 (HTTP) or InvalidArgument (gRPC). Options:

  • Enable compression: gzip on the exporter (see above) — this is the easy win.
  • Split into smaller batches by setting send_batch_max_size on an upstream batch processor in your pipeline (not between collector and Berserk — Berserk batches internally).

Ingress sizing: if you terminate TLS or proxy through nginx/Envoy/Istio in front of the ingest service, check your ingress body limits. The most common trap is nginx-ingress, whose default client_max_body_size is 1 MiB — set the annotation nginx.ingress.kubernetes.io/proxy-body-size: "16m" on the ingest Ingress resource. Other ingresses (Envoy, Contour, AWS ALB, GCP HTTP(S) LB) either have no body limit or a limit well above 16 MiB, but it is worth confirming for your specific setup.

Verifying Ingestion

After configuring your collector, verify data is flowing:

bzrk search "<your dataset> | take 10" --since "5m ago"

If no data appears, check:

  • The ingest token is correct and not revoked
  • The collector can reach Tjalfe (tjalfe:4317)
  • The collector logs for export errors or retries

On this page