Introduction

The goals of this page are to:

  • Remain brief and up-to-date; short enough to read in one sitting, updated and versioned with the code.

  • Provide an overview and a snapshot of status for team members and newcomers.

  • Provide links to navigate to other project Resources

This page does not replace or duplicate information in JIRA, Bugzilla, enhancement proposals, or the enhancment proposal process.

Architecture Summary

Log categories

We define 3 logging categories

Application

Container logs from non-infrastrure containers.

Infrastructure

Node logs and container logs from kube- and openshift- namespaces.

Audit

Node logs from /var/log/audit, security sensitive.

Components

The logging system breaks down into 4 logical components:

collector

Read container log data from each node.

forwarder

Forward log data to configured outputs.

store

Store log data for analys This is the default output for the forwarder.

exploration

UI tools (GUI and command line) to search, query and view stored logs

Operators and Custom Resources

legend
Figure 1. Key to diagrams
overview
Figure 2. Operators and APIs

The cluster logging operator (CLO) implements the following custom resources:

ClusterLogging (CL)

Deploys the collector and forwarder which currently are both implemented by a daemonset running Fluentd on each node.

ClusterLogForwarder (CLF)

Generate Fluentd configuration to forward logs per user configuration.

The elasticsearch logging operator (ELO) implements the following custom resources:

ElasticSearch

Configure and deploy an Elasticsearch instance as the default log store.

Kibana

Configure and deploy Kibana instance to search, query and view logs.

Runtime behavior

node
Figure 3. Collection and forwarding

The container runtime interface (CRI-O) on each node writes container logs to files. The file names include the container’s UID, namespace, name and other data. We also collect per-node logs from the Linux journald.

The CLO deploys a Fluentd daemon on each node which acts both as a collector (reading log files) and as a forwarder (sending log data to configured outputs)

Log Entries

The term log is overloaded so we’ll use these terms to clarify:

Log

A stream of text containing a sequence of log entries.

Log Entry

Usually a single line of text in a log, but see Multi-line Entries

Container Log

Produced by CRI-O, combining stdout/stderr output from process running in a single container.

Node Log

Produced by journald or other per-node agent from non-containerized processes on the node.

Structured Log

A log where each entry is a JSON object (map), written as a single line of text.

Kubernetes does not enforce a uniform format for logs. Anything that a containerized process writes to stdout or stderr is considered a log. This "lowest common denominator" approach allows pre-existing applications to run on the cluster.

Traditional log formats write entries as ordered fields, but the order, field separator, format and meaning of fields varies.

Structured logs write log entries as JSON objects on a single line. However names, types, and meaning of fields in the JSON object varies between applications.

The Kubernetes Structured Logging proposal will standardize the log format for some k8s components, but there will still be diverse log formats from non-k8s applications running on the cluster.

Metadata, Envelopes and Forwarding

Metadata is additional data about a log entry (original host, container-id, namespace etc.) that we add as part of forwarding the logs. We use these terms for clarity:

Message

The original, unmodifed log entry.

Envelope

Include metadata fields and a message field with the original message

We usually use JSON notation for the envelope since it’s the most widespread convention.

However, we do and will implement other output formats formats; for example a syslog message with its MSG and STRUCTURED-DATA sections is an different way to encode the equivalent envelope data.

Depending on the output type, we may forward entries as _message only, full envelope, or the users choice.

The current metadata model is documented here. Model documentation is generated from a formal model.

Not all of the documented model is in active use. Review is needed. The labels field is "flattened" before forwarding to Elasticsearch.

Multi-line Entries

Log entries are usually a single line of text, but they can consist of more than one line for several reasons:

CRI-O

CRI-O reads chunks of text from applications, not single lines. If a line gets split between chunks, CRI-O writes each part as a separate line in the log file with a "partial" flag so they can be correctly re-assembled.

Stack traces

Programs in languages like Java, Ruby or Python often dump multi-line stack traces into the log. The entire stack trace needs to be kept together when forwarded to be useful.

JSON Objects

A JSON object can be written on multiple lines, although structured logging libraries typically don’t do this.

Work in progress

Flow Control/Back-pressure

Status: Needs write-up as enhancment proposal(s)

TODO: Updated diagram - CRIO to fluentd end-to-end.

Goal: 1) Sustain a predictable average load without log loss up to retention limits. 2) Handle temporary load spikes predictably: drop or back-pressure. 3) Handle long-term overload with alerts and predictable log loss at source.

Problems now: - Uncontrolled (file-at-a-time) log loss from slow collection + node log rotation. - Large back-up in file buffers under load: very high latencies, slow recovery.

Propose 2 qualities of service: - fast: priority is low latency, high throughput, no effect on apps. May drop data. n- reliable: Priority is to avoid data loss, but may allow loss outside of defined limits. May slow application progress.

In traditional terminology: - fast: at-most-once - reliable: at-least-once with limits. Even users who want reliable logging may have a breaking point where they’d rather let the application progress and lose logs. We may need configurable limits on how hard we try to be reliable.

Architecture: - conmon writes container log files on node, log rotation (retention) - fluentd on node: file vs. memory buffers - forwarder target: throughput - store target: retention - Future: separate normalizer/forwarder, fluentbit/fluentd

Must consider data loss by forwarding protocol also: - store (elasticsearch) review options. - fluent-forward need to enable at-least-once acks (we don’t) - others need to review case by case if its possible.

Throughput and latency: - evaluate throughput of each stage: node log to store/target. - end-to-end latency, expected/acceptable variation.

Buffer sizes - all components must maintain bounded buffers. - without end-to-end back-pressure we cannot guarantee no data loss. - we should be able to give better sizing/capacity guidelines.

Need well-designed (accurate, no floods, no noise) alerts for log loss and back-pressure situations

Configuration: - Enable backpressure by pod label and/or namespace. - Can’t impose backpressure everywhere? - Enable rate limiting in low-latency mode (back-pressure always limits rate)

Error Reporting

The logging system itself can encounter errors that need to be diagnosed, examples: * Invalid JSON received where structured logs are reuqired. * Hard (no retry possible) errors from store or other target causing unavoidable log loss.

Alerts are a key component, but alerts must be actionable, they can’t be used to record ongoing activity that might or might not be reviewed later. For that we need logs.

The CLO and fluentd collector logs can be captured just like any other infrastructure log. However, if the logging system itself is in trouble, users need a simple, direct path to diagnose the issue. This path might have a simpler implementation that is more likely to survive if logging is in trouble.

Proposal: add a 4th logging category [application, infrastructure, audit, logging] This category collects logs related to errors in the logging system, including fluentd error messages and errors logged by the CLO.

Document Metadata

Decide on the supported set of envelope metadata fields and document them.

Some of our format decisions are specifically for elasticsearch (e.g. flattening maps to lists) We need to separate the ES-specifics, either:

  • Include suffficient output format configuration to cover everything we need for ES (map flattening) OR

  • Move the ES-specific formatting into the elasticsearch output type.

Multi-line support

  • Cover common stack trace types: Java, Ruby, Python, Go.

  • Review need for multi-line JSON.

Syslog metadata

Optionally copy metadata copied to syslog STRUCTURED-DATA

Loki as store

  • Benchmarking & stress testing in progress

  • Configuring loki at scale.

  • Test with back ends s3, boltd.

Observability/Telemetry

TODO

Updating this page

The asciidoc source for this document is on GitHub. Create a GitHub Pull Request to request changes.

Resources

Planning and tracking
Data model