Untitled

Fedora Infra Openshift Best Practices

This document aims to encourage the use of best practices related to application development and deployment of containerised applications on Kubernetes/Openshift.

Its a large topic, can’t possibly cover every element in detail, but should be enough to act as a primer. NOTE: Should these best practices be something maintained by the kube-sig? If so should we attempt to resurrect it?

References/Resources/Further Reading

[1] Fedora Infra Flock Hackfest https://hackmd.io/HxpzTNpITfu0OYmOGRApiw
[2] Kubernetes health checks https://blog.kubecost.com/blog/kubernetes-health-check/
[3] Prometheus metrics format https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#text-based-format
[4] Fedora Kubedev SIG https://fedoraproject.org/wiki/SIGs/KubeDev
[5] Openshift oauth-proxy https://github.com/openshift/oauth-proxy
[6] Fedora Infra migration tracker DeploymentConfig to Deployment https://pagure.io/fedora-infrastructure/issue/12142
[7] Fedora Infra ticket tracker: https://pagure.io/fedora-infrastructure/issues
[8] 42 Prod Best Practices The Complete Guide for Developers https://medium.com/@mahernaija/docker-2024-docker-compose-2024-master-best-practices-the-complete-guide-for-developers-aaf851349240
[9] Semantic Versioning https://semver.org/
[10] Resources https://docs.openshift.com/container-platform/4.17/scalability_and_performance/compute-resource-quotas.html
[11] Ansible Operator Tutorial: https://sdk.operatorframework.io/docs/building-operators/ansible/tutorial/
[12] How pods with resource limits are run https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run
[13] Enabling monitoring for user defined projects: https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html

Fedora Infra Clusters

Fedora Infra currently manages the following three Openshift clusters:

Staging (Self Hosted in RDU3, deploy apps via ansible): https://console-openshift-console.apps.ocp.stg.fedoraproject.org/
Production (Self Hosted in RDU3, deploy apps via ansible): https://console-openshift-console.apps.ocp.fedoraproject.org/
Communishift (RH Openshift Dedicated deployed in AWS, apps deployed by individual app maintainers in various ways): https://console-openshift-console.apps.fedora.cj14.p1.openshiftapps.com/

Access to the clusters is managed via the Fedora account system (FAS). All Fedora users may authenticate, but access to each project is managed on an app per app basis. Open a ticket at [7] requesting access to a particular app, but ensure you first get approval from the existing app owners.

Building containers

Use Podman over Docker when developing locally.
How containers are currently built and updated inside Fedora Infra? Since the retirement of OSBS, they arn’t automated iirc?
Use a service to build the containers Konflux?, Imagebuilder?, quay.io? iirc, the plan is that we will use Konflux to do our container building going forward, we’re starting off looking at configuring the Konflux instance to build artifacts. -
Don’t consume an image directly built via BuildConfig with S2I (source to image) instead: — Use Fedora as the base image! quay.io/fedora/fedora:latest — Build and push the built container image to a registry like quay.io. — If the application is an official app image, use the fedora namespace: quay.io/fedora/appname. — Inside Openshift create an ImageStream which points at quay.io/fedora/appname:v1.0.0.releasename. — For staging could possibly use quay.io/fedora/appname:latest. — When the image changes inside quay.io the ImageStream will pull down the latest version of this image. — Applications should consume the container image via ImageStream within a Deployment. — This prevents problems which only display themselves during a build such as missing dependencies. — Doing it this way prevents outages or service degredation, as the existing version will remain operational should the build run into issues.
Minimise the number of layers. Each line in the Dockerfile/Podmanfile adds a new layer. This can quickly increase the build time and size of the end container. To combat this make use of && to chain commands together, which counts as a single layer. eg:

FROM busybox

RUN echo This is the 1 > 1 \
  && rm -f 1 \
  && echo This is the 2 > 2 \
  && rm -f 2 \
# ... for about 70 commands

# rather than
FROM busybox

RUN echo This is the 1 > 1 \
RUN rm -f 1 \
RUN echo This is the 2 > 2 \
RUN rm -f 2 \
# ... for about 70 commands

Use specific build tags eg: v1.0.2 which follow semantic versioning [9].
Limit container privileges. By default containers which run as root cannot run in Openshift without elevated privileges and will not start the container without these privileges in place for the ServiceAccount. If you need root access, don’t run this part of the application in Openshift at all (if possible).

ImageStream

Changes to an image which the imagestream points to will automatically cause a roll out to applications which use this imagestream.
This provides a single change to the base fedora image, to cause a roll out of all applications on the clusters to the latest image.

Handling Dependencies

All application dependencies should be version pinned and locked within the container to aid reaching reproducible builds. Make use of a dependency management system as per the language best practices.
If a container image is vital to Fedora, perhaps dependencies could also be stored in a local pip/gem/nodejs/whatever/rpm repository to enable building?

DeploymentConfig migration to Deployments

DeploymentConfig is depreciated and is being phased out (very) soonish, we should replace all DeploymentConfigs with Deployments.
This is being tracked with a board on pagure: https://pagure.io/fedora-infrastructure/issue/12142
We should consider breaking this epic up into smaller tickets and creating individual tickets to track each instance of DeploymentConfig deployed app in Fedora Infra.

Deploy/use ACS (redhat product) that looks inside containers and tells you what’s in it and what the security issues are.

The registry quay.io has such features already, perhaps use this instead? One less service we need to run and maintain.

Security

Do secure access to application using something like the oauth-proxy [5] especially if working with user data.
When hosting an app within Openshift, using the oauth-proxy might be a better way to secure the app rather than using systems like flask-oidc.

Monitoring applications

Do expose endpoints in the application to aid in monitoring [2]. — Liveness probes to detect a non-responsive application — Readiness probes to ensure that a service is ready to receive traffic — Startup probes for identifying and delaying application startup until it’s prepared to handle requests
Do expose metrics endpoint in the application to display current metrics indicating health and application metadata eg: /metrics

apiVersion: apps/v1
kind: Deployment
metadata:
  name: darwin-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: darwin-app
  template:
    metadata:
      labels:
        app: darwin-app
    spec:
      containers:
      - name: darwin-container
        image: nginx:latest
        ports:
        - containerPort: 443
        readinessProbe:
          httpGet:
            path: /darwin-probes
            port: 443
            scheme: HTTPS
          initialDelaySeconds: 5
          periodSeconds: 10
          timeoutSeconds: 5
        livenessProbe:
          httpGet:
            path: /healthz
            port: 443
            scheme: HTTPS
          initialDelaySeconds: 5
          periodSeconds: 10
          timeoutSeconds: 5
        startupProbe:
          httpGet:
            path: /healthz
            port: liveness-port
          failureThreshold: 30
          periodSeconds: 10

user workload monitoring stack (what ever the released name is now its out of tech preview)
Hook into the Openshift monitoring [13] stack and then use prometheus exporters to push metrics and alerts to zabbix maybe (blocked until Zabbix is more widely used within Fedora Infra).

Preferred source control managers

github
pagure
gitlab
forgejo

Preferred methods of deploying applications within Fedora Infra

Fedora Infra uses ansible playbooks/roles as the primary means to deploy applications.
An ansible role should be developed to deploy the app within Fedora Infra.
Private variables should be stored in the ansible-private repo.
Ensure sane defaults are available within the default directory of the role.
Alternative is to perhaps develop a Helm chart or Ansible based Kubernetes Operator [11] to do this work.

Limits, requests

When deploying the application, ensure to add resource requests and limits to the Deployment [10] see example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: darwin-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: darwin-app
  template:
    metadata:
      labels:
        app: darwin-app
    spec:
      containers:
      - name: darwin-container
        image: nginx:latest
        ports:
        - containerPort: 443
        resources:
          requests:
            ephemeral-storage: "2Gi"
            memory: 100Mi
            cpu: 100m # milicores
          limits:
            memory: 200Mi
            cpu: 1
            ephemeral-storage: "4Gi"
...

Do set limits
Do set resource requests
If a Container exceeds its memory limit, it will probably be terminated.
If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.

Scaling

When designing an app ensure the following: — It is capable of recovering from a restart/crash. (eg: killed, moved and or crashed containers) — Add ability to scale app in the architecture and design eg: multiple instances behind a load balancer. In Kubernetes Deployments: replicas: 1 — Ensure the app includes high availability in its design, eg: 3 instances ensuring the application stays up even if one instance is down.

Want to help? Learn how to contribute to Fedora Docs ›