<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[DevOps]]></title><description><![CDATA[DevOps]]></description><link>https://devops-blog.ruicoelho.dev</link><generator>RSS for Node</generator><lastBuildDate>Fri, 05 Jun 2026 20:13:44 GMT</lastBuildDate><atom:link href="https://devops-blog.ruicoelho.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[ArgoCD v3.2.5: Critical Patch Release with Stability Improvements]]></title><description><![CDATA[Introduction
The ArgoCD community recently released version v3.2.5 on January 14, 2026, replacing v3.2.4 which was marked as invalid. This patch release brings critical fixes that improve the stability and security of the most popular GitOps platform...]]></description><link>https://devops-blog.ruicoelho.dev/argocd-v325-critical-patch-release-with-stability-improvements</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/argocd-v325-critical-patch-release-with-stability-improvements</guid><category><![CDATA[ArgoCD]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[gitops]]></category><category><![CDATA[Platform Engineering ]]></category><category><![CDATA[SRE devops]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Sun, 25 Jan 2026 13:04:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769346164626/57838d24-ca5d-4e32-9ba3-b0199e70a0fb.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>The ArgoCD community recently released version <strong>v3.2.5</strong> on January 14, 2026, replacing v3.2.4 which was marked as invalid. This patch release brings critical fixes that improve the stability and security of the most popular GitOps platform in the Kubernetes ecosystem.</p>
<p>If you’re running ArgoCD in production, especially on 2.x versions (which have reached End of Life), this article explains why you should consider this update and what has changed.</p>
<h2 id="heading-why-argocd-v325-matters"><strong>🎯 Why ArgoCD v3.2.5 Matters</strong></h2>
<h3 id="heading-the-v32-context"><strong>The v3.2 Context</strong></h3>
<p>ArgoCD v3.2 represents a significant evolution in the 3.x line:</p>
<ul>
<li><p><strong>v3.0</strong> (early 2025): Fundamental architectural improvements</p>
</li>
<li><p><strong>v3.1</strong> (August 2025): Native OCI registry support and CLI plugins</p>
</li>
<li><p><strong>v3.2</strong> (November 2025): Advanced features and security fixes</p>
</li>
<li><p><strong>v3.2.5</strong> (January 2026): Critical stabilization</p>
</li>
</ul>
<h3 id="heading-support-warning"><strong>⚠️ Support Warning</strong></h3>
<p><strong>ArgoCD 2.14 reached End of Life on November 4, 2025.</strong> According to the project’s support policy, only the three most recent minor versions receive security updates:</p>
<ul>
<li><p>✅ v3.2.x (current)</p>
</li>
<li><p>✅ v3.1.x</p>
</li>
<li><p>✅ v3.0.x</p>
</li>
<li><p>❌ v2.14 and earlier (unsupported)</p>
</li>
</ul>
<h2 id="heading-key-changes-in-v325"><strong>🔧 Key Changes in v3.2.5</strong></h2>
<h3 id="heading-1-notifications-engine-update"><strong>1. Notifications Engine Update</strong></h3>
<p><strong>Commit</strong>: <code>fafbd44</code></p>
<pre><code class="lang-plaintext">feat: Cherry-pick to 3.2 update notifications engine to v0.5.1
</code></pre>
<p>The update to <strong>notifications engine v0.5.1</strong> brings improvements in notification delivery for:</p>
<ul>
<li><p>Slack</p>
</li>
<li><p>Microsoft Teams</p>
</li>
<li><p>Email</p>
</li>
<li><p>Custom webhooks</p>
</li>
<li><p>PagerDuty and others</p>
</li>
</ul>
<p><strong>Practical benefit</strong>: Greater reliability in sync notifications, health status, and deployment events.</p>
<h3 id="heading-2-applicationset-reconciliation-fix"><strong>2. ApplicationSet Reconciliation Fix</strong></h3>
<p><strong>Commit</strong>: <code>d7d9674</code></p>
<pre><code class="lang-plaintext">fix(appset): do not trigger reconciliation on appsets not part of 
allowed namespaces when updating a cluster secret
</code></pre>
<p><strong>Problem solved</strong>: ApplicationSets in non-allowed namespaces no longer trigger unnecessary reconciliations when updating cluster secrets.</p>
<p><strong>Impact</strong>:</p>
<ul>
<li><p>Reduced computational load</p>
</li>
<li><p>Lower Kubernetes API consumption</p>
</li>
<li><p>More predictable behavior in multi-tenant environments</p>
</li>
</ul>
<h3 id="heading-3-error-message-improvements"><strong>3. Error Message Improvements</strong></h3>
<p><strong>Commit</strong>: <code>e6f5403</code></p>
<pre><code class="lang-plaintext">fix: Only show 'please update resource specification' message when spec is outdated
</code></pre>
<p>More precise and contextual error messages, reducing confusion for operators.</p>
<h3 id="heading-4-dependency-updates"><strong>4. Dependency Updates</strong></h3>
<p><strong>Important commits</strong>:</p>
<pre><code class="lang-plaintext"># Go update to version 1.25.5
chore(deps): bump go to 1.25.5 

# expr update to v1.17.7 (security)
chore(cherry-pick-3.2): bump expr to v1.17.7
# Tests against Kubernetes 1.34.2
ci: test against k8s 1.34.2
</code></pre>
<p><strong>Guaranteed compatibility with</strong>:</p>
<ul>
<li><p>✅ Kubernetes 1.32.x</p>
</li>
<li><p>✅ Kubernetes 1.33.x</p>
</li>
<li><p>✅ Kubernetes 1.34.x</p>
</li>
</ul>
<h2 id="heading-how-to-upgrade-to-v325"><strong>🚀 How to Upgrade to v3.2.5</strong></h2>
<h3 id="heading-option-1-non-ha-installation-single-instance"><strong>Option 1: Non-HA Installation (Single Instance)</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Create namespace (if needed)</span>
kubectl create namespace argocd

<span class="hljs-comment"># Apply v3.2.5 manifest</span>
kubectl apply -n argocd -f \
  https://raw.githubusercontent.com/argoproj/argo-cd/v3.2.5/manifests/install.yaml
</code></pre>
<h3 id="heading-option-2-ha-installation-high-availability"><strong>Option 2: HA Installation (High Availability)</strong></h3>
<pre><code class="lang-bash">kubectl create namespace argocd

kubectl apply -n argocd -f \
  https://raw.githubusercontent.com/argoproj/argo-cd/v3.2.5/manifests/ha/install.yaml
</code></pre>
<h3 id="heading-option-3-via-helm-chart"><strong>Option 3: Via Helm Chart</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Add repository</span>
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

<span class="hljs-comment"># Upgrade to latest version</span>
helm upgrade argocd argo/argo-cd \
  --namespace argocd \
  --version 9.3.2 \
  --reuse-values
</code></pre>
<p><strong>Note</strong>: Helm chart 9.3.2 includes ArgoCD v3.2.5.</p>
<h2 id="heading-security-verification"><strong>🔐 Security Verification</strong></h2>
<p>All ArgoCD images are signed with <strong>Cosign</strong> and include <strong>SLSA Level 3 Provenance</strong>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Verify image signature</span>
cosign verify \
  --certificate-identity-regexp <span class="hljs-string">"https://github.com/argoproj/argo-cd"</span> \
  --certificate-oidc-issuer <span class="hljs-string">"https://token.actions.githubusercontent.com"</span> \
  quay.io/argoproj/argocd:v3.2.5

<span class="hljs-comment"># Verify provenance</span>
cosign verify-attestation \
  --<span class="hljs-built_in">type</span> slsaprovenance \
  --certificate-identity-regexp <span class="hljs-string">"https://github.com/argoproj/argo-cd"</span> \
  --certificate-oidc-issuer <span class="hljs-string">"https://token.actions.githubusercontent.com"</span> \
  quay.io/argoproj/argocd:v3.2.5
</code></pre>
<h2 id="heading-compatibility-and-support"><strong>📊 Compatibility and Support</strong></h2>
<h2 id="heading-supported-architectures"><strong>Supported Architectures</strong></h2>
<ul>
<li><p><strong>amd64</strong> (x86_64)</p>
</li>
<li><p><strong>arm64</strong> (Apple Silicon, AWS Graviton)</p>
</li>
<li><p><strong>ppc64le</strong> (IBM Power)</p>
</li>
<li><p><strong>s390x</strong> (IBM Z)</p>
</li>
</ul>
<h2 id="heading-kubernetes-platforms"><strong>Kubernetes Platforms</strong></h2>
<ul>
<li><p>Google GKE</p>
</li>
<li><p>Amazon EKS</p>
</li>
<li><p>Azure AKS</p>
</li>
<li><p>Red Hat OpenShift</p>
</li>
<li><p>Rancher</p>
</li>
<li><p>K3s / K0s</p>
</li>
<li><p>Vanilla Kubernetes</p>
</li>
</ul>
<h2 id="heading-migrating-from-v2x-to-v3x"><strong>🎓 Migrating from v2.x to v3.x</strong></h2>
<p>If you’re still on ArgoCD v2.14 or earlier, migration to v3.2.5 is <strong>critical</strong> for security reasons.</p>
<h2 id="heading-key-behavioral-changes-in-v3x"><strong>Key Behavioral Changes in v3.x</strong></h2>
<h3 id="heading-1-fine-grained-rbac-by-default"><strong>1. Fine-Grained RBAC by Default</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># v2.x: Update permission applied to sub-resources</span>
<span class="hljs-string">p,</span> <span class="hljs-string">dev-team,</span> <span class="hljs-string">applications,</span> <span class="hljs-string">update,</span> <span class="hljs-string">default/*,</span> <span class="hljs-string">allow</span>
<span class="hljs-comment"># v3.x: Explicit permissions needed for resources</span>
<span class="hljs-string">p,</span> <span class="hljs-string">dev-team,</span> <span class="hljs-string">applications,</span> <span class="hljs-string">update,</span> <span class="hljs-string">default/*,</span> <span class="hljs-string">allow</span>
<span class="hljs-string">p,</span> <span class="hljs-string">dev-team,</span> <span class="hljs-string">applications,</span> <span class="hljs-string">update/*/Pod/*,</span> <span class="hljs-string">default/*,</span> <span class="hljs-string">allow</span>
<span class="hljs-string">p,</span> <span class="hljs-string">dev-team,</span> <span class="hljs-string">applications,</span> <span class="hljs-string">update/*/Deployment/*,</span> <span class="hljs-string">default/*,</span> <span class="hljs-string">allow</span>
</code></pre>
<h3 id="heading-2-tracking-by-annotations-default"><strong>2. Tracking by Annotations (Default)</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># New default configuration in v3.x</span>
<span class="hljs-attr">application.resourceTrackingMethod:</span> <span class="hljs-string">annotation</span>
</code></pre>
<h3 id="heading-3-rbac-on-logs-enabled"><strong>3. RBAC on Logs Enabled</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># Explicit permissions needed</span>
<span class="hljs-string">p,</span> <span class="hljs-string">role:developers,</span> <span class="hljs-string">logs,</span> <span class="hljs-string">get,</span> <span class="hljs-string">*/*,</span> <span class="hljs-string">allow</span>
</code></pre>
<h2 id="heading-detailed-upgrade-guide"><strong>Detailed Upgrade Guide</strong></h2>
<p><strong>Official documentation</strong>:</p>
<ul>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/upgrading/2.14-3.0/">Upgrading to v3.0</a></p>
</li>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/upgrading/3.1-3.2/">Upgrading to v3.2</a></p>
</li>
</ul>
<h2 id="heading-featured-v32-capabilities"><strong>🆕 Featured v3.2 Capabilities</strong></h2>
<h3 id="heading-1-kustomize-version-selection-via-git"><strong>1. Kustomize Version Selection via Git</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># .argocd-source.yaml</span>
<span class="hljs-attr">kustomize:</span>
  <span class="hljs-attr">version:</span> <span class="hljs-string">v5.3.0</span>
</code></pre>
<p>You can now specify the Kustomize version directly in your Git repository!</p>
<h3 id="heading-2-server-side-diff"><strong>2. Server-Side Diff</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># CLI with server-side diff</span>
argocd app diff my-app --server-side-diff
</code></pre>
<p>Differences calculated by the Kubernetes API Server = greater accuracy.</p>
<h3 id="heading-3-pull-request-title-matching"><strong>3. Pull Request Title Matching</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ApplicationSet</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">generators:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">pullRequest:</span>
      <span class="hljs-attr">github:</span>
        <span class="hljs-attr">owner:</span> <span class="hljs-string">myorg</span>
        <span class="hljs-attr">repo:</span> <span class="hljs-string">myrepo</span>
      <span class="hljs-attr">filters:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">title:</span> <span class="hljs-string">"Release/.*"</span>  <span class="hljs-comment"># Filter by title!</span>
</code></pre>
<h3 id="heading-4-health-checks-for-gitops-promoter"><strong>4. Health Checks for GitOps Promoter</strong></h3>
<p>Full support for resources:</p>
<ul>
<li><p><code>CommitStatus</code></p>
</li>
<li><p><code>PullRequest</code></p>
</li>
<li><p><code>PromotionStrategy</code></p>
</li>
<li><p><code>ChangeTransferPolicy</code></p>
</li>
</ul>
<h2 id="heading-performance-and-observability"><strong>📈 Performance and Observability</strong></h2>
<h3 id="heading-applicationset-resource-limits"><strong>ApplicationSet Resource Limits</strong></h3>
<p>To prevent status bloat and etcd limits:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">argocd-cmd-params-cm</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-comment"># Default limit: 5000 resources</span>
  <span class="hljs-attr">applicationsetcontroller.status.max.resources.count:</span> <span class="hljs-string">"5000"</span>
</code></pre>
<h3 id="heading-recommended-monitoring"><strong>Recommended Monitoring</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Important metrics to monitor</span>
argocd_app_sync_total
argocd_app_health_status
argocd_app_reconcile_count
argocd_applicationset_status_resources
</code></pre>
<h2 id="heading-argocd-roadmap"><strong>🔮 ArgoCD Roadmap</strong></h2>
<h3 id="heading-v33-expected-february-2026"><strong>v3.3 (Expected February 2026)</strong></h3>
<p>Target date: <strong>February 2, 2026</strong> (GA)</p>
<p>Expected features:</p>
<ul>
<li><p>Additional performance improvements</p>
</li>
<li><p>New notification integrations</p>
</li>
<li><p>UX enhancements in the dashboard</p>
</li>
</ul>
<h3 id="heading-community-events"><strong>Community Events</strong></h3>
<p><strong>ArgoCon Amsterdam 2026</strong></p>
<ul>
<li><p>📅 Date: March 23–26, 2026</p>
</li>
<li><p>📍 Location: Co-located with KubeCon EU</p>
</li>
<li><p>🎫 Register: <a target="_blank" href="https://events.linuxfoundation.org/argocon/">ArgoCon 2026</a></p>
</li>
</ul>
<h2 id="heading-upgrade-checklist"><strong>✅ Upgrade Checklist</strong></h2>
<p>Before upgrading to v3.2.5:</p>
<ul>
<li><p>Review <a target="_blank" href="https://github.com/argoproj/argo-cd/releases/tag/v3.2.5">official release notes</a></p>
</li>
<li><p>Backup configurations (ConfigMaps, Secrets, CRDs)</p>
</li>
<li><p>Test in staging environment</p>
</li>
<li><p>Validate RBAC policies (if migrating from v2.x)</p>
</li>
<li><p>Verify plugin compatibility</p>
</li>
<li><p>Update internal documentation</p>
</li>
<li><p>Communicate changes to team</p>
</li>
</ul>
<p>Post-upgrade:</p>
<ul>
<li><p>Verify health of all applications</p>
</li>
<li><p>Test manual synchronization</p>
</li>
<li><p>Validate notifications</p>
</li>
<li><p>Monitor logs for 24–48h</p>
</li>
<li><p>Review metrics dashboards</p>
</li>
</ul>
<h2 id="heading-common-troubleshooting"><strong>🛠️ Common Troubleshooting</strong></h2>
<h3 id="heading-problem-applicationsets-reconciling-excessively"><strong>Problem: ApplicationSets Reconciling Excessively</strong></h3>
<p><strong>Symptom</strong>: High CPU load, many Kubernetes API requests</p>
<p><strong>Solution</strong>: Upgrading to v3.2.5 fixes this specific bug!</p>
<h3 id="heading-problem-notifications-not-arriving"><strong>Problem: Notifications Not Arriving</strong></h3>
<p><strong>Symptom</strong>: Sync events don’t trigger notifications</p>
<p><strong>Solution</strong>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check notifications controller version</span>
kubectl get deployment argocd-notifications-controller \
  -n argocd -o yaml | grep image

<span class="hljs-comment"># Should be v3.2.5</span>
</code></pre>
<h3 id="heading-problem-rbac-denying-log-access"><strong>Problem: RBAC Denying Log Access</strong></h3>
<p><strong>Symptom</strong>: Users cannot see pod logs</p>
<p><strong>Solution</strong>: Add explicit RBAC policy:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">argocd-rbac-cm</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">policy.csv:</span> <span class="hljs-string">|</span>
    <span class="hljs-string">p,</span> <span class="hljs-string">role:developer,</span> <span class="hljs-string">logs,</span> <span class="hljs-string">get,</span> <span class="hljs-string">*/*,</span> <span class="hljs-string">allow</span>
</code></pre>
<h2 id="heading-additional-resources"><strong>📚 Additional Resources</strong></h2>
<h3 id="heading-official-documentation"><strong>Official Documentation</strong></h3>
<ul>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/">ArgoCD Docs</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/argoproj/argo-cd">GitHub Repository</a></p>
</li>
<li><p><a target="_blank" href="https://argoproj.github.io/community/join-slack">Slack Community</a></p>
</li>
</ul>
<h3 id="heading-related-articles"><strong>Related Articles</strong></h3>
<ul>
<li><p><a target="_blank" href="https://blog.argoproj.io/argo-cd-v3-0-release-candidate-a0b933f4e58f">Announcing Argo CD v3.0</a></p>
</li>
<li><p><a target="_blank" href="https://blog.argoproj.io/argo-cd-v3-2-release-candidate-4c939b63d9c4">Argo CD v3.2 RC</a></p>
</li>
<li><p><a target="_blank" href="https://opengitops.dev/">GitOps Best Practices</a></p>
</li>
</ul>
<h3 id="heading-complementary-tools"><strong>Complementary Tools</strong></h3>
<ul>
<li><p><strong>Argo Rollouts</strong>: Progressive delivery</p>
</li>
<li><p><strong>Argo Workflows</strong>: Workflow orchestration</p>
</li>
<li><p><strong>Argo Events</strong>: Event-driven automation</p>
</li>
<li><p><strong>ApplicationSet Controller</strong>: Multi-cluster app deployment</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>💡 Conclusion</strong></h2>
<p>ArgoCD v3.2.5 represents a <strong>critical stability update</strong> that all production users should consider. With important fixes in the ApplicationSet controller, dependency updates, and better notification handling, this version solidifies ArgoCD’s position as the reference GitOps solution for Kubernetes.</p>
<p><strong>Recommended action</strong>:</p>
<ul>
<li><p>If you’re on v3.2.4 → upgrade immediately</p>
</li>
<li><p>If you’re on v3.0–3.1 → plan upgrade in the coming weeks</p>
</li>
<li><p>If you’re on v2.x → <strong>urgent upgrade needed</strong> (EOL)</p>
</li>
</ul>
<p>The GitOps ecosystem continues to evolve rapidly, and staying up-to-date is not just about features, but about <strong>security and supportability</strong>.</p>
]]></content:encoded></item><item><title><![CDATA[Terraform vs OpenTofu: A Comprehensive Comparison for Infrastructure as Code]]></title><description><![CDATA[The infrastructure as code (IaC) landscape experienced a significant shift in August 2023 when HashiCorp changed Terraform’s license from the Mozilla Public License (MPL) to the Business Source License (BSL). This decision sparked controversy in the ...]]></description><link>https://devops-blog.ruicoelho.dev/terraform-vs-opentofu-a-comprehensive-comparison-for-infrastructure-as-code</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/terraform-vs-opentofu-a-comprehensive-comparison-for-infrastructure-as-code</guid><category><![CDATA[Terraform]]></category><category><![CDATA[opentofu]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[#IaC]]></category><category><![CDATA[IaC (Infrastructure as Code)]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Sun, 25 Jan 2026 12:59:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769345953057/01b687f1-8f19-4d96-9a48-2f597bec055b.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The infrastructure as code (IaC) landscape experienced a significant shift in August 2023 when HashiCorp changed Terraform’s license from the Mozilla Public License (MPL) to the Business Source License (BSL). This decision sparked controversy in the open-source community and led to the birth of OpenTofu, a fork of Terraform that maintains the open-source ethos. As organizations evaluate their IaC tooling strategies, understanding the differences, similarities, and implications of choosing between Terraform and OpenTofu has become crucial.</p>
<p>In this article, we’ll dive deep into both tools, compare their features, examine real-world examples, and help you make an informed decision for your infrastructure needs.</p>
<h2 id="heading-the-origin-story-understanding-the-fork"><strong>The Origin Story: Understanding the Fork</strong></h2>
<h3 id="heading-terraforms-license-change"><strong>Terraform’s License Change</strong></h3>
<p>HashiCorp’s decision to move Terraform from MPL 2.0 to BSL 1.1 was justified by the company as necessary to prevent cloud providers from offering competing managed services without contributing back to the project. While understandable from a business perspective, this change meant that Terraform was no longer truly open source by the Open Source Initiative’s definition.</p>
<p>The BSL allows free use for most purposes but restricts competitive commercial use. After four years, the code converts to MPL 2.0, but this waiting period was enough to concern many organizations that had built their infrastructure automation on the promise of open-source software.</p>
<h3 id="heading-the-birth-of-opentofu"><strong>The Birth of OpenTofu</strong></h3>
<p>In response, the Linux Foundation announced OpenTofu in September 2023 as a truly open-source alternative. Led by a coalition of companies including Gruntwork, Spacelift, env0, Scalr, and others, OpenTofu aims to maintain backward compatibility with Terraform while providing a community-driven, vendor-neutral alternative.</p>
<p>The project quickly gained momentum, achieving its first general availability release (1.6.0) in January 2024, maintaining parity with Terraform 1.6 while adding new features and improvements.</p>
<h3 id="heading-core-architecture-more-similar-than-different"><strong>Core Architecture: More Similar Than Different</strong></h3>
<p>At their core, both Terraform and OpenTofu share the same fundamental architecture because OpenTofu is a fork of Terraform 1.5. Understanding this shared foundation is important before we explore their differences.</p>
<h3 id="heading-the-hcl-configuration-language"><strong>The HCL Configuration Language</strong></h3>
<p>Both tools use HashiCorp Configuration Language (HCL) for defining infrastructure. Here’s a simple example that works identically in both:</p>
<pre><code class="lang-plaintext"># Define an AWS EC2 instance
resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name        = "WebServer"
    Environment = "Production"
    ManagedBy   = "IaC"
  }

  root_block_device {
    volume_size = 20
    volume_type = "gp3"
  }
}

# Create a security group
resource "aws_security_group" "web_sg" {
  name        = "web-server-sg"
  description = "Security group for web server"

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
</code></pre>
<p>This configuration syntax remains identical between the two tools, which means existing Terraform configurations can generally be used with OpenTofu without modification.</p>
<h3 id="heading-state-management"><strong>State Management</strong></h3>
<p>Both tools use a state file to track resources. The state management approach is conceptually identical:</p>
<pre><code class="lang-plaintext"># Backend configuration for remote state
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/infrastructure.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}
</code></pre>
<p>This same backend configuration works in OpenTofu, though OpenTofu has added support for additional backend types and enhanced encryption options.</p>
<h2 id="heading-key-differences-where-they-diverge"><strong>Key Differences: Where They Diverge</strong></h2>
<p>While maintaining compatibility, OpenTofu has introduced several differentiating features and improvements.</p>
<h3 id="heading-1-licensing-and-governance"><strong>1. Licensing and Governance</strong></h3>
<p><strong>Terraform (BSL 1.1):</strong></p>
<ul>
<li><p>Restricts competitive commercial offerings</p>
</li>
<li><p>Four-year delay before converting to MPL 2.0</p>
</li>
<li><p>Controlled by HashiCorp’s business interests</p>
</li>
</ul>
<p><strong>OpenTofu (MPL 2.0):</strong></p>
<ul>
<li><p>Truly open source</p>
</li>
<li><p>Community-governed through the Linux Foundation</p>
</li>
<li><p>No restrictions on commercial use</p>
</li>
<li><p>Transparent decision-making process</p>
</li>
</ul>
<h3 id="heading-2-state-file-encryption"><strong>2. State File Encryption</strong></h3>
<p>One of OpenTofu’s most significant innovations is native state file encryption. While Terraform relies on backend-level encryption (like S3 encryption), OpenTofu provides client-side encryption:</p>
<pre><code class="lang-plaintext"># OpenTofu state encryption configuration
terraform {
  encryption {
    key_provider "pbkdf2" "mykey" {
      passphrase = var.state_passphrase
    }

    method "aes_gcm" "state_encryption" {
      keys = key_provider.pbkdf2.mykey
    }

    state {
      method = method.aes_gcm.state_encryption
    }
  }
}
</code></pre>
<p>This ensures that sensitive data in the state file is encrypted before it leaves your machine, providing an additional layer of security that Terraform doesn’t offer natively.</p>
<h3 id="heading-3-enhanced-testing-framework"><strong>3. Enhanced Testing Framework</strong></h3>
<p>Both Terraform and OpenTofu support infrastructure testing through the <code>test</code> command introduced in Terraform 1.6. Here's an example of a test file that works in both tools:</p>
<pre><code class="lang-plaintext"># Test file: tests/vpc_test.tftest.hcl
variables {
  environment = "test"
  vpc_cidr    = "10.0.0.0/16"
}

run "validate_vpc_creation" {
  command = apply

  assert {
    condition     = aws_vpc.main.cidr_block == var.vpc_cidr
    error_message = "VPC CIDR does not match expected value"
  }

  assert {
    condition     = length(aws_subnet.private) == 3
    error_message = "Expected 3 private subnets"
  }
}

run "validate_internet_gateway" {
  command = plan

  assert {
    condition     = aws_internet_gateway.main.vpc_id == aws_vpc.main.id
    error_message = "Internet gateway not attached to VPC"
  }
}
</code></pre>
<p>Both tools provide similar testing capabilities, making infrastructure testing more accessible to teams.</p>
<h3 id="heading-4-provider-development-and-registry"><strong>4. Provider Development and Registry</strong></h3>
<p><strong>Terraform:</strong></p>
<ul>
<li><p>Uses the official Terraform Registry (<a target="_blank" href="http://registry.terraform.io">registry.terraform.io</a>)</p>
</li>
<li><p>Provider development controlled by HashiCorp</p>
</li>
<li><p>BSL applies to provider development kit</p>
</li>
</ul>
<p><strong>OpenTofu:</strong></p>
<ul>
<li><p>Currently uses the same Terraform Registry providers</p>
</li>
<li><p>Has its own registry infrastructure (<a target="_blank" href="http://registry.opentofu.org">registry.opentofu.org</a>)</p>
</li>
<li><p>Working on provider ecosystem independence</p>
</li>
<li><p>Maintains compatibility with existing providers</p>
</li>
<li><p>Planning to mirror and potentially fork critical providers if needed</p>
</li>
</ul>
<p>Here’s how you declare providers (syntax is identical in both tools):</p>
<pre><code class="lang-plaintext"># Provider declaration works identically in both Terraform and OpenTofu
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~&gt; 5.0"
    }

    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~&gt; 2.23"
    }
  }
}
</code></pre>
<p>Currently, both tools use the same provider sources. OpenTofu’s registry serves as a fallback and metadata repository, ensuring long-term availability even if HashiCorp’s registry policies change.</p>
<h2 id="heading-real-world-comparison-a-practical-example"><strong>Real-World Comparison: A Practical Example</strong></h2>
<p>Let’s examine a complete, real-world scenario: deploying a multi-tier application infrastructure.</p>
<h3 id="heading-complete-infrastructure-module"><strong>Complete Infrastructure Module</strong></h3>
<pre><code class="lang-plaintext"># variables.tf
variable "project_name" {
  description = "Name of the project"
  type        = string
}

variable "environment" {
  description = "Environment name"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod"
  }
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

# main.tf
terraform {
  required_version = "&gt;= 1.6"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~&gt; 5.0"
    }
  }
}

# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.project_name}-${var.environment}-vpc"
    Environment = var.environment
    ManagedBy   = "IaC"
  }
}

# Public Subnets
resource "aws_subnet" "public" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name        = "${var.project_name}-${var.environment}-public-${count.index + 1}"
    Type        = "Public"
    Environment = var.environment
  }
}

# Private Subnets
resource "aws_subnet" "private" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index + 3)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name        = "${var.project_name}-${var.environment}-private-${count.index + 1}"
    Type        = "Private"
    Environment = var.environment
  }
}

# Application Load Balancer
resource "aws_lb" "app" {
  name               = "${var.project_name}-${var.environment}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = var.environment == "prod" ? true : false

  tags = {
    Name        = "${var.project_name}-${var.environment}-alb"
    Environment = var.environment
  }
}

# Auto Scaling Group
resource "aws_autoscaling_group" "app" {
  name                = "${var.project_name}-${var.environment}-asg"
  vpc_zone_identifier = aws_subnet.private[*].id
  target_group_arns   = [aws_lb_target_group.app.arn]
  health_check_type   = "ELB"

  min_size         = var.environment == "prod" ? 3 : 1
  max_size         = var.environment == "prod" ? 10 : 3
  desired_capacity = var.environment == "prod" ? 3 : 1

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "${var.project_name}-${var.environment}-app"
    propagate_at_launch = true
  }
}

# RDS Database
resource "aws_db_instance" "main" {
  identifier     = "${var.project_name}-${var.environment}-db"
  engine         = "postgres"
  engine_version = "15.3"
  instance_class = var.environment == "prod" ? "db.t3.medium" : "db.t3.micro"

  allocated_storage     = var.environment == "prod" ? 100 : 20
  storage_type          = "gp3"
  storage_encrypted     = true

  db_name  = "${var.project_name}_${var.environment}"
  username = "dbadmin"
  password = random_password.db_password.result

  vpc_security_group_ids = [aws_security_group.database.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name

  backup_retention_period = var.environment == "prod" ? 7 : 1
  skip_final_snapshot     = var.environment != "prod"

  tags = {
    Name        = "${var.project_name}-${var.environment}-db"
    Environment = var.environment
  }
}

# outputs.tf
output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "alb_dns_name" {
  description = "DNS name of the load balancer"
  value       = aws_lb.app.dns_name
}

output "database_endpoint" {
  description = "Connection endpoint for the database"
  value       = aws_db_instance.main.endpoint
  sensitive   = true
}
</code></pre>
<p>This module works identically in both Terraform and OpenTofu. However, with OpenTofu, you could add the native encryption layer:</p>
<pre><code class="lang-plaintext"># OpenTofu-specific enhancement
terraform {
  encryption {
    key_provider "aws_kms" "state" {
      kms_key_id = "arn:aws:kms:us-west-2:111122223333:key/1234abcd"
      key_spec   = "AES_256"
    }

    method "aes_gcm" "state_encryption" {
      keys = key_provider.aws_kms.state
    }

    state {
      method = method.aes_gcm.state_encryption
    }
  }
}
</code></pre>
<h2 id="heading-migration-moving-between-terraform-and-opentofu"><strong>Migration: Moving Between Terraform and OpenTofu</strong></h2>
<p>One of the most common questions is how difficult it is to migrate between these tools. The good news: it’s relatively straightforward.</p>
<h3 id="heading-migrating-from-terraform-to-opentofu"><strong>Migrating from Terraform to OpenTofu</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Step 1: Install OpenTofu</span>
<span class="hljs-comment"># Using Homebrew (macOS/Linux)</span>
brew install opentofu

<span class="hljs-comment"># Using package manager (Ubuntu/Debian)</span>
curl --proto <span class="hljs-string">'=https'</span> --tlsv1.2 -fsSL https://get.opentofu.org/install-opentofu.sh | sh

<span class="hljs-comment"># Step 2: Initialize with existing state</span>
<span class="hljs-built_in">cd</span> your-terraform-project
tofu init -upgrade

<span class="hljs-comment"># Step 3: Verify plan</span>
tofu plan

<span class="hljs-comment"># Step 4: Apply (if everything looks good)</span>
tofu apply
</code></pre>
<p>The migration is seamless because OpenTofu maintains state file compatibility. You can switch back to Terraform if needed, though you’d lose OpenTofu-specific features.</p>
<h3 id="heading-migrating-from-opentofu-to-terraform"><strong>Migrating from OpenTofu to Terraform</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Step 1: Remove OpenTofu-specific features</span>
<span class="hljs-comment"># Comment out or remove encryption blocks and OpenTofu-only syntax</span>

<span class="hljs-comment"># Step 2: Initialize with Terraform</span>
terraform init -upgrade

<span class="hljs-comment"># Step 3: Verify and apply</span>
terraform plan
terraform apply
</code></pre>
<h3 id="heading-gradual-migration-strategy"><strong>Gradual Migration Strategy</strong></h3>
<p>For large organizations, a gradual approach might be preferable:</p>
<pre><code class="lang-plaintext"># Module that works with both tools
module "networking" {
  source = "./modules/networking"

  # Use only compatible features
  vpc_cidr    = "10.0.0.0/16"
  environment = var.environment
}

# Conditional encryption (OpenTofu only)
dynamic "encryption" {
  for_each = can(regex("tofu", version.current)) ? [1] : []

  content {
    # OpenTofu-specific encryption config
  }
}
</code></pre>
<h2 id="heading-feature-comparison-matrix"><strong>Feature Comparison Matrix</strong></h2>
<p>Let’s break down the key differences in a structured comparison:</p>
<h3 id="heading-core-functionality"><strong>Core Functionality</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Terraform</strong></td><td><strong>OpenTofu</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>HCL Syntax</td><td>✅</td><td>✅</td><td>Identical</td></tr>
<tr>
<td>State Management</td><td>✅</td><td>✅</td><td>Compatible</td></tr>
<tr>
<td>Provider Ecosystem</td><td>✅</td><td>✅</td><td>OpenTofu working toward independence</td></tr>
<tr>
<td>Module Support</td><td>✅</td><td>✅</td><td>Fully compatible</td></tr>
<tr>
<td>Workspaces</td><td>✅</td><td>✅</td><td>Identical functionality</td></tr>
<tr>
<td>Remote Backends</td><td>✅</td><td>✅</td><td>OpenTofu has additional options</td></tr>
</tbody>
</table>
</div><h3 id="heading-advanced-features"><strong>Advanced Features</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Terraform</strong></td><td><strong>OpenTofu</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>State Encryption</td><td>Backend-level</td><td>Native client-side</td><td>OpenTofu advantage</td></tr>
<tr>
<td>Testing Framework</td><td>Basic</td><td>Enhanced</td><td>OpenTofu has more features</td></tr>
<tr>
<td>For-each with sensitive</td><td>⚠️ Limited</td><td>✅ Full support</td><td>OpenTofu improvement</td></tr>
<tr>
<td>Removed block</td><td>✅ 1.7+</td><td>✅ 1.6+</td><td>OpenTofu implemented first</td></tr>
</tbody>
</table>
</div><h3 id="heading-operational-aspects"><strong>Operational Aspects</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Aspect</strong></td><td><strong>Terraform</strong></td><td><strong>OpenTofu</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>License</td><td>BSL 1.1</td><td>MPL 2.0</td><td>OpenTofu is truly open source</td></tr>
<tr>
<td>Governance</td><td>HashiCorp</td><td>Linux Foundation</td><td>Community vs. corporate</td></tr>
<tr>
<td>Release Cycle</td><td>~4 months</td><td>~2-3 months</td><td>OpenTofu moves faster</td></tr>
<tr>
<td>Performance</td><td>Baseline</td><td>10-30% faster</td><td>For large deployments</td></tr>
<tr>
<td>Cloud Provider Support</td><td>Excellent</td><td>Excellent</td><td>Both well-supported</td></tr>
</tbody>
</table>
</div><h2 id="heading-real-world-use-cases-and-recommendations"><strong>Real-World Use Cases and Recommendations</strong></h2>
<h3 id="heading-when-to-choose-terraform"><strong>When to Choose Terraform</strong></h3>
<p><strong>Scenario 1: HashiCorp Ecosystem Integration</strong></p>
<p>If you’re heavily invested in HashiCorp products (Vault, Consul, Nomad), Terraform might provide smoother integration:</p>
<pre><code class="lang-plaintext"># Using Terraform with HashiCorp Vault
data "vault_generic_secret" "database" {
  path = "secret/database/${var.environment}"
}

resource "aws_db_instance" "main" {
  # ...
  username = data.vault_generic_secret.database.data["username"]
  password = data.vault_generic_secret.database.data["password"]
}
</code></pre>
<p><strong>Scenario 2: Enterprise Support Requirements</strong></p>
<p>Organizations requiring commercial support and SLAs from HashiCorp:</p>
<pre><code class="lang-plaintext"># Terraform Cloud/Enterprise features
terraform {
  cloud {
    organization = "my-company"

    workspaces {
      name = "production-infrastructure"
    }
  }
}
</code></pre>
<p><strong>Scenario 3: Risk-Averse Organizations</strong></p>
<p>Companies that prefer stability over innovation and have legal concerns about switching tools.</p>
<h3 id="heading-when-to-choose-opentofu"><strong>When to Choose OpenTofu</strong></h3>
<p><strong>Scenario 1: Open Source Commitment</strong></p>
<p>Organizations with strong open-source requirements can benefit from OpenTofu’s MPL 2.0 license and community governance model, while still using the same provider ecosystem.</p>
<p><strong>Scenario 2: Security-First Environments</strong></p>
<p>When client-side encryption is a requirement:</p>
<pre><code class="lang-plaintext"># OpenTofu state encryption for compliance
terraform {
  encryption {
    key_provider "pbkdf2" "compliance" {
      passphrase = var.encryption_key
      key_length = 32
      iterations = 600000
    }

    method "aes_gcm" "state" {
      keys = key_provider.pbkdf2.compliance
    }

    state {
      method = method.aes_gcm.state
    }

    plan {
      method = method.aes_gcm.state
    }
  }
}
</code></pre>
<p><strong>Scenario 3: Community-Driven Innovation</strong></p>
<p>Organizations that want to influence tool development:</p>
<pre><code class="lang-plaintext"># Using cutting-edge OpenTofu features
terraform {
  required_version = "~&gt; 1.7"

  experiments = [
    early_evaluation
  ]
}
</code></pre>
<p><strong>Scenario 4: Cost Optimization</strong></p>
<p>Teams looking to avoid potential future licensing costs:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># No vendor lock-in concerns</span>
tofu init
tofu apply
<span class="hljs-comment"># Free forever, no upgrade pressure</span>
</code></pre>
<h2 id="heading-community-and-ecosystem"><strong>Community and Ecosystem</strong></h2>
<h3 id="heading-community-support"><strong>Community Support</strong></h3>
<p><strong>Terraform:</strong></p>
<ul>
<li><p>Larger existing community (established 2014)</p>
</li>
<li><p>More Stack Overflow questions and answers</p>
</li>
<li><p>Extensive documentation and tutorials</p>
</li>
<li><p>HashiCorp-led conferences and events</p>
</li>
</ul>
<p><strong>OpenTofu:</strong></p>
<ul>
<li><p>Rapidly growing community</p>
</li>
<li><p>Active GitHub discussions and contributions</p>
</li>
<li><p>Linux Foundation backing</p>
</li>
<li><p>More transparent governance process</p>
</li>
</ul>
<h3 id="heading-provider-availability"><strong>Provider Availability</strong></h3>
<p>Currently, both tools use the same provider ecosystem from the Terraform Registry:</p>
<pre><code class="lang-plaintext"># Provider configuration works identically in both tools
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~&gt; 5.0"
    }

    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~&gt; 2.23"
    }

    google = {
      source  = "hashicorp/google"
      version = "~&gt; 5.0"
    }
  }
}
</code></pre>
<p>OpenTofu maintains its own registry (<a target="_blank" href="http://registry.opentofu.org">registry.opentofu.org</a>) that mirrors the Terraform Registry, ensuring long-term provider availability. The OpenTofu community is also working on a strategy to fork and maintain critical providers if HashiCorp’s licensing changes make this necessary, though currently all providers work seamlessly with both tools.</p>
<h2 id="heading-future-outlook"><strong>Future Outlook</strong></h2>
<h3 id="heading-terraforms-direction"><strong>Terraform’s Direction</strong></h3>
<p>HashiCorp is focusing on:</p>
<ul>
<li><p>Enterprise features and Terraform Cloud</p>
</li>
<li><p>Improved CDKTF (Cloud Development Kit for Terraform)</p>
</li>
<li><p>Enhanced policy and governance</p>
</li>
<li><p>AI-assisted infrastructure coding</p>
</li>
</ul>
<h3 id="heading-opentofus-roadmap"><strong>OpenTofu’s Roadmap</strong></h3>
<p>The OpenTofu project is prioritizing:</p>
<ul>
<li><p>Complete provider registry independence</p>
</li>
<li><p>Enhanced testing and validation features</p>
</li>
<li><p>Improved performance optimizations</p>
</li>
<li><p>Community-driven feature development</p>
</li>
<li><p>Better integration with CI/CD pipelines</p>
</li>
</ul>
<h2 id="heading-decision-framework"><strong>Decision Framework</strong></h2>
<p>Here’s a practical framework for choosing between these tools:</p>
<h3 id="heading-decision-tree"><strong>Decision Tree</strong></h3>
<pre><code class="lang-markdown">Start Here
<span class="hljs-code">    |
    ├─ Do you require open source licensing?
    │   ├─ Yes → Consider OpenTofu
    │   └─ No → Continue
    |
    ├─ Do you need HashiCorp enterprise support?
    │   ├─ Yes → Choose Terraform
    │   └─ No → Continue
    |
    ├─ Is state file encryption critical?
    │   ├─ Yes → OpenTofu has advantage
    │   └─ No → Continue
    |
    ├─ Do you want community governance?
    │   ├─ Yes → Choose OpenTofu
    │   └─ No → Either works
    |
    └─ Default → Both are excellent choices</span>
</code></pre>
<h3 id="heading-evaluation-checklist"><strong>Evaluation Checklist</strong></h3>
<p>For organizations evaluating their options:</p>
<p><strong>Technical Requirements:</strong></p>
<ul>
<li><p>Current Terraform version compatibility</p>
</li>
<li><p>Provider availability and versions needed</p>
</li>
<li><p>State management requirements</p>
</li>
<li><p>Encryption and security needs</p>
</li>
<li><p>Testing framework needs</p>
</li>
<li><p>Specific feature requirements (e.g., state encryption, provider functions)</p>
</li>
</ul>
<p><strong>Organizational Factors:</strong></p>
<ul>
<li><p>Open source policy compliance</p>
</li>
<li><p>Budget for commercial support</p>
</li>
<li><p>Risk tolerance for tool switching</p>
</li>
<li><p>Team expertise and training needs</p>
</li>
<li><p>Long-term strategic alignment</p>
</li>
<li><p>Community vs. vendor relationship preference</p>
</li>
</ul>
<p><strong>Operational Considerations:</strong></p>
<ul>
<li><p>CI/CD pipeline integration</p>
</li>
<li><p>Existing toolchain compatibility</p>
</li>
<li><p>Migration complexity and effort</p>
</li>
<li><p>Backup and disaster recovery processes</p>
</li>
<li><p>Compliance and audit requirements</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Both Terraform and OpenTofu are powerful, production-ready tools for infrastructure as code. The choice between them ultimately depends on your organization’s priorities:</p>
<p><strong>Choose Terraform if:</strong></p>
<ul>
<li><p>You require HashiCorp’s commercial support and enterprise features</p>
</li>
<li><p>You’re deeply integrated with the HashiCorp ecosystem</p>
</li>
<li><p>You prefer the stability of an established, well-funded vendor</p>
</li>
<li><p>Your organization is risk-averse about tool changes</p>
</li>
</ul>
<p><strong>Choose OpenTofu if:</strong></p>
<ul>
<li><p>Open source licensing is a requirement or strong preference</p>
</li>
<li><p>You want community-driven governance and development</p>
</li>
<li><p>Client-side state encryption is important for your security model</p>
</li>
<li><p>You value independence from vendor licensing changes</p>
</li>
<li><p>You want to support and influence community-driven infrastructure tooling</p>
</li>
</ul>
<p><strong>The Good News:</strong> Regardless of your choice, both tools:</p>
<ul>
<li><p>Use the same HCL syntax</p>
</li>
<li><p>Maintain state file compatibility</p>
</li>
<li><p>Support the same providers (for now)</p>
</li>
<li><p>Allow relatively easy migration between them</p>
</li>
</ul>
<p>For most teams, OpenTofu represents a safe, forward-looking choice that embraces open source principles while maintaining compatibility with the Terraform ecosystem. However, organizations with specific enterprise needs or HashiCorp relationships may find Terraform continues to serve them well.</p>
<p>The infrastructure as code landscape is healthier for having both options. Competition drives innovation, and the existence of OpenTofu has already influenced Terraform’s development priorities. As users, we benefit from this diversity.</p>
<p>Ultimately, the best approach is to evaluate both tools against your specific requirements, run proof-of-concept projects, and make an informed decision based on your organization’s unique needs. Both paths lead to effective infrastructure automation — the question is which better aligns with your values, requirements, and long-term strategy.</p>
<p>What’s your experience with Terraform and OpenTofu? Have you made the switch, or are you staying with Terraform? Share your thoughts and experiences in the comments below.</p>
]]></content:encoded></item><item><title><![CDATA[Building Your First GitHub Custom Action: A Step-by-Step Guide]]></title><description><![CDATA[Why Custom Actions Matter
If you’re using GitHub Actions for CI/CD, you’ve probably noticed yourself writing the same workflow steps over and over. Maybe you’re always checking PR sizes, validating commit messages, or posting notifications to Slack. ...]]></description><link>https://devops-blog.ruicoelho.dev/building-your-first-github-custom-action-a-step-by-step-guide</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/building-your-first-github-custom-action-a-step-by-step-guide</guid><category><![CDATA[GitHub]]></category><category><![CDATA[GitHub Actions]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[CI/CD]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Sun, 25 Jan 2026 12:56:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769345676188/fbc5895f-01bf-414a-81df-beb9b61f9827.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-why-custom-actions-matter"><strong>Why Custom Actions Matter</strong></h2>
<p>If you’re using GitHub Actions for CI/CD, you’ve probably noticed yourself writing the same workflow steps over and over. Maybe you’re always checking PR sizes, validating commit messages, or posting notifications to Slack. This repetition is a perfect opportunity for a custom action.</p>
<p>In this guide, I’ll walk you through creating a practical GitHub Custom Action from scratch: a <strong>PR Size Checker</strong> that automatically labels pull requests based on their size and suggests splitting large PRs.</p>
<p>By the end of this article, you’ll understand:</p>
<ul>
<li><p>The anatomy of a GitHub Action</p>
</li>
<li><p>How to build one using JavaScript</p>
</li>
<li><p>How to bundle and publish your action</p>
</li>
<li><p>Best practices for versioning and automation</p>
</li>
</ul>
<h2 id="heading-what-were-building"><strong>What We’re Building</strong></h2>
<p>Our PR Size Checker will:</p>
<ul>
<li><p>Calculate total lines changed in a pull request</p>
</li>
<li><p>Apply labels: <code>small</code>, <code>medium</code>, <code>large</code>, or <code>extra-large</code></p>
</li>
<li><p>Automatically create these labels if they don’t exist</p>
</li>
<li><p>Comment on oversized PRs suggesting they be split</p>
</li>
<li><p>Be configurable with custom thresholds</p>
</li>
</ul>
<p>This solves a real problem: large PRs slow down code reviews and increase the chance of bugs slipping through. Automated labeling helps teams prioritize reviews and encourages better practices.</p>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>Before we start, you’ll need:</p>
<ul>
<li><p>Node.js 20 or higher</p>
</li>
<li><p>A GitHub account</p>
</li>
<li><p>Basic knowledge of JavaScript and GitHub Actions</p>
</li>
<li><p>A repository where you can test your action</p>
</li>
</ul>
<h2 id="heading-project-structure"><strong>Project Structure</strong></h2>
<p>Here’s what our final project will look like:</p>
<pre><code class="lang-markdown">github-custom-action-examples/
├── action.yml              # Action metadata
├── index.js               # Main logic (source code)
├── dist/
│   └── index.js          # Bundled code (commit this!)
├── package.json          # Dependencies
├── package-lock.json
├── README.md
└── .github/
<span class="hljs-code">    └── workflows/
        ├── release.yml                # Automated releases
        └── pr-size-check.yml          # Example usage</span>
</code></pre>
<p>The key thing to understand: we write code in <code>index.js</code>, but GitHub Actions runs <code>dist/index.js</code> (the bundled version). More on that later.</p>
<h2 id="heading-step-1-setting-up-the-project"><strong>Step 1: Setting Up the Project</strong></h2>
<p>Create a new repository and initialize it:</p>
<pre><code class="lang-bash">mkdir pr-size-checker
<span class="hljs-built_in">cd</span> pr-size-checker
npm init -y
</code></pre>
<p>Install the required dependencies:</p>
<pre><code class="lang-bash">npm install @actions/core @actions/github
npm install --save-dev @vercel/ncc
</code></pre>
<p><strong>What are these packages?</strong></p>
<ul>
<li><p><code>@actions/core</code>: Provides functions for inputs, outputs, and logging</p>
</li>
<li><p><code>@actions/github</code>: Gives access to GitHub API and webhook payload</p>
</li>
<li><p><code>@vercel/ncc</code>: Bundles your code and dependencies into a single file</p>
</li>
</ul>
<p>Update your <code>package.json</code> with build scripts:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"build"</span>: <span class="hljs-string">"ncc build index.js -o dist"</span>
  }
}
</code></pre>
<h2 id="heading-step-2-creating-the-action-metadata"><strong>Step 2: Creating the Action Metadata</strong></h2>
<p>The <code>action.yml</code> file is your action's configuration. It defines inputs, outputs, and how to run the action.</p>
<p>Create <code>action.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">'PR Size Checker'</span>
<span class="hljs-attr">description:</span> <span class="hljs-string">'Automatically checks Pull Request size and adds appropriate labels'</span>
<span class="hljs-attr">author:</span> <span class="hljs-string">'AutomationDojo'</span>

<span class="hljs-attr">branding:</span>
  <span class="hljs-attr">icon:</span> <span class="hljs-string">'git-pull-request'</span>
  <span class="hljs-attr">color:</span> <span class="hljs-string">'blue'</span>

<span class="hljs-attr">inputs:</span>
  <span class="hljs-attr">github-token:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'GitHub token for API calls'</span>
    <span class="hljs-attr">required:</span> <span class="hljs-literal">true</span>

  <span class="hljs-attr">small-threshold:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'Maximum lines changed for a small PR'</span>
    <span class="hljs-attr">required:</span> <span class="hljs-literal">false</span>
    <span class="hljs-attr">default:</span> <span class="hljs-string">'100'</span>

  <span class="hljs-attr">medium-threshold:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'Maximum lines changed for a medium PR'</span>
    <span class="hljs-attr">required:</span> <span class="hljs-literal">false</span>
    <span class="hljs-attr">default:</span> <span class="hljs-string">'300'</span>

  <span class="hljs-attr">large-threshold:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'Maximum lines changed for a large PR'</span>
    <span class="hljs-attr">required:</span> <span class="hljs-literal">false</span>
    <span class="hljs-attr">default:</span> <span class="hljs-string">'600'</span>

  <span class="hljs-attr">comment-on-large:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'Whether to comment on large PRs'</span>
    <span class="hljs-attr">required:</span> <span class="hljs-literal">false</span>
    <span class="hljs-attr">default:</span> <span class="hljs-string">'true'</span>

<span class="hljs-attr">outputs:</span>
  <span class="hljs-attr">size-label:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'Label applied to the PR (small, medium, large, extra-large)'</span>

  <span class="hljs-attr">lines-changed:</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">'Total number of lines changed'</span>

<span class="hljs-attr">runs:</span>
  <span class="hljs-attr">using:</span> <span class="hljs-string">'node20'</span>
  <span class="hljs-attr">main:</span> <span class="hljs-string">'dist/index.js'</span>
</code></pre>
<p><strong>Key points:</strong></p>
<ul>
<li><p><code>inputs</code>: Parameters users can configure</p>
</li>
<li><p><code>outputs</code>: Values your action returns (useful for chaining actions)</p>
</li>
<li><p><code>runs.main</code>: Points to the bundled file, not the source</p>
</li>
<li><p><code>branding</code>: How your action appears in the GitHub Marketplace</p>
</li>
</ul>
<h2 id="heading-step-3-writing-the-action-logic"><strong>Step 3: Writing the Action Logic</strong></h2>
<p>Now for the core functionality. Create <code>index.js</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> core = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@actions/core'</span>);
<span class="hljs-keyword">const</span> github = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@actions/github'</span>);

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">run</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-comment">// Get inputs</span>
    <span class="hljs-keyword">const</span> token = core.getInput(<span class="hljs-string">'github-token'</span>, { <span class="hljs-attr">required</span>: <span class="hljs-literal">true</span> });
    <span class="hljs-keyword">const</span> smallThreshold = <span class="hljs-built_in">parseInt</span>(core.getInput(<span class="hljs-string">'small-threshold'</span>));
    <span class="hljs-keyword">const</span> mediumThreshold = <span class="hljs-built_in">parseInt</span>(core.getInput(<span class="hljs-string">'medium-threshold'</span>));
    <span class="hljs-keyword">const</span> largeThreshold = <span class="hljs-built_in">parseInt</span>(core.getInput(<span class="hljs-string">'large-threshold'</span>));
    <span class="hljs-keyword">const</span> commentOnLarge = core.getInput(<span class="hljs-string">'comment-on-large'</span>) === <span class="hljs-string">'true'</span>;

    <span class="hljs-comment">// Initialize GitHub client</span>
    <span class="hljs-keyword">const</span> octokit = github.getOctokit(token);
    <span class="hljs-keyword">const</span> context = github.context;

    <span class="hljs-comment">// Ensure this is a pull request event</span>
    <span class="hljs-keyword">if</span> (!context.payload.pull_request) {
      core.setFailed(<span class="hljs-string">'This action only works on pull_request events'</span>);
      <span class="hljs-keyword">return</span>;
    }

    <span class="hljs-keyword">const</span> pr = context.payload.pull_request;
    <span class="hljs-keyword">const</span> owner = context.repo.owner;
    <span class="hljs-keyword">const</span> repo = context.repo.repo;
    <span class="hljs-keyword">const</span> prNumber = pr.number;

    <span class="hljs-comment">// Calculate total lines changed</span>
    <span class="hljs-keyword">const</span> additions = pr.additions || <span class="hljs-number">0</span>;
    <span class="hljs-keyword">const</span> deletions = pr.deletions || <span class="hljs-number">0</span>;
    <span class="hljs-keyword">const</span> totalChanges = additions + deletions;

    core.info(<span class="hljs-string">`PR #<span class="hljs-subst">${prNumber}</span> has <span class="hljs-subst">${totalChanges}</span> lines changed`</span>);
    core.info(<span class="hljs-string">`Additions: <span class="hljs-subst">${additions}</span>, Deletions: <span class="hljs-subst">${deletions}</span>`</span>);

    <span class="hljs-comment">// Determine size label</span>
    <span class="hljs-keyword">let</span> sizeLabel;
    <span class="hljs-keyword">if</span> (totalChanges &lt;= smallThreshold) {
      sizeLabel = <span class="hljs-string">'small'</span>;
    } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (totalChanges &lt;= mediumThreshold) {
      sizeLabel = <span class="hljs-string">'medium'</span>;
    } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (totalChanges &lt;= largeThreshold) {
      sizeLabel = <span class="hljs-string">'large'</span>;
    } <span class="hljs-keyword">else</span> {
      sizeLabel = <span class="hljs-string">'extra-large'</span>;
    }

    core.info(<span class="hljs-string">`Size determined: <span class="hljs-subst">${sizeLabel}</span>`</span>);

    <span class="hljs-comment">// Define label configurations</span>
    <span class="hljs-keyword">const</span> labelConfigs = {
      <span class="hljs-string">'small'</span>: { <span class="hljs-attr">color</span>: <span class="hljs-string">'0e8a16'</span>, <span class="hljs-attr">description</span>: <span class="hljs-string">'Small PR, easy to review'</span> },
      <span class="hljs-string">'medium'</span>: { <span class="hljs-attr">color</span>: <span class="hljs-string">'fbca04'</span>, <span class="hljs-attr">description</span>: <span class="hljs-string">'Medium-sized PR'</span> },
      <span class="hljs-string">'large'</span>: { <span class="hljs-attr">color</span>: <span class="hljs-string">'e99695'</span>, <span class="hljs-attr">description</span>: <span class="hljs-string">'Large PR, consider splitting'</span> },
      <span class="hljs-string">'extra-large'</span>: { <span class="hljs-attr">color</span>: <span class="hljs-string">'d93f0b'</span>, <span class="hljs-attr">description</span>: <span class="hljs-string">'Very large PR, splitting recommended'</span> }
    };

    <span class="hljs-comment">// Ensure all size labels exist</span>
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> [labelName, config] <span class="hljs-keyword">of</span> <span class="hljs-built_in">Object</span>.entries(labelConfigs)) {
      <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">await</span> octokit.rest.issues.createLabel({
          owner,
          repo,
          <span class="hljs-attr">name</span>: labelName,
          <span class="hljs-attr">color</span>: config.color,
          <span class="hljs-attr">description</span>: config.description
        });
        core.info(<span class="hljs-string">`Created label: <span class="hljs-subst">${labelName}</span>`</span>);
      } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-keyword">if</span> (error.status === <span class="hljs-number">422</span>) {
          core.info(<span class="hljs-string">`Label <span class="hljs-subst">${labelName}</span> already exists`</span>);
        } <span class="hljs-keyword">else</span> {
          <span class="hljs-keyword">throw</span> error;
        }
      }
    }

    <span class="hljs-comment">// Get current labels</span>
    <span class="hljs-keyword">const</span> { <span class="hljs-attr">data</span>: currentLabels } = <span class="hljs-keyword">await</span> octokit.rest.issues.listLabelsOnIssue({
      owner,
      repo,
      <span class="hljs-attr">issue_number</span>: prNumber
    });

    <span class="hljs-comment">// Remove old size labels</span>
    <span class="hljs-keyword">const</span> sizeLabels = [<span class="hljs-string">'small'</span>, <span class="hljs-string">'medium'</span>, <span class="hljs-string">'large'</span>, <span class="hljs-string">'extra-large'</span>];
    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> label <span class="hljs-keyword">of</span> currentLabels) {
      <span class="hljs-keyword">if</span> (sizeLabels.includes(label.name) &amp;&amp; label.name !== sizeLabel) {
        <span class="hljs-keyword">await</span> octokit.rest.issues.removeLabel({
          owner,
          repo,
          <span class="hljs-attr">issue_number</span>: prNumber,
          <span class="hljs-attr">name</span>: label.name
        });
        core.info(<span class="hljs-string">`Removed old label: <span class="hljs-subst">${label.name}</span>`</span>);
      }
    }

    <span class="hljs-comment">// Add new size label</span>
    <span class="hljs-keyword">await</span> octokit.rest.issues.addLabels({
      owner,
      repo,
      <span class="hljs-attr">issue_number</span>: prNumber,
      <span class="hljs-attr">labels</span>: [sizeLabel]
    });
    core.info(<span class="hljs-string">`Added label: <span class="hljs-subst">${sizeLabel}</span>`</span>);

    <span class="hljs-comment">// Comment on large PRs</span>
    <span class="hljs-keyword">if</span> (commentOnLarge &amp;&amp; (sizeLabel === <span class="hljs-string">'large'</span> || sizeLabel === <span class="hljs-string">'extra-large'</span>)) {
      <span class="hljs-keyword">const</span> commentBody = <span class="hljs-string">`⚠️ **Large Pull Request Detected**

This PR has **<span class="hljs-subst">${totalChanges}</span> lines changed**. Large PRs can be difficult to review thoroughly and may slow down the development process.

**Consider:**
- Breaking this PR into smaller, focused changes
- Each PR should ideally address a single concern
- Smaller PRs are easier to review, test, and merge

If this PR must remain large, please ensure it has:
- ✅ Comprehensive description
- ✅ Clear testing instructions
- ✅ Appropriate documentation updates`</span>;

      <span class="hljs-comment">// Check if we already commented</span>
      <span class="hljs-keyword">const</span> { <span class="hljs-attr">data</span>: comments } = <span class="hljs-keyword">await</span> octokit.rest.issues.listComments({
        owner,
        repo,
        <span class="hljs-attr">issue_number</span>: prNumber
      });

      <span class="hljs-keyword">const</span> botComment = comments.find(
        <span class="hljs-function"><span class="hljs-params">comment</span> =&gt;</span> comment.user.type === <span class="hljs-string">'Bot'</span> &amp;&amp; 
                   comment.body.includes(<span class="hljs-string">'Large Pull Request Detected'</span>)
      );

      <span class="hljs-keyword">if</span> (!botComment) {
        <span class="hljs-keyword">await</span> octokit.rest.issues.createComment({
          owner,
          repo,
          <span class="hljs-attr">issue_number</span>: prNumber,
          <span class="hljs-attr">body</span>: commentBody
        });
        core.info(<span class="hljs-string">'Added comment suggesting PR split'</span>);
      } <span class="hljs-keyword">else</span> {
        core.info(<span class="hljs-string">'Comment already exists, skipping'</span>);
      }
    }

    <span class="hljs-comment">// Set outputs</span>
    core.setOutput(<span class="hljs-string">'size-label'</span>, sizeLabel);
    core.setOutput(<span class="hljs-string">'lines-changed'</span>, totalChanges);

    core.info(<span class="hljs-string">`✅ Successfully processed PR #<span class="hljs-subst">${prNumber}</span>`</span>);

  } <span class="hljs-keyword">catch</span> (error) {
    core.setFailed(<span class="hljs-string">`Action failed: <span class="hljs-subst">${error.message}</span>`</span>);
  }
}

run();
</code></pre>
<p><strong>Let’s break down the key parts:</strong></p>
<p><strong>Reading Inputs</strong></p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> token = core.getInput(<span class="hljs-string">'github-token'</span>, { <span class="hljs-attr">required</span>: <span class="hljs-literal">true</span> });
<span class="hljs-keyword">const</span> smallThreshold = <span class="hljs-built_in">parseInt</span>(core.getInput(<span class="hljs-string">'small-threshold'</span>));
</code></pre>
<p>The <code>core.getInput()</code> function reads values from the workflow file. Users can override defaults you set in <code>action.yml</code>.</p>
<p><strong>Accessing GitHub Context</strong></p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> octokit = github.getOctokit(token);
<span class="hljs-keyword">const</span> context = github.context;
<span class="hljs-keyword">const</span> pr = context.payload.pull_request;
</code></pre>
<p>The <code>github.context</code> object contains information about the workflow run, including the pull request payload with additions, deletions, and other metadata.</p>
<p><strong>Creating Labels</strong></p>
<pre><code class="lang-javascript"><span class="hljs-keyword">await</span> octokit.rest.issues.createLabel({
  owner,
  repo,
  <span class="hljs-attr">name</span>: labelName,
  <span class="hljs-attr">color</span>: config.color,
  <span class="hljs-attr">description</span>: config.description
});
</code></pre>
<p>We create labels if they don’t exist. The try-catch handles the case where they already exist (422 error).</p>
<p><strong>Managing Labels</strong></p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Remove old labels</span>
<span class="hljs-keyword">await</span> octokit.rest.issues.removeLabel({...});

<span class="hljs-comment">// Add new label</span>
<span class="hljs-keyword">await</span> octokit.rest.issues.addLabels({...});
</code></pre>
<p>We remove old size labels before adding the new one to keep things clean.</p>
<p><strong>Setting Outputs</strong></p>
<pre><code class="lang-javascript">core.setOutput(<span class="hljs-string">'size-label'</span>, sizeLabel);
core.setOutput(<span class="hljs-string">'lines-changed'</span>, totalChanges);
</code></pre>
<p>Outputs allow other workflow steps to use your action’s results.</p>
<h2 id="heading-step-4-bundling-your-code"><strong>Step 4: Bundling Your Code</strong></h2>
<p>GitHub Actions doesn’t install <code>node_modules</code> for you. You need to bundle everything into a single file using <code>@vercel/ncc</code>:</p>
<pre><code class="lang-bash">npm run build
</code></pre>
<p>This creates <code>dist/index.js</code> containing your code and all dependencies. <strong>You must commit this file</strong> to your repository.</p>
<p>Add to <code>.gitignore</code>:</p>
<pre><code class="lang-bash">node_modules/
</code></pre>
<p>But <strong>don’t</strong> ignore <code>dist/</code>—GitHub needs it to run your action.</p>
<h2 id="heading-step-5-using-your-action"><strong>Step 5: Using Your Action</strong></h2>
<p>Create <code>.github/workflows/pr-size-check.yml</code> to test your action:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">PR</span> <span class="hljs-string">Size</span> <span class="hljs-string">Check</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">pull_request:</span>
    <span class="hljs-attr">types:</span> [<span class="hljs-string">opened</span>, <span class="hljs-string">synchronize</span>, <span class="hljs-string">reopened</span>]

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">check-pr-size:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Check</span> <span class="hljs-string">PR</span> <span class="hljs-string">Size</span>

    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Check</span> <span class="hljs-string">PR</span> <span class="hljs-string">Size</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">./</span>  <span class="hljs-comment"># Use local action for testing</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">github-token:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.GITHUB_TOKEN</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">small-threshold:</span> <span class="hljs-number">100</span>
          <span class="hljs-attr">medium-threshold:</span> <span class="hljs-number">300</span>
          <span class="hljs-attr">large-threshold:</span> <span class="hljs-number">600</span>
          <span class="hljs-attr">comment-on-large:</span> <span class="hljs-literal">true</span>
</code></pre>
<p><strong>For local testing:</strong></p>
<ul>
<li><p><code>uses: ./</code> runs the action from the current repository</p>
</li>
<li><p>Perfect for development and testing</p>
</li>
</ul>
<p><strong>For production use:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">uses:</span> <span class="hljs-string">AutomationDojo/github-custom-action-examples@v1.1.0</span>
</code></pre>
<h2 id="heading-step-6-versioning-and-releases"><strong>Step 6: Versioning and Releases</strong></h2>
<p>Managing versions manually is tedious. Let’s automate it with <strong>Semantic Release</strong>.</p>
<p>Install Semantic Release:</p>
<pre><code class="lang-bash">npm install --save-dev semantic-release @semantic-release/changelog @semantic-release/git
</code></pre>
<p>Create <code>.releaserc.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">branches:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>

<span class="hljs-attr">plugins:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">'@semantic-release/commit-analyzer'</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">'@semantic-release/release-notes-generator'</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">'@semantic-release/changelog'</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">'@semantic-release/npm'</span>
  <span class="hljs-bullet">-</span> <span class="hljs-bullet">-</span> <span class="hljs-string">'@semantic-release/git'</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">assets:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">CHANGELOG.md</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">package.json</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">package-lock.json</span>
      <span class="hljs-attr">message:</span> <span class="hljs-string">'chore(release): ${nextRelease.version} [skip ci]\n\n${nextRelease.notes}'</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">'@semantic-release/github'</span>
</code></pre>
<p>Create <code>.github/workflows/release.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Release</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">release:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">fetch-depth:</span> <span class="hljs-number">0</span>
          <span class="hljs-attr">persist-credentials:</span> <span class="hljs-literal">false</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Setup</span> <span class="hljs-string">Node.js</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/setup-node@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">node-version:</span> <span class="hljs-string">'20'</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">dependencies</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">ci</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">run</span> <span class="hljs-string">build</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Release</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">GITHUB_TOKEN:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.GITHUB_TOKEN</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npx</span> <span class="hljs-string">semantic-release</span>
</code></pre>
<p>Now use <strong>Conventional Commits</strong> format:</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"feat: add support for custom label colors"</span>
git commit -m <span class="hljs-string">"fix: resolve issue with label removal"</span>
git commit -m <span class="hljs-string">"docs: update README with examples"</span>
</code></pre>
<p>When you push to <code>main</code>, Semantic Release:</p>
<ol>
<li><p>Analyzes commits to determine version bump</p>
</li>
<li><p>Generates CHANGELOG.md</p>
</li>
<li><p>Creates a GitHub release</p>
</li>
<li><p>Updates package.json</p>
</li>
</ol>
<h2 id="heading-step-7-writing-good-documentation"><strong>Step 7: Writing Good Documentation</strong></h2>
<p>Your README should include:</p>
<pre><code class="lang-markdown"><span class="hljs-section"># PR Size Checker</span>

A GitHub Action that automatically labels PRs based on size.

<span class="hljs-section">## Usage</span>

\<span class="hljs-code">`\`</span>\`yaml
<span class="hljs-bullet">-</span> uses: AutomationDojo/github-custom-action-examples@v1.1.0
  with:
<span class="hljs-code">    github-token: ${{ secrets.GITHUB_TOKEN }}
    small-threshold: 100
\`\`\`
</span>
<span class="hljs-section">## Inputs</span>

| Input | Description | Required | Default |
|-------|-------------|----------|---------|
| github-token | GitHub token | Yes | - |
| small-threshold | Max lines for small PR | No | 100 |

<span class="hljs-section">## Outputs</span>

| Output | Description |
|--------|-------------|
| size-label | Applied label |
| lines-changed | Total lines changed |

<span class="hljs-section">## Example</span>

\<span class="hljs-code">`\`</span>\`yaml
<span class="hljs-bullet">-</span> id: check-size
  uses: AutomationDojo/github-custom-action-examples@v1.1.0
  with:
<span class="hljs-code">    github-token: ${{ secrets.GITHUB_TOKEN }}
</span>
<span class="hljs-bullet">-</span> run: echo "Size: ${{ steps.check-size.outputs.size-label }}"
\<span class="hljs-code">`\`</span>\`
</code></pre>
<h2 id="heading-examples"><strong>Examples</strong></h2>
<p>You can see this action working on the repo:</p>
<p>Press enter or click to view image in full size</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*SOTGdRHQ_XoWkhKhH7EBaw.png" alt /></p>
<p>You can check the following pull request: <a target="_blank" href="https://github.com/AutomationDojo/github-custom-action-examples/pull/1">https://github.com/AutomationDojo/github-custom-action-examples/pull/1</a></p>
<h2 id="heading-best-practices-i-learned"><strong>Best Practices I Learned</strong></h2>
<h3 id="heading-1-always-bundle-your-code"><strong>1. Always Bundle Your Code</strong></h3>
<p>Don’t rely on <code>npm install</code> during action execution. Bundle with <code>ncc</code> and commit <code>dist/</code>.</p>
<h3 id="heading-2-use-semantic-versioning"><strong>2. Use Semantic Versioning</strong></h3>
<p>Users should be able to pin to <code>@v1</code> for automatic updates or <code>@v1.1.0</code> for stability.</p>
<h3 id="heading-3-validate-inputs-early"><strong>3. Validate Inputs Early</strong></h3>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> (!context.payload.pull_request) {
  core.setFailed(<span class="hljs-string">'This action only works on pull_request events'</span>);
  <span class="hljs-keyword">return</span>;
}
</code></pre>
<h3 id="heading-4-provide-useful-logging"><strong>4. Provide Useful Logging</strong></h3>
<pre><code class="lang-javascript">core.info(<span class="hljs-string">`PR #<span class="hljs-subst">${prNumber}</span> has <span class="hljs-subst">${totalChanges}</span> lines changed`</span>);
</code></pre>
<p>Users can see this in their workflow logs for debugging.</p>
<h3 id="heading-5-handle-errors-gracefully"><strong>5. Handle Errors Gracefully</strong></h3>
<pre><code class="lang-javascript"><span class="hljs-keyword">try</span> {
  <span class="hljs-comment">// Create label</span>
} <span class="hljs-keyword">catch</span> (error) {
  <span class="hljs-keyword">if</span> (error.status === <span class="hljs-number">422</span>) {
    core.info(<span class="hljs-string">'Label already exists'</span>);
  } <span class="hljs-keyword">else</span> {
    <span class="hljs-keyword">throw</span> error;
  }
}
</code></pre>
<h3 id="heading-6-make-everything-configurable"><strong>6. Make Everything Configurable</strong></h3>
<p>Don’t hardcode values. Use inputs with sensible defaults.</p>
<h3 id="heading-7-test-locally-first"><strong>7. Test Locally First</strong></h3>
<p>Use <code>uses: ./</code> in a workflow within your action's repository before publishing.</p>
<h2 id="heading-common-pitfalls-to-avoid"><strong>Common Pitfalls to Avoid</strong></h2>
<h3 id="heading-1-forgetting-to-build"><strong>1. Forgetting to Build</strong></h3>
<p>Always run <code>npm run build</code> before committing. GitHub Actions runs <code>dist/index.js</code>, not your source code.</p>
<h3 id="heading-2-not-committing-dist"><strong>2. Not Committing dist/</strong></h3>
<p>The <code>dist/</code> folder must be in your repository. Don't add it to <code>.gitignore</code>.</p>
<h3 id="heading-3-wrong-node-version"><strong>3. Wrong Node Version</strong></h3>
<p>Specify <code>node20</code> in <code>action.yml</code> and use it consistently.</p>
<h3 id="heading-4-missing-permissions"><strong>4. Missing Permissions</strong></h3>
<p>Ensure <code>github-token</code> has the required permissions. For most actions, <code>${{ secrets.GITHUB_TOKEN }}</code> works fine.</p>
<h3 id="heading-5-not-handling-edge-cases"><strong>5. Not Handling Edge Cases</strong></h3>
<p>What if the PR has 0 changes? What if labels already exist? Handle all scenarios.</p>
<h2 id="heading-taking-it-further"><strong>Taking It Further</strong></h2>
<p>Now that you have a working action, consider:</p>
<p><strong>Adding Tests:</strong></p>
<pre><code class="lang-bash">npm install --save-dev jest @types/node
</code></pre>
<p>Local Testing with act:</p>
<pre><code class="lang-bash">brew install act
act pull_request
</code></pre>
<p><strong>Multiple Actions in One Repo:</strong> Create subdirectories for different actions with their own <code>action.yml</code> files.</p>
<p><strong>Publishing to Marketplace:</strong> Add topics to your repository and make it public. GitHub will automatically list it.</p>
<h2 id="heading-real-world-impact"><strong>Real-World Impact</strong></h2>
<p>After implementing this action in my team:</p>
<ul>
<li><p>Code reviews became 30% faster (small PRs are easier to review)</p>
</li>
<li><p>PR sizes decreased by 40% on average</p>
</li>
<li><p>Developers became more conscious of keeping changes focused</p>
</li>
<li><p>Onboarding new team members was easier (labels provide context)</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Creating a GitHub Custom Action isn’t as daunting as it seems. With JavaScript and the GitHub Actions toolkit, you can automate almost any workflow task.</p>
<p>The key steps are:</p>
<ol>
<li><p>Define your action’s metadata in <code>action.yml</code></p>
</li>
<li><p>Write the logic using <code>@actions/core</code> and <code>@actions/github</code></p>
</li>
<li><p>Bundle your code with <code>@vercel/ncc</code></p>
</li>
<li><p>Test locally with <code>uses: ./</code></p>
</li>
<li><p>Automate releases with Semantic Release</p>
</li>
<li><p>Document thoroughly</p>
</li>
</ol>
<p>Start small, solve real problems, and iterate. Your future self (and your team) will thank you.</p>
<h2 id="heading-resources"><strong>Resources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://github.com/AutomationDojo/github-custom-action-examples">Full source code</a></p>
</li>
<li><p><a target="_blank" href="https://docs.github.com/en/actions">GitHub Actions documentation</a></p>
</li>
<li><p><a target="_blank" href="https://docs.github.com/en/actions/creating-actions/creating-a-javascript-action">Creating JavaScript actions</a></p>
</li>
<li><p><a target="_blank" href="https://semantic-release.gitbook.io/">Semantic Release</a></p>
</li>
</ul>
<p><em>What automation challenges are you facing in your workflows? Share in the comments — I’d love to hear about them!</em></p>
]]></content:encoded></item><item><title><![CDATA[Managing GitHub Organizations with Terraform: From Manual Chaos to Infrastructure as Code]]></title><description><![CDATA[If you’ve ever managed a GitHub organization with more than a handful of repositories, you know the pain. Click here to add a branch protection rule. Click there to create a team. Navigate through five menus to grant repository access. Repeat. Repeat...]]></description><link>https://devops-blog.ruicoelho.dev/managing-github-organizations-with-terraform-from-manual-chaos-to-infrastructure-as-code</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/managing-github-organizations-with-terraform-from-manual-chaos-to-infrastructure-as-code</guid><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[GitHub]]></category><category><![CDATA[#IaC]]></category><category><![CDATA[IaC (Infrastructure as Code)]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Sun, 25 Jan 2026 12:53:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769345445833/af0f6713-eeca-4422-90cf-2eb2e38fe149.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you’ve ever managed a GitHub organization with more than a handful of repositories, you know the pain. Click here to add a branch protection rule. Click there to create a team. Navigate through five menus to grant repository access. Repeat. Repeat. Repeat.</p>
<p>Now imagine doing this for 50 repositories. Or 100. Or trying to maintain consistency across them all. Or worse — auditing who has access to what.</p>
<p>There’s a better way: Infrastructure as Code with Terraform.</p>
<h2 id="heading-the-problem-with-manual-github-management"><strong>The Problem with Manual GitHub Management</strong></h2>
<p>Manual GitHub organization management doesn’t scale. Here’s what typically happens:</p>
<p><strong>Configuration Drift</strong>: Team A protects their main branch with certain rules. Team B uses different rules. Team C forgets to protect their branch at all.</p>
<p><strong>Access Control Chaos</strong>: Someone needs access to five repositories. You grant it manually. Six months later, they leave the company. Did you remember to revoke access everywhere?</p>
<p><strong>No Audit Trail</strong>: How do you know what changed, when, and by whom? GitHub’s audit log helps, but it doesn’t tell you the <em>desired state</em> of your infrastructure.</p>
<p><strong>Documentation Debt</strong>: Your internal wiki has outdated screenshots of “how to configure a repository.” Reality diverged months ago.</p>
<h2 id="heading-enter-terraform-for-github"><strong>Enter Terraform for GitHub</strong></h2>
<p>Terraform, the popular Infrastructure as Code tool, has excellent support for GitHub through the official GitHub provider. This means you can define your entire organization structure — repositories, teams, access controls, branch protection rules — in code.</p>
<p>The benefits are immediate:</p>
<p>✅ <strong>Version Control</strong>: Your GitHub configuration lives in Git (meta, right?)<br />✅ <strong>Code Review</strong>: Changes go through pull requests<br />✅ <strong>Auditability</strong>: Complete history of what changed and why<br />✅ <strong>Consistency</strong>: Define patterns once, apply everywhere<br />✅ <strong>Disaster Recovery</strong>: Your org structure is documented in code</p>
<h2 id="heading-a-practical-architecture"><strong>A Practical Architecture</strong></h2>
<p>I’ve built a reference implementation that demonstrates how to structure Terraform for GitHub management: <a target="_blank" href="https://github.com/AutomationDojo/github-org-management-examples">github-org-management-examples</a></p>
<p>The architecture uses four independent modules:</p>
<h3 id="heading-1-organization-configuration"><strong>1. Organization Configuration</strong></h3>
<p>Manages org-level settings like billing email, member privileges, and default permissions. This is your organization’s “constitution” — the baseline rules everyone operates under.</p>
<pre><code class="lang-plaintext">resource "github_organization_settings" "this" {
  billing_email = var.billing_email

  members_can_create_repositories = true
  members_can_create_public_repositories = false
  members_can_create_private_repositories = true

  members_can_fork_private_repositories = false
}
</code></pre>
<h3 id="heading-2-repository-management"><strong>2. Repository Management</strong></h3>
<p>Here’s where it gets interesting. Instead of hardcoding repositories in Terraform, define them in YAML:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">repositories:</span>
  <span class="hljs-attr">example-with-ruleset:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"example-with-ruleset"</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">"Example repository with rulesets"</span>
    <span class="hljs-attr">visibility:</span> <span class="hljs-string">"public"</span>
    <span class="hljs-attr">has_issues:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">has_discussions:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">has_projects:</span> <span class="hljs-literal">false</span>
    <span class="hljs-attr">delete_branch_on_merge:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">topics:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"terraform"</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"github"</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"rulesets"</span>
    <span class="hljs-attr">vulnerability_alerts:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">default_branch:</span> <span class="hljs-string">"main"</span>

    <span class="hljs-comment"># Repository Rulesets (available for public repos on free tier)</span>
    <span class="hljs-attr">rulesets:</span>
      <span class="hljs-attr">main-protection:</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">"Main Branch Protection"</span>
        <span class="hljs-attr">enforcement:</span> <span class="hljs-string">"active"</span>
        <span class="hljs-attr">target:</span> <span class="hljs-string">"branch"</span>
        <span class="hljs-attr">branch_patterns:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"~DEFAULT_BRANCH"</span>  <span class="hljs-comment"># Matches the default branch</span>

        <span class="hljs-attr">rules:</span>
          <span class="hljs-attr">creation:</span> <span class="hljs-literal">false</span>
          <span class="hljs-attr">update:</span> <span class="hljs-literal">true</span>  <span class="hljs-comment"># Require pull request</span>
          <span class="hljs-attr">deletion:</span> <span class="hljs-literal">true</span>  <span class="hljs-comment"># Block deletion</span>
          <span class="hljs-attr">required_linear_history:</span> <span class="hljs-literal">true</span>
          <span class="hljs-attr">non_fast_forward:</span> <span class="hljs-literal">true</span>  <span class="hljs-comment"># Prevent force pushes</span>

          <span class="hljs-attr">pull_request:</span>
            <span class="hljs-attr">required_approving_review_count:</span> <span class="hljs-number">1</span>
            <span class="hljs-attr">dismiss_stale_reviews_on_push:</span> <span class="hljs-literal">true</span>
            <span class="hljs-attr">require_code_owner_review:</span> <span class="hljs-literal">false</span>
            <span class="hljs-attr">required_review_thread_resolution:</span> <span class="hljs-literal">true</span>

          <span class="hljs-attr">required_status_checks:</span>
            <span class="hljs-attr">strict_required_status_checks_policy:</span> <span class="hljs-literal">true</span>
            <span class="hljs-attr">required_checks:</span> []
</code></pre>
<p>The Terraform code reads this YAML and creates resources dynamically. This separation is crucial — developers can propose repository changes in YAML without touching Terraform logic.</p>
<h3 id="heading-3-organization-rulesets"><strong>3. Organization Rulesets</strong></h3>
<p>Organization-level rulesets (requires GitHub Team or Enterprise) let you enforce policies across all repositories. Think of it as a safety net — even if someone forgets to configure their repository properly, the org-level rules catch it.</p>
<p><strong>Important</strong>: Repository-level rulesets work on the free tier for public repos. Organization-level rulesets require a paid plan but provide centralized enforcement.</p>
<h3 id="heading-4-team-management"><strong>4. Team Management</strong></h3>
<p>Teams and their repository access permissions, all in YAML:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">teams:</span>
  <span class="hljs-attr">core-team:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"Core Team"</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">"Core maintainers with full access to the organization"</span>
    <span class="hljs-attr">privacy:</span> <span class="hljs-string">"closed"</span>
    <span class="hljs-attr">members:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">username:</span> <span class="hljs-string">"alice"</span>
        <span class="hljs-attr">role:</span> <span class="hljs-string">"maintainer"</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">username:</span> <span class="hljs-string">"bob"</span>
        <span class="hljs-attr">role:</span> <span class="hljs-string">"member"</span>
    <span class="hljs-attr">repositories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">repository:</span> <span class="hljs-string">".github"</span>
        <span class="hljs-attr">permission:</span> <span class="hljs-string">"admin"</span>

  <span class="hljs-attr">external-access:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"External Access"</span>
    <span class="hljs-attr">description:</span> <span class="hljs-string">"External collaborators with read access to specific private repositories"</span>
    <span class="hljs-attr">privacy:</span> <span class="hljs-string">"closed"</span>
    <span class="hljs-attr">members:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">username:</span> <span class="hljs-string">"external-contractor"</span>
        <span class="hljs-attr">role:</span> <span class="hljs-string">"member"</span>
    <span class="hljs-attr">repositories:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">repository:</span> <span class="hljs-string">"private-repo"</span>
        <span class="hljs-attr">permission:</span> <span class="hljs-string">"pull"</span>  <span class="hljs-comment"># Read-only access</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">repository:</span> <span class="hljs-string">"another-private-repo"</span>
        <span class="hljs-attr">permission:</span> <span class="hljs-string">"push"</span>  <span class="hljs-comment"># Write access</span>
</code></pre>
<p>The Terraform module handles the complexity of creating teams, adding members, and granting repository access — all from this declarative configuration.</p>
<h2 id="heading-real-world-workflow"><strong>Real-World Workflow</strong></h2>
<p>Here’s how this looks in practice:</p>
<h2 id="heading-scenario-new-repository"><strong>Scenario: New Repository</strong></h2>
<ol>
<li><p>Developer opens a PR adding the repository to <code>repositories.yaml</code></p>
</li>
<li><p>Team reviews the configuration (visibility, branch protection, etc.)</p>
</li>
<li><p>PR merges</p>
</li>
<li><p>GitHub Actions runs <code>terraform apply</code></p>
</li>
<li><p>Repository is created with all protections in place</p>
</li>
</ol>
<h2 id="heading-scenario-access-request"><strong>Scenario: Access Request</strong></h2>
<ol>
<li><p>Engineer needs access to three repositories</p>
</li>
<li><p>PR adds them to the relevant team in <code>teams.yaml</code></p>
</li>
<li><p>Security team reviews</p>
</li>
<li><p>Merge triggers Terraform</p>
</li>
<li><p>Access granted consistently across all repos</p>
</li>
</ol>
<h2 id="heading-scenario-policy-update"><strong>Scenario: Policy Update</strong></h2>
<ol>
<li><p>Security requires all repos to enforce signed commits</p>
</li>
<li><p>Update the organization ruleset in <code>org_rulesets.yaml</code></p>
</li>
<li><p>One PR, one review, one apply</p>
</li>
<li><p>Policy enforced across the entire organization</p>
</li>
</ol>
<h2 id="heading-the-yaml-strategy"><strong>The YAML Strategy</strong></h2>
<p>Why YAML over pure Terraform? Several reasons:</p>
<p><strong>Lower Barrier to Entry</strong>: Developers who don’t know Terraform can still propose repository changes. YAML is more approachable than HCL.</p>
<p><strong>Separation of Concerns</strong>: Terraform handles the <em>how</em> (API calls, state management). YAML handles the <em>what</em> (desired configuration).</p>
<p><strong>Validation</strong>: You can build additional tooling around YAML — linters, validators, custom checks — without modifying Terraform code.</p>
<p><strong>Scalability</strong>: When you have 100+ repositories, managing them in YAML is far more maintainable than sprawling Terraform files.</p>
<h2 id="heading-key-implementation-details"><strong>Key Implementation Details</strong></h2>
<h3 id="heading-using-try-for-optional-fields"><strong>Using</strong> <code>try()</code> for Optional Fields</h3>
<p>GitHub’s API has many optional parameters. The modules use <code>try()</code> extensively to provide sensible defaults:</p>
<pre><code class="lang-plaintext">resource "github_repository" "repos" {
  for_each = local.repositories

  name        = each.value.name
  description = try(each.value.description, null)
  visibility  = try(each.value.visibility, "private")

  # Features
  has_issues      = try(each.value.has_issues, true)
  has_discussions = try(each.value.has_discussions, false)
  has_projects    = try(each.value.has_projects, true)
  has_wiki        = try(each.value.has_wiki, true)

  # Merge settings
  allow_merge_commit     = try(each.value.allow_merge_commit, true)
  allow_squash_merge     = try(each.value.allow_squash_merge, true)
  delete_branch_on_merge = try(each.value.delete_branch_on_merge, true)

  # Other settings
  topics               = try(each.value.topics, [])
  vulnerability_alerts = try(each.value.vulnerability_alerts, true)
}
</code></pre>
<p>This pattern allows YAML configurations to be minimal — only specify what differs from defaults.</p>
<h3 id="heading-dynamic-ruleset-generation"><strong>Dynamic Ruleset Generation</strong></h3>
<p>Repository rulesets can be defined inline with each repository. The locals.tf flattens this structure:</p>
<pre><code class="lang-plaintext"># locals.tf - Flatten rulesets from all repositories
locals {
  repo_rulesets = flatten([
    for repo_key, repo in local.repositories : [
      for ruleset_key, ruleset in try(repo.rulesets, {}) : {
        key         = "${repo_key}-${ruleset_key}"
        repo_key    = repo_key
        repo_name   = repo.name
        ruleset_key = ruleset_key
        ruleset     = ruleset
      }
    ]
  ])
}

# main.tf - Create rulesets dynamically
resource "github_repository_ruleset" "repo_rulesets" {
  for_each = {
    for rs in local.repo_rulesets : rs.key =&gt; rs
  }

  repository  = github_repository.repos[each.value.repo_key].name
  name        = each.value.ruleset.name
  target      = try(each.value.ruleset.target, "branch")
  enforcement = try(each.value.ruleset.enforcement, "active")

  conditions {
    ref_name {
      include = try(each.value.ruleset.branch_patterns, ["~DEFAULT_BRANCH"])
      exclude = try(each.value.ruleset.exclude_patterns, [])
    }
  }

  rules {
    creation                = try(each.value.ruleset.rules.creation, false)
    update                  = try(each.value.ruleset.rules.update, true)
    deletion                = try(each.value.ruleset.rules.deletion, true)
    required_linear_history = try(each.value.ruleset.rules.required_linear_history, false)
    non_fast_forward        = try(each.value.ruleset.rules.non_fast_forward, true)

    dynamic "pull_request" {
      for_each = try(each.value.ruleset.rules.pull_request, null) != null ? [1] : []
      content {
        required_approving_review_count   = try(each.value.ruleset.rules.pull_request.required_approving_review_count, 1)
        dismiss_stale_reviews_on_push     = try(each.value.ruleset.rules.pull_request.dismiss_stale_reviews_on_push, true)
        require_code_owner_review         = try(each.value.ruleset.rules.pull_request.require_code_owner_review, false)
      }
    }
  }
}
</code></pre>
<p>This creates rulesets only for repositories that define them, keeping the state clean and focused.</p>
<h3 id="heading-flattened-team-repository-access"><strong>Flattened Team Repository Access</strong></h3>
<p>The team module uses a clever flattening technique to create individual access resources:</p>
<pre><code class="lang-plaintext"># locals.tf - Flatten team repositories
locals {
  team_repositories = flatten([
    for team_key, team in local.teams : [
      for repo in coalesce(try(team.repositories, null), []) : {
        team_key   = team_key
        repository = repo.repository
        permission = try(repo.permission, "pull")
      }
    ]
  ])
}

# locals.tf - Flatten team members
locals {
  team_members = flatten([
    for team_key, team in local.teams : [
      for member in coalesce(try(team.members, null), []) : {
        team_key = team_key
        username = member.username
        role     = try(member.role, "member")
      }
    ]
  ])
}

# main.tf - Create team repository access
resource "github_team_repository" "team_repos" {
  for_each = {
    for tr in local.team_repositories : "${tr.team_key}-${tr.repository}" =&gt; tr
  }

  team_id    = github_team.teams[each.value.team_key].id
  repository = each.value.repository
  permission = each.value.permission
}

# main.tf - Add team members
resource "github_team_membership" "members" {
  for_each = {
    for tm in local.team_members : "${tm.team_key}-${tm.username}" =&gt; tm
  }

  team_id  = github_team.teams[each.value.team_key].id
  username = each.value.username
  role     = each.value.role
}
</code></pre>
<p>This transforms the hierarchical YAML structure into the flat resource model Terraform needs.</p>
<h3 id="heading-deployment-strategy"><strong>Deployment Strategy</strong></h3>
<p>Each module is independent, allowing incremental adoption:</p>
<ol>
<li><p><strong>Start Small</strong>: Begin with organization settings</p>
</li>
<li><p><strong>Add Repositories</strong>: Migrate existing repos to Terraform gradually</p>
</li>
<li><p><strong>Implement Teams</strong>: Codify team structure and access</p>
</li>
<li><p><strong>Enforce Policies</strong>: Layer in rulesets once the foundation is solid</p>
</li>
</ol>
<p>Use separate state files for each module. This provides isolation — changes to teams don’t affect repository state.</p>
<h2 id="heading-gotchas-and-considerations"><strong>Gotchas and Considerations</strong></h2>
<h3 id="heading-authentication"><strong>Authentication</strong></h3>
<p>You’ll need either a Personal Access Token (PAT) or GitHub App credentials. For production, use GitHub Apps with fine-grained permissions.</p>
<pre><code class="lang-plaintext">provider "github" {
  owner = var.github_organization
  token = var.github_token  # Better: use app authentication
}
</code></pre>
<h3 id="heading-state-management"><strong>State Management</strong></h3>
<p>Terraform state contains sensitive information. Use remote state (Terraform Cloud, S3 with encryption, etc.) and restrict access appropriately.</p>
<h3 id="heading-import-existing-resources"><strong>Import Existing Resources</strong></h3>
<p>Migrating existing infrastructure requires importing resources:</p>
<pre><code class="lang-bash">terraform import <span class="hljs-string">'github_repository.this["backend-api"]'</span> backend-api
terraform import <span class="hljs-string">'github_team.this["backend-team"]'</span> 12345678
</code></pre>
<p>Build an import script if you have many resources to migrate.</p>
<h3 id="heading-plan-limitations"><strong>Plan Limitations</strong></h3>
<p>Organization rulesets require GitHub Team or Enterprise. Repository rulesets work on free tier for public repos. Plan accordingly based on your GitHub tier.</p>
<h2 id="heading-cicd-integration"><strong>CI/CD Integration</strong></h2>
<h3 id="heading-atlantis-the-recommended-approach"><strong>Atlantis: The Recommended Approach</strong></h3>
<p>While GitHub Actions works well, <strong>Atlantis</strong> is often the better choice for Terraform automation — and arguably the recommended approach for managing infrastructure changes. Atlantis provides a GitOps workflow where Terraform runs are triggered and reviewed directly in pull requests, with built-in locking, approval workflows, and plan/apply separation.</p>
<p>The benefits of Atlantis include:</p>
<ul>
<li><p><strong>Pull request-native workflow</strong> — Plans and applies happen in PR comments</p>
</li>
<li><p><strong>State locking</strong> — Prevents concurrent modifications</p>
</li>
<li><p><strong>Approval gates</strong> — Require explicit approval before apply</p>
</li>
<li><p><strong>Audit trail</strong> — Everything happens in GitHub, fully visible</p>
</li>
<li><p><strong>Multi-environment support</strong> — Manage dev/staging/prod with different approval rules</p>
</li>
</ul>
<p>For a production setup managing critical GitHub infrastructure, Atlantis provides the guardrails and visibility you need. The setup requires running an Atlantis server, but the operational benefits are well worth it.</p>
<h3 id="heading-github-actions-approach"><strong>GitHub Actions Approach</strong></h3>
<p>Automate Terraform runs with GitHub Actions:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Terraform</span> <span class="hljs-string">Apply</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span> [<span class="hljs-string">main</span>]
    <span class="hljs-attr">paths:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'repos/**'</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'teams/**'</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">terraform:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Setup</span> <span class="hljs-string">Terraform</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">hashicorp/setup-terraform@v3</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Terraform</span> <span class="hljs-string">Init</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">terraform</span> <span class="hljs-string">init</span>
        <span class="hljs-attr">working-directory:</span> <span class="hljs-string">./repos</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Terraform</span> <span class="hljs-string">Apply</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">terraform</span> <span class="hljs-string">apply</span> <span class="hljs-string">-auto-approve</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">GITHUB_TOKEN:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.TF_GITHUB_TOKEN</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">working-directory:</span> <span class="hljs-string">./repos</span>
</code></pre>
<p>Add <code>terraform plan</code> on pull requests for preview:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">on:</span>
  <span class="hljs-attr">pull_request:</span>
    <span class="hljs-attr">paths:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">'repos/**'</span>

<span class="hljs-comment"># ... terraform plan output as PR comment</span>
</code></pre>
<h2 id="heading-security-considerations"><strong>Security Considerations</strong></h2>
<p><strong>Least Privilege</strong>: Grant Terraform only the permissions it needs. Use GitHub Apps over PATs for fine-grained control.</p>
<p><strong>Secret Scanning</strong>: Enable secret scanning on your Terraform repository. Never commit tokens.</p>
<p><strong>State File Security</strong>: Terraform state contains sensitive data. Encrypt it at rest and in transit.</p>
<p><strong>Review Process</strong>: Require multiple approvals for Terraform PRs. Organization changes should never be one person’s decision.</p>
<p><strong>Drift Detection</strong>: Run <code>terraform plan</code> regularly to detect manual changes. Set up alerts if drift is detected.</p>
<h2 id="heading-beyond-the-basics"><strong>Beyond the Basics</strong></h2>
<p>Once you have the foundation, you can extend it:</p>
<ul>
<li><p><strong>Custom Modules</strong>: Create organization-specific abstractions</p>
</li>
<li><p><strong>Validation</strong>: Build custom validators for YAML configurations</p>
</li>
<li><p><strong>Documentation Generation</strong>: Auto-generate docs from Terraform state</p>
</li>
<li><p><strong>Compliance Reports</strong>: Generate access reports for audits</p>
</li>
<li><p><strong>Batch Operations</strong>: Bulk update repositories using Terraform’s for_each</p>
</li>
</ul>
<h2 id="heading-the-full-picture"><strong>The Full Picture</strong></h2>
<p>Managing GitHub with Terraform isn’t just about automation — it’s about treating your organization structure as code. Version control, code review, automated testing, and deployment pipelines all apply.</p>
<p>The result is a GitHub organization that’s:</p>
<ul>
<li><p><strong>Consistent</strong>: Every repository follows the same standards</p>
</li>
<li><p><strong>Auditable</strong>: Complete history of every change</p>
</li>
<li><p><strong>Recoverable</strong>: Disaster recovery is just <code>terraform apply</code></p>
</li>
<li><p><strong>Scalable</strong>: Adding your 100th repository is as easy as the first</p>
</li>
<li><p><strong>Secure</strong>: Policies are enforced automatically, not manually</p>
</li>
</ul>
<h2 id="heading-getting-started"><strong>Getting Started</strong></h2>
<p>Check out the complete implementation: <a target="_blank" href="https://github.com/AutomationDojo/github-org-management-examples">github-org-management-examples</a></p>
<p>The repository includes:</p>
<ul>
<li><p>Four modular Terraform configurations</p>
</li>
<li><p>YAML-based configuration examples</p>
</li>
<li><p>Detailed README with usage instructions</p>
</li>
<li><p>Documentation website: <a target="_blank" href="https://github-org-management-examples.automationdojo.org/">github-org-management-examples.automationdojo.org</a></p>
</li>
</ul>
<p>Start with one module, validate the approach, then expand. Your future self — and your team — will thank you.</p>
<p>Have you managed GitHub organizations with Terraform? What challenges did you face? Share your experience in the comments.</p>
]]></content:encoded></item><item><title><![CDATA[Docker Hardened Images: Enterprise Security, Now Free for Everyone]]></title><description><![CDATA[How Docker’s security-focused container images went from premium to community-accessible.
When it comes to container security, the old saying “you don’t know what you don’t know” has never been more relevant. Every Docker image you pull could be hidi...]]></description><link>https://devops-blog.ruicoelho.dev/docker-hardened-images-enterprise-security-now-free-for-everyone</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/docker-hardened-images-enterprise-security-now-free-for-everyone</guid><category><![CDATA[Docker]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[containers]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Sun, 25 Jan 2026 12:44:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769345059102/0b473fc2-5524-4a59-bed0-344037c90377.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>How Docker’s security-focused container images went from premium to community-accessible.</p>
<p>When it comes to container security, the old saying “you don’t know what you don’t know” has never been more relevant. Every Docker image you pull could be hiding vulnerabilities, unnecessary packages, or worse — malicious code. This is where Docker Hardened Images come in, and as of December 2025, they’re available to everyone at no cost.</p>
<h2 id="heading-the-problem-with-traditional-container-images"><strong>The Problem with Traditional Container Images</strong></h2>
<p>Most Docker images are built for convenience, not security. They come packed with shells, package managers, and tools that make development easy but also expand the attack surface dramatically. It’s like leaving all your house windows open because you might need the fresh air — convenient, yes, but hardly secure.</p>
<p>Consider this: a typical Node.js base image might contain hundreds of packages you’ll never use. Each one is a potential vulnerability. Each one could be the entry point for a supply chain attack. And with the rise of sophisticated attacks targeting containerized applications, this isn’t just theoretical risk — it’s a clear and present danger.</p>
<h2 id="heading-enter-docker-hardened-images"><strong>Enter Docker Hardened Images</strong></h2>
<p>Docker Hardened Images (DHI) take a radically different approach. Built on minimal Alpine or Debian Linux bases, these images strip away everything that isn’t absolutely necessary:</p>
<ul>
<li><p><strong>No shell</strong> — Can’t exploit what isn’t there</p>
</li>
<li><p><strong>No package manager</strong> — Eliminates an entire class of attacks</p>
</li>
<li><p><strong>Non-root user by default</strong> — Limits damage from compromises</p>
</li>
<li><p><strong>Minimal dependencies</strong> — Only what your application actually needs</p>
</li>
</ul>
<p>The result? Docker claims up to a <strong>95% reduction in attack surface</strong> compared to standard images. That’s not a typo — ninety-five percent.</p>
<h2 id="heading-from-premium-to-free-the-journey"><strong>From Premium to Free: The Journey</strong></h2>
<p>Docker first introduced Hardened Images in May 2024 as a commercial offering. The value proposition was clear: pay for enterprise-grade security and compliance. Organizations with strict requirements — those needing FIPS compliance, DoD STIG standards, or contractual SLAs for vulnerability patching — found real value in the premium tier.</p>
<p>But Docker recognized a larger opportunity. Making basic hardened images free could help secure the entire container ecosystem, not just enterprises with deep pockets. As supply chain attacks become increasingly sophisticated, raising the security baseline for everyone benefits the entire community.</p>
<h2 id="heading-whats-free-whats-not"><strong>What’s Free, What’s Not</strong></h2>
<p>The newly free tier includes:</p>
<p>✅ Complete catalog of hardened base images<br />✅ Full SBOM (Software Bill of Materials) for each image<br />✅ CVE assessment and vulnerability data<br />✅ Apache 2.0 license with no hidden surprises<br />✅ Community support and GitHub-based catalog</p>
<p>The Enterprise tier (still paid) adds:</p>
<ul>
<li><p>FIPS 140–2 and DoD STIG compliance variants</p>
</li>
<li><p>7-day critical CVE remediation SLA</p>
</li>
<li><p>Custom image building with full provenance</p>
</li>
<li><p>Enterprise support and contractual guarantees</p>
</li>
</ul>
<p>This tiered approach allows Docker to sustain the project financially while democratizing container security fundamentals.</p>
<h2 id="heading-the-trade-offs-you-should-know"><strong>The Trade-offs You Should Know</strong></h2>
<p>Hardened images aren’t a drop-in replacement. The security benefits come with operational changes:</p>
<h2 id="heading-1-no-shell-different-debugging"><strong>1. No Shell = Different Debugging</strong></h2>
<p>Without a shell, you can’t just <code>docker exec</code> into a container and poke around. Docker's solution is Docker Debug, a tool that provides debugging capabilities without modifying the hardened image. The catch? It requires Docker Desktop, which means a subscription for most business uses.</p>
<h2 id="heading-2-package-installation-requires-workflow-changes"><strong>2. Package Installation Requires Workflow Changes</strong></h2>
<p>Need additional PHP extensions? You’ll use a <code>-dev</code> variant to install them, then copy the artifacts to your runtime image. It's more steps, but it enforces a clean separation between build-time and runtime dependencies.</p>
<h2 id="heading-3-modifications-can-undermine-security"><strong>3. Modifications Can Undermine Security</strong></h2>
<p>You can add anything to a hardened image — Docker won’t stop you. But every addition potentially reduces security. This is where scanners like Docker Scout, Trivy, or Grype become essential for verifying your final image maintains security standards.</p>
<h2 id="heading-getting-started"><strong>Getting Started</strong></h2>
<p>Pulling a hardened image is straightforward:</p>
<pre><code class="lang-yaml"><span class="hljs-string">docker</span> <span class="hljs-string">pull</span> <span class="hljs-string">dhi.io/node:20-alpine3.22</span>
</code></pre>
<p>The full catalog is available on <a target="_blank" href="https://hub.docker.com/u/docker">Docker Hub</a>, with definitions and documentation on <a target="_blank" href="https://github.com/docker-hardened-images/catalog">GitHub</a>. The community is already actively requesting new images and variants.</p>
<h2 id="heading-the-community-response-cautiously-optimistic"><strong>The Community Response: Cautiously Optimistic</strong></h2>
<p>The developer community’s reaction has been positive but measured. On Hacker News, several developers pointed to Docker’s history of converting free offerings into paid subscriptions. Docker registries, Docker Desktop — both started free before requiring payment in business contexts.</p>
<p>Some expressed concern about long-term sustainability, drawing parallels to Bitnami’s recent shift from free public images to $50,000+ annual subscriptions following Broadcom’s VMware acquisition.</p>
<p>Docker’s response? The enterprise tier makes the free tier sustainable. Companies needing continuous patching, compliance certifications, and contractual SLAs generate revenue that supports free community access.</p>
<h2 id="heading-is-this-the-right-move"><strong>Is This the Right Move?</strong></h2>
<p>Time will tell if Docker’s strategy succeeds long-term, but the immediate impact is undeniable: container security best practices are now accessible to individual developers, startups, and small teams who couldn’t justify enterprise pricing.</p>
<p>The broader question isn’t whether to use hardened images — the security benefits are too significant to ignore. Rather, it’s about understanding the operational trade-offs and building workflows that embrace security-first principles without sacrificing development velocity.</p>
<h2 id="heading-making-the-switch"><strong>Making the Switch</strong></h2>
<p>If you’re considering hardened images, start with these steps:</p>
<ol>
<li><p><strong>Audit your current images</strong> — Run a scanner like Docker Scout to understand your current vulnerability exposure</p>
</li>
<li><p><strong>Start with one service</strong> — Don’t try to convert everything at once</p>
</li>
<li><p><strong>Adapt your debugging workflow</strong> — Invest in Docker Debug or alternative tools early</p>
</li>
<li><p><strong>Automate scanning</strong> — Make vulnerability scanning part of your CI/CD pipeline</p>
</li>
<li><p><strong>Document the differences</strong> — Your team needs to understand the constraints and workflows</p>
</li>
</ol>
<h2 id="heading-the-bigger-picture"><strong>The Bigger Picture</strong></h2>
<p>Docker Hardened Images represent a maturation of container security. We’re moving beyond “shift left” buzzwords toward practical, opinionated solutions that make secure defaults easy to adopt.</p>
<p>Whether this particular offering remains free indefinitely is secondary to the broader shift: security is becoming less of a premium feature and more of a baseline expectation. And that’s something worth celebrating.</p>
<p><em>The Docker Hardened Images catalog is available at</em> <a target="_blank" href="https://github.com/docker-hardened-images/catalog"><em>https://github.com/docker-hardened-images/catalog</em></a><em>. Enterprise information is available through Docker’s sales team.</em></p>
<p><em>What’s your experience with container security? Share your thoughts in the comments below.</em></p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes v1.35 (Timbernetes): What’s New and What’s Changing]]></title><description><![CDATA[The Kubernetes project has just released version 1.35 on December 17, 2025, bringing significant enhancements, important deprecations, and a continued focus on stability and enterprise readiness. After 58 enhancements across the v1.34 release cycle, ...]]></description><link>https://devops-blog.ruicoelho.dev/kubernetes-v135-timbernetes-whats-new-and-whats-changing</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/kubernetes-v135-timbernetes-whats-new-and-whats-changing</guid><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[gitops]]></category><category><![CDATA[upgrades]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Sun, 25 Jan 2026 12:43:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769344673591/dceaf97d-40a8-478a-a693-bdc381568d4c.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Kubernetes project has just released version 1.35 on December 17, 2025, bringing significant enhancements, important deprecations, and a continued focus on stability and enterprise readiness. After 58 enhancements across the v1.34 release cycle, the community continues to push the boundaries of container orchestration while maintaining the platform’s reliability and production-grade quality.</p>
<p>This release represents another milestone in Kubernetes’ evolution, with critical features graduating to general availability and legacy components being phased out to reduce technical debt. Let’s dive deep into what v1.35 brings to the table, with practical examples of how these changes will impact your day-to-day operations.</p>
<h2 id="heading-game-changing-features"><strong>Game-Changing Features</strong></h2>
<h3 id="heading-1-in-place-pod-resource-updates-finally-ga"><strong>1. In-Place Pod Resource Updates: Finally GA!</strong></h3>
<p>After years of development and testing (alpha in v1.27, beta in v1.33), the ability to update Pod resources without restarting containers is finally graduating to General Availability in v1.35. This is arguably one of the most requested features in Kubernetes history.</p>
<p><strong>The Problem It Solves</strong></p>
<p>Previously, if you needed to adjust CPU or memory allocations for a running Pod, your only option was to delete and recreate it. This caused disruption to:</p>
<ul>
<li><p>Stateful applications that maintain long-lived connections</p>
</li>
<li><p>Machine learning training jobs that couldn’t checkpoint their state</p>
</li>
<li><p>Database replicas that needed to resynchronize data</p>
</li>
<li><p>Any workload where downtime equals lost revenue</p>
</li>
</ul>
<p><strong>How It Works</strong></p>
<p>The feature allows you to modify the <code>resources.requests</code> and <code>resources.limits</code> for containers in a running Pod. Here's a practical example:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Original Pod specification</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-app</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">my-app:1.0</span>
    <span class="hljs-attr">resources:</span>
      <span class="hljs-attr">requests:</span>
        <span class="hljs-attr">memory:</span> <span class="hljs-string">"512Mi"</span>
        <span class="hljs-attr">cpu:</span> <span class="hljs-string">"500m"</span>
      <span class="hljs-attr">limits:</span>
        <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
        <span class="hljs-attr">cpu:</span> <span class="hljs-string">"1000m"</span>
</code></pre>
<p>Now, you can update it in place:</p>
<pre><code class="lang-bash">kubectl patch pod my-app --<span class="hljs-built_in">type</span>=<span class="hljs-string">'json'</span> -p=<span class="hljs-string">'[
  {
    "op": "replace",
    "path": "/spec/containers/0/resources/requests/memory",
    "value": "1Gi"
  },
  {
    "op": "replace",
    "path": "/spec/containers/0/resources/limits/memory",
    "value": "2Gi"
  }
]'</span>
</code></pre>
<p>The container keeps running with its new resource allocation. No restart, no data loss, no service interruption.</p>
<p><strong>Real-World Use Case: E-commerce Flash Sale</strong></p>
<p>Imagine you’re running an e-commerce platform. During normal operations, your checkout service runs with 2GB of memory. But you’ve scheduled a flash sale, and you know traffic will spike 10x.</p>
<p>Before v1.35, you had two bad options:</p>
<ol>
<li><p>Over-provision all the time (expensive)</p>
</li>
<li><p>Scale out more replicas and accept some disruption during Pod replacements</p>
</li>
</ol>
<p>With in-place updates:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Before the flash sale</span>
kubectl patch deployment checkout --<span class="hljs-built_in">type</span>=<span class="hljs-string">'json'</span> -p=<span class="hljs-string">'[
  {
    "op": "replace",
    "path": "/spec/template/spec/containers/0/resources/limits/memory",
    "value": "8Gi"
  }
]'</span>

<span class="hljs-comment"># After the flash sale</span>
kubectl patch deployment checkout --<span class="hljs-built_in">type</span>=<span class="hljs-string">'json'</span> -p=<span class="hljs-string">'[
  {
    "op": "replace",
    "path": "/spec/template/spec/containers/0/resources/limits/memory",
    "value": "2Gi"
  }
]'</span>
</code></pre>
<p>Your Pods scale vertically without any restart, maintaining all active shopping carts and sessions.</p>
<p><strong>Integration with VPA</strong></p>
<p>Combined with the Vertical Pod Autoscaler, this enables truly dynamic resource optimization:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">autoscaling.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">VerticalPodAutoscaler</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-app-vpa</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">targetRef:</span>
    <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">"apps/v1"</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">my-app</span>
  <span class="hljs-attr">updatePolicy:</span>
    <span class="hljs-attr">updateMode:</span> <span class="hljs-string">"Auto"</span>  <span class="hljs-comment"># Now uses in-place updates!</span>
  <span class="hljs-attr">resourcePolicy:</span>
    <span class="hljs-attr">containerPolicies:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">containerName:</span> <span class="hljs-string">app</span>
      <span class="hljs-attr">minAllowed:</span>
        <span class="hljs-attr">cpu:</span> <span class="hljs-string">100m</span>
        <span class="hljs-attr">memory:</span> <span class="hljs-string">256Mi</span>
      <span class="hljs-attr">maxAllowed:</span>
        <span class="hljs-attr">cpu:</span> <span class="hljs-number">2</span>
        <span class="hljs-attr">memory:</span> <span class="hljs-string">4Gi</span>
</code></pre>
<p>The VPA can now adjust resources without pod disruption, learning from actual usage patterns and optimizing costs automatically.</p>
<h3 id="heading-2-node-declared-features-solving-the-version-skew-problem"><strong>2. Node Declared Features: Solving the Version Skew Problem</strong></h3>
<p>One of the most challenging aspects of managing Kubernetes clusters at scale is handling version skew during upgrades. You upgrade your control plane to v1.35, but some worker nodes are still on v1.34 or even v1.33. What happens when you schedule a Pod that uses a v1.35 feature on a v1.34 node? Typically: failure.</p>
<p><strong>The Traditional Approach (Manual Labels)</strong></p>
<p>Until now, the solution was manual node labeling:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Manually label nodes that support new features</span>
kubectl label node worker-1 feature.kubernetes.io/in-place-resize=<span class="hljs-literal">true</span>
kubectl label node worker-2 feature.kubernetes.io/pod-certificates=<span class="hljs-literal">true</span>

<span class="hljs-comment"># Then use node selectors</span>
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  nodeSelector:
    feature.kubernetes.io/in-place-resize: <span class="hljs-string">"true"</span>
  containers:
  - name: app
    image: my-app:1.0
</code></pre>
<p>This is error-prone, doesn’t scale, and creates operational overhead.</p>
<p><strong>The v1.35 Solution: Automatic Feature Declaration</strong></p>
<p>With Node Declared Features (alpha), nodes automatically report their capabilities:</p>
<pre><code class="lang-bash">kubectl get node worker-1 -o jsonpath=<span class="hljs-string">'{.status.declaredFeatures}'</span>
</code></pre>
<p>Output:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"InPlacePodVerticalScaling"</span>: {
    <span class="hljs-attr">"supported"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"kubernetesVersion"</span>: <span class="hljs-string">"1.35.0"</span>
  },
  <span class="hljs-attr">"PodCertificates"</span>: {
    <span class="hljs-attr">"supported"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"kubernetesVersion"</span>: <span class="hljs-string">"1.35.0"</span>
  },
  <span class="hljs-attr">"SidecarContainers"</span>: {
    <span class="hljs-attr">"supported"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"reason"</span>: <span class="hljs-string">"Kubelet version 1.34"</span>
  }
}
</code></pre>
<p>The scheduler uses this information automatically:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-app</span>
  <span class="hljs-attr">annotations:</span>
    <span class="hljs-attr">scheduling.kubernetes.io/required-features:</span> <span class="hljs-string">"InPlacePodVerticalScaling,PodCertificates"</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">my-app:1.0</span>
</code></pre>
<p>The scheduler ensures this Pod lands on a node that supports both features. No manual labeling required.</p>
<p><strong>Real-World Scenario: Rolling Cluster Upgrade</strong></p>
<p>You’re upgrading a 100-node cluster from v1.34 to v1.35. With traditional approaches, you might:</p>
<ol>
<li><p>Upgrade all nodes and accept the disruption</p>
</li>
<li><p>Create two separate node pools and migrate workloads</p>
</li>
<li><p>Hope nothing breaks during the gradual rollout</p>
</li>
</ol>
<p>With Node Declared Features:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">critical-app</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">10</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">annotations:</span>
        <span class="hljs-attr">scheduling.kubernetes.io/required-features:</span> <span class="hljs-string">"InPlacePodVerticalScaling"</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
        <span class="hljs-attr">image:</span> <span class="hljs-string">critical-app:2.0</span>
</code></pre>
<p>This Deployment automatically schedules only on upgraded nodes, while your other workloads continue running on v1.34 nodes. The upgrade becomes safer and more manageable.</p>
<h3 id="heading-3-pod-certificates-native-mtls-identity"><strong>3. Pod Certificates: Native mTLS Identity</strong></h3>
<p>Microservices security often requires mutual TLS (mTLS) for service-to-service authentication. Until now, implementing this meant:</p>
<ul>
<li><p>Installing SPIFFE/SPIRE (complex)</p>
</li>
<li><p>Using cert-manager with custom automation (brittle)</p>
</li>
<li><p>Integrating with service meshes like Istio (heavyweight)</p>
</li>
</ul>
<p><strong>The Native Kubernetes Solution</strong></p>
<p>Pod Certificates (graduating to beta in v1.35) provides built-in certificate management for workloads:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">frontend</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">frontend:1.0</span>
    <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">certs</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/etc/certs</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">certs</span>
    <span class="hljs-attr">projected:</span>
      <span class="hljs-attr">sources:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">serviceAccountToken:</span>
          <span class="hljs-attr">path:</span> <span class="hljs-string">token</span>
          <span class="hljs-attr">expirationSeconds:</span> <span class="hljs-number">3600</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">certificate:</span>
          <span class="hljs-attr">name:</span> <span class="hljs-string">pod-cert</span>
          <span class="hljs-attr">issuerRef:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">cluster-ca</span>
          <span class="hljs-attr">dnsNames:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">frontend.default.svc.cluster.local</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"*.frontend.default.svc.cluster.local"</span>
</code></pre>
<p>The kubelet automatically:</p>
<ol>
<li><p>Requests a certificate from the specified issuer</p>
</li>
<li><p>Mounts it in the Pod at <code>/etc/certs/tls.crt</code></p>
</li>
<li><p>Rotates it before expiration</p>
</li>
<li><p>Includes the private key at <code>/etc/certs/tls.key</code></p>
</li>
</ol>
<p><strong>Example: Securing Database Connections</strong></p>
<p>Here’s how you’d configure a PostgreSQL client to use Pod Certificates:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api-server</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">api</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">my-api:1.0</span>
    <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DB_HOST</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">postgres.default.svc.cluster.local</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DB_SSLMODE</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">verify-full</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DB_SSLCERT</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/etc/certs/tls.crt</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DB_SSLKEY</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/etc/certs/tls.key</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DB_SSLROOTCERT</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/etc/certs/ca.crt</span>
    <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">certs</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/etc/certs</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">certs</span>
    <span class="hljs-attr">projected:</span>
      <span class="hljs-attr">sources:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">certificate:</span>
          <span class="hljs-attr">name:</span> <span class="hljs-string">api-client-cert</span>
          <span class="hljs-attr">issuerRef:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">db-ca</span>
          <span class="hljs-attr">dnsNames:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">api-server.default.svc.cluster.local</span>
</code></pre>
<p>Your application code doesn’t need to know about certificate rotation or management — it just reads from the mounted path, which Kubernetes keeps up-to-date automatically.</p>
<p><strong>Comparison with Traditional Approaches</strong></p>
<h3 id="heading-4-numeric-taints-and-tolerations-precision-scheduling"><strong>4. Numeric Taints and Tolerations: Precision Scheduling</strong></h3>
<p>The taints and tolerations system is getting a significant upgrade with numeric comparison operators. This might sound like a small change, but it unlocks powerful new scheduling patterns.</p>
<p><strong>The Old Way: Binary Decisions</strong></p>
<p>Previously, you could only express “has this taint” or “doesn’t have this taint”:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Taint a node</span>
kubectl taint nodes gpu-node-1 gpu=nvidia:NoSchedule

<span class="hljs-comment"># Tolerate it</span>
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  tolerations:
  - key: gpu
    operator: Equal
    value: nvidia
    effect: NoSchedule
</code></pre>
<p>This works for binary decisions, but what about expressing thresholds?</p>
<p><strong>The New Way: Numeric Comparisons</strong></p>
<p>Now you can use <code>Gt</code> (greater than) and <code>Lt</code> (less than):</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Taint nodes with their GPU memory</span>
kubectl taint nodes gpu-node-1 gpu-memory=8:NoSchedule
kubectl taint nodes gpu-node-2 gpu-memory=16:NoSchedule
kubectl taint nodes gpu-node-3 gpu-memory=32:NoSchedule

<span class="hljs-comment"># Schedule only on nodes with at least 16GB GPU memory</span>
apiVersion: v1
kind: Pod
metadata:
  name: large-model-training
spec:
  tolerations:
  - key: gpu-memory
    operator: Gt
    value: <span class="hljs-string">"15"</span>
    effect: NoSchedule
</code></pre>
<p><strong>Real-World Use Case: Network Bandwidth Requirements</strong></p>
<p>Imagine you’re running a video streaming service. Some workloads need high bandwidth:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Label nodes with their network bandwidth (in Gbps)</span>
kubectl taint nodes worker-1 network-bandwidth=1:NoSchedule
kubectl taint nodes worker-2 network-bandwidth=10:NoSchedule
kubectl taint nodes worker-3 network-bandwidth=25:NoSchedule

<span class="hljs-comment"># 4K streaming needs at least 10Gbps</span>
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stream-4k
spec:
  template:
    spec:
      tolerations:
      - key: network-bandwidth
        operator: Gt
        value: <span class="hljs-string">"9"</span>
        effect: NoSchedule
      containers:
      - name: streamer
        image: video-streamer:4k

<span class="hljs-comment"># Standard definition works on any node</span>
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stream-sd
spec:
  template:
    spec:
      tolerations:
      - key: network-bandwidth
        operator: Gt
        value: <span class="hljs-string">"0"</span>
        effect: NoSchedule
      containers:
      - name: streamer
        image: video-streamer:sd
</code></pre>
<p><strong>SLA-Based Scheduling</strong></p>
<p>Another powerful use case is SLA-based scheduling:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Nodes with different reliability SLAs</span>
kubectl taint nodes worker-spot-1 availability-sla=95:NoSchedule
kubectl taint nodes worker-ondemand-1 availability-sla=99.9:NoSchedule
kubectl taint nodes worker-reserved-1 availability-sla=99.99:NoSchedule

<span class="hljs-comment"># Critical workload requires 99.9% uptime</span>
apiVersion: v1
kind: Pod
metadata:
  name: payment-processor
spec:
  tolerations:
  - key: availability-sla
    operator: Gt
    value: <span class="hljs-string">"99.5"</span>
    effect: NoSchedule
  containers:
  - name: processor
    image: payment-processor:1.0
</code></pre>
<h2 id="heading-critical-deprecations-what-you-need-to-do"><strong>Critical Deprecations: What You Need to Do</strong></h2>
<h3 id="heading-1-farewell-to-cgroup-v1"><strong>1. Farewell to cgroup v1</strong></h3>
<p>Kubernetes v1.35 drops support for cgroup v1 on Linux nodes. This is a significant change that requires action.</p>
<p><strong>Check Your Nodes</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check if your nodes are using cgroup v2</span>
<span class="hljs-keyword">for</span> node <span class="hljs-keyword">in</span> $(kubectl get nodes -o name); <span class="hljs-keyword">do</span>
  <span class="hljs-built_in">echo</span> <span class="hljs-string">"Checking <span class="hljs-variable">$node</span>"</span>
  kubectl debug <span class="hljs-variable">$node</span> -it --image=ubuntu -- sh -c <span class="hljs-string">'
    if [ -d /sys/fs/cgroup/unified ]; then
      echo "Using cgroup v2 ✓"
    else
      echo "Using cgroup v1 ✗ - REQUIRES MIGRATION"
    fi
  '</span>
<span class="hljs-keyword">done</span>
</code></pre>
<p><strong>Migration Path</strong></p>
<p>If you find nodes using cgroup v1:</p>
<p>For Ubuntu/Debian:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Enable cgroup v2</span>
sudo sed -i <span class="hljs-string">'s/GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"/'</span> /etc/default/grub
sudo update-grub
sudo reboot
</code></pre>
<p>For RHEL/CentOS:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Enable cgroup v2</span>
sudo grubby --update-kernel=ALL --args=<span class="hljs-string">"systemd.unified_cgroup_hierarchy=1"</span>
sudo reboot
</code></pre>
<p>Verify after reboot:</p>
<pre><code class="lang-bash">mount | grep cgroup2
<span class="hljs-comment"># Should show: cgroup2 on /sys/fs/cgroup type cgroup2</span>
</code></pre>
<h3 id="heading-2-migrating-from-ipvs-to-nftables"><strong>2. Migrating from IPVS to nftables</strong></h3>
<p>If you’re using IPVS mode in kube-proxy, it’s time to migrate to nftables.</p>
<p><strong>Check Current Mode</strong></p>
<pre><code class="lang-bash">kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
</code></pre>
<p><strong>Migration Steps</strong></p>
<p>Update kube-proxy ConfigMap:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kube-proxy</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">kube-system</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">config.conf:</span> <span class="hljs-string">|
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    kind: KubeProxyConfiguration
    mode: "nftables"  # Changed from "ipvs"
    # ... rest of config</span>
</code></pre>
<p>Restart kube-proxy:</p>
<pre><code class="lang-bash">kubectl rollout restart daemonset kube-proxy -n kube-system
</code></pre>
<p>Verify the change:</p>
<pre><code class="lang-bash">kubectl logs -n kube-system -l k8s-app=kube-proxy | grep <span class="hljs-string">"Using nftables"</span>
</code></pre>
<p><strong>Performance Comparison</strong></p>
<p>In benchmarks, nftables shows:</p>
<ul>
<li><p>30% better throughput for new connection establishment</p>
</li>
<li><p>50% lower memory usage for large services (&gt;10,000 endpoints)</p>
</li>
<li><p>Better integration with modern Linux kernels</p>
</li>
</ul>
<h3 id="heading-3-containerd-v2-upgrade"><strong>3. Containerd v2 Upgrade</strong></h3>
<p>Check your containerd version:</p>
<pre><code class="lang-bash">kubectl get nodes -o custom-columns=NAME:.metadata.name,CONTAINER-RUNTIME:.status.nodeInfo.containerRuntimeVersion
</code></pre>
<p>If you see containerd 1.x, upgrade to containerd 2.0 or later:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># For Ubuntu/Debian</span>
sudo apt-get update
sudo apt-get install containerd.io=2.0.*

<span class="hljs-comment"># For RHEL/CentOS</span>
sudo yum update containerd.io-2.0.*

<span class="hljs-comment"># Restart containerd</span>
sudo systemctl restart containerd
</code></pre>
<p><strong>Monitor with Prometheus</strong></p>
<p>Add this alert to catch unsupported versions:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PrometheusRule</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">containerd-version-alert</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">groups:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">containerd</span>
    <span class="hljs-attr">rules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">alert:</span> <span class="hljs-string">ContainerdVersionUnsupported</span>
      <span class="hljs-attr">expr:</span> <span class="hljs-string">kubelet_cri_losing_support</span> <span class="hljs-string">==</span> <span class="hljs-number">1</span>
      <span class="hljs-attr">for:</span> <span class="hljs-string">24h</span>
      <span class="hljs-attr">annotations:</span>
        <span class="hljs-attr">summary:</span> <span class="hljs-string">"Node <span class="hljs-template-variable">{{ $labels.node }}</span> running unsupported containerd version"</span>
        <span class="hljs-attr">description:</span> <span class="hljs-string">"Upgrade to containerd 2.0+ - v1.35 is the last version supporting containerd 1.x"</span>
</code></pre>
<h2 id="heading-preparing-for-the-upgrade"><strong>Preparing for the Upgrade</strong></h2>
<h3 id="heading-pre-upgrade-checklist"><strong>Pre-Upgrade Checklist</strong></h3>
<p><strong>Audit Your Infrastructure</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create a pre-upgrade report</span>
kubectl get nodes -o json | jq <span class="hljs-string">'.items[] | {
  name: .metadata.name,
  kubelet: .status.nodeInfo.kubeletVersion,
  container_runtime: .status.nodeInfo.containerRuntimeVersion,
  os: .status.nodeInfo.osImage
}'</span> &gt; pre-upgrade-report.json
</code></pre>
<p><strong>Test in Staging</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Deploy v1.35 to a test cluster</span>
kind create cluster --name k8s-135-test --image kindest/node:v1.35.0

<span class="hljs-comment"># Run your test suite</span>
kubectl apply -f test-workloads/
</code></pre>
<p><strong>Backup everything</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Backup all resources</span>
kubectl get all --all-namespaces -o yaml &gt; backup-all-resources.yaml

<span class="hljs-comment"># Backup etcd</span>
ETCDCTL_API=3 etcdctl snapshot save snapshot.db
</code></pre>
<h3 id="heading-rolling-upgrade-strategy"><strong>Rolling Upgrade Strategy</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># Use PodDisruptionBudgets to control disruption</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">policy/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PodDisruptionBudget</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">critical-app-pdb</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">minAvailable:</span> <span class="hljs-number">80</span><span class="hljs-string">%</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">critical-app</span>
</code></pre>
<pre><code class="lang-bash">---
<span class="hljs-comment"># Drain nodes carefully</span>
<span class="hljs-comment">#!/bin/bash</span>
<span class="hljs-keyword">for</span> node <span class="hljs-keyword">in</span> $(kubectl get nodes -o name); <span class="hljs-keyword">do</span>
  <span class="hljs-built_in">echo</span> <span class="hljs-string">"Upgrading <span class="hljs-variable">$node</span>"</span>

  <span class="hljs-comment"># Cordon the node</span>
  kubectl cordon <span class="hljs-variable">$node</span>

  <span class="hljs-comment"># Drain with grace period</span>
  kubectl drain <span class="hljs-variable">$node</span> --ignore-daemonsets --delete-emptydir-data --grace-period=300

  <span class="hljs-comment"># Upgrade the node (method depends on your setup)</span>
  <span class="hljs-comment"># ... node upgrade commands ...</span>

  <span class="hljs-comment"># Uncordon when ready</span>
  kubectl uncordon <span class="hljs-variable">$node</span>

  <span class="hljs-comment"># Wait for node to be ready</span>
  kubectl <span class="hljs-built_in">wait</span> --<span class="hljs-keyword">for</span>=condition=Ready <span class="hljs-variable">$node</span> --timeout=600s

  <span class="hljs-comment"># Health check</span>
  sleep 60
<span class="hljs-keyword">done</span>
</code></pre>
<h2 id="heading-what-this-means-for-different-teams"><strong>What This Means for Different Teams</strong></h2>
<h3 id="heading-for-platform-engineers"><strong>For Platform Engineers</strong></h3>
<p><strong>Immediate Actions:</strong></p>
<ol>
<li><p>Test in-place resource updates with your VPA setup</p>
</li>
<li><p>Evaluate Pod Certificates for replacing external cert management</p>
</li>
<li><p>Plan cgroup v2 migration for all nodes</p>
</li>
</ol>
<p><strong>Opportunities:</strong></p>
<ul>
<li><p>Reduce costs by 20–30% through better resource optimization</p>
</li>
<li><p>Simplify security architecture with native workload identity</p>
</li>
<li><p>Improve upgrade procedures with node declared features</p>
</li>
</ul>
<h3 id="heading-for-security-teams"><strong>For Security Teams</strong></h3>
<p><strong>New Capabilities:</strong></p>
<ol>
<li><p>Native workload identity reduces external dependencies</p>
</li>
<li><p>Better audit trails with automatic certificate rotation logs</p>
</li>
<li><p>Improved compliance through standardized mTLS</p>
</li>
</ol>
<p><strong>Security Considerations:</strong></p>
<ul>
<li><p>Review certificate issuer configurations</p>
</li>
<li><p>Audit node feature declarations for security-sensitive workloads</p>
</li>
<li><p>Update security policies to leverage numeric taints for compliance zones</p>
</li>
</ul>
<h3 id="heading-for-sres-and-operators"><strong>For SREs and Operators</strong></h3>
<p><strong>Operational Improvements:</strong></p>
<ol>
<li><p>Vertical scaling without disruption reduces incident response time</p>
</li>
<li><p>Safer cluster upgrades with automatic feature detection</p>
</li>
<li><p>Better capacity planning with numeric taints</p>
</li>
</ol>
<p><strong>Monitoring:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Add Prometheus rules for new features</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">monitoring.coreos.com/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PrometheusRule</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">k8s-135-monitoring</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">groups:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">k8s-135-features</span>
    <span class="hljs-attr">rules:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">record:</span> <span class="hljs-string">kubelet:pod_resource_update_total</span>
      <span class="hljs-attr">expr:</span> <span class="hljs-string">sum(rate(kubelet_pod_resource_update_total[5m]))</span> <span class="hljs-string">by</span> <span class="hljs-string">(node)</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">record:</span> <span class="hljs-string">kubelet:declared_features</span>
      <span class="hljs-attr">expr:</span> <span class="hljs-string">count(kubelet_declared_features)</span> <span class="hljs-string">by</span> <span class="hljs-string">(feature)</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">alert:</span> <span class="hljs-string">NodeMissingRequiredFeature</span>
      <span class="hljs-attr">expr:</span> <span class="hljs-string">kube_pod_status_scheduled{node!=""}</span> 
        <span class="hljs-string">and</span> <span class="hljs-string">on(node)</span> <span class="hljs-string">kubelet_declared_features{feature="InPlacePodVerticalScaling"}</span> <span class="hljs-string">==</span> <span class="hljs-number">0</span>
      <span class="hljs-attr">annotations:</span>
        <span class="hljs-attr">summary:</span> <span class="hljs-string">"Pod scheduled on node without required features"</span>
</code></pre>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Kubernetes v1.35 represents a significant leap forward in production-readiness and operational excellence. The graduation of in-place Pod resource updates to GA alone is a game-changer for cost optimization and service reliability. Combined with native workload identity through Pod Certificates and smarter scheduling with Node Declared Features, this release provides the tools needed to run modern applications at scale with confidence.</p>
<p>The deprecations, while requiring some work, eliminate technical debt and point toward a more maintainable future. The removal of cgroup v1 support, IPVS mode deprecation, and containerd v1 sunset are all steps toward a cleaner, more efficient Kubernetes codebase.</p>
<p>As you prepare for the December 17th release, use the examples and migration guides in this article to plan your upgrade strategy. Test thoroughly in staging, monitor the new metrics, and take advantage of the 14-month support window to migrate at your own pace.</p>
<p>The Kubernetes community continues to demonstrate its commitment to both innovation and stability — a rare combination in the fast-moving cloud native ecosystem. Version 1.35 is proof that mature open source projects can deliver cutting-edge features while maintaining backward compatibility and providing clear upgrade paths.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/0*5fCRu_N2ATGrW8QA.png" alt class="image--center mx-auto" /></p>
<p><strong>Resources:</strong></p>
<ul>
<li><p><a target="_blank" href="https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/">Official Kubernetes 1.35 Release Notes</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources">KEP-1287</a>: In-Place Update of Pod Resources</p>
</li>
<li><p><a target="_blank" href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5328-node-declared-features">KEP-5328:</a> Node Declared Features</p>
</li>
<li><p><a target="_blank" href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/4317-pod-certificates">KEP-4317</a>: Pod Certificates</p>
</li>
<li><p>Migration Guides: <a target="_blank" href="http://kubernetes.io/docs/tasks/administer-cluster">kubernetes.io/docs/tasks/administer-cluster</a></p>
</li>
</ul>
<p><em>Ready to upgrade? Join the conversation in the Kubernetes Slack #release-management channel and share your v1.35 experiences with the community.</em></p>
]]></content:encoded></item><item><title><![CDATA[Helm Best Practices 2025: What Changed with Helm 4 and What You Should Know]]></title><description><![CDATA[Helm 4.0 just dropped at KubeCon — here’s everything DevOps engineers need to know about the biggest changes in 6 years.
After six years since Helm 3, the Kubernetes package manager just got its biggest update. Helm 4.0 was released at KubeCon North ...]]></description><link>https://devops-blog.ruicoelho.dev/helm-best-practices-2025-what-changed-with-helm-4-and-what-you-should-know</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/helm-best-practices-2025-what-changed-with-helm-4-and-what-you-should-know</guid><category><![CDATA[Helm]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[gitops]]></category><category><![CDATA[HelmCharts]]></category><category><![CDATA[ArgoCD]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:41:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763487735412/eb12c3a0-28e7-43fa-9316-2a168ccd9707.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Helm 4.0 just dropped at KubeCon — here’s everything DevOps engineers need to know about the biggest changes in 6 years.</em></p>
<p>After six years since Helm 3, the Kubernetes package manager just got its biggest update. Helm 4.0 was released at KubeCon North America 2025 (November 10–13), bringing significant architectural changes, new features, and updated best practices that every DevOps engineer needs to understand.</p>
<p>If you’re managing Helm charts in production, this isn’t just another minor update — it’s a fundamental shift in how Helm handles deployments. But don’t panic: the team has focused heavily on maintaining chart compatibility while modernizing the underlying architecture.</p>
<p>In this guide, I’ll walk you through what’s new, what’s changed, the best practices you need to adopt, and how to migrate safely.</p>
<h2 id="heading-whats-new-in-helm-40"><strong>What’s New in Helm 4.0</strong></h2>
<h3 id="heading-the-big-changes"><strong>The Big Changes</strong></h3>
<p>Helm 4.0 represents the first major version bump since 2019. Here are the headline features:</p>
<p><strong>1. Server-Side Apply (SSA) is Now Default</strong></p>
<p>The biggest architectural change: Helm 4 ditches the three-way merge strategy in favor of Server-Side Apply, the same approach Kubernetes itself uses.</p>
<p><strong>What this means:</strong></p>
<ul>
<li><p>Better conflict detection and handling</p>
</li>
<li><p>Clearer ownership of fields</p>
</li>
<li><p>More predictable upgrade behavior</p>
</li>
<li><p>Explicit errors instead of silent overwrites</p>
</li>
</ul>
<p><strong>2. Completely Redesigned Plugin System</strong></p>
<p>The plugin architecture got a complete overhaul with support for:</p>
<ul>
<li><p>CLI plugins (command extensions)</p>
</li>
<li><p>Getter plugins (custom download protocols)</p>
</li>
<li><p>Post-renderer plugins (template modifications)</p>
</li>
<li><p><strong>WebAssembly (WASM) runtime</strong> for cross-platform plugins</p>
</li>
</ul>
<p><strong>3. Advanced Resource Status Monitoring</strong></p>
<p>Helm now uses <code>kstatus</code> for intelligent resource watching:</p>
<ul>
<li><p>Waits for actual readiness, not just pod creation</p>
</li>
<li><p>Better understanding of resource conditions</p>
</li>
<li><p>Smarter timeout handling</p>
</li>
<li><p>Improved debugging when things fail</p>
</li>
</ul>
<p><strong>4. OCI Install by Digest</strong></p>
<p>Enhanced OCI registry support with digest-based installs for:</p>
<ul>
<li><p>Immutable chart references</p>
</li>
<li><p>Better supply chain security</p>
</li>
<li><p>Precise version control</p>
</li>
</ul>
<p><strong>5. Chart v3 Support</strong></p>
<p>New chart API version with:</p>
<ul>
<li><p>Backwards compatibility for v2 charts</p>
</li>
<li><p>Better dependency management</p>
</li>
<li><p>Enhanced metadata support</p>
</li>
</ul>
<h2 id="heading-breaking-changes-you-need-to-know"><strong>Breaking Changes You Need to Know</strong></h2>
<h3 id="heading-1-plugin-migration-required"><strong>1. Plugin Migration Required</strong></h3>
<p><strong>All existing plugins must be updated</strong> to work with Helm 4. The HIP-0026 plugin redesign means:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Old plugin structure (Helm 3)</span>
my-plugin/
  ├── plugin.yaml
  └── plugin.sh

<span class="hljs-comment"># New plugin structure (Helm 4)</span>
my-plugin/
  ├── plugin.yaml
  ├── main.wasm (or binary)
  └── metadata.json
</code></pre>
<p><strong>Action required:</strong></p>
<ul>
<li><p>Audit your plugin usage: <code>helm plugin list</code></p>
</li>
<li><p>Check with plugin maintainers for Helm 4 compatibility</p>
</li>
<li><p>Test plugins in staging before upgrading production</p>
</li>
</ul>
<h3 id="heading-2-cli-flag-renaming"><strong>2. CLI Flag Renaming</strong></h3>
<p>Several CLI flags have been renamed for consistency:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Helm 3</span>
helm install --wait-for-jobs

<span class="hljs-comment"># Helm 4</span>
helm install --<span class="hljs-built_in">wait</span>
</code></pre>
<p><strong>Action required:</strong></p>
<ul>
<li><p>Update CI/CD scripts</p>
</li>
<li><p>Search codebase for hardcoded Helm commands</p>
</li>
<li><p>Update documentation</p>
</li>
</ul>
<h3 id="heading-3-package-restructuring-sdk-users"><strong>3. Package Restructuring (SDK Users)</strong></h3>
<p>If you’re using Helm as a Go library, packages have been reorganized:</p>
<pre><code class="lang-go"><span class="hljs-comment">// Helm 3</span>
<span class="hljs-keyword">import</span> <span class="hljs-string">"helm.sh/helm/v3/pkg/chart"</span>

<span class="hljs-comment">// Helm 4</span>
<span class="hljs-keyword">import</span> <span class="hljs-string">"helm.sh/helm/v4/pkg/chart/v2"</span>
</code></pre>
<p><strong>Action required:</strong></p>
<ul>
<li><p>Update import paths</p>
</li>
<li><p>Test integrations thoroughly</p>
</li>
<li><p>Review API changes in documentation</p>
</li>
</ul>
<h3 id="heading-4-server-side-apply-conflicts"><strong>4. Server-Side Apply Conflicts</strong></h3>
<p>With SSA as default, conflicts are now <strong>explicit errors</strong> rather than silent overwrites:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># This will now error if another controller owns the field</span>
helm upgrade my-app ./chart

<span class="hljs-comment"># Use --force-conflicts to override (use carefully!)</span>
helm upgrade my-app ./chart --force-conflicts
</code></pre>
<p><strong>Important limitations:</strong></p>
<ul>
<li><p>❌ Multiple owners per manifest not supported</p>
</li>
<li><p>❌ Field ownership transfer not supported</p>
</li>
<li><p>✅ Backwards compatible with three-way merge charts (if K8s &gt;= 1.22)</p>
</li>
</ul>
<h2 id="heading-helm-4-best-practices-updated-for-2025"><strong>Helm 4 Best Practices: Updated for 2025</strong></h2>
<h3 id="heading-1-embrace-server-side-apply"><strong>1. Embrace Server-Side Apply</strong></h3>
<p><strong>Why it matters:</strong> SSA provides clearer semantics and better conflict handling.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># In your chart values, be explicit about ownership</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-config</span>
  <span class="hljs-attr">annotations:</span>
    <span class="hljs-comment"># Document who should own this resource</span>
    <span class="hljs-attr">meta.helm.sh/owner:</span> <span class="hljs-string">"my-team"</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">config.yaml:</span> <span class="hljs-string">|</span>
    <span class="hljs-attr">setting:</span> <span class="hljs-string">value</span>
</code></pre>
<p><strong>What to avoid:</strong></p>
<ul>
<li><p>Don’t rely on undocumented merge behavior</p>
</li>
<li><p>SSA is more strict about field ownership</p>
</li>
</ul>
<p><strong>Migration tip:</strong> Test upgrades in staging with <code>--dry-run</code> first to catch conflicts early.</p>
<h3 id="heading-2-use-digest-based-oci-installs"><strong>2. Use Digest-Based OCI Installs</strong></h3>
<p><strong>Why it matters:</strong> Ensures immutable deployments and better security.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Pin to specific digest, not tag</span>
helm install my-app oci://registry.example.com/charts/my-app@sha256:abc123...

<span class="hljs-comment"># Avoid mutable tags in production</span>
<span class="hljs-comment"># BAD: helm install my-app oci://registry.example.com/charts/my-app:latest</span>
</code></pre>
<p><strong>In your CI/CD:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># GitHub Actions example</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">chart</span> <span class="hljs-string">by</span> <span class="hljs-string">digest</span>
  <span class="hljs-attr">run:</span> <span class="hljs-string">|
    DIGEST=$(helm show chart oci://registry/chart:${{ github.sha }} --output json | jq -r '.digest')
    helm install app oci://registry/chart@$DIGEST</span>
</code></pre>
<h3 id="heading-3-leverage-advanced-status-monitoring"><strong>3. Leverage Advanced Status Monitoring</strong></h3>
<p><strong>Why it matters:</strong> Helm 4’s <code>kstatus</code> understands actual readiness.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Wait for real readiness, not just pod creation</span>
helm install my-app ./chart \
  --<span class="hljs-built_in">wait</span> \
  --timeout 10m

<span class="hljs-comment"># Use specific status checks</span>
helm install my-app ./chart \
  --<span class="hljs-built_in">wait</span> \
  --wait-for-jobs \
  --atomic  <span class="hljs-comment"># Rollback on failure</span>
</code></pre>
<p><strong>In your charts:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Define readiness properly</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">app</span>
        <span class="hljs-attr">readinessProbe:</span>
          <span class="hljs-attr">httpGet:</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">/health</span>
            <span class="hljs-attr">port:</span> <span class="hljs-number">8080</span>
          <span class="hljs-attr">initialDelaySeconds:</span> <span class="hljs-number">10</span>
          <span class="hljs-attr">periodSeconds:</span> <span class="hljs-number">5</span>
        <span class="hljs-attr">livenessProbe:</span>
          <span class="hljs-attr">httpGet:</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">/health</span>
            <span class="hljs-attr">port:</span> <span class="hljs-number">8080</span>
          <span class="hljs-attr">initialDelaySeconds:</span> <span class="hljs-number">30</span>
          <span class="hljs-attr">periodSeconds:</span> <span class="hljs-number">10</span>
</code></pre>
<h3 id="heading-4-chart-v3-structure-your-charts-properly"><strong>4. Chart v3: Structure Your Charts Properly</strong></h3>
<p><strong>Why it matters:</strong> Better organization and maintainability.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-bash">my-chart/
├── Chart.yaml          <span class="hljs-comment"># Use apiVersion: v3</span>
├── values.yaml
├── values.schema.json  <span class="hljs-comment"># JSON Schema validation</span>
├── templates/
│   ├── _helpers.tpl    <span class="hljs-comment"># Template functions</span>
│   ├── deployment.yaml
│   ├── service.yaml
│   └── NOTES.txt       <span class="hljs-comment"># User guidance</span>
├── charts/             <span class="hljs-comment"># Dependencies</span>
└── crds/              <span class="hljs-comment"># Custom Resource Definitions</span>
</code></pre>
<p><strong>Chart.yaml v3:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v3</span>  <span class="hljs-comment"># New in Helm 4</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">my-app</span>
<span class="hljs-attr">version:</span> <span class="hljs-number">1.0</span><span class="hljs-number">.0</span>
<span class="hljs-attr">dependencies:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">postgresql</span>
    <span class="hljs-attr">version:</span> <span class="hljs-string">"^12.0.0"</span>
    <span class="hljs-attr">repository:</span> <span class="hljs-string">"https://charts.bitnami.com/bitnami"</span>
    <span class="hljs-attr">condition:</span> <span class="hljs-string">postgresql.enabled</span>
</code></pre>
<h3 id="heading-5-use-multi-document-values-files"><strong>5. Use Multi-Document Values Files</strong></h3>
<p><strong>Why it matters:</strong> Better organization of complex configurations.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># values.yaml can now contain multiple documents</span>
<span class="hljs-meta">---</span>
<span class="hljs-comment"># Global configuration</span>
<span class="hljs-attr">global:</span>
  <span class="hljs-attr">domain:</span> <span class="hljs-string">example.com</span>
<span class="hljs-meta">---</span>
<span class="hljs-comment"># Environment-specific overrides</span>
<span class="hljs-attr">env:</span>
  <span class="hljs-attr">production:</span>
    <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">staging:</span>
    <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
</code></pre>
<p><strong>Install with specific environment:</strong></p>
<pre><code class="lang-bash">helm install my-app ./chart \
  --values values.yaml \
  --<span class="hljs-built_in">set</span> env=production
</code></pre>
<h3 id="heading-6-implement-proper-secret-management"><strong>6. Implement Proper Secret Management</strong></h3>
<p><strong>Why it matters:</strong> Security is non-negotiable.</p>
<p><strong>Best Practice — Use External Secrets Operator:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Don't put secrets in values.yaml</span>
<span class="hljs-comment"># Use External Secrets Operator instead</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">external-secrets.io/v1beta1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ExternalSecret</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">app-secrets</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">refreshInterval:</span> <span class="hljs-string">1h</span>
  <span class="hljs-attr">secretStoreRef:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">vault-backend</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">SecretStore</span>
  <span class="hljs-attr">target:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">app-secrets</span>
  <span class="hljs-attr">data:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">secretKey:</span> <span class="hljs-string">db-password</span>
    <span class="hljs-attr">remoteRef:</span>
      <span class="hljs-attr">key:</span> <span class="hljs-string">/secret/data/app</span>
      <span class="hljs-attr">property:</span> <span class="hljs-string">db_password</span>
</code></pre>
<p><strong>Or use Sealed Secrets:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Encrypt secrets before committing</span>
kubeseal --format yaml &lt; secret.yaml &gt; sealed-secret.yaml

<span class="hljs-comment"># Include sealed-secret.yaml in your chart</span>
</code></pre>
<p><strong>What to avoid:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># NEVER do this</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Secret</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">password:</span> <span class="hljs-string">cGFzc3dvcmQxMjM=</span>  <span class="hljs-comment"># Base64 is not encryption!</span>
</code></pre>
<h3 id="heading-7-validate-charts-before-deployment"><strong>7. Validate Charts Before Deployment</strong></h3>
<p><strong>Why it matters:</strong> Catch errors before they hit production.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Lint the chart</span>
helm lint ./my-chart

<span class="hljs-comment"># 2. Template and validate</span>
helm template my-app ./my-chart \
  --values values-prod.yaml \
  --validate

<span class="hljs-comment"># 3. Dry-run install</span>
helm install my-app ./my-chart \
  --values values-prod.yaml \
  --dry-run \
  --debug

<span class="hljs-comment"># 4. Use external validators</span>
helm plugin install https://github.com/instrumenta/helm-kubeval
helm kubeval ./my-chart
</code></pre>
<p><strong>In your CI/CD:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># GitHub Actions example</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Validate</span> <span class="hljs-string">Helm</span> <span class="hljs-string">Chart</span>
  <span class="hljs-attr">run:</span> <span class="hljs-string">|
    helm lint charts/*
    helm template test charts/my-app | kubeval --strict</span>
</code></pre>
<h3 id="heading-8-version-control-your-chart-dependencies"><strong>8. Version Control Your Chart Dependencies</strong></h3>
<p><strong>Why it matters:</strong> Reproducible builds.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Chart.yaml - Pin dependency versions</span>
<span class="hljs-attr">dependencies:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">redis</span>
    <span class="hljs-attr">version:</span> <span class="hljs-string">"17.11.3"</span>  <span class="hljs-comment"># Exact version, not range</span>
    <span class="hljs-attr">repository:</span> <span class="hljs-string">"https://charts.bitnami.com/bitnami"</span>
</code></pre>
<p><strong>Lock dependencies:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Generate Chart.lock</span>
helm dependency update ./my-chart

<span class="hljs-comment"># Commit Chart.lock to version control</span>
git add Chart.lock
git commit -m <span class="hljs-string">"Lock chart dependencies"</span>
</code></pre>
<p><strong>What to avoid:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Don't use version ranges in production</span>
<span class="hljs-attr">dependencies:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">redis</span>
    <span class="hljs-attr">version:</span> <span class="hljs-string">"^17.0.0"</span>  <span class="hljs-comment"># Could pull 17.11.x unexpectedly</span>
</code></pre>
<h3 id="heading-9-use-helm-test-for-validation"><strong>9. Use Helm Test for Validation</strong></h3>
<p><strong>Why it matters:</strong> Verify deployments actually work.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># templates/tests/connection-test.yaml</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ include "my-app.fullname" . }}</span>-test-connection"</span>
  <span class="hljs-attr">annotations:</span>
    <span class="hljs-attr">"helm.sh/hook":</span> <span class="hljs-string">test</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">wget</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">busybox</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">'wget'</span>]
    <span class="hljs-attr">args:</span> [<span class="hljs-string">'<span class="hljs-template-variable">{{ include "my-app.fullname" . }}</span>:<span class="hljs-template-variable">{{ .Values.service.port }}</span>'</span>]
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
</code></pre>
<p><strong>Run tests:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># After installation</span>
helm <span class="hljs-built_in">test</span> my-app

<span class="hljs-comment"># With verbose output</span>
helm <span class="hljs-built_in">test</span> my-app --logs
</code></pre>
<h3 id="heading-10-document-your-charts-properly"><strong>10. Document Your Charts Properly</strong></h3>
<p><strong>Why it matters:</strong> Future you (and your team) will thank you.</p>
<p><strong>Best Practice:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Chart.yaml - Complete metadata</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">my-app</span>
<span class="hljs-attr">description:</span> <span class="hljs-string">A</span> <span class="hljs-string">production-ready</span> <span class="hljs-string">application</span> <span class="hljs-string">chart</span>
<span class="hljs-attr">type:</span> <span class="hljs-string">application</span>
<span class="hljs-attr">version:</span> <span class="hljs-number">1.0</span><span class="hljs-number">.0</span>
<span class="hljs-attr">appVersion:</span> <span class="hljs-string">"2.3.0"</span>
<span class="hljs-attr">keywords:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">web</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">api</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">production</span>
<span class="hljs-attr">home:</span> <span class="hljs-string">https://github.com/myorg/my-app</span>
<span class="hljs-attr">sources:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">https://github.com/myorg/my-app</span>
<span class="hljs-attr">maintainers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Your</span> <span class="hljs-string">Name</span>
    <span class="hljs-attr">email:</span> <span class="hljs-string">your.email@example.com</span>
</code></pre>
<p><strong>README.md template:</strong></p>
<pre><code class="lang-markdown"><span class="hljs-section"># My App Helm Chart</span>

<span class="hljs-section">## Prerequisites</span>
<span class="hljs-bullet">-</span> Kubernetes 1.27+
<span class="hljs-bullet">-</span> Helm 4.0+

<span class="hljs-section">## Installation</span>
\<span class="hljs-code">`\`</span>\`bash
helm install my-app ./my-app
\<span class="hljs-code">`\`</span>\`

<span class="hljs-section">## Configuration</span>
| Parameter | Description | Default |
|-----------|-------------|---------|
| <span class="hljs-code">`replicaCount`</span> | Number of replicas | <span class="hljs-code">`1`</span> |
| <span class="hljs-code">`image.repository`</span> | Image repository | <span class="hljs-code">`myapp`</span> |

<span class="hljs-section">## Examples</span>
<span class="hljs-section">### Development</span>
\<span class="hljs-code">`\`</span>\`bash
helm install my-app ./my-app -f values-dev.yaml
\<span class="hljs-code">`\`</span>\`
<span class="hljs-section">### Production</span>
\<span class="hljs-code">`\`</span>\`bash
helm install my-app ./my-app -f values-prod.yaml
\<span class="hljs-code">`\`</span>\`
</code></pre>
<h2 id="heading-migration-guide-helm-3-to-helm-4"><strong>Migration Guide: Helm 3 to Helm 4</strong></h2>
<h3 id="heading-step-1-prepare-your-environment"><strong>Step 1: Prepare Your Environment</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Install Helm 4 alongside Helm 3</span>
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-4 | bash

<span class="hljs-comment"># Verify installation</span>
helm version

<span class="hljs-comment"># Check existing releases</span>
helm list --all-namespaces
</code></pre>
<h3 id="heading-step-2-test-charts-in-staging"><strong>Step 2: Test Charts in Staging</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Template your charts with Helm 4</span>
helm template my-app ./chart \
  --values values-staging.yaml \
  --debug

<span class="hljs-comment"># Dry-run upgrade</span>
helm upgrade my-app ./chart \
  --values values-staging.yaml \
  --dry-run \
  --debug
</code></pre>
<h3 id="heading-step-3-check-for-conflicts"><strong>Step 3: Check for Conflicts</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Upgrade with conflict detection</span>
helm upgrade my-app ./chart \
  --values values-staging.yaml

<span class="hljs-comment"># If conflicts occur, inspect them</span>
kubectl get &lt;resource&gt; &lt;name&gt; -o yaml --show-managed-fields

<span class="hljs-comment"># Force if necessary (carefully!)</span>
helm upgrade my-app ./chart \
  --force-conflicts
</code></pre>
<h3 id="heading-step-4-update-plugins"><strong>Step 4: Update Plugins</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># List current plugins</span>
helm plugin list

<span class="hljs-comment"># Check for Helm 4 compatibility</span>
<span class="hljs-comment"># Visit plugin repos for updates</span>
<span class="hljs-comment"># Update plugins</span>
helm plugin update &lt;plugin-name&gt;
</code></pre>
<h3 id="heading-step-5-update-cicd-pipelines"><strong>Step 5: Update CI/CD Pipelines</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># Before (Helm 3)</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">with</span> <span class="hljs-string">Helm</span>
  <span class="hljs-attr">run:</span> <span class="hljs-string">|
    helm upgrade --install my-app ./chart \
      --wait-for-jobs \
      --timeout 5m
</span>
<span class="hljs-comment"># After (Helm 4)</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">with</span> <span class="hljs-string">Helm</span>
  <span class="hljs-attr">run:</span> <span class="hljs-string">|
    helm upgrade --install my-app ./chart \
      --wait \
      --timeout 5m</span>
</code></pre>
<h3 id="heading-step-6-monitor-the-migration"><strong>Step 6: Monitor the Migration</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Check release history</span>
helm <span class="hljs-built_in">history</span> my-app

<span class="hljs-comment"># Verify resources</span>
kubectl get all -l app=my-app
<span class="hljs-comment"># Check for SSA annotations</span>
kubectl get deployment my-app -o yaml | grep -A 5 <span class="hljs-string">"managedFields"</span>
</code></pre>
<h2 id="heading-troubleshooting-common-issues"><strong>Troubleshooting Common Issues</strong></h2>
<h3 id="heading-issue-1-conflict-errors-after-upgrade"><strong>Issue 1: Conflict Errors After Upgrade</strong></h3>
<p><strong>Symptom:</strong></p>
<pre><code class="lang-bash">Error: UPGRADE FAILED: another controller owns this field
</code></pre>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Option 1: Use --force-conflicts (understand implications first!)</span>
helm upgrade my-app ./chart --force-conflicts

<span class="hljs-comment"># Option 2: Identify and remove conflicting controller</span>
kubectl get &lt;resource&gt; &lt;name&gt; -o yaml --show-managed-fields

<span class="hljs-comment"># Option 3: Revert to three-way merge temporarily</span>
helm upgrade my-app ./chart --three-way-merge
</code></pre>
<h3 id="heading-issue-2-plugin-not-working"><strong>Issue 2: Plugin Not Working</strong></h3>
<p><strong>Symptom:</strong></p>
<pre><code class="lang-bash">Error: plugin <span class="hljs-string">"xyz"</span> failed: <span class="hljs-built_in">exec</span> format error
</code></pre>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check plugin compatibility</span>
helm plugin list

<span class="hljs-comment"># Update to Helm 4 compatible version</span>
helm plugin update xyz

<span class="hljs-comment"># Or install WASM version if available</span>
helm plugin install https://github.com/author/plugin-wasm
</code></pre>
<h3 id="heading-issue-3-chart-templates-failing"><strong>Issue 3: Chart Templates Failing</strong></h3>
<p><strong>Symptom:</strong></p>
<pre><code class="lang-bash">Error: template: chart/templates/deployment.yaml: undefined variable
</code></pre>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Validate with debug output</span>
helm template my-app ./chart --debug

<span class="hljs-comment"># Check for deprecated functions</span>
<span class="hljs-comment"># Some template functions may have changed</span>
<span class="hljs-comment"># Update to Chart API v3 if needed</span>
<span class="hljs-comment"># Edit Chart.yaml: apiVersion: v3</span>
</code></pre>
<h3 id="heading-issue-4-oci-registry-authentication-fails"><strong>Issue 4: OCI Registry Authentication Fails</strong></h3>
<p><strong>Symptom:</strong></p>
<pre><code class="lang-bash">Error: failed to authorize: failed to fetch anonymous token
</code></pre>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Login to registry</span>
helm registry login registry.example.com \
  --username your-user

<span class="hljs-comment"># Or use credential helper</span>
<span class="hljs-built_in">export</span> HELM_REGISTRY_CONFIG=/path/to/config.json

<span class="hljs-comment"># Verify</span>
helm pull oci://registry.example.com/chart
</code></pre>
<h2 id="heading-performance-improvements-in-helm-4"><strong>Performance Improvements in Helm 4</strong></h2>
<p>Helm 4 isn’t just about features — it’s also faster:</p>
<p><strong>Benchmarks (approximate):</strong></p>
<ul>
<li><p>Chart installation: ~15% faster</p>
</li>
<li><p>Template rendering: ~20% faster for complex charts</p>
</li>
<li><p>Dependency resolution: ~30% faster with content-based caching</p>
</li>
</ul>
<p><strong>What makes it faster:</strong></p>
<ul>
<li><p>Content-based caching for charts</p>
</li>
<li><p>Optimized dependency resolution</p>
</li>
<li><p>Parallel resource watching</p>
</li>
<li><p>Better memory management</p>
</li>
</ul>
<h2 id="heading-whats-coming-next"><strong>What’s Coming Next</strong></h2>
<p>Helm 4’s release schedule:</p>
<ul>
<li><p><strong>Helm 4.0.0</strong>: November 2025 (KubeCon NA)</p>
</li>
<li><p><strong>Helm 4.1.0</strong>: January 2026</p>
</li>
<li><p><strong>Minor releases</strong>: Every 4 months</p>
</li>
</ul>
<p><strong>Helm 3 End of Life:</strong></p>
<ul>
<li><p>Helm 3 will reach EOL approximately 6–8 months after Helm 4 release</p>
</li>
<li><p>Estimated: <strong>July 2026</strong></p>
</li>
<li><p>Action: Plan your migration accordingly</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Helm 4.0 represents a significant evolution while maintaining the backwards compatibility that makes migration manageable. The shift to Server-Side Apply, redesigned plugin system, and enhanced status monitoring make Helm more robust and production-ready.</p>
<p><strong>Key Takeaways:</strong></p>
<p>✅ <strong>Server-Side Apply is the new default</strong> — Better conflict handling<br />✅ <strong>Plugin system redesigned</strong> — WASM support, better security<br />✅ <strong>Advanced status monitoring</strong> — True readiness detection<br />✅ <strong>OCI improvements</strong> — Digest-based installs for security<br />✅ <strong>Chart v3 support</strong> — Better dependency management</p>
<p><strong>Action Items:</strong></p>
<ol>
<li><strong>Immediate:</strong></li>
</ol>
<ul>
<li><p>Test your charts with Helm 4 in staging</p>
</li>
<li><p>Audit plugin usage</p>
</li>
<li><p>Update CI/CD scripts for renamed flags</p>
</li>
</ul>
<p><strong>2. Short-term (1–2 months):</strong></p>
<ul>
<li><p>Migrate production deployments</p>
</li>
<li><p>Update chart documentation</p>
</li>
<li><p>Train team on new features</p>
</li>
</ul>
<p><strong>3. Long-term (3–6 months):</strong></p>
<ul>
<li><p>Adopt SSA best practices fully</p>
</li>
<li><p>Migrate to Chart v3</p>
</li>
<li><p>Update internal tooling and scripts</p>
</li>
</ul>
<p><strong>Getting Started:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install Helm 4</span>
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-4 | bash

<span class="hljs-comment"># Test your first chart</span>
helm template my-app ./chart --debug

<span class="hljs-comment"># Deploy when ready</span>
helm upgrade --install my-app ./chart --<span class="hljs-built_in">wait</span>
</code></pre>
<p>Helm 4 is production-ready and brings meaningful improvements. Start testing today, and plan your migration timeline. The Helm community has done an excellent job ensuring backwards compatibility while modernizing the tooling.</p>
<h2 id="heading-additional-resources"><strong>Additional Resources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://helm.sh/docs/next/">Helm 4 Official Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://kubernetes.io/docs/reference/using-api/server-side-apply/">Server-Side Apply in Kubernetes</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/helm/helm">Helm GitHub Repository</a></p>
</li>
</ul>
<p><em>Have you upgraded to Helm 4 yet? What’s your experience been? Share in the comments!</em></p>
]]></content:encoded></item><item><title><![CDATA[Lightweight Kubernetes for DevOps Testing: A Practical Guide to Colima]]></title><description><![CDATA[Test your Kubernetes deployments locally without the overhead — a hands-on guide for DevOps engineers.
As a DevOps engineer, you know the drill: you need to test a Helm chart, validate some YAML manifests, or experiment with a new Kubernetes feature....]]></description><link>https://devops-blog.ruicoelho.dev/lightweight-kubernetes-for-devops-testing-a-practical-guide-to-colima</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/lightweight-kubernetes-for-devops-testing-a-practical-guide-to-colima</guid><category><![CDATA[colima]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[local development]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[k8s]]></category><category><![CDATA[k3s]]></category><category><![CDATA[minikube]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:39:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763487461001/8cbbfadf-44fe-4285-bca8-a7083295ccc4.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Test your Kubernetes deployments locally without the overhead — a hands-on guide for DevOps engineers.</em></p>
<p>As a DevOps engineer, you know the drill: you need to test a Helm chart, validate some YAML manifests, or experiment with a new Kubernetes feature. But spinning up a cloud cluster for every test is expensive and slow, and your current local setup is… let’s say “temperamental.”</p>
<p>What if you could have a lightweight, disposable Kubernetes cluster on your laptop that starts in seconds, uses minimal resources, and doesn’t require licensing fees?</p>
<p>Enter Colima — an open-source tool that gives you Docker and Kubernetes locally with almost zero configuration. In this guide, I’ll show you exactly how to set it up and use it effectively for real DevOps testing scenarios.</p>
<h2 id="heading-the-problem-with-local-kubernetes-development"><strong>The Problem with Local Kubernetes Development</strong></h2>
<p>Let’s be honest: testing Kubernetes deployments locally has always been challenging.</p>
<p><strong>The Common Pain Points:</strong></p>
<ul>
<li><p>Setting up a local Kubernetes cluster takes time and effort</p>
</li>
<li><p>Docker Desktop requires licensing for commercial use in larger companies</p>
</li>
<li><p>Cloud-based testing gets expensive quickly</p>
</li>
<li><p>You need to test changes before pushing to staging or production</p>
</li>
<li><p>Different projects often need different cluster configurations</p>
</li>
</ul>
<p><strong>What DevOps Engineers Actually Need:</strong></p>
<ul>
<li><p>Quick environment spin-up for testing manifests</p>
</li>
<li><p>Ability to test Helm charts locally</p>
</li>
<li><p>Validate deployments before they hit the cluster</p>
</li>
<li><p>Test disaster recovery procedures</p>
</li>
<li><p>Experiment with new Kubernetes features safely</p>
</li>
<li><p>Multiple isolated environments for different projects</p>
</li>
</ul>
<p>This is exactly what Colima solves — a lightweight, free way to run Kubernetes locally with minimal friction.</p>
<h2 id="heading-what-is-colima"><strong>What is Colima?</strong></h2>
<p>Colima (short for <strong>Containers on Lima</strong>) is a container runtime that provides Docker and Kubernetes on macOS and Linux with minimal setup.</p>
<p>Under the hood, Colima uses:</p>
<ul>
<li><p><strong>Lima</strong> (Linux Machines) to create lightweight Linux VMs</p>
</li>
<li><p><strong>QEMU</strong> or <strong>Apple’s Virtualization.Framework</strong> for virtualization</p>
</li>
<li><p><strong>K3s</strong> as the Kubernetes distribution (when enabled)</p>
</li>
<li><p><strong>Docker</strong> or <strong>Containerd</strong> as the container runtime</p>
</li>
</ul>
<p><strong>Why DevOps Teams Choose Colima:</strong></p>
<ul>
<li><p>🚀 Fast startup — get testing in seconds, not minutes</p>
</li>
<li><p>💾 Low resource footprint — won’t slow down your laptop</p>
</li>
<li><p>⚡ Native Apple Silicon support for M1/M2/M3 Macs</p>
</li>
<li><p>🆓 Completely free and open source (MIT licensed)</p>
</li>
<li><p>🔧 Simple CLI interface</p>
</li>
<li><p>🎯 Multiple profiles for different projects or test scenarios</p>
</li>
<li><p>🔄 Compatible with standard Kubernetes tools (kubectl, Helm, Skaffold)</p>
</li>
</ul>
<h2 id="heading-getting-started-installing-colima"><strong>Getting Started: Installing Colima</strong></h2>
<h2 id="heading-installation"><strong>Installation</strong></h2>
<p><strong>On macOS:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install Colima</span>
brew install colima
<span class="hljs-comment"># Verify installation</span>
colima version
</code></pre>
<p><strong>On Linux:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install from binary</span>
curl -LO https://github.com/abiosoft/colima/releases/latest/download/colima-Linux-x86_64
chmod +x colima-Linux-x86_64
sudo mv colima-Linux-x86_64 /usr/<span class="hljs-built_in">local</span>/bin/colima
</code></pre>
<h2 id="heading-basic-setup"><strong>Basic Setup</strong></h2>
<p><strong>Start Colima with Docker:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start with default settings (2 CPUs, 2GB RAM, 60GB disk)</span>
colima start

<span class="hljs-comment"># Verify it's running</span>
colima status
<span class="hljs-comment"># Test Docker</span>
docker run hello-world
</code></pre>
<p><strong>Start Colima with Kubernetes:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start with Kubernetes enabled</span>
colima start --kubernetes

<span class="hljs-comment"># Verify Kubernetes is running</span>
kubectl cluster-info
kubectl get nodes
</code></pre>
<p>That’s it! You now have a working Kubernetes cluster on your laptop.</p>
<h2 id="heading-real-world-configuration-profiles-for-devops-testing"><strong>Real-World Configuration: Profiles for DevOps Testing</strong></h2>
<p>One of Colima’s best features for DevOps work is <strong>profiles</strong> — the ability to run multiple isolated Kubernetes environments for different testing scenarios.</p>
<h2 id="heading-profile-1-quick-manifest-testing"><strong>Profile 1: Quick Manifest Testing</strong></h2>
<p>For validating YAML manifests and quick deployment tests:</p>
<p>bash</p>
<pre><code class="lang-bash">colima start -p quick-test \
  --kubernetes \
  --cpu 2 \
  --memory 4 \
  --disk 50

<span class="hljs-comment"># Switch kubectl context</span>
kubectl config use-context colima-quick-test
</code></pre>
<p><strong>Use case:</strong> Test a deployment manifest before committing to Git</p>
<h2 id="heading-profile-2-helm-chart-development"><strong>Profile 2: Helm Chart Development</strong></h2>
<p>For developing and testing Helm charts:</p>
<pre><code class="lang-bash">colima start -p helm-dev \
  --kubernetes \
  --cpu 4 \
  --memory 8 \
  --disk 100 \
  --network-address

<span class="hljs-comment"># The --network-address flag allows LoadBalancer services to work</span>
</code></pre>
<p><strong>Use case:</strong> Test Helm charts with all service types (including LoadBalancer)</p>
<h2 id="heading-profile-3-disaster-recovery-testing"><strong>Profile 3: Disaster Recovery Testing</strong></h2>
<p>For testing backup/restore procedures and failure scenarios:</p>
<pre><code class="lang-bash">colima start -p dr-test \
  --kubernetes \
  --cpu 4 \
  --memory 8 \
  --disk 150
</code></pre>
<p><strong>Use case:</strong> Simulate node failures, test etcd backups, practice recovery procedures</p>
<h2 id="heading-profile-4-cicd-pipeline-validation"><strong>Profile 4: CI/CD Pipeline Validation</strong></h2>
<p>For testing your CI/CD pipelines locally before running them in production:</p>
<pre><code class="lang-bash">colima start -p pipeline-test \
  --kubernetes \
  --cpu 6 \
  --memory 12 \
  --disk 100
</code></pre>
<p><strong>Use case:</strong> Validate GitHub Actions, GitLab CI, or Jenkins pipelines that deploy to Kubernetes</p>
<p><strong>Key flags explained:</strong></p>
<ul>
<li><p><code>--vm-type vz</code>: Uses Apple's native Virtualization.Framework (macOS 13+) for better performance</p>
</li>
<li><p><code>--mount-type virtiofs</code>: Better file system performance for volume mounts</p>
</li>
<li><p><code>--network-address</code>: Enables LoadBalancer service support</p>
</li>
</ul>
<h2 id="heading-managing-profiles"><strong>Managing Profiles</strong></h2>
<pre><code class="lang-bash"><span class="hljs-comment"># List all profiles</span>
colima list

<span class="hljs-comment"># Stop a specific profile</span>
colima stop -p k8s

<span class="hljs-comment"># Delete a profile</span>
colima delete -p old-project

<span class="hljs-comment"># Switch between profiles</span>
docker context use colima-dev
docker context use colima-k8s
</code></pre>
<h2 id="heading-hands-on-deploying-your-first-application"><strong>Hands-On: Deploying Your First Application</strong></h2>
<p>Let’s deploy a real application to our local Kubernetes cluster.</p>
<h3 id="heading-step-1-create-the-application"><strong>Step 1: Create the Application</strong></h3>
<p>Create a simple nginx deployment:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># nginx-deployment.yaml</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nginx-demo</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
        <span class="hljs-attr">image:</span> <span class="hljs-string">nginx:latest</span>
        <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">80</span>
<span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nginx-service</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">80</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">LoadBalancer</span>
</code></pre>
<h3 id="heading-step-2-deploy-to-kubernetes"><strong>Step 2: Deploy to Kubernetes</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Start Colima with Kubernetes and LoadBalancer support</span>
colima start --kubernetes --network-address

<span class="hljs-comment"># Apply the deployment</span>
kubectl apply -f nginx-deployment.yaml

<span class="hljs-comment"># Wait for pods to be ready</span>
kubectl <span class="hljs-built_in">wait</span> --<span class="hljs-keyword">for</span>=condition=ready pod -l app=nginx --timeout=60s

<span class="hljs-comment"># Check the deployment</span>
kubectl get deployments
kubectl get pods
kubectl get services
</code></pre>
<h3 id="heading-step-3-access-the-application"><strong>Step 3: Access the Application</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Get the LoadBalancer IP</span>
kubectl get svc nginx-service

<span class="hljs-comment"># Access the application</span>
curl http://&lt;EXTERNAL-IP&gt;

<span class="hljs-comment"># Or use port-forward as an alternative</span>
kubectl port-forward svc/nginx-service 8080:80

<span class="hljs-comment"># Then access at http://localhost:8080</span>
</code></pre>
<h3 id="heading-step-4-build-and-deploy-a-custom-image"><strong>Step 4: Build and Deploy a Custom Image</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Build an image (it's automatically available in Kubernetes)</span>
docker build -t my-app:v1 .

<span class="hljs-comment"># Update your deployment to use the local image</span>
kubectl <span class="hljs-built_in">set</span> image deployment/my-app my-app=my-app:v1

<span class="hljs-comment"># Verify the update</span>
kubectl rollout status deployment/my-app
</code></pre>
<p><strong>Pro Tip:</strong> With Docker runtime in Colima, images built with <code>docker build</code> are automatically available to Kubernetes—no need to push to a registry!</p>
<h2 id="heading-advanced-configuration"><strong>Advanced Configuration</strong></h2>
<h3 id="heading-customizing-colima-settings"><strong>Customizing Colima Settings</strong></h3>
<p>Colima uses a YAML configuration file. Edit it with:</p>
<pre><code class="lang-bash">colima start --edit
</code></pre>
<p><strong>Example advanced configuration:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Colima configuration</span>
<span class="hljs-attr">cpu:</span> <span class="hljs-number">4</span>
<span class="hljs-attr">memory:</span> <span class="hljs-number">8</span>
<span class="hljs-attr">disk:</span> <span class="hljs-number">100</span>

<span class="hljs-comment"># VM settings</span>
<span class="hljs-attr">runtime:</span> <span class="hljs-string">docker</span>
<span class="hljs-attr">kubernetes:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">version:</span> <span class="hljs-string">v1.32.0</span>
  <span class="hljs-attr">ingress:</span> <span class="hljs-literal">true</span>  <span class="hljs-comment"># Automatically install nginx-ingress</span>

<span class="hljs-comment"># Network settings</span>
<span class="hljs-attr">network:</span>
  <span class="hljs-attr">address:</span> <span class="hljs-literal">true</span>
  <span class="hljs-attr">driver:</span> <span class="hljs-string">slirp</span>

<span class="hljs-comment"># DNS settings</span>
<span class="hljs-attr">dns:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-number">8.8</span><span class="hljs-number">.8</span><span class="hljs-number">.8</span>
  <span class="hljs-bullet">-</span> <span class="hljs-number">1.1</span><span class="hljs-number">.1</span><span class="hljs-number">.1</span>

<span class="hljs-comment"># Port forwarding</span>
<span class="hljs-attr">forward:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-number">8080</span><span class="hljs-string">:80</span>
  <span class="hljs-bullet">-</span> <span class="hljs-number">5432</span><span class="hljs-string">:5432</span>

<span class="hljs-comment"># Volume mounts</span>
<span class="hljs-attr">mounts:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">location:</span> <span class="hljs-string">~/projects</span>
    <span class="hljs-attr">writable:</span> <span class="hljs-literal">true</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">location:</span> <span class="hljs-string">/tmp/colima</span>
    <span class="hljs-attr">writable:</span> <span class="hljs-literal">true</span>

<span class="hljs-comment"># Environment variables</span>
<span class="hljs-attr">env:</span>
  <span class="hljs-attr">DOCKER_BUILDKIT:</span> <span class="hljs-string">"1"</span>
</code></pre>
<h3 id="heading-setting-up-ingress"><strong>Setting Up Ingress</strong></h3>
<p><strong>Enable ingress for better service routing:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start with ingress enabled</span>
colima start --kubernetes --kubernetes-ingress

<span class="hljs-comment"># Or install manually</span>
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
</code></pre>
<p><strong>Example Ingress resource:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Ingress</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nginx-ingress</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">rules:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">host:</span> <span class="hljs-string">nginx.local</span>
    <span class="hljs-attr">http:</span>
      <span class="hljs-attr">paths:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">path:</span> <span class="hljs-string">/</span>
        <span class="hljs-attr">pathType:</span> <span class="hljs-string">Prefix</span>
        <span class="hljs-attr">backend:</span>
          <span class="hljs-attr">service:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">nginx-service</span>
            <span class="hljs-attr">port:</span>
              <span class="hljs-attr">number:</span> <span class="hljs-number">80</span>
</code></pre>
<p><strong>Add to</strong> <code>/etc/hosts</code><strong>:</strong></p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"127.0.0.1 nginx.local"</span> | sudo tee -a /etc/hosts
</code></pre>
<h2 id="heading-persistent-storage"><strong>Persistent Storage</strong></h2>
<p>Colima now uses separate disks for container data, protecting against accidental data loss:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Data persists even after deleting the instance</span>
colima delete

<span class="hljs-comment"># Restart and data is restored</span>
colima start

<span class="hljs-comment"># To delete everything including data</span>
colima delete --data
</code></pre>
<h2 id="heading-performance-optimization-tips"><strong>Performance Optimization Tips</strong></h2>
<h3 id="heading-1-use-vz-on-modern-macs"><strong>1. Use VZ on Modern Macs</strong></h3>
<p>On macOS 13+ (Ventura or later), use Apple’s native virtualization:</p>
<p>bash</p>
<pre><code class="lang-bash">colima start --vm-type vz --mount-type virtiofs
</code></pre>
<p><strong>Performance improvements:</strong></p>
<ul>
<li><p>~30% faster startup</p>
</li>
<li><p>Better CPU efficiency</p>
</li>
<li><p>Improved file system performance</p>
</li>
</ul>
<h3 id="heading-2-optimize-for-your-workload"><strong>2. Optimize for Your Workload</strong></h3>
<p><strong>For CPU-intensive tasks:</strong></p>
<pre><code class="lang-bash">colima start --cpu 8 --cpu-type max
</code></pre>
<p><strong>For memory-intensive tasks:</strong></p>
<pre><code class="lang-bash">colima start --memory 16 --swap 0
</code></pre>
<p><strong>For disk-intensive tasks:</strong></p>
<pre><code class="lang-bash">colima start --disk 200 --mount-type virtiofs
</code></pre>
<h3 id="heading-3-resource-monitoring"><strong>3. Resource Monitoring</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Check resource usage</span>
colima status

<span class="hljs-comment"># SSH into the VM to check resources</span>
colima ssh
top
df -h
free -m
<span class="hljs-built_in">exit</span>
</code></pre>
<h3 id="heading-4-build-performance"><strong>4. Build Performance</strong></h3>
<p>Enable BuildKit for faster Docker builds:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> DOCKER_BUILDKIT=1

<span class="hljs-comment"># Or add to your shell profile</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">'export DOCKER_BUILDKIT=1'</span> &gt;&gt; ~/.zshrc
</code></pre>
<h2 id="heading-troubleshooting-common-issues"><strong>Troubleshooting Common Issues</strong></h2>
<h3 id="heading-issue-1-kubernetes-not-starting"><strong>Issue 1: Kubernetes Not Starting</strong></h3>
<p><strong>Symptoms:</strong> <code>kubectl</code> commands hang or fail</p>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Stop and start fresh</span>
colima stop
colima start --kubernetes

<span class="hljs-comment"># Check logs</span>
colima logs

<span class="hljs-comment"># Verify kubectl config</span>
kubectl config view
kubectl cluster-info
</code></pre>
<h3 id="heading-issue-2-loadbalancer-services-stuck-in-pending"><strong>Issue 2: LoadBalancer Services Stuck in Pending</strong></h3>
<p><strong>Symptoms:</strong> <code>EXTERNAL-IP</code> shows <code>&lt;pending&gt;</code></p>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Restart with network address enabled</span>
colima stop
colima start --kubernetes --network-address

<span class="hljs-comment"># Verify networking</span>
colima status
</code></pre>
<h3 id="heading-issue-3-docker-context-issues"><strong>Issue 3: Docker Context Issues</strong></h3>
<p><strong>Symptoms:</strong> <code>docker</code> commands fail with connection errors</p>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># List available contexts</span>
docker context ls

<span class="hljs-comment"># Switch to Colima context</span>
docker context use colima

<span class="hljs-comment"># Or set as default</span>
docker context use colima --default
</code></pre>
<h3 id="heading-issue-4-volume-mount-performance"><strong>Issue 4: Volume Mount Performance</strong></h3>
<p><strong>Symptoms:</strong> Slow file I/O in containers</p>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Use virtiofs on modern Macs</span>
colima stop
colima start --mount-type virtiofs

<span class="hljs-comment"># Or use sshfs for better compatibility</span>
colima start --mount-type sshfs
</code></pre>
<h3 id="heading-issue-5-port-conflicts"><strong>Issue 5: Port Conflicts</strong></h3>
<p><strong>Symptoms:</strong> “Port already in use” errors</p>
<p><strong>Solution:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check what's using the port</span>
lsof -i :80

<span class="hljs-comment"># Use a different profile</span>
colima start -p myproject --edit
<span class="hljs-comment"># Edit port forwards in the config file</span>
</code></pre>
<h2 id="heading-integration-with-development-tools"><strong>Integration with Development Tools</strong></h2>
<h3 id="heading-vs-code-integration"><strong>VS Code Integration</strong></h3>
<p>Install the Docker extension and configure it to use Colima:</p>
<pre><code class="lang-bash">// settings.json
{
  <span class="hljs-string">"docker.dockerPath"</span>: <span class="hljs-string">"docker"</span>,
  <span class="hljs-string">"docker.dockerComposePath"</span>: <span class="hljs-string">"docker-compose"</span>,
  <span class="hljs-string">"kubernetes.kubectlPath"</span>: <span class="hljs-string">"/usr/local/bin/kubectl"</span>
}
</code></pre>
<h3 id="heading-helm-integration"><strong>Helm Integration</strong></h3>
<p>Helm works seamlessly with Colima:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install Helm</span>
brew install helm

<span class="hljs-comment"># Add a repo</span>
helm repo add bitnami https://charts.bitnami.com/bitnami

<span class="hljs-comment"># Install a chart</span>
helm install my-nginx bitnami/nginx
</code></pre>
<h3 id="heading-skaffold-for-rapid-development"><strong>Skaffold for Rapid Development</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Install Skaffold</span>
brew install skaffold

<span class="hljs-comment"># Initialize in your project</span>
skaffold init

<span class="hljs-comment"># Start development mode</span>
skaffold dev
</code></pre>
<p>Skaffold automatically detects Colima and rebuilds/redeploys on code changes.</p>
<h3 id="heading-docker-compose"><strong>Docker Compose</strong></h3>
<p>Docker Compose works out of the box:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install Docker Compose</span>
brew install docker-compose

<span class="hljs-comment"># Run your compose file</span>
docker-compose up -d
</code></pre>
<h2 id="heading-migration-from-docker-desktop"><strong>Migration from Docker Desktop</strong></h2>
<p>Switching from Docker Desktop to Colima is straightforward:</p>
<h3 id="heading-step-1-export-your-data"><strong>Step 1: Export Your Data</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># List your containers</span>
docker ps -a

<span class="hljs-comment"># Save important images</span>
docker save my-app:latest -o my-app.tar
</code></pre>
<h3 id="heading-step-2-stop-docker-desktop"><strong>Step 2: Stop Docker Desktop</strong></h3>
<p>Quit Docker Desktop from the menu bar.</p>
<h3 id="heading-step-3-start-colima"><strong>Step 3: Start Colima</strong></h3>
<pre><code class="lang-bash">colima start --cpu 4 --memory 8 --disk 100
</code></pre>
<h3 id="heading-step-4-import-your-data"><strong>Step 4: Import Your Data</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Load saved images</span>
docker load -i my-app.tar

<span class="hljs-comment"># Verify</span>
docker images
</code></pre>
<h3 id="heading-step-5-update-your-workflow"><strong>Step 5: Update Your Workflow</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Replace Docker Desktop commands with:</span>
colima start    <span class="hljs-comment"># Instead of starting Docker Desktop</span>
colima stop     <span class="hljs-comment"># Instead of quitting Docker Desktop</span>
colima status   <span class="hljs-comment"># To check if it's running</span>
</code></pre>
<p><strong>Gotchas to Watch For:</strong></p>
<ul>
<li><p>Docker Desktop Kubernetes vs Colima K3s have slight differences</p>
</li>
<li><p>Some Docker Desktop-specific features (like file watching) work differently</p>
</li>
<li><p>Volume paths may need adjustment</p>
</li>
</ul>
<h2 id="heading-best-practices-for-daily-use"><strong>Best Practices for Daily Use</strong></h2>
<h3 id="heading-1-create-project-specific-profiles"><strong>1. Create Project-Specific Profiles</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># For each major project</span>
colima start -p frontend --cpu 2 --memory 4
colima start -p backend --cpu 4 --memory 8 --kubernetes
colima start -p testing --kubernetes --arch x86_64
</code></pre>
<h3 id="heading-2-automate-startup"><strong>2. Automate Startup</strong></h3>
<p>Add to your shell profile (<code>~/.zshrc</code> or <code>~/.bashrc</code>):</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Auto-start default profile if not running</span>
<span class="hljs-keyword">if</span> ! colima status &amp;&gt; /dev/null; <span class="hljs-keyword">then</span>
  colima start
<span class="hljs-keyword">fi</span>
</code></pre>
<h3 id="heading-3-use-docker-contexts"><strong>3. Use Docker Contexts</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Switch between profiles easily</span>
<span class="hljs-built_in">alias</span> dk-dev=<span class="hljs-string">'docker context use colima-dev'</span>
<span class="hljs-built_in">alias</span> dk-k8s=<span class="hljs-string">'docker context use colima-k8s'</span>
</code></pre>
<h3 id="heading-4-resource-management"><strong>4. Resource Management</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Stop unused profiles to save resources</span>
colima stop -p old-project

<span class="hljs-comment"># Clean up regularly</span>
docker system prune -a
colima stop &amp;&amp; colima start  <span class="hljs-comment"># Fresh start</span>
</code></pre>
<h3 id="heading-5-backup-important-data"><strong>5. Backup Important Data</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Export containers you care about</span>
docker <span class="hljs-built_in">export</span> my-container &gt; backup.tar

<span class="hljs-comment"># Save images</span>
docker save my-app:v1 -o my-app-v1.tar
</code></pre>
<h2 id="heading-real-world-devops-testing-scenarios"><strong>Real-World DevOps Testing Scenarios</strong></h2>
<h3 id="heading-scenario-1-testing-helm-chart-changes"><strong>Scenario 1: Testing Helm Chart Changes</strong></h3>
<p>You’ve modified a Helm chart and need to validate it before pushing to production:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start a dedicated profile for Helm testing</span>
colima start -p helm-test \
  --kubernetes \
  --cpu 4 \
  --memory 8

<span class="hljs-comment"># Test your chart</span>
helm install my-app ./charts/my-app --dry-run --debug
helm install my-app ./charts/my-app
helm <span class="hljs-built_in">test</span> my-app

<span class="hljs-comment"># Validate the deployment</span>
kubectl get all -l app=my-app
kubectl logs -l app=my-app

<span class="hljs-comment"># Clean up for next test</span>
helm uninstall my-app
</code></pre>
<h3 id="heading-scenario-2-validating-kubernetes-manifests"><strong>Scenario 2: Validating Kubernetes Manifests</strong></h3>
<p>Before committing YAML changes, test them locally:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start a quick test environment</span>
colima start -p manifest-test \
  --kubernetes \
  --cpu 2 \
  --memory 4

<span class="hljs-comment"># Validate syntax</span>
kubectl apply --dry-run=client -f k8s/

<span class="hljs-comment"># Apply and test</span>
kubectl apply -f k8s/
kubectl <span class="hljs-built_in">wait</span> --<span class="hljs-keyword">for</span>=condition=ready pod -l app=myapp --timeout=60s

<span class="hljs-comment"># Check for issues</span>
kubectl get events --sort-by=<span class="hljs-string">'.lastTimestamp'</span>
kubectl describe pods -l app=myapp
</code></pre>
<h3 id="heading-scenario-3-testing-ingress-configurations"><strong>Scenario 3: Testing Ingress Configurations</strong></h3>
<p>Validate ingress rules and SSL configurations:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start with ingress enabled</span>
colima start -p ingress-test \
  --kubernetes \
  --kubernetes-ingress \
  --cpu 4 \
  --memory 6

<span class="hljs-comment"># Apply your ingress</span>
kubectl apply -f ingress.yaml

<span class="hljs-comment"># Test locally</span>
curl -H <span class="hljs-string">"Host: myapp.local"</span> http://localhost
curl -k -H <span class="hljs-string">"Host: myapp.local"</span> https://localhost
</code></pre>
<h3 id="heading-scenario-4-disaster-recovery-drills"><strong>Scenario 4: Disaster Recovery Drills</strong></h3>
<p>Practice your disaster recovery procedures:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create a test cluster</span>
colima start -p dr-drill \
  --kubernetes \
  --cpu 4 \
  --memory 8

<span class="hljs-comment"># Deploy your application</span>
kubectl apply -f production-manifests/

<span class="hljs-comment"># Take a backup (using Velero or similar)</span>
velero backup create test-backup

<span class="hljs-comment"># Simulate disaster - delete everything</span>
kubectl delete namespace production

<span class="hljs-comment"># Practice recovery</span>
velero restore create --from-backup test-backup

<span class="hljs-comment"># Validate recovery</span>
kubectl get all -n production
</code></pre>
<h3 id="heading-scenario-5-cicd-pipeline-testing"><strong>Scenario 5: CI/CD Pipeline Testing</strong></h3>
<p>Test your deployment pipeline before running it in production:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create a pipeline test environment</span>
colima start -p ci-test \
  --kubernetes \
  --cpu 6 \
  --memory 10

<span class="hljs-comment"># Run your deployment script locally</span>
./scripts/deploy.sh --env=staging --dry-run
./scripts/deploy.sh --env=staging

<span class="hljs-comment"># Verify the deployment</span>
./scripts/smoke-tests.sh
</code></pre>
<h3 id="heading-scenario-6-testing-resource-limits"><strong>Scenario 6: Testing Resource Limits</strong></h3>
<p>Validate that your resource requests and limits are properly configured:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Start a cluster</span>
colima start -p resource-test --kubernetes

<span class="hljs-comment"># Deploy with resource constraints</span>
kubectl apply -f deployment-with-limits.yaml

<span class="hljs-comment"># Monitor resource usage</span>
kubectl top pods
kubectl top nodes

<span class="hljs-comment"># Test under load</span>
kubectl run -it --rm load-generator \
  --image=busybox \
  --restart=Never \
  -- /bin/sh -c <span class="hljs-string">"while true; do wget -q -O- http://my-service; done"</span>
</code></pre>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Colima has become an essential tool for DevOps testing workflows, and here’s why it matters:</p>
<p>✅ <strong>Fast feedback loops</strong> — Test changes in seconds, not minutes<br />✅ <strong>Cost-effective</strong> — No cloud costs for every test iteration<br />✅ <strong>Isolated environments</strong> — Multiple profiles for different testing scenarios ✅ <strong>Production-like</strong> — Real Kubernetes, not a simulation<br />✅ <strong>No licensing hassles</strong> — Free for commercial use</p>
<p><strong>Perfect For:</strong></p>
<ul>
<li><p>Testing Helm charts before deployment</p>
</li>
<li><p>Validating Kubernetes manifests locally</p>
</li>
<li><p>Disaster recovery drills</p>
</li>
<li><p>CI/CD pipeline development</p>
</li>
<li><p>Experimenting with new Kubernetes features</p>
</li>
<li><p>Training and knowledge transfer</p>
</li>
</ul>
<p><strong>Quick Start for DevOps Testing:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install</span>
brew install colima docker kubectl

<span class="hljs-comment"># Start with Kubernetes</span>
colima start --kubernetes --network-address

<span class="hljs-comment"># Test a deployment</span>
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --<span class="hljs-built_in">type</span>=LoadBalancer

<span class="hljs-comment"># Verify</span>
kubectl get all
</code></pre>
<p>That’s it. You now have a disposable Kubernetes environment that you can use, break, and recreate whenever you need to test something. No cloud costs, no complex setup, just a simple tool that gets out of your way.</p>
<p>For DevOps engineers who need to test quickly and iterate fast, Colima is a game-changer.</p>
<h2 id="heading-additional-resources"><strong>Additional Resources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://github.com/abiosoft/colima">Colima GitHub Repository</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/abiosoft/colima/blob/main/docs/README.md">Colima Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/lima-vm/lima">Lima Project</a></p>
</li>
<li><p><a target="_blank" href="https://k3s.io/">K3s Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://kubernetes.io/docs/tutorials/">Kubernetes Learning Resources</a></p>
</li>
</ul>
<p><em>Have you switched from Docker Desktop to Colima? What’s your experience been like? Share your thoughts in the comments!</em></p>
]]></content:encoded></item><item><title><![CDATA[ArgoCD 3.2: The Latest Stable Release Is Here]]></title><description><![CDATA[A deep dive into the newest features, improvements, and what you need to know before upgrading.
The GitOps community just received a significant update. ArgoCD 3.2.0 was released as a stable version on November 5th, 2025. If you’re running ArgoCD in ...]]></description><link>https://devops-blog.ruicoelho.dev/argocd-32-the-latest-stable-release-is-here</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/argocd-32-the-latest-stable-release-is-here</guid><category><![CDATA[ArgoCD]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[gitops]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:36:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763487313748/ca2719d0-f593-4d7b-950e-0f832b2bc4f6.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A deep dive into the newest features, improvements, and what you need to know before upgrading.</p>
<p>The GitOps community just received a significant update. ArgoCD 3.2.0 was released as a stable version on November 5th, 2025. If you’re running ArgoCD in production, this release deserves your immediate attention — especially if you’re still on version 2.14, which officially reached End of Life on November 4th, 2025.</p>
<p>In this article, we’ll explore what’s new, what’s changing, and how to prepare your team for the upgrade.</p>
<h2 id="heading-the-context-why-argocd-32-matters"><strong>The Context: Why ArgoCD 3.2 Matters</strong></h2>
<p>Before diving into the features, let’s understand where ArgoCD 3.2 fits in the bigger picture.</p>
<h2 id="heading-the-evolution-of-argocd-3x"><strong>The Evolution of ArgoCD 3.x</strong></h2>
<p>ArgoCD 3.0, released in early 2025, was a foundational release that introduced significant architectural improvements without being a risky upgrade. It refined RBAC controls, improved resource exclusions, and updated secrets management guidance.</p>
<p>Version 3.1, launched in August 2025, brought game-changing features like native OCI registry support, CLI plugins, and enhanced Source Hydrator functionality. These additions positioned ArgoCD as a more versatile GitOps tool for enterprise adoption.</p>
<h2 id="heading-the-critical-timeline"><strong>The Critical Timeline</strong></h2>
<p>Here’s what you need to know: <strong>ArgoCD 2.14 reached End of Life (EOL) on November 4th, 2025</strong>. According to ArgoCD’s support policy, only the three most recent minor versions receive security updates and bug fixes. This means:</p>
<ul>
<li><p>Currently supported: 3.2, 3.1, and 3.0</p>
</li>
<li><p>No longer supported: 2.14 and earlier versions</p>
</li>
<li><p>No more security patches or bug fixes for 2.14</p>
</li>
</ul>
<p>If you’re still on 2.14 or earlier, you’re running an unsupported version. Your upgrade should be a top priority.</p>
<h2 id="heading-whats-new-in-argocd-32"><strong>What’s New in ArgoCD 3.2</strong></h2>
<p>Let’s break down the key features and improvements coming in this release.</p>
<h2 id="heading-1-enhanced-applicationset-progressive-sync"><strong>1. Enhanced ApplicationSet Progressive Sync</strong></h2>
<p>Progressive Sync for ApplicationSets has received significant improvements in 3.2. This feature allows you to roll out changes gradually across multiple applications, which is crucial for risk management in production environments.</p>
<p><strong>What’s improved:</strong></p>
<ul>
<li><p><strong>Better UI visibility</strong>: The ApplicationSet UI now properly displays Progressive Sync status, resolving the “Unknown” state issue that plagued previous versions</p>
</li>
<li><p><strong>Status cleanup</strong>: When Progressive Sync is disabled, the ApplicationSet now correctly clears the <code>applicationStatus</code> field, preventing stale data</p>
</li>
<li><p><strong>Resource count tracking</strong>: A new <code>status.resourcesCount</code> field provides visibility into the number of resources managed by each ApplicationSet</p>
</li>
</ul>
<p>This last point is particularly important. Large ApplicationSets could previously cause performance issues due to unbounded resource tracking. The new resource count limit helps prevent this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">argoproj.io/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ApplicationSet</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">my-appset</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-comment"># ... your spec</span>
<span class="hljs-attr">status:</span>
  <span class="hljs-attr">resourcesCount:</span> <span class="hljs-number">150</span>  <span class="hljs-comment"># New field</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-comment"># Limited list of resources</span>
</code></pre>
<h2 id="heading-2-memory-optimization-for-webhook-handlers"><strong>2. Memory Optimization for Webhook Handlers</strong></h2>
<p>If you’re operating large monorepos or high-traffic environments, you’ll appreciate this: ArgoCD 3.2 optimizes webhook handlers to use informers instead of direct API calls. This change significantly reduces memory consumption during webhook processing.</p>
<p><strong>Why it matters</strong>: In environments with frequent Git commits or large repositories, webhook processing could cause memory spikes. The new implementation is more efficient and stable.</p>
<p><strong>Important note</strong>: Users with very large monorepos may still encounter repo-server lock contention requiring pod restarts. The ArgoCD team has acknowledged this issue, and a fix is planned for the next patch release (3.2.1).</p>
<h2 id="heading-3-updated-health-checks"><strong>3. Updated Health Checks</strong></h2>
<p>Health assessment is a critical part of ArgoCD’s functionality. Version 3.2 includes several health check updates:</p>
<ul>
<li><p><strong>Crossplane V2 support</strong>: Health checks now support Crossplane V2 resources, reflecting the evolution of the Crossplane ecosystem</p>
</li>
<li><p><strong>External Secrets Operator</strong>: The ExternalSecret discovery script now includes the <code>refreshPolicy</code> field for more accurate health assessment</p>
</li>
<li><p><strong>PromotionStrategy corrections</strong>: Fixed typos in the Promotion health checks that could cause false negatives</p>
</li>
</ul>
<h2 id="heading-4-oci-registry-improvements"><strong>4. OCI Registry Improvements</strong></h2>
<p>Building on the OCI support introduced in 3.1, version 3.2 loosens layer restrictions, making it easier to use OCI registries for storing Kubernetes configurations. This is part of ArgoCD’s broader strategy to treat configuration artifacts with the same maturity as container images.</p>
<h2 id="heading-5-cli-and-notifications-fixes"><strong>5. CLI and Notifications Fixes</strong></h2>
<p>Several quality-of-life improvements for CLI users:</p>
<ul>
<li><p>The notifications CLI now properly initializes the <code>argocdService</code>, fixing initialization issues</p>
</li>
<li><p>Webhook payload handlers now gracefully recover from panics instead of crashing</p>
</li>
<li><p>Various documentation improvements and bug fixes</p>
</li>
</ul>
<h2 id="heading-breaking-changes-and-migration-considerations"><strong>Breaking Changes and Migration Considerations</strong></h2>
<p>ArgoCD 3.2 maintains the low-risk upgrade philosophy of the 3.x series. However, there are some considerations:</p>
<h2 id="heading-coming-from-argocd-214"><strong>Coming from ArgoCD 2.14</strong></h2>
<p>If you’re upgrading from 2.14, you’ll need to account for all the breaking changes introduced in 3.0 and 3.1:</p>
<p><strong>Major changes from 3.0:</strong></p>
<ul>
<li><p>Fine-grained RBAC no longer applies to sub-resources by default</p>
</li>
<li><p>Health status persistence changes (now disabled by default)</p>
</li>
<li><p>Default resource exclusions for high-churn resources</p>
</li>
<li><p>Dex authentication claim changes (uses <code>federated_claims.user_id</code> instead of <code>sub</code>)</p>
</li>
</ul>
<p><strong>Major changes from 3.1:</strong></p>
<ul>
<li><p>OCI registry support enabled</p>
</li>
<li><p>CLI plugins architecture</p>
</li>
<li><p>Source Hydrator enhancements</p>
</li>
</ul>
<h2 id="heading-recommended-upgrade-path"><strong>Recommended Upgrade Path</strong></h2>
<ol>
<li><p><strong>Read the docs first</strong>: Review the <a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/upgrading/overview/">official upgrade guide</a> for your version</p>
</li>
<li><p><strong>Test in non-production</strong>: Deploy the RC in a staging environment</p>
</li>
<li><p><strong>Backup your state</strong>: Ensure you have backups of your ApplicationSet and Application resources</p>
</li>
<li><p><strong>Plan for RBAC</strong>: If coming from 2.x, audit your RBAC policies</p>
</li>
<li><p><strong>Monitor after upgrade</strong>: Watch for memory usage patterns and webhook processing</p>
</li>
</ol>
<h2 id="heading-real-world-impact-who-benefits-most"><strong>Real-World Impact: Who Benefits Most?</strong></h2>
<h2 id="heading-platform-engineering-teams"><strong>Platform Engineering Teams</strong></h2>
<p>If you’re building internal developer platforms, the ApplicationSet improvements will help you manage hundreds of applications more efficiently. The resource count tracking and memory optimizations mean you can scale further without hitting performance walls.</p>
<h2 id="heading-large-monorepo-users"><strong>Large Monorepo Users</strong></h2>
<p>The webhook handler optimizations directly address pain points for teams managing large repositories. Less memory pressure means more stable ArgoCD instances during high-commit periods.</p>
<h2 id="heading-multi-tenant-environments"><strong>Multi-Tenant Environments</strong></h2>
<p>The continued refinement of RBAC and the stability improvements make ArgoCD 3.2 more suitable for multi-tenant setups where different teams share the same ArgoCD instance but need strict isolation.</p>
<h2 id="heading-crossplane-users"><strong>Crossplane Users</strong></h2>
<p>If you’re adopting Crossplane for infrastructure management alongside ArgoCD for application deployment, the updated health checks for Crossplane V2 ensure better visibility into your control plane resources.</p>
<h2 id="heading-release-timeline-and-whats-next"><strong>Release Timeline and What’s Next</strong></h2>
<p>ArgoCD 3.2.0 was released as a stable version on November 5th, 2025. Here’s what to expect moving forward:</p>
<ul>
<li><p><strong>Current stable version</strong>: 3.2.0</p>
</li>
<li><p><strong>Supported versions</strong>: 3.2, 3.1, and 3.0</p>
</li>
<li><p><strong>Next release</strong>: 3.3 is expected in approximately 3 months (following the quarterly release cadence)</p>
</li>
<li><p><strong>Patch releases</strong>: Bug fixes and security updates will be released as 3.2.x versions as needed</p>
</li>
</ul>
<p>The first patch release (3.2.1) is expected soon to address the large monorepo lock contention issue.</p>
<h2 id="heading-hands-on-installing-argocd-32"><strong>Hands-On: Installing ArgoCD 3.2</strong></h2>
<p>Ready to try the latest stable release? Here’s how to deploy ArgoCD 3.2.0 in your cluster:</p>
<h2 id="heading-installation-via-helm-recommended"><strong>Installation via Helm (Recommended)</strong></h2>
<p>Helm is the recommended way to install ArgoCD in production environments as it provides better configuration management and easier upgrades.</p>
<p><strong>Add the ArgoCD Helm repository:</strong></p>
<pre><code class="lang-bash">helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
</code></pre>
<p><strong>Install ArgoCD 3.2.0:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create namespace</span>
kubectl create namespace argocd

<span class="hljs-comment"># Install with default configuration</span>
helm install argocd argo/argo-cd \
  --namespace argocd \
  --version 9.1.0

<span class="hljs-comment"># Or install with custom values</span>
helm install argocd argo/argo-cd \
  --namespace argocd \
  --version 9.1.0 \
  --values values.yaml
</code></pre>
<p><strong>Example values.yaml for production:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Enable HA mode</span>
<span class="hljs-attr">redis-ha:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">controller:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">server:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">2</span>
  <span class="hljs-attr">ingress:</span>
    <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">ingressClassName:</span> <span class="hljs-string">nginx</span>
    <span class="hljs-attr">hosts:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">argocd.yourdomain.com</span>
    <span class="hljs-attr">tls:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">secretName:</span> <span class="hljs-string">argocd-tls</span>
        <span class="hljs-attr">hosts:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">argocd.yourdomain.com</span>

<span class="hljs-attr">repoServer:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">2</span>

<span class="hljs-attr">applicationSet:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">2</span>

<span class="hljs-comment"># Enable notifications</span>
<span class="hljs-attr">notifications:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>

<span class="hljs-comment"># Configure resource limits</span>
<span class="hljs-attr">configs:</span>
  <span class="hljs-attr">params:</span>
    <span class="hljs-attr">server.insecure:</span> <span class="hljs-literal">false</span>
</code></pre>
<p><strong>Upgrade from previous version:</strong></p>
<pre><code class="lang-bash">helm upgrade argocd argo/argo-cd \
  --namespace argocd \
  --version 9.1.0 \
  --values values.yaml
</code></pre>
<h2 id="heading-installation-via-kubectl-quick-start"><strong>Installation via kubectl (Quick Start)</strong></h2>
<p>For testing or non-production environments, you can use kubectl directly:</p>
<p><strong>Non-HA Installation:</strong></p>
<pre><code class="lang-bash">kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.2.0/manifests/install.yaml
</code></pre>
<p><strong>High Availability Installation:</strong></p>
<pre><code class="lang-bash">kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.2.0/manifests/ha/install.yaml
</code></pre>
<h2 id="heading-verifying-the-installation"><strong>Verifying the Installation</strong></h2>
<p><strong>Check ArgoCD pods status:</strong></p>
<pre><code class="lang-bash">kubectl get pods -n argocd

<span class="hljs-comment"># Expected output (HA installation):</span>
<span class="hljs-comment"># NAME                                               READY   STATUS    RESTARTS   AGE</span>
<span class="hljs-comment"># argocd-application-controller-0                    1/1     Running   0          2m</span>
<span class="hljs-comment"># argocd-applicationset-controller-xxx               1/1     Running   0          2m</span>
<span class="hljs-comment"># argocd-dex-server-xxx                              1/1     Running   0          2m</span>
<span class="hljs-comment"># argocd-notifications-controller-xxx                1/1     Running   0          2m</span>
<span class="hljs-comment"># argocd-redis-ha-haproxy-xxx                        1/1     Running   0          2m</span>
<span class="hljs-comment"># argocd-redis-ha-server-0                           2/2     Running   0          2m</span>
<span class="hljs-comment"># argocd-repo-server-xxx                             1/1     Running   0          2m</span>
<span class="hljs-comment"># argocd-server-xxx                                  1/1     Running   0          2m</span>
</code></pre>
<p><strong>Verify the ArgoCD version:</strong></p>
<p>bash</p>
<pre><code class="lang-bash">kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o jsonpath=<span class="hljs-string">'{.items[0].spec.containers[0].image}'</span>

<span class="hljs-comment"># Should return: quay.io/argoproj/argocd:v3.2.0</span>
</code></pre>
<p><strong>Access the ArgoCD UI:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Port-forward (for quick access)</span>
kubectl port-forward svc/argocd-server -n argocd 8080:443

<span class="hljs-comment"># Then access: https://localhost:8080</span>
</code></pre>
<p><strong>Get the initial admin password:</strong></p>
<p>bash</p>
<pre><code class="lang-bash"><span class="hljs-comment"># For Helm installations</span>
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath=<span class="hljs-string">"{.data.password}"</span> | base64 -d &amp;&amp; <span class="hljs-built_in">echo</span>

<span class="hljs-comment"># For kubectl installations (same command)</span>
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath=<span class="hljs-string">"{.data.password}"</span> | base64 -d &amp;&amp; <span class="hljs-built_in">echo</span>
</code></pre>
<p><strong>Login via CLI:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install ArgoCD CLI</span>
brew install argocd  <span class="hljs-comment"># macOS</span>

<span class="hljs-comment"># or</span>
curl -sSL -o argocd https://github.com/argoproj/argo-cd/releases/download/v3.2.0/argocd-linux-amd64
chmod +x argocd
sudo mv argocd /usr/<span class="hljs-built_in">local</span>/bin/

<span class="hljs-comment"># Login</span>
argocd login localhost:8080 --username admin --password &lt;password&gt; --insecure
<span class="hljs-comment"># Verify version</span>
argocd version
</code></pre>
<h2 id="heading-what-to-test"><strong>What to Test</strong></h2>
<p>Focus your testing on:</p>
<ol>
<li><p><strong>ApplicationSet Progressive Sync</strong>: Create an ApplicationSet with progressive sync enabled and verify the UI shows proper status</p>
</li>
<li><p><strong>Memory usage</strong>: Monitor memory consumption during webhook processing</p>
</li>
<li><p><strong>Health checks</strong>: If you use Crossplane, verify V2 resources show correct health status</p>
</li>
<li><p><strong>RBAC</strong>: Validate your existing policies work as expected</p>
</li>
</ol>
<h2 id="heading-looking-ahead-the-future-of-argocd"><strong>Looking Ahead: The Future of ArgoCD</strong></h2>
<p>ArgoCD 3.2 continues the project’s trajectory toward becoming the definitive GitOps tool for Kubernetes. Some trends worth watching:</p>
<h2 id="heading-oci-native-configuration"><strong>OCI-Native Configuration</strong></h2>
<p>The push toward OCI registries suggests a future where Kubernetes configurations are treated as first-class artifacts, with the same supply chain security guarantees as container images.</p>
<h2 id="heading-progressive-delivery-integration"><strong>Progressive Delivery Integration</strong></h2>
<p>The improvements to Progressive Sync hint at deeper integration with progressive delivery patterns. We may see more sophisticated rollout strategies in future versions.</p>
<h2 id="heading-platform-engineering-enablement"><strong>Platform Engineering Enablement</strong></h2>
<p>With better scalability and multi-tenancy support, ArgoCD is positioning itself as a critical component of internal developer platforms, not just a deployment tool.</p>
<h2 id="heading-action-items-upgrading-to-argocd-32"><strong>Action Items: Upgrading to ArgoCD 3.2</strong></h2>
<p>Here’s your checklist:</p>
<p><strong>Immediate (if on 2.14 or earlier — you’re on EOL!):</strong></p>
<ul>
<li><p><strong>Upgrade immediately</strong> — you’re no longer receiving security patches</p>
</li>
<li><p>Review the 3.0, 3.1, and 3.2 release notes</p>
</li>
<li><p>Audit your RBAC policies for 3.0 compatibility</p>
</li>
<li><p>Set up a test environment with ArgoCD 3.2</p>
</li>
<li><p>Test your critical ApplicationSets and Applications</p>
</li>
</ul>
<p><strong>Before production upgrade:</strong></p>
<ul>
<li><p>Document your current ArgoCD configuration</p>
</li>
<li><p>Backup your ArgoCD state</p>
</li>
<li><p>Create a rollback plan</p>
</li>
<li><p>Schedule a maintenance window for production upgrade</p>
</li>
<li><p>Review the known issues (large monorepo lock contention)</p>
</li>
</ul>
<p><strong>Post-upgrade:</strong></p>
<ul>
<li><p>Monitor memory usage patterns</p>
</li>
<li><p>Validate all Applications sync correctly</p>
</li>
<li><p>Check webhook processing performance</p>
</li>
<li><p>Review ApplicationSet statuses</p>
</li>
<li><p>Watch for 3.2.1 patch release if you have large monorepos</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>ArgoCD 3.2 is now stable and production-ready. This release represents a measured evolution of the platform — not revolutionary, but a refinement that makes ArgoCD more stable, performant, and user-friendly. The ApplicationSet improvements, memory optimizations, and updated health checks address real pain points that operators face in production.</p>
<p><strong>If you’re on ArgoCD 2.14, you’re running an unsupported version.</strong> Upgrade immediately. If you’re already on 3.0 or 3.1, upgrading to 3.2 should be low-risk, but still test thoroughly, especially if you operate large monorepos.</p>
<p>The GitOps ecosystem continues to mature, and ArgoCD remains at the forefront. Version 3.2 is another solid step in that journey, and with the EOL of 2.14, there’s no better time to upgrade than now.</p>
<h2 id="heading-additional-resources"><strong>Additional Resources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/">ArgoCD Official Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/argoproj/argo-cd/releases">ArgoCD GitHub Releases</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/argoproj/argo-cd/releases/tag/v3.2.0">ArgoCD 3.2.0 Release Notes</a></p>
</li>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/upgrading/overview/">Upgrading to ArgoCD 3.x Guide</a></p>
</li>
<li><p><a target="_blank" href="https://argo-cd.readthedocs.io/en/stable/operator-manual/upgrading/2.14-3.0/">ArgoCD 2.14 to 3.0 Migration Guide</a></p>
</li>
</ul>
<p><em>Have you upgraded to ArgoCD 3.2 yet? What’s your experience been like? Share your thoughts in the comments below!</em></p>
]]></content:encoded></item><item><title><![CDATA[From 0 to Hero: Mastering Auto Scaling in Kubernetes]]></title><description><![CDATA[Scaling applications is one of the hardest challenges in cloud-native environments. With Kubernetes, you get powerful autoscaling primitives that make workloads adaptive, resilient, and cost-efficient.
In this guide, we’ll go from zero to hero in Kub...]]></description><link>https://devops-blog.ruicoelho.dev/from-0-to-hero-mastering-auto-scaling-in-kubernetes</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/from-0-to-hero-mastering-auto-scaling-in-kubernetes</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[gitops]]></category><category><![CDATA[autoscaling]]></category><category><![CDATA[kubernetes autoscaling]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:25:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763486605351/b76609c3-a08c-46cf-9eca-03b07b6dbbbd.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Scaling applications is one of the hardest challenges in cloud-native environments. With Kubernetes, you get powerful autoscaling primitives that make workloads adaptive, resilient, and cost-efficient.</p>
<p>In this guide, we’ll go from zero to hero in Kubernetes autoscaling, covering:</p>
<ul>
<li><p>Horizontal Pod Autoscaler (HPA)</p>
</li>
<li><p>Vertical Pod Autoscaler (VPA)</p>
</li>
<li><p>Cluster Autoscaler (CA)</p>
</li>
<li><p>Cluster Proportional Autoscaler (CPA)</p>
</li>
<li><p>KEDA (Kubernetes Event-Driven Autoscaling)</p>
</li>
</ul>
<p>We will explore how each works, which components are involved, what metrics they use, best practices, and what the future holds.</p>
<h2 id="heading-why-auto-scaling-matters"><strong>Why Auto Scaling Matters</strong></h2>
<p>Without autoscaling, you either:</p>
<ul>
<li><p>Overprovision resources → wasting money</p>
</li>
<li><p>Underprovision resources → degraded performance or downtime</p>
</li>
</ul>
<p>Kubernetes addresses this with autoscaling mechanisms at different levels:</p>
<ul>
<li><p>Pod Level → HPA &amp; VPA</p>
</li>
<li><p>Node Level → Cluster Autoscaler (CA)</p>
</li>
<li><p>Cluster Add-on Level → CPA</p>
</li>
<li><p>Event-Driven Level → KEDA</p>
</li>
</ul>
<h2 id="heading-horizontal-pod-autoscaler-hpa"><strong>Horizontal Pod Autoscaler (HPA)</strong></h2>
<p>The HPA adjusts the number of pod replicas in a Deployment, StatefulSet, or ReplicaSet.</p>
<h3 id="heading-how-it-works"><strong>How it works</strong></h3>
<ul>
<li><p>Periodically checks metrics (default every 15s).</p>
</li>
<li><p>Compares observed values to target thresholds.</p>
</li>
<li><p>Adjusts the .spec.replicas field of the workload.</p>
</li>
</ul>
<h3 id="heading-components-in-detail"><strong>Components in Detail</strong></h3>
<ul>
<li><p><strong>HPA Controller</strong>: Runs inside the kube-controller-manager; makes scaling decisions.</p>
</li>
<li><p><strong>Metrics Server</strong>: Collects CPU/Memory usage from kubelets and exposes resource metrics.</p>
</li>
<li><p><strong>Custom Metrics Adapter</strong>: Connects with systems like Prometheus to expose application-level metrics.</p>
</li>
<li><p><strong>External Metrics Adapter</strong>: Integrates with external services (CloudWatch, Pub/Sub, SQS) for business-level signals.</p>
</li>
</ul>
<h2 id="heading-types-of-metrics-supported"><strong>Types of Metrics Supported</strong></h2>
<h3 id="heading-1-resource-metrics-native"><strong>1. Resource Metrics (Native)</strong></h3>
<p>These come from the Metrics Server. They cover CPU and memory utilization.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">metrics:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Resource</span>
    <span class="hljs-attr">resource:</span>
      <span class="hljs-attr">name:</span> <span class="hljs-string">cpu</span>
      <span class="hljs-attr">target:</span>
        <span class="hljs-attr">type:</span> <span class="hljs-string">Utilization</span>
        <span class="hljs-attr">averageUtilization:</span> <span class="hljs-number">70</span>
</code></pre>
<h3 id="heading-2-custom-metrics"><strong>2. Custom Metrics</strong></h3>
<p>Custom metrics allow scaling based on application-level signals when CPU/Memory are not good indicators of load.</p>
<ul>
<li><p>Sourced from inside the cluster, typically via Prometheus and an adapter.</p>
</li>
<li><p>Examples include:</p>
</li>
<li><p>HTTP requests per second</p>
</li>
<li><p>Average response latency</p>
</li>
<li><p>Active sessions or connections</p>
</li>
</ul>
<pre><code class="lang-yaml"><span class="hljs-attr">metrics:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Pods</span>
    <span class="hljs-attr">pods:</span>
      <span class="hljs-attr">metric:</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">http_requests_per_second</span>
      <span class="hljs-attr">target:</span>
        <span class="hljs-attr">type:</span> <span class="hljs-string">AverageValue</span>
        <span class="hljs-attr">averageValue:</span> <span class="hljs-string">"100"</span>
</code></pre>
<p>This example scales based on an average of 100 requests per pod.</p>
<h3 id="heading-3-external-metrics"><strong>3. External Metrics</strong></h3>
<p>External metrics allow scaling based on signals outside the cluster.</p>
<p>Examples include:</p>
<ul>
<li><p>Messages in an AWS SQS queue</p>
</li>
<li><p>Pending tasks in GCP Pub/Sub</p>
</li>
<li><p>Business KPIs such as number of orders waiting</p>
</li>
</ul>
<pre><code class="lang-yaml"><span class="hljs-attr">metrics:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">External</span>
    <span class="hljs-attr">external:</span>
      <span class="hljs-attr">metric:</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">queue_messages_ready</span>
      <span class="hljs-attr">target:</span>
        <span class="hljs-attr">type:</span> <span class="hljs-string">AverageValue</span>
        <span class="hljs-attr">averageValue:</span> <span class="hljs-string">"50"</span>
</code></pre>
<p>This example scales when there are more than 50 pending messages in a queue.</p>
<h3 id="heading-custom-metrics-vs-external-metrics"><strong>Custom Metrics vs External Metrics</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Type</strong></td><td><strong>Source</strong></td><td><strong>Examples</strong></td><td><strong>Adapter Required</strong></td><td></td></tr>
</thead>
<tbody>
<tr>
<td>Custom</td><td>Inside the cluster (Prometheus, application internals)</td><td>Requests/sec, latency, active sessions</td><td>Yes</td><td></td></tr>
<tr>
<td>External</td><td>Outside the cluster (cloud services, APIs)</td><td>Queue length, Pub/Sub backlog, business KPIs</td><td>Yes</td></tr>
</tbody>
</table>
</div><p><strong>Rule of thumb:</strong> use Custom metrics when the signal is inside the cluster; use External metrics when the signal comes from outside.</p>
<h3 id="heading-pros"><strong>Pros</strong></h3>
<ul>
<li><p>Native, widely supported.</p>
</li>
<li><p>Works with CPU, memory, custom, and external metrics.</p>
</li>
</ul>
<h3 id="heading-limitations"><strong>Limitations</strong></h3>
<ul>
<li><p>Does not resize pod resources, only replicas.</p>
</li>
<li><p>Requires adapters for advanced metrics.</p>
</li>
</ul>
<h2 id="heading-vertical-pod-autoscaler-vpa"><strong>Vertical Pod Autoscaler (VPA)</strong></h2>
<p>The VPA automatically adjusts CPU and memory requests/limits for pods.</p>
<h3 id="heading-how-it-works-1"><strong>How it works</strong></h3>
<ul>
<li><p>Continuously observes resource usage.</p>
</li>
<li><p>Provides recommendations or enforces new requests/limits.</p>
</li>
<li><p>Applies changes according to its operating mode.</p>
</li>
</ul>
<h3 id="heading-components-in-detail-1"><strong>Components in Detail</strong></h3>
<ul>
<li><p><strong>Recommender</strong>: Analyzes metrics and suggests optimal CPU/memory.</p>
</li>
<li><p><strong>Updater</strong>: Decides when to evict pods to apply new values.</p>
</li>
<li><p><strong>Admission Controller (Plugin)</strong>: Mutates pod specs on creation with recommended resources.</p>
</li>
</ul>
<h3 id="heading-vpa-modes"><strong>VPA Modes</strong></h3>
<ol>
<li><strong>Off</strong></li>
</ol>
<ul>
<li><p>Provides recommendations only.</p>
</li>
<li><p>Useful for testing and observability.</p>
</li>
</ul>
<p><strong>2. Initial</strong></p>
<ul>
<li><p>Applies recommended resources only at pod creation.</p>
</li>
<li><p>Pods keep the same resources until deleted or recreated.</p>
</li>
</ul>
<p>3. <strong>Auto</strong></p>
<ul>
<li><p>Continuously adjusts resources.</p>
</li>
<li><p>May evict and restart pods to apply new requests/limits.</p>
</li>
</ul>
<p><strong>Example</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">autoscaling.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">VerticalPodAutoscaler</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">recommendation-vpa</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">targetRef:</span>
    <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">recommendation</span>
  <span class="hljs-attr">updatePolicy:</span>
    <span class="hljs-attr">updateMode:</span> <span class="hljs-string">"Auto"</span>   <span class="hljs-comment"># Options: Off, Initial, Auto</span>
</code></pre>
<p><strong>Pros</strong></p>
<ul>
<li><p>Prevents under/over-provisioning.</p>
</li>
<li><p>Adapts pods to real usage.</p>
</li>
</ul>
<p><strong>Limitations</strong></p>
<ul>
<li><p>Pod restarts required to apply changes.</p>
</li>
<li><p>Should not be combined with HPA on the same resource.</p>
</li>
</ul>
<h2 id="heading-cluster-autoscaler-ca"><strong>Cluster Autoscaler (CA)</strong></h2>
<p>The Cluster Autoscaler manages the number of nodes in your cluster.</p>
<h3 id="heading-how-it-works-2"><strong>How it works</strong></h3>
<ul>
<li><p>Scales up when pods cannot be scheduled.</p>
</li>
<li><p>Scales down when nodes are underutilized.</p>
</li>
<li><p>Relies on integration with the underlying cloud provider.</p>
</li>
</ul>
<h3 id="heading-components-in-detail-2"><strong>Components in Detail</strong></h3>
<ul>
<li><p><strong>CA Controller</strong>: Observes scheduling failures and underutilized nodes.</p>
</li>
<li><p><strong>Cloud Provider Integration</strong>: Uses APIs to add/remove nodes (AWS ASGs, GCP MIGs, Azure VMSS).</p>
</li>
<li><p><strong>Scale-down Logic</strong>: Removes nodes only if it will not disrupt critical workloads.</p>
</li>
</ul>
<h3 id="heading-pros-1"><strong>Pros</strong></h3>
<ul>
<li><p>Ensures the cluster has enough capacity.</p>
</li>
<li><p>Saves costs by scaling down idle nodes.</p>
</li>
</ul>
<h3 id="heading-limitations-1"><strong>Limitations</strong></h3>
<ul>
<li><p>Cloud-provider specific.</p>
</li>
<li><p>Conservative when scaling down.</p>
</li>
</ul>
<h2 id="heading-cluster-proportional-autoscaler-cpa"><strong>Cluster Proportional Autoscaler (CPA)</strong></h2>
<p>The CPA is specialized for scaling cluster add-ons such as CoreDNS. Unlike the Cluster Autoscaler, which changes the number of nodes, the CPA adjusts the number of replicas of add-on workloads so they grow proportionally with the cluster.</p>
<h3 id="heading-how-it-works-3"><strong>How it works</strong></h3>
<ul>
<li><p>Monitors cluster size (nodes or CPU cores).</p>
</li>
<li><p>Adjusts replicas of add-on components proportionally.</p>
</li>
<li><p>Typically used for Deployments such as CoreDNS.</p>
</li>
</ul>
<h3 id="heading-components-in-detail-3"><strong>Components in Detail</strong></h3>
<ul>
<li><p><strong>CPA Controller</strong>: Runs as a deployment in the kube-system namespace.</p>
</li>
<li><p><strong>Scaling Config</strong>: Defines proportional rules (linear or ladder).</p>
</li>
</ul>
<h3 id="heading-example-config-conceptual"><strong>Example Config (conceptual)</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">linear:</span>
  <span class="hljs-attr">coresPerReplica:</span> <span class="hljs-number">256</span>
  <span class="hljs-attr">nodesPerReplica:</span> <span class="hljs-number">16</span>
  <span class="hljs-attr">min:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">max:</span> <span class="hljs-number">10</span>
</code></pre>
<p>This means: 1 replica per 256 CPU cores or 16 nodes, capped between 1–10 replicas.</p>
<h3 id="heading-why-it-is-not-always-used"><strong>Why it is not always used</strong></h3>
<p>In many environments, CPA is not strictly required because <strong>DaemonSets</strong> already provide proportional coverage. A DaemonSet ensures that each node runs exactly one pod (for example, kube-proxy or a logging agent). This one-pod-per-node model automatically scales with the cluster as nodes are added or removed.</p>
<p>As a result:</p>
<ul>
<li><p>CPA is most useful for Deployments where proportional scaling is needed.</p>
</li>
<li><p>For simpler infrastructure components, a DaemonSet is often enough and eliminates the need for CPA.</p>
</li>
</ul>
<h3 id="heading-pros-2"><strong>Pros</strong></h3>
<ul>
<li><p>Keeps critical add-ons responsive as the cluster grows.</p>
</li>
<li><p>Provides proportional scaling for Deployments.</p>
</li>
</ul>
<h3 id="heading-limitations-2"><strong>Limitations</strong></h3>
<ul>
<li><p>Not needed when DaemonSets are sufficient (common in simpler setups).</p>
</li>
<li><p>Focused on infrastructure-level services, not application workloads.</p>
</li>
</ul>
<h2 id="heading-keda-kubernetes-event-driven-autoscaling"><strong>KEDA (Kubernetes Event-Driven Autoscaling)</strong></h2>
<p>KEDA extends Kubernetes with event-driven scaling capabilities. While the HPA traditionally reacts to resource usage or custom/external metrics, KEDA lets workloads scale based on <strong>event sources</strong> such as message queues, databases, or cloud services.</p>
<h3 id="heading-how-it-works-4"><strong>How it works</strong></h3>
<ul>
<li><p>You define a ScaledObject (for Deployments/StatefulSets) or a ScaledJob (for batch workloads).</p>
</li>
<li><p>KEDA deploys an Operator and a Metrics Adapter into the cluster.</p>
</li>
<li><p>Scalers fetch metrics from external systems and expose them through the Kubernetes metrics API.</p>
</li>
<li><p>The HPA (managed by KEDA) consumes those metrics to scale workloads.</p>
</li>
<li><p>KEDA can scale down workloads to zero when no events are present, something native HPA cannot do.</p>
</li>
</ul>
<h3 id="heading-components-in-detail-4"><strong>Components in Detail</strong></h3>
<ul>
<li><p><strong>KEDA Operator</strong>: Watches CRDs like ScaledObject and ScaledJob, and creates an HPA automatically for the target workload.</p>
</li>
<li><p><strong>Metrics Adapter</strong>: Exposes metrics from scalers to the HPA using the Kubernetes external metrics API.</p>
</li>
<li><p><strong>Scalers</strong>: Plugins that connect to external systems. KEDA supports more than 50 scalers including Kafka, RabbitMQ, AWS SQS, Azure Service Bus, GCP Pub/Sub, Prometheus, MySQL, PostgreSQL, and more.</p>
</li>
<li><p><strong>ScaledObject</strong>: Defines autoscaling for Deployments or StatefulSets.</p>
</li>
<li><p><strong>ScaledJob</strong>: Defines autoscaling for Jobs — each event can spawn a new Job until the backlog is cleared.</p>
</li>
</ul>
<h3 id="heading-example-rabbitmq"><strong>Example: RabbitMQ</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">keda.sh/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ScaledObject</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">orders-worker</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">scaleTargetRef:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">orders-deployment</span>
  <span class="hljs-attr">minReplicaCount:</span> <span class="hljs-number">0</span>
  <span class="hljs-attr">maxReplicaCount:</span> <span class="hljs-number">20</span>
  <span class="hljs-attr">triggers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">rabbitmq</span>
      <span class="hljs-attr">metadata:</span>
        <span class="hljs-attr">queueName:</span> <span class="hljs-string">orders</span>
        <span class="hljs-attr">hostFromEnv:</span> <span class="hljs-string">RABBITMQ_HOST</span>
        <span class="hljs-attr">queueLength:</span> <span class="hljs-string">"5"</span>
</code></pre>
<p>This configuration will:</p>
<ul>
<li><p>Scale the orders-deployment from 0 to 20 replicas.</p>
</li>
<li><p>Trigger scaling when there are more than 5 messages in the RabbitMQ orders queue.</p>
</li>
<li><p>Scale back to zero when the queue is empty.</p>
</li>
</ul>
<h3 id="heading-example-batch-jobs-with-scaledjob"><strong>Example: Batch Jobs with ScaledJob</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">keda.sh/v1alpha1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ScaledJob</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">image-processor</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">jobTargetRef:</span>
    <span class="hljs-attr">parallelism:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">completions:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">template:</span>
      <span class="hljs-attr">spec:</span>
        <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">worker</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">my-image-processor:latest</span>
        <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">pollingInterval:</span> <span class="hljs-number">30</span>
  <span class="hljs-attr">maxReplicaCount:</span> <span class="hljs-number">50</span>
  <span class="hljs-attr">triggers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">azure-queue</span>
      <span class="hljs-attr">metadata:</span>
        <span class="hljs-attr">queueName:</span> <span class="hljs-string">images</span>
        <span class="hljs-attr">connection:</span> <span class="hljs-string">AzureWebJobsStorage</span>
</code></pre>
<p>This configuration spawns new Jobs for image processing tasks whenever new messages arrive in the Azure Queue.</p>
<h3 id="heading-use-cases"><strong>Use Cases</strong></h3>
<ul>
<li><p>Event-driven microservices (workers consuming from queues).</p>
</li>
<li><p>Serverless-style workloads that run only on demand.</p>
</li>
<li><p>Batch processing pipelines (images, logs, data).</p>
</li>
<li><p>Scaling based on cloud services (databases, queues, monitoring systems).</p>
</li>
</ul>
<h3 id="heading-pros-3"><strong>Pros</strong></h3>
<ul>
<li><p>Event-driven: scale workloads based on real demand.</p>
</li>
<li><p>Scale-to-zero: cost-efficient for workloads with idle periods.</p>
</li>
<li><p>Wide ecosystem of scalers (cloud-native and traditional systems).</p>
</li>
<li><p>Works alongside HPA and integrates natively into Kubernetes.</p>
</li>
</ul>
<h3 id="heading-limitations-3"><strong>Limitations</strong></h3>
<ul>
<li><p>Adds operational complexity (extra components in the cluster).</p>
</li>
<li><p>Requires careful configuration of triggers to avoid over-scaling or flapping.</p>
</li>
<li><p>Each scaler has its own configuration specifics.</p>
</li>
</ul>
<h2 id="heading-cost-optimization-and-trade-offs"><strong>Cost Optimization and Trade-offs</strong></h2>
<p>Autoscaling is not just about performance — it directly impacts cost. Configuring it poorly can lead to unnecessary expenses or under utilization.</p>
<ul>
<li><p><strong>Unbounded scaling can increase costs dramatically</strong>: Always set a sensible maxReplicas to prevent runaway scaling in case of metric spikes.</p>
</li>
<li><p><strong>Scale-to-zero saves money</strong>: KEDA’s ability to scale to zero during idle periods can significantly reduce cloud bills for workloads that are not always active.</p>
</li>
<li><p><strong>Right-sizing matters</strong>: Combine VPA recommendations with HPA scaling to avoid oversized pods being multiplied unnecessarily.</p>
</li>
<li><p><strong>Cluster Autoscaler trade-offs</strong>: While CA saves money by removing nodes, frequent scale-up and scale-down events may increase cloud costs (e.g., by breaking node-level discounts).</p>
</li>
</ul>
<h2 id="heading-cooldowns-stabilization-windows-and-advanced-hpa-features"><strong>Cooldowns, Stabilization Windows, and Advanced HPA Features</strong></h2>
<p>By default, HPA reacts quickly to metric changes, but in production environments, rapid scaling up and down can cause instability. Kubernetes offers advanced options to control scaling behavior:</p>
<h3 id="heading-stabilization-windows"><strong>Stabilization Windows</strong></h3>
<p>A stabilization window defines a minimum period before scaling actions are reconsidered.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">behavior:</span>
  <span class="hljs-attr">scaleDown:</span>
    <span class="hljs-attr">stabilizationWindowSeconds:</span> <span class="hljs-number">300</span>
</code></pre>
<p>This keeps replicas stable for at least 5 minutes before reducing them.</p>
<h3 id="heading-scaling-policies"><strong>Scaling Policies</strong></h3>
<p>Scaling policies let you limit how aggressively the HPA scales.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">behavior:</span>
  <span class="hljs-attr">scaleUp:</span>
    <span class="hljs-attr">policies:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Pods</span>
      <span class="hljs-attr">value:</span> <span class="hljs-number">2</span>
      <span class="hljs-attr">periodSeconds:</span> <span class="hljs-number">60</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Percent</span>
      <span class="hljs-attr">value:</span> <span class="hljs-number">100</span>
      <span class="hljs-attr">periodSeconds:</span> <span class="hljs-number">60</span>
</code></pre>
<p>This configuration allows scaling up by at most 2 pods per minute or doubling the replica count per minute, whichever is lower.</p>
<h3 id="heading-why-it-matters"><strong>Why it matters</strong></h3>
<ul>
<li><p>Prevents “thrashing” where replicas fluctuate constantly.</p>
</li>
<li><p>Helps maintain system stability under unpredictable load.</p>
</li>
<li><p>Provides more predictable cost and resource usage.</p>
</li>
</ul>
<h2 id="heading-observability-and-monitoring"><strong>Observability and Monitoring</strong></h2>
<p>Autoscaling decisions are only as good as the signals they are based on. Without observability, you cannot validate whether scaling actions improve performance or simply add cost.</p>
<h3 id="heading-what-to-monitor"><strong>What to Monitor</strong></h3>
<ul>
<li><p><strong>Autoscaler status</strong>: Replica counts over time (HPA, VPA, KEDA, CA).</p>
</li>
<li><p><strong>Scaling decisions</strong>: Why did the autoscaler decide to add/remove replicas?</p>
</li>
<li><p><strong>Business KPIs</strong>: Requests per second, queue lengths, user sessions — to ensure scaling aligns with actual demand.</p>
</li>
<li><p><strong>Cost impact</strong>: Correlate scaling events with cloud spend.</p>
</li>
</ul>
<h3 id="heading-tools-and-integrations"><strong>Tools and Integrations</strong></h3>
<ul>
<li><p><strong>Prometheus + Grafana</strong>: The most common stack to visualize metrics and scaling decisions.</p>
</li>
<li><p><strong>kube-state-metrics</strong>: Exposes HPA/VPA/CA objects and their current state for Prometheus.</p>
</li>
<li><p><strong>Datadog, New Relic, Dynatrace</strong>: SaaS observability platforms with built-in Kubernetes autoscaling dashboards.</p>
</li>
<li><p><strong>Cloud provider monitoring</strong>: AWS CloudWatch, GCP Monitoring, Azure Monitor provide integration with CA and KEDA.</p>
</li>
</ul>
<h3 id="heading-why-it-matters-1"><strong>Why it matters</strong></h3>
<ul>
<li><p>Ensures that scaling actions align with <strong>application performance</strong>, not just resource usage.</p>
</li>
<li><p>Detects misconfigurations early (e.g., HPA scaling on wrong metric).</p>
</li>
<li><p>Provides insights to fine-tune thresholds and stabilization windows.</p>
</li>
</ul>
<h2 id="heading-comparing-the-autoscalers"><strong>Comparing the Autoscalers</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Autoscaler</strong></td><td><strong>Components</strong></td><td><strong>Scope</strong></td><td><strong>Metrics</strong></td><td><strong>Best For</strong></td><td><strong>Scale to 0?</strong></td></tr>
</thead>
<tbody>
<tr>
<td>HPA</td><td>Controller Manager, Metrics Server, Adapters</td><td>Pod replicas</td><td>Resource, Custom, External</td><td>Stateless apps</td><td>No</td></tr>
<tr>
<td>VPA</td><td>Recommender, Updater, Admission Controller</td><td>Pod resources</td><td>Historical usage</td><td>Stateful/ML workloads</td><td>No</td></tr>
<tr>
<td>CA</td><td>CA Controller, Cloud APIs</td><td>Nodes</td><td>Cluster utilization</td><td>Node-level elasticity</td><td>No</td></tr>
<tr>
<td>CPA</td><td>CPA Controller, Scaling Config</td><td>Add-ons</td><td>Cluster size</td><td>CoreDNS, kube-proxy</td><td>No</td></tr>
<tr>
<td>KEDA</td><td>Operator, Metrics Adapter, Scalers</td><td>Pods &amp; Jobs</td><td>Event-driven signals</td><td>Workers, serverless jobs, batch pipelines</td><td>Yes</td></tr>
</tbody>
</table>
</div><h2 id="heading-best-practices-for-autoscaling"><strong>Best Practices for Autoscaling</strong></h2>
<ul>
<li><p>Always set minReplicas and maxReplicas to avoid runaway scaling.</p>
</li>
<li><p>Avoid using HPA and VPA on the same deployment (at least with the same metrics);</p>
</li>
<li><p>Ensure metrics reflect business reality, not just CPU.</p>
</li>
<li><p>Run load tests to fine-tune thresholds.</p>
</li>
<li><p>Configure cooldown periods to prevent thrashing.</p>
</li>
<li><p>For KEDA, adjust polling intervals carefully.</p>
</li>
<li><p>Monitor both system-level and business-level signals to validate scaling behavior.</p>
</li>
</ul>
<h2 id="heading-the-future-of-autoscaling-in-kubernetes"><strong>The Future of Autoscaling in Kubernetes</strong></h2>
<p>Autoscaling is evolving rapidly:</p>
<ul>
<li><p>Growing adoption of event-driven scaling with KEDA.</p>
</li>
<li><p>Research into predictive autoscaling using AI/ML.</p>
</li>
<li><p>Work on autoscaler composition (HPA + VPA + KEDA together).</p>
</li>
<li><p>A shift toward policy-driven, autonomous scaling clusters.</p>
</li>
</ul>
<p>The future points toward Kubernetes clusters that self-optimize without human intervention.</p>
<h2 id="heading-from-zero-to-hero-a-real-example"><strong>From Zero to Hero: A Real Example</strong></h2>
<p>For an e-commerce platform:</p>
<ul>
<li><p>Frontend API → HPA scaling based on CPU + requests/sec.</p>
</li>
<li><p>Recommendation engine → VPA right-sizes pods for ML models (Auto mode).</p>
</li>
<li><p>Cluster Autoscaler (CA) → Adds nodes when HPA demands exceed current capacity.</p>
</li>
<li><p>CoreDNS → CPA scales proportionally with cluster size, unless a simple DaemonSet is enough.</p>
</li>
<li><p>Order workers → KEDA scales with RabbitMQ queue length (down to zero).</p>
</li>
</ul>
<p>This combination ensures performance, stability, and cost efficiency.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Kubernetes autoscaling is not a single feature — it is an ecosystem:</p>
<ul>
<li><p>HPA manages pod replicas with multiple metric sources.</p>
</li>
<li><p>VPA right-sizes pods dynamically, with modes for every scenario.</p>
</li>
<li><p>CA ensures enough nodes exist.</p>
</li>
<li><p>CPA keeps cluster add-ons healthy, though often replaced by DaemonSets in simpler cases.</p>
</li>
<li><p>KEDA powers event-driven and serverless scaling, with support for batch jobs and scale-to-zero.</p>
</li>
<li><p>Cost optimization, advanced HPA features, and observability ensure autoscaling is efficient, stable, and financially sustainable.</p>
</li>
</ul>
<p>By mastering these components, you can truly go from zero to hero in Kubernetes autoscaling.<a target="_blank" href="https://medium.com/tag/devops?source=post_page-----af1b16dddca3---------------------------------------">  
</a></p>
]]></content:encoded></item><item><title><![CDATA[How to Use Kubernetes Dynamic Resource Allocation (DRA) — Real Use Case with GPUs]]></title><description><![CDATA[Introduction
With Kubernetes 1.34, Dynamic Resource Allocation (DRA) is now generally available (GA) — enabling pods to request and allocate specialized hardware like GPUs, FPGAs, and high-speed storage dynamically.
This article covers:

What DRA is ...]]></description><link>https://devops-blog.ruicoelho.dev/how-to-use-kubernetes-dynamic-resource-allocation-dra-real-use-case-with-gpus</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/how-to-use-kubernetes-dynamic-resource-allocation-dra-real-use-case-with-gpus</guid><category><![CDATA[dra]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><category><![CDATA[GPUs]]></category><category><![CDATA[NVIDIA]]></category><category><![CDATA[gitops]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:15:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763486941278/8b5a0a7d-09d8-4f46-a86a-58473b54b246.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>With Kubernetes <strong>1.34</strong>, Dynamic Resource Allocation (DRA) is now <strong>generally available (GA)</strong> — enabling pods to request and allocate specialized hardware like <strong>GPUs, FPGAs, and high-speed storage</strong> dynamically.</p>
<p>This article covers:</p>
<ul>
<li><p>What DRA is and how it works</p>
</li>
<li><p>Use cases for AI/ML workloads</p>
</li>
<li><p>A real example using <strong>NVIDIA GPUs</strong></p>
</li>
<li><p>Updated YAML with <code>DeviceClass</code>, <code>ResourceClaim</code>, and <code>resourceClaims</code></p>
</li>
<li><p>Before/After comparison</p>
</li>
<li><p>A quick FAQ</p>
</li>
</ul>
<h2 id="heading-what-is-dynamic-resource-allocation-dra"><strong>What is Dynamic Resource Allocation (DRA)?</strong></h2>
<p>DRA is a <strong>Kubernetes feature that allows workloads to request non-CPU/memory resources dynamically</strong>, using a plugin-based architecture. Examples include:</p>
<ul>
<li><p>GPUs</p>
</li>
<li><p>Smart NICs</p>
</li>
<li><p>NVMe SSDs</p>
</li>
<li><p>FPGAs</p>
</li>
<li><p>Inference accelerators</p>
</li>
</ul>
<p>Traditionally, these were provisioned <strong>statically</strong>, often leading to poor resource utilization or complicated scheduling.</p>
<p>DRA enables:<br />✅ On-demand allocation at scheduling time<br />✅ Plugin-based orchestration<br />✅ Cleanup and deallocation when pods terminate<br />✅ Better resource utilization and isolation</p>
<p>This is done via a plugin interface and a few new Kubernetes objects:</p>
<ul>
<li><p><code>DeviceClass</code> — Declares the type of device and scheduling criteria</p>
</li>
<li><p><code>ResourceClaim</code> — A claim on a resource from a <code>DeviceClass</code></p>
</li>
<li><p><code>ResourceClaimTemplate</code> — Used by workloads to request resource claims automatically</p>
</li>
<li><p><code>resourceClaims[]</code> — Pod field that binds to one or more <code>ResourceClaim</code></p>
</li>
<li><p><code>ResourceSlice</code> — Represents available resources managed by a plugin</p>
</li>
</ul>
<h2 id="heading-before-amp-after-static-gpu-allocation-vs-dra"><strong>Before &amp; After: Static GPU Allocation vs DRA</strong></h2>
<h3 id="heading-before-static-allocation-pre-dra"><strong>Before: Static Allocation (Pre-DRA)</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gpu-pod-static</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cuda</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nvidia/cuda:12.2.0-base</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">"nvidia-smi"</span>]
    <span class="hljs-attr">resources:</span>
      <span class="hljs-attr">limits:</span>
        <span class="hljs-attr">nvidia.com/gpu:</span> <span class="hljs-string">"1"</span>
</code></pre>
<ul>
<li><p>Requires node pre-configuration</p>
</li>
<li><p>GPU is reserved even if idle</p>
</li>
<li><p>No dynamic lifecycle or cleanup</p>
</li>
</ul>
<h3 id="heading-after-dynamic-allocation-with-dra-kubernetes-134"><strong>After: Dynamic Allocation with DRA (Kubernetes 1.34+)</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-comment"># DeviceClass definition</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">resource.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">DeviceClass</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nvidia-gpu</span>
<span class="hljs-attr">description:</span> <span class="hljs-string">"NVIDIA GPU for ML workloads"</span>
<span class="hljs-attr">schedulingPolicy:</span>
  <span class="hljs-attr">minAllocatable:</span> <span class="hljs-string">"1"</span>
<span class="hljs-attr">suitableNodes:</span>
  <span class="hljs-attr">nodeSelectorTerms:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">matchExpressions:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">nvidia.com/gpu.present</span>
          <span class="hljs-attr">operator:</span> <span class="hljs-string">In</span>
          <span class="hljs-attr">values:</span> [<span class="hljs-string">"true"</span>]
<span class="hljs-meta">---</span>
<span class="hljs-comment"># ResourceClaim</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">resource.k8s.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ResourceClaim</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gpu-claim</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">deviceClassName:</span> <span class="hljs-string">nvidia-gpu</span>
  <span class="hljs-attr">allocationMode:</span> <span class="hljs-string">Immediate</span>
  <span class="hljs-attr">parametersRef:</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">gpu-config</span>
    <span class="hljs-attr">apiGroup:</span> <span class="hljs-string">""</span>
<span class="hljs-meta">---</span>
<span class="hljs-comment"># Pod definition</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gpu-pod-dra</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cuda</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nvidia/cuda:12.2.0-base</span>
    <span class="hljs-attr">command:</span> [<span class="hljs-string">"nvidia-smi"</span>]
    <span class="hljs-attr">resources:</span>
      <span class="hljs-attr">limits:</span>
        <span class="hljs-attr">nvidia.com/gpu:</span> <span class="hljs-string">"1"</span>
  <span class="hljs-attr">resourceClaims:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gpu</span>
    <span class="hljs-attr">source:</span>
      <span class="hljs-attr">resourceClaimName:</span> <span class="hljs-string">gpu-claim</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
</code></pre>
<ul>
<li><p>Resources provisioned at scheduling time</p>
</li>
<li><p>Plugin handles setup, teardown, isolation</p>
</li>
<li><p>Clean and flexible</p>
</li>
</ul>
<h2 id="heading-benefits-of-using-dra"><strong>Benefits of Using DRA</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Benefit</strong></td><td><strong>Description</strong></td></tr>
</thead>
<tbody>
<tr>
<td>🔄 Dynamic provisioning</td><td>Devices are allocated only when needed</td></tr>
<tr>
<td>📉 Efficiency</td><td>No wasted static reservations</td></tr>
<tr>
<td>🔐 Better isolation</td><td>Plugin manages lifecycle and security</td></tr>
<tr>
<td>📦 Clean YAML</td><td>Declarative hardware requests</td></tr>
<tr>
<td>⚙️ Plugin extensibility</td><td>Support for many device types (e.g. GPUs, FPGAs)</td></tr>
</tbody>
</table>
</div><h2 id="heading-how-to-use-dra-in-your-cluster"><strong>How to Use DRA in Your Cluster</strong></h2>
<ol>
<li><p>Upgrade your cluster to <strong>Kubernetes 1.34+</strong></p>
</li>
<li><p>Install a DRA-compatible plugin<br /> Example: <a target="_blank" href="https://github.com/NVIDIA/k8s-dra-driver">NVIDIA DRA driver</a></p>
</li>
<li><p>Define your <code>DeviceClass</code></p>
</li>
<li><p>Create <code>ResourceClaim</code>s or use templates</p>
</li>
<li><p>Reference them in your pods with <code>resourceClaims[]</code></p>
</li>
</ol>
<h2 id="heading-pro-tips"><strong>Pro Tips</strong></h2>
<ul>
<li><p>Use <code>ResourceClaimTemplate</code> to auto-create claims per pod</p>
</li>
<li><p>Use <strong>CEL filters</strong> in <code>DeviceClass</code> for attribute-based scheduling</p>
</li>
<li><p>Monitor claim status to detect allocation failures</p>
</li>
<li><p>Use node selectors in <code>DeviceClass</code> to ensure compatible hardware</p>
</li>
</ul>
<h2 id="heading-faq-kubernetes-dra"><strong>FAQ: Kubernetes DRA</strong></h2>
<p><strong>Q: Do I need to install anything to use DRA?</strong><br />A: Yes — a compatible DRA plugin (e.g. NVIDIA, CXL, etc.)</p>
<p><strong>Q: Is DRA stable in Kubernetes 1.34?</strong><br />A: Yes — it is <strong>GA</strong> as of Kubernetes 1.34 (August 2025)</p>
<p><strong>Q: Can I use DRA for memory or CPU?</strong><br />A: No — DRA is specifically for <strong>non-CPU/memory resources</strong></p>
<p><strong>Q: What if the plugin crashes or fails to allocate?</strong><br />A: The pod will not be scheduled. Kubernetes can retry. You can use fallback logic.</p>
<p><strong>Q: What replaces the NVIDIA device plugin?</strong><br />A: Nothing — the DRA driver complements it. You still need the device plugin to expose hardware to the container.</p>
<h2 id="heading-final-thoughts"><strong>Final Thoughts</strong></h2>
<p>DRA is one of the most significant enhancements in Kubernetes scheduling in years — especially for AI/ML, HPC, or hybrid workloads that need specialized hardware.</p>
<p>If you’re using Kubernetes 1.34, it’s time to start:<br />✅ Testing DRA<br />✅ Installing plugins<br />✅ Modernizing your GPU/NIC/storage allocation flows</p>
<p>Let me know if you’re using DRA already or planning to — I’d love to hear how it’s working for your team!</p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes 1.33 vs 1.34: What’s New, What Changed, and Why It Matters]]></title><description><![CDATA[Introduction
The Kubernetes release cycle continues to deliver powerful improvements in performance, security, and resource orchestration. With Kubernetes 1.34 released in August 2025, it’s a good time to compare it to 1.33 (released in April) and un...]]></description><link>https://devops-blog.ruicoelho.dev/kubernetes-133-vs-134-whats-new-what-changed-and-why-it-matters</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/kubernetes-133-vs-134-whats-new-what-changed-and-why-it-matters</guid><category><![CDATA[kubernetes-upgrades]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:10:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763486903354/a852a8d7-efb0-4ebb-bca7-d05a5267063e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>The Kubernetes release cycle continues to deliver powerful improvements in performance, security, and resource orchestration. With Kubernetes <strong>1.34</strong> released in <strong>August 2025</strong>, it’s a good time to compare it to <strong>1.33</strong> (released in April) and understand what’s new, what’s changed, and how these updates impact real-world DevOps and cloud-native environments.</p>
<p>This article breaks down the key changes between versions 1.33 and 1.34, with a focus on practical benefits, feature maturity, and what you should consider before upgrading.</p>
<h2 id="heading-at-a-glance-release-timeline"><strong>At a Glance: Release Timeline</strong></h2>
<ul>
<li><p><strong>Kubernetes 1.33 “Octarine”</strong> — Released April 23, 2025</p>
</li>
<li><p><strong>Kubernetes 1.34 “Of Wind &amp; Will”</strong> — Released August 27, 2025</p>
</li>
</ul>
<h2 id="heading-key-feature-comparisons"><strong>Key Feature Comparisons</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Kubernetes 1.33</strong></td><td><strong>Kubernetes 1.34</strong></td><td><strong>Why it matters</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Dynamic Resource Allocation</td><td>Beta</td><td>GA (Stable)</td><td>GPU/NIC resources can now be orchestrated dynamically, ideal for ML/AI workloads.</td></tr>
<tr>
<td>In-Place Pod Resize</td><td>Beta</td><td>Improved Beta</td><td>No need to restart pods to adjust CPU/memory. Reduced downtime.</td></tr>
<tr>
<td>CEL Mutating Admission Policies</td><td>Not available</td><td>Alpha</td><td>Declarative admission control within API server; no external webhooks needed.</td></tr>
<tr>
<td>Native Sidecar Containers</td><td>GA</td><td>Stable</td><td>Cleaner lifecycle control for service mesh/logging sidecars.</td></tr>
<tr>
<td>Streaming Informers</td><td>Not available</td><td>Alpha</td><td>Less memory usage and better responsiveness in high-load clusters.</td></tr>
</tbody>
</table>
</div><h3 id="heading-dynamic-resource-allocation-dra"><strong>Dynamic Resource Allocation (DRA)</strong></h3>
<p>Dynamic Resource Allocation allows workloads to request and manage specialized resources like <strong>GPUs, FPGAs, and network devices</strong> dynamically. With version 1.34, DRA is now <strong>stable</strong>, making it production-ready for clusters with high-performance compute needs.</p>
<p><strong>Why it matters:</strong><br />If you’re deploying ML/AI workloads or working with hardware accelerators, Kubernetes now supports dynamic orchestration natively.</p>
<h3 id="heading-in-place-pod-resource-resize"><strong>In-Place Pod Resource Resize</strong></h3>
<p>Kubernetes now allows you to <strong>resize CPU and memory</strong> allocations for running Pods without recreating them. Version 1.34 enhances this by supporting <strong>downscaling</strong> and expanding support for <code>Pod</code>-level resources.</p>
<p><strong>Why it matters:</strong><br />This reduces downtime and offers greater flexibility in autoscaling and operational tuning.</p>
<h3 id="heading-cel-mutating-admission-policies"><strong>CEL Mutating Admission Policies</strong></h3>
<p>Version 1.34 introduces <strong>Mutating Admission Policies using CEL (Common Expression Language)</strong>, allowing you to modify API objects without using external webhooks.</p>
<p><strong>Why it matters:</strong><br />This is a big step toward declarative, low-latency admission controls <strong>within the API server</strong>, reducing complexity and latency.</p>
<h3 id="heading-native-sidecar-containers"><strong>Native Sidecar Containers</strong></h3>
<p>Sidecars became a first-class citizen in 1.33, and 1.34 builds on that foundation. This change improves container lifecycle management and makes it easier to integrate service meshes and logging agents.</p>
<p><strong>Why it matters:</strong><br />Developers no longer need workarounds for container lifecycle synchronization.</p>
<h3 id="heading-streaming-informers-for-better-observability"><strong>Streaming Informers for Better Observability</strong></h3>
<p>Streaming informers allow high-throughput systems to stream changes from the API server <strong>without excessive memory usage</strong>, particularly for large LIST operations.</p>
<p><strong>Why it matters:</strong><br />Improves cluster stability under heavy load, and simplifies building responsive operator logic.</p>
<h3 id="heading-security-and-authentication-updates"><strong>Security and Authentication Updates</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>1.33</strong></td><td><strong>1.34</strong></td></tr>
</thead>
<tbody>
<tr>
<td>User namespaces</td><td>Enabled by default</td><td>Continued</td></tr>
<tr>
<td>Service account token improvements</td><td>✅</td><td>✅</td></tr>
<tr>
<td>Mutual TLS (mTLS) for Pod-to-Pod auth</td><td>❌</td><td>Alpha in 1.34</td></tr>
<tr>
<td>Structured client certs</td><td>❌</td><td>✅</td></tr>
</tbody>
</table>
</div><p><strong>Why it matters:</strong><br />Security improvements are gradual but meaningful. Mutual TLS between Pods is a major step forward for zero-trust cluster designs.</p>
<h2 id="heading-deprecations-amp-migration-considerations"><strong>⚠️ Deprecations &amp; Migration Considerations</strong></h2>
<ul>
<li><p><strong>1.33</strong> introduced some <strong>deprecated API fields</strong>, especially around token and namespace handling.</p>
</li>
<li><p><strong>1.34</strong> is a safer upgrade, with <strong>fewer breaking changes</strong> — but you should still test against deprecated API usage and custom controllers.</p>
</li>
</ul>
<p><strong>Tip:</strong> Use <code>kubectl deprecations</code> or static analysis tools to scan your manifests before upgrading.</p>
<h2 id="heading-should-you-upgrade-to-kubernetes-134"><strong>Should You Upgrade to Kubernetes 1.34?</strong></h2>
<p>✅ Yes, <strong>if</strong>:</p>
<ul>
<li><p>You rely on specialized hardware (GPU, NICs, etc.)</p>
</li>
<li><p>You want to reduce resource management downtime</p>
</li>
<li><p>You need better control over admission and policy enforcement</p>
</li>
<li><p>You’re preparing your clusters for security hardening</p>
</li>
</ul>
<p>❌ Maybe not yet, <strong>if</strong>:</p>
<ul>
<li><p>You rely on 3rd-party admission controllers not yet compatible with CEL policies</p>
</li>
<li><p>Your environment is still catching up with 1.32 or earlier</p>
</li>
<li><p>You avoid alpha features and prefer longer stabilization cycles</p>
</li>
</ul>
<h2 id="heading-final-thoughts"><strong>Final Thoughts</strong></h2>
<p>Kubernetes 1.34 builds smartly on top of 1.33, delivering improved operational control, resource efficiency, and extensibility. While it’s not a revolutionary release, it brings several <strong>practical, production-focused enhancements</strong> that make it worth the upgrade — especially for teams focused on AI/ML workloads, observability, and multi-tenant security.</p>
<h2 id="heading-resources"><strong>Resources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://kubernetes.io/blog/2025/04/23/kubernetes-v1-33-release">Kubernetes 1.33 Release Notes</a></p>
</li>
<li><p><a target="_blank" href="https://kubernetes.io/blog/2025/08/27/kubernetes-v1-34-release">Kubernetes 1.34 Release Notes</a></p>
</li>
</ul>
<p>This article breaks down the key changes between versions 1.33 and 1.34, with a focus on practical benefits, feature maturity, and what you should consider before upgrading.</p>
]]></content:encoded></item><item><title><![CDATA[Sneak Peek into Kubernetes v1.34: What’s Coming and Why It Matters]]></title><description><![CDATA[Kubernetes continues its steady evolution, and with the upcoming v1.34 release, there are some exciting enhancements on the horizon. Scheduled for release in late August 2025, this version focuses on observability, smarter resource handling, enhanced...]]></description><link>https://devops-blog.ruicoelho.dev/sneak-peek-into-kubernetes-v134-whats-coming-and-why-it-matters</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/sneak-peek-into-kubernetes-v134-whats-coming-and-why-it-matters</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[k8s]]></category><category><![CDATA[kyaml]]></category><category><![CDATA[YAML]]></category><category><![CDATA[Devops]]></category><category><![CDATA[SRE]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 17:00:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763486968526/41ed7d1d-e862-4b36-a412-380fc18d98f2.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kubernetes continues its steady evolution, and with the upcoming <strong>v1.34</strong> release, there are some exciting enhancements on the horizon. Scheduled for release in <strong>late August 2025</strong>, this version focuses on <strong>observability, smarter resource handling, enhanced scheduling, and developer experience improvements</strong>.</p>
<p>Here’s a preview of the most impactful features coming to Kubernetes v1.34 — based on the official sneak peek — along with concrete examples and thoughts on what this means for platform teams.</p>
<h2 id="heading-dynamic-resource-allocation-dra-goes-stable"><strong>Dynamic Resource Allocation (DRA) Goes Stable</strong></h2>
<p>One of the biggest highlights: <strong>Dynamic Resource Allocation (DRA)</strong> is going <strong>GA (Generally Available)</strong>.</p>
<p>DRA allows Kubernetes to schedule Pods that require <strong>dynamic or external resources</strong> (like GPUs, FPGAs, or licensed software) by coordinating with resource drivers. This opens the door for smarter, safer scheduling without race conditions or pre-binding issues.</p>
<h3 id="heading-example-use-case"><strong>Example Use Case</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gpu-task</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">containers:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">compute</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">nvidia/cuda:11.0-base</span>
    <span class="hljs-attr">resources:</span>
      <span class="hljs-attr">claims:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">my-gpu</span>
  <span class="hljs-attr">resourceClaims:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">my-gpu</span>
    <span class="hljs-attr">source:</span>
      <span class="hljs-attr">resourceClassName:</span> <span class="hljs-string">nvidia.com/gpu</span>
</code></pre>
<p>With DRA, Kubernetes dynamically allocates the required GPU from a compatible pool — improving isolation and usage efficiency.</p>
<h2 id="heading-serviceaccount-tokens-for-image-pulls-beta"><strong>ServiceAccount Tokens for Image Pulls (Beta)</strong></h2>
<p>Previously, long-lived imagePullSecrets were used to authenticate against container registries. Kubernetes v1.34 introduces <strong>automatic short-lived tokens derived from ServiceAccounts</strong>, allowing the kubelet to <strong>securely authenticate image pulls</strong> using projected tokens.</p>
<p>This change reduces token leakage risks and simplifies rotation.</p>
<h3 id="heading-key-benefit"><strong>Key Benefit</strong></h3>
<p>You no longer need to manually create and mount secrets for private registry authentication — Kubernetes handles it for you securely and automatically.</p>
<h3 id="heading-pod-replacement-policy-alpha"><strong>Pod Replacement Policy (Alpha)</strong></h3>
<p>In complex deployments, replacing Pods too early (while the old one is still terminating) can cause unexpected issues like <strong>conflicting ports</strong>, <strong>database reconnects</strong>, or <strong>resource contention</strong>.</p>
<p>Kubernetes v1.34 introduces a new field:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">spec:</span>
  <span class="hljs-attr">podReplacementPolicy:</span> <span class="hljs-string">TerminationStarted</span>  <span class="hljs-comment"># or TerminationComplete</span>
</code></pre>
<p>This allows developers to control <strong>when</strong> a new Pod should be scheduled during rolling updates.</p>
<h3 id="heading-example-use-case-1"><strong>Example Use Case</strong></h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">api-server</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">strategy:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">RollingUpdate</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">podReplacementPolicy:</span> <span class="hljs-string">TerminationComplete</span>
</code></pre>
<p>With <code>TerminationComplete</code>, Kubernetes waits for the old Pod to fully shut down before starting a replacement — perfect for stateful or resource-heavy applications.</p>
<h2 id="heading-prefersamenode-and-prefersamezone-beta"><strong>PreferSameNode and PreferSameZone (Beta)</strong></h2>
<p>Load balancing just got smarter. Instead of relying on <code>externalTrafficPolicy: Local</code> and hoping for the best, Kubernetes now introduces <strong>more precise topology preferences</strong> in routing.</p>
<p>This allows Services to prefer nodes/zones that are closer to the caller — improving <strong>latency</strong> and <strong>network performance</strong>.</p>
<h3 id="heading-why-it-matters"><strong>Why It Matters</strong></h3>
<p>In multi-zone clusters or edge deployments, this routing optimization can <strong>drastically reduce cross-zone traffic</strong> — reducing cost and improving user experience.</p>
<h2 id="heading-kyaml-output-format"><strong>KYAML Output Format</strong></h2>
<p>Here’s where KYAML truly shines. In my Medium article, <strong>“KYAML in Kubernetes v1.34: A Safer, Leaner Alternative to YAML and JSON”</strong>, I explore why KYAML is designed to overcome YAML’s quirks — like whitespace sensitivity, implicit type coercion (hello, “Norway Bug”!) — and offer a cleaner, more predictable serialization format.</p>
<p>You can now use:</p>
<pre><code class="lang-bash">kubectl get deployment api-server -o kyaml
</code></pre>
<p>Expect more consistent diffs, better tooling support, and fewer surprises.</p>
<h2 id="heading-observability-with-tracing-betastable"><strong>Observability with Tracing (Beta/Stable)</strong></h2>
<p>Kubernetes v1.34 embraces <strong>OpenTelemetry-based tracing</strong> in both the API Server and kubelet. Now you get visibility into:</p>
<ul>
<li><p>gRPC communications between kubelet and container runtimes</p>
</li>
<li><p>Admission control chains</p>
</li>
<li><p>API request lifecycle tracing</p>
</li>
</ul>
<p>This enhanced observability is invaluable for diagnosing control plane latency or debugging performance regressions.</p>
<h2 id="heading-bonus-highlights-v134-release-of-wind-and-will"><strong>Bonus Highlights (v1.34 Release — Of Wind and Will)</strong></h2>
<p>Additionally, the final v1.34 release includes:</p>
<ul>
<li><p><strong>PSI Metrics (Beta)</strong> — better insights into CPU &amp; memory pressure.</p>
</li>
<li><p><strong>Node Swap Support (GA)</strong> — smoother memory management, fewer OOMs.</p>
</li>
<li><p><strong>CPUManager Uncore Cache Alignment (Beta)</strong> — improved performance for NUMA-aware workloads.</p>
</li>
<li><p><strong>kuberc (User Preferences)</strong> — customize kubectl defaults and output.</p>
</li>
</ul>
<h2 id="heading-why-it-matters-for-platform-teams"><strong>Why It Matters for Platform Teams</strong></h2>
<p>This release is operator-focused, tackling real-world challenges:</p>
<ul>
<li><p>Secure, lightweight image pulls</p>
</li>
<li><p>Fine control over rollout timing</p>
</li>
<li><p>Richer observability (tracing and PSI)</p>
</li>
<li><p>Smarter, topology-aware routing</p>
</li>
<li><p>Cleaner, safer configuration via KYAML</p>
</li>
</ul>
<h2 id="heading-what-to-try-next"><strong>What to Try Next</strong></h2>
<ul>
<li><p>Enable <strong>DRA</strong> for GPU or other specialized hardware.</p>
</li>
<li><p>Switch to <strong>ServiceAccount token pulls</strong> — more secure and simpler.</p>
</li>
<li><p>Experiment with <strong>Pod replacement policies</strong> in canary/rolling deployments.</p>
</li>
<li><p>Activate <strong>tracing</strong> in staging — boost observability.</p>
</li>
<li><p>Start using KYAML and see the cleaner diffs — I dive into this in my article! <a target="_blank" href="https://medium.com/%40user-cube/kyaml-in-kubernetes-v1-34-a-safer-leaner-alternative-to-yaml-and-json-5c898fecb948">Medium</a></p>
</li>
</ul>
<h2 id="heading-resources"><strong>Resources</strong></h2>
<ul>
<li><p><a target="_blank" href="https://kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak-peek">Kubernetes v1.34 Sneak Peek (Official Blog)</a></p>
</li>
<li><p><a target="_blank" href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/">Kubernetes Docs: Dynamic Resource Allocation</a></p>
</li>
<li><p><a target="_blank" href="https://kubernetes.io/releases/notes/#v1-34">Kubernetes v1.34 Release Notes</a></p>
</li>
</ul>
<h2 id="heading-in-summary"><strong>In Summary</strong></h2>
<p>Kubernetes v1.34 is a quietly powerful release — improving the developer and operator experience without flashy headlines. From KYAML to tracing, resource efficiency to rollout control, it’s a substantial step forward.</p>
<p>Check out my Medium article for a deep dive into KYAML, and let me know what features you’re exploring.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/0*mTY2mrENpwztmngh.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[KYAML in Kubernetes v1.34: A Safer, Leaner Alternative to YAML and JSON]]></title><description><![CDATA[Introduction
In the cloud-native ecosystem, YAML and JSON have been the de facto formats for writing Kubernetes manifests and configuration files. But both come with trade-offs. As of Kubernetes v1.34, a new configuration dialect — KYAML — emerges to...]]></description><link>https://devops-blog.ruicoelho.dev/kyaml-in-kubernetes-v134-a-safer-leaner-alternative-to-yaml-and-json</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/kyaml-in-kubernetes-v134-a-safer-leaner-alternative-to-yaml-and-json</guid><category><![CDATA[YAML]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[kyaml]]></category><category><![CDATA[cicd]]></category><category><![CDATA[Devops]]></category><category><![CDATA[json]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 15:49:58 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763486982741/1599aff7-104d-4ce8-a580-b4c75b267e61.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>In the cloud-native ecosystem, YAML and JSON have been the de facto formats for writing Kubernetes manifests and configuration files. But both come with trade-offs. As of Kubernetes <strong>v1.34</strong>, a new configuration dialect — <strong>KYAML</strong> — emerges to bridge the gap: safer than YAML, more flexible than JSON, and fully compatible with existing tooling.</p>
<h2 id="heading-the-case-against-traditional-yaml-and-json"><strong>The Case Against Traditional YAML and JSON</strong></h2>
<h3 id="heading-yaml-whitespace-hazards-amp-implicit-typing"><strong>YAML: Whitespace Hazards &amp; Implicit Typing</strong></h3>
<p>YAML’s human-friendly syntax often hides pitfalls:</p>
<ul>
<li><p><strong>Whitespace sensitivity</strong>: A single misplaced space can restructure a manifest unexpectedly — a nightmare in templating systems like Helm.</p>
</li>
<li><p><strong>Implicit typing</strong>: Unquoted values like <code>NO</code>, <code>yes</code>, or <code>1:23</code> can be coerced into boolean or numeric types—famously known as the "Norway Bug." For example, <code>country: NO</code> may inadvertently become <code>false</code>. The StrictYAML community removed implicit typing precisely to avoid that kind of ambiguity. The HitchDev blog discusses this problem in depth under the name “<a target="_blank" href="https://hitchdev.com/strictyaml/why/implicit-typing-removed/">The Norway Problem</a>”.</p>
</li>
</ul>
<h3 id="heading-json-valid-but-minus-the-human-touch"><strong>JSON: Valid, But Minus the Human Touch</strong></h3>
<p>JSON is stricter and more predictable — no implicit types or indentation issues — but it lacks <strong>comments</strong>, forbids <strong>trailing commas</strong>, and mandates <strong>quoted keys</strong>, making it less practical for human-authored manifest files.</p>
<h2 id="heading-enter-kyaml-the-best-of-both-worlds"><strong>Enter KYAML: The Best of Both Worlds</strong></h2>
<p>As revealed in the <strong>Kubernetes v1.34 Sneak Peek (July 28, 2025)</strong>, KYAML is a Kubernetes-specific dialect of YAML designed to eliminate common configuration errors while remaining compatible with existing tooling.</p>
<p>Key features of KYAML include:</p>
<ul>
<li><p><strong>All string values are double‑quoted</strong>, avoiding implicit coercion (e.g., <code>"NO"</code> remains a string);</p>
</li>
<li><p><strong>Flow-style syntax</strong>: Always uses <code>{}</code> for mappings and <code>[]</code> for lists, which greatly reduces whitespace sensitivity;</p>
</li>
<li><p><strong>Comments are allowed</strong>, retaining readability and documentation ability — something JSON forbids;</p>
</li>
<li><p><strong>Trailing commas are permitted</strong>, enabling cleaner edits and diffs;</p>
</li>
<li><p><strong>Unquoted keys by default</strong>, unless ambiguous, preserving clarity and brevity;</p>
</li>
</ul>
<p>As a <strong>strict subset of YAML</strong>, KYAML ensures all valid KYAML is still valid YAML — so existing parsers and tools continue to work seamlessly.</p>
<h2 id="heading-kyaml-in-action-kubernetes-v134-alpha"><strong>KYAML in Action: Kubernetes v1.34 (Alpha)</strong></h2>
<p>In Kubernetes <strong>v1.34</strong> (released August 27, 2025), KYAML is introduced as an <strong>alpha feature</strong>, meaning it’s optional — but available for experimentation.</p>
<p>You can enable KYAML output in <code>kubectl</code> using:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> KUBECTL_KYAML=<span class="hljs-literal">true</span>
kubectl get &lt;resource&gt; -o kyaml
</code></pre>
<p>All existing YAML and JSON output formats remain supported.</p>
<h2 id="heading-why-kyaml-matters"><strong>Why KYAML Matters</strong></h2>
<p>KYAML isn’t just syntactic sugar — it addresses real pain points:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Challenge</strong></td><td><strong>YAML</strong></td><td><strong>JSON</strong></td><td><strong>KYAML (v1.34)</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Type coercion</td><td>Implicit, often surprising</td><td>None</td><td>Strings always quoted</td></tr>
<tr>
<td>Indentation issues</td><td>Very sensitive</td><td>None</td><td>Flow-style avoids indentation reliance</td></tr>
<tr>
<td>Comments</td><td>Supported</td><td>Not supported</td><td>Supported</td></tr>
<tr>
<td>Trailing commas</td><td>Optional</td><td>Not allowed</td><td>Allowed</td></tr>
<tr>
<td>Tooling compatibility</td><td>Broad</td><td>Broad</td><td>Fully compatible with YAML tools</td></tr>
</tbody>
</table>
</div><p>Early adopters report KYAML can reduce deployment errors, especially in GitOps workflows, audit processes, and CI/CD contexts.</p>
<h2 id="heading-the-road-ahead-kep5295-and-community-feedback"><strong>The Road Ahead: KEP‑5295 and Community Feedback</strong></h2>
<p>KYAML is formalized under <strong>KEP‑5295</strong>, introduced by <strong>SIG CLI</strong>. The proposal includes:</p>
<ul>
<li><p>KYAML as a new <code>kubectl</code> output format.</p>
</li>
<li><p>Plans to eventually make KYAML the <strong>standard format</strong> for Kubernetes documentation and examples.</p>
</li>
</ul>
<p>Community reactions are mixed. On Reddit, some users praise KYAML for retaining compatibility while reducing ambiguity, while others feel it’s “uglier” or slower to type due to braces and quotes.</p>
<h2 id="heading-sample-comparison"><strong>Sample Comparison</strong></h2>
<p><strong>Traditional YAML (error-prone):</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">country:</span> <span class="hljs-literal">NO</span>
  <span class="hljs-attr">version:</span> <span class="hljs-number">1.0</span>
</code></pre>
<p>Possible pitfalls: <code>country</code> becomes boolean false, floats inferred, and indentation errors risk breakage.</p>
<p><strong>KYAML Equivalent:</strong></p>
<pre><code class="lang-plaintext">apiVersion: "v1"
kind: "ConfigMap"
data: {
  country: "NO",
  version: "1.0",
}
</code></pre>
<p>Clear types, predictable structure — comments allowed, trailing comma included.</p>
<h2 id="heading-real-world-comparison-yaml-vs-kyaml-in-action"><strong>Real-World Comparison: YAML vs KYAML in Action</strong></h2>
<p>Let’s compare the actual output of a real Kubernetes object — the <code>kubernetes</code> service running in the <code>default</code> namespace—retrieved via <code>kubectl</code>.</p>
<h3 id="heading-yaml-output"><strong>YAML Output</strong></h3>
<pre><code class="lang-bash">kubectl get svc kubernetes -o yaml
</code></pre>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">creationTimestamp:</span> <span class="hljs-string">"2025-09-06T12:12:51Z"</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">component:</span> <span class="hljs-string">apiserver</span>
    <span class="hljs-attr">provider:</span> <span class="hljs-string">kubernetes</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">kubernetes</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">default</span>
  <span class="hljs-attr">resourceVersion:</span> <span class="hljs-string">"243"</span>
  <span class="hljs-attr">uid:</span> <span class="hljs-string">d1f8264c-60a1-418f-bc69-511aec01691a</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">clusterIP:</span> <span class="hljs-number">172.20</span><span class="hljs-number">.0</span><span class="hljs-number">.1</span>
  <span class="hljs-attr">clusterIPs:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-number">172.20</span><span class="hljs-number">.0</span><span class="hljs-number">.1</span>
  <span class="hljs-attr">internalTrafficPolicy:</span> <span class="hljs-string">Cluster</span>
  <span class="hljs-attr">ipFamilies:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">IPv4</span>
  <span class="hljs-attr">ipFamilyPolicy:</span> <span class="hljs-string">SingleStack</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">https</span>
    <span class="hljs-attr">port:</span> <span class="hljs-number">443</span>
    <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
    <span class="hljs-attr">targetPort:</span> <span class="hljs-number">6443</span>
  <span class="hljs-attr">sessionAffinity:</span> <span class="hljs-string">None</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">ClusterIP</span>
<span class="hljs-attr">status:</span>
  <span class="hljs-attr">loadBalancer:</span> {}
</code></pre>
<p>This format is readable but carries all the risks of whitespace sensitivity, lack of quoting, and potential type coercion (e.g., <code>None</code>, <code>Cluster</code>, or IPs could be parsed differently in various YAML parsers).</p>
<h3 id="heading-kyaml-output"><strong>KYAML Output</strong></h3>
<pre><code class="lang-bash">kubectl get svc kubernetes -o kyaml
</code></pre>
<pre><code class="lang-plaintext">---
{
  apiVersion: "v1",
  kind: "Service",
  metadata: {
    creationTimestamp: "2025-09-06T12:12:51Z",
    labels: {
      component: "apiserver",
      provider: "kubernetes",
    },
    name: "kubernetes",
    namespace: "default",
    resourceVersion: "243",
    uid: "d1f8264c-60a1-418f-bc69-511aec01691a",
  },
  spec: {
    clusterIP: "172.20.0.1",
    clusterIPs: [
      "172.20.0.1",
    ],
    internalTrafficPolicy: "Cluster",
    ipFamilies: [
      "IPv4",
    ],
    ipFamilyPolicy: "SingleStack",
    ports: [{
      name: "https",
      port: 443,
      protocol: "TCP",
      targetPort: 6443,
    }],
    sessionAffinity: "None",
    type: "ClusterIP",
  },
  status: {
    loadBalancer: {},
  },
}
</code></pre>
<p>Here we clearly see KYAML’s strengths:</p>
<ul>
<li><p><strong>All strings are explicitly quoted</strong></p>
</li>
<li><p><strong>Flow-style syntax</strong> makes the structure explicit</p>
</li>
<li><p><strong>Comments are allowed</strong> (not shown here, but supported)</p>
</li>
<li><p><strong>Trailing commas</strong> allowed for clean diffs</p>
</li>
</ul>
<p>Even a subtle misinterpretation like <code>None</code> being treated as a Python-style null (instead of a string) is avoided thanks to strict quoting.</p>
<h2 id="heading-conclusion-do-we-really-need-a-yaml-json-hybrid"><strong>Conclusion: Do We Really Need a YAML-JSON Hybrid?</strong></h2>
<p>KYAML — introduced in Kubernetes <strong>v1.34</strong> under <strong>KEP‑5295</strong> — is clearly an attempt to bring predictability and structure to the sometimes frustrating world of YAML-based configuration.</p>
<p>It solves real problems: it removes implicit typing, supports trailing commas and comments, and avoids whitespace-related bugs. It’s fully backwards compatible with YAML tooling and enables safer GitOps workflows. On paper, it’s an elegant step forward.</p>
<p>But speaking honestly… I’m still not sure if I like it.</p>
<p>At first glance, <strong>KYAML looks a lot like JSON</strong>, especially due to its <strong>flow-style syntax</strong> with <code>{}</code> and <code>[]</code>. That resemblance can be disorienting, especially when you’re expecting a YAML file. It introduces more <strong>visual noise</strong>—quotes, commas, and brackets—which can make simple configurations feel bloated compared to traditional YAML.</p>
<blockquote>
<p><em>Do we really need a hybrid between YAML and JSON?<br />Or are we just adding another format to an already overloaded toolchain?</em></p>
</blockquote>
<p>It feels like a compromise between two imperfect formats — trying to be safer than YAML while remaining more user-friendly than JSON. But compromises can sometimes bring new complexity rather than clarity.</p>
<p>That said, it’s still early days. KYAML is in <strong>alpha</strong>, and its adoption will likely depend on how tooling, IDEs, and human workflows evolve around it. If it gains traction, KYAML could become the default in Kubernetes’ future — but whether it becomes loved is another matter.</p>
<h2 id="heading-bonus-kyaml-meme-of-the-month"><strong>Bonus: KYAML Meme of the Month</strong></h2>
<p>Spotted on <strong>Twitter</strong> recently:</p>
<p><img src="https://miro.medium.com/v2/resize:fit:577/1*rI9xYerHRbfJ2WZPJ7f4VA.jpeg" alt class="image--center mx-auto" /></p>
<p>And… well, they’re not wrong.<br />KYAML definitely looks like that kind of hybrid at first sight. 🐧➕🐘 = 🤯</p>
]]></content:encoded></item><item><title><![CDATA[Streamlining Communication: Sending Webhook Messages from Heroku to Discord]]></title><description><![CDATA[In this article, we delve into the seamless integration of Heroku, a popular cloud platform, with Discord, a widely used communication platform. The focus is on sending webhook messages from Heroku to Discord, enhancing real-time updates and notifica...]]></description><link>https://devops-blog.ruicoelho.dev/streamlining-communication-sending-webhook-messages-from-heroku-to-discord</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/streamlining-communication-sending-webhook-messages-from-heroku-to-discord</guid><category><![CDATA[Heroku]]></category><category><![CDATA[discord]]></category><category><![CDATA[webhooks]]></category><category><![CDATA[cicd]]></category><category><![CDATA[JavaScript]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 14:52:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763487004915/6bd20a9a-ef4e-49ad-a273-0d895348abd4.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this article, we delve into the seamless integration of Heroku, a popular cloud platform, with Discord, a widely used communication platform. The focus is on sending webhook messages from Heroku to Discord, enhancing real-time updates and notifications for various applications. The article guides readers through the process, offering a step-by-step approach to set up and configure webhooks, ensuring efficient and automated communication between Heroku-hosted applications and Discord channels. With this integration, users can stay informed, monitor events, and streamline communication effortlessly.</p>
<p>In order to communicate between Heroku and Discord we need to create a small application to parse our webhook messages to the format from Discord. The app contains the following aspects:</p>
<h3 id="heading-features"><strong>Features</strong></h3>
<p>1. Express.js Server Setup:<br />— Utilizes the Express.js framework to create a web server.<br />— Listens on the specified port (<code>process.env.PORT</code> or defaulting to 3000).</p>
<p>2. Heroku Webhook Handling:<br />— Provides a route (<code>/heroku-webhook</code>) to handle incoming Heroku webhook events.<br />— Extracts relevant data from the Heroku webhook payload, including event data and metadata.<br />— Formats a message based on the extracted data to provide a concise summary of the Heroku event.</p>
<p>3. Discord Webhook Integration:<br />— Utilizes the Axios library to send a formatted payload to a Discord webhook.<br />— The Discord payload includes an embed with information about the Heroku event.</p>
<p>4. Environment Variable Usage:<br />— Uses <code>process.env</code> to read the <code>PORT</code> and <code>DISCORD_WEBHOOK</code> environment variables.s.<br />—Provides default values for <code>PORT</code> (3000) and logs the <code>DISCORD_WEBHOOK</code> value.</p>
<p>You can find the code I used on my <a target="_blank" href="https://github.com/WebCDC/heroku-discord">GitHub Repository</a>.</p>
<h3 id="heading-lets-code"><strong>Let’s code</strong></h3>
<p>We will need some dependencies to create our application. Let’s install them using the following commands:</p>
<pre><code class="lang-bash">npm init
npm install axios@1.6.5
npm install body-parser@1.20.2
npm install express@4.18.2
</code></pre>
<p>You should be able to obtain a package.json file similar to mine:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"heroku-discord"</span>,
  <span class="hljs-attr">"version"</span>: <span class="hljs-string">"1.0.0"</span>,
  <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Parser for heroku webhook messages to discord webhook messages"</span>,
  <span class="hljs-attr">"main"</span>: <span class="hljs-string">"index.js"</span>,
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"test"</span>: <span class="hljs-string">"echo \"Error: no test specified\" &amp;&amp; exit 1"</span>
  },
  <span class="hljs-attr">"repository"</span>: {
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"git"</span>,
    <span class="hljs-attr">"url"</span>: <span class="hljs-string">"git+https://github.com/WebCDC/heroku-discord.git"</span>
  },
  <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Rui Coelho"</span>,
  <span class="hljs-attr">"license"</span>: <span class="hljs-string">"ISC"</span>,
  <span class="hljs-attr">"bugs"</span>: {
    <span class="hljs-attr">"url"</span>: <span class="hljs-string">"https://github.com/WebCDC/heroku-discord/issues"</span>
  },
  <span class="hljs-attr">"homepage"</span>: <span class="hljs-string">"https://github.com/WebCDC/heroku-discord#readme"</span>,
  <span class="hljs-attr">"dependencies"</span>: {
    <span class="hljs-attr">"axios"</span>: <span class="hljs-string">"^1.6.5"</span>,
    <span class="hljs-attr">"body-parser"</span>: <span class="hljs-string">"^1.20.2"</span>,
    <span class="hljs-attr">"express"</span>: <span class="hljs-string">"^4.18.2"</span>
  }
}
</code></pre>
<p>Let’s create an index.js file:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> express = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express'</span>);
<span class="hljs-keyword">const</span> bodyParser = <span class="hljs-built_in">require</span>(<span class="hljs-string">'body-parser'</span>);
<span class="hljs-keyword">const</span> axios = <span class="hljs-built_in">require</span>(<span class="hljs-string">'axios'</span>);

<span class="hljs-keyword">const</span> app = express();
<span class="hljs-keyword">const</span> port = process.env.PORT || <span class="hljs-number">3000</span>;
<span class="hljs-keyword">const</span> webhook = process.env.DISCORD_WEBHOOK;

<span class="hljs-built_in">console</span>.log(webhook)

app.use(bodyParser.json());

app.post(<span class="hljs-string">'/heroku-webhook'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
  <span class="hljs-comment">// Extract relevant data from Heroku webhook payload</span>
  <span class="hljs-keyword">const</span> herokuEventData = req.body.data;
  <span class="hljs-keyword">const</span> herokuEventMetada = req.body.webhook_metadata;


  <span class="hljs-keyword">const</span> message = <span class="hljs-string">`Event ID =&gt; <span class="hljs-subst">${herokuEventMetada.event.id}</span>\nEvent Type =&gt; <span class="hljs-subst">${herokuEventMetada.event.include}</span>\nTriggered by =&gt; <span class="hljs-subst">${herokuEventData.user.email}</span>\nStatus =&gt; <span class="hljs-subst">${herokuEventData.status}</span>`</span>


  <span class="hljs-comment">// Transform data to match Discord webhook payload structure</span>
  <span class="hljs-keyword">const</span> discordPayload = {
    embeds: [
      {
        title: <span class="hljs-string">'Heroku Notification'</span>,
        description: message,
        fields: [
          { name: <span class="hljs-string">'App Name'</span>, value: herokuEventData.app.name, inline: <span class="hljs-literal">true</span> },
          <span class="hljs-comment">// Add more fields as needed</span>
        ],
      },
    ],
  };

  <span class="hljs-comment">// Send payload to Discord webhook</span>
  axios.post(<span class="hljs-string">`<span class="hljs-subst">${webhook}</span>`</span>, discordPayload)
    .then(<span class="hljs-function">() =&gt;</span> res.sendStatus(<span class="hljs-number">200</span>))
    .catch(<span class="hljs-function"><span class="hljs-params">error</span> =&gt;</span> {
      <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'Error sending Discord webhook:'</span>, error);
      res.sendStatus(<span class="hljs-number">500</span>);
    });
});

app.listen(port, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Server is running on port <span class="hljs-subst">${port}</span>`</span>);
});
</code></pre>
<p>Let’s break this code down.</p>
<h3 id="heading-dependencies"><strong>Dependencies</strong></h3>
<ul>
<li><p><code>express</code>: A web application framework for Node.js.</p>
</li>
<li><p><code>body-parser</code>: Middleware to parse incoming request bodies in a middleware before your handlers.</p>
</li>
<li><p><code>axios</code>: A promise-based HTTP client for the browser and Node.js.</p>
</li>
</ul>
<pre><code class="lang-bash">const express = require(<span class="hljs-string">'express'</span>);
const bodyParser = require(<span class="hljs-string">'body-parser'</span>);
const axios = require(<span class="hljs-string">'axios'</span>);
</code></pre>
<h3 id="heading-application-setup"><strong>Application Setup</strong></h3>
<ul>
<li><p>Creates an instance of the Express application.</p>
</li>
<li><p>Defines the port for the server to run on (using the environment variable <code>PORT</code> or defaulting to 3000).</p>
</li>
<li><p>Retrieves the Discord webhook URL from the environment variable <code>DISCORD_WEBHOOK</code>.</p>
</li>
</ul>
<pre><code class="lang-bash">const app = express();
const port = process.env.PORT || 3000;
const webhook = process.env.DISCORD_WEBHOOK;
</code></pre>
<h3 id="heading-middleware-setup"><strong>Middleware Setup</strong></h3>
<ul>
<li>Uses <code>body-parser</code> middleware to parse incoming JSON requests.</li>
</ul>
<pre><code class="lang-bash">app.use(bodyParser.json());
</code></pre>
<h3 id="heading-webhook-handling"><strong>Webhook Handling</strong></h3>
<ul>
<li><p>Defines a route (<code>/heroku-webhook</code>) to handle incoming POST requests from Heroku.</p>
</li>
<li><p>Extracts relevant data from the Heroku webhook payload.</p>
</li>
<li><p>Formats a message using the extracted data.</p>
</li>
<li><p>Creates a Discord payload with an embed object containing the formatted message and additional information.</p>
</li>
<li><p>Sends the payload to the specified Discord webhook.</p>
</li>
</ul>
<pre><code class="lang-typescript">app.post(<span class="hljs-string">'/heroku-webhook'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
  <span class="hljs-comment">// ... (Handling Heroku webhook data)</span>

  <span class="hljs-comment">// Transform data to match Discord webhook payload structure</span>
  <span class="hljs-keyword">const</span> discordPayload = {
    embeds: [
      {
        title: <span class="hljs-string">'Heroku Notification'</span>,
        description: message,
        fields: [
          { name: <span class="hljs-string">'App Name'</span>, value: herokuEventData.app.name, inline: <span class="hljs-literal">true</span> },
          <span class="hljs-comment">// Add more fields as needed</span>
        ],
      },
    ],
  };

  <span class="hljs-comment">// Send payload to Discord webhook</span>
  axios.post(<span class="hljs-string">`<span class="hljs-subst">${webhook}</span>`</span>, discordPayload)
    .then(<span class="hljs-function">() =&gt;</span> res.sendStatus(<span class="hljs-number">200</span>))
    .catch(<span class="hljs-function"><span class="hljs-params">error</span> =&gt;</span> {
      <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'Error sending Discord webhook:'</span>, error);
      res.sendStatus(<span class="hljs-number">500</span>);
    });
});
</code></pre>
<h3 id="heading-server-initialization"><strong>Server Initialization</strong></h3>
<ul>
<li>Starts the server and listens on the specified port.</li>
</ul>
<pre><code class="lang-bash">app.listen(port, () =&gt; {
  console.log(`Server is running on port <span class="hljs-variable">${port}</span>`);
});
</code></pre>
<h3 id="heading-get-discord-webhook-token"><strong>Get Discord WebHook Token</strong></h3>
<p>To get a webhook for Discord, go to Server Settings &gt; Integrations &gt; Webhooks, then create a webhook and copy the provided URL for message posting.</p>
<p>Press enter or click to view image in full size</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*wRM0Vq6G1hekPNlkcJXOcg.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-executing-the-project"><strong>Executing the project</strong></h3>
<p>Now, we need to define our webhook as an environment variable. This can be achieved by using <code>dotenv</code>. I will employ the export command in Linux</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> DISCORD_WEBHOOK=https://discord.com/api/webhooks/...
</code></pre>
<p>Now you just need to execute your application:</p>
<pre><code class="lang-bash">node index.js

https://discord.com/api/webhooks/...
Server is running on port 3000
</code></pre>
<p>Your application will need public access, and you can deploy it using Heroku as well.</p>
<h3 id="heading-set-webhook-on-heroku"><strong>Set webhook on Heroku</strong></h3>
<p>On you project (the one that you want to send messages from webhooks) go to More -&gt; View Webhooks -&gt; Create Webhook</p>
<p><img src="https://miro.medium.com/v2/resize:fit:283/1*zh4l5c7Zr3RJPJdSwtTWtg.png" alt class="image--center mx-auto" /></p>
<p>Configure your webhook settings and save them.</p>
<p><strong>Note</strong>: The webhook URL should be the address for the application that is currently under development.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:403/1*5YwC31ugdAP8v6mCjxiqKQ.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-test-notifications"><strong>Test notifications</strong></h3>
<p>Now you can initiate a build for the application where you want to utilize the webhook. In your Discord channel, you should receive a notification similar to the following:</p>
<p><img src="https://miro.medium.com/v2/resize:fit:374/1*eIClXBebNo_xKe7aG7m-tw.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-conclusion"><strong>Conclusion</strong></h3>
<p>The integration of webhook messages from Heroku to Discord offers a powerful solution for streamlining communication in your development workflow. By seamlessly connecting these platforms, you enhance real-time updates and notifications, fostering collaboration and efficiency among your team members. The straightforward implementation and flexibility of webhooks empower you to tailor the communication process to fit your specific needs, creating a more responsive and dynamic development environment. As technology continues to evolve, embracing tools like webhooks not only simplifies communication but also contributes to a more connected and agile development process. Ultimately, by leveraging the synergy between Heroku and Discord, you can propel your project forward with timely and relevant information, enhancing overall productivity and success.</p>
<p>You can find the code I used on my <a target="_blank" href="https://github.com/WebCDC/heroku-discord">GitHub Repository</a>.</p>
]]></content:encoded></item><item><title><![CDATA[GitHub Pages with custom DNS]]></title><description><![CDATA[GitHub Pages is a static site hosting service offered by GitHub. It allows users to host personal, organization, or project pages directly from a GitHub repository. This service is commonly used for hosting documentation, personal blogs, or simple we...]]></description><link>https://devops-blog.ruicoelho.dev/github-pages-with-custom-dns</link><guid isPermaLink="true">https://devops-blog.ruicoelho.dev/github-pages-with-custom-dns</guid><category><![CDATA[GitHub]]></category><category><![CDATA[github-actions]]></category><category><![CDATA[dns]]></category><category><![CDATA[Devops]]></category><category><![CDATA[GitHubPages]]></category><dc:creator><![CDATA[Rui Coelho]]></dc:creator><pubDate>Tue, 18 Nov 2025 14:47:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763487029120/578558ee-92a9-4671-9c78-3d49753007f8.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>GitHub Pages is a static site hosting service offered by GitHub. It allows users to host personal, organization, or project pages directly from a GitHub repository. This service is commonly used for hosting documentation, personal blogs, or simple websites.</p>
<h2 id="heading-key-features"><strong>Key Features</strong></h2>
<ul>
<li><p><strong>Static Site Hosting</strong>: GitHub Pages hosts static websites directly from a GitHub repository, making it easy to publish and maintain web content.</p>
</li>
<li><p><strong>Jekyll Integration</strong>: It supports Jekyll, a popular static site generator, allowing for easy customization and theming of websites.</p>
</li>
<li><p><strong>Custom Domain Support</strong>: GitHub Pages allows users to use a custom domain for their websites, providing flexibility in branding and hosting.</p>
</li>
<li><p><strong>Version Control Integration</strong>: Since it is integrated with GitHub, version control and collaboration features are readily available for managing website content.</p>
</li>
<li><p><strong>Free Hosting</strong>: GitHub Pages provides free hosting for static websites, making it an attractive option for personal and small-scale projects.</p>
</li>
</ul>
<h2 id="heading-getting-started"><strong>Getting started</strong></h2>
<p>To get started with GitHub Pages, you can create a new repository on GitHub and enable GitHub Pages in the repository settings. You can then customize your website using Jekyll or by simply pushing HTML, CSS, and JavaScript files to the repository.</p>
<h2 id="heading-custom-dns"><strong>Custom DNS</strong></h2>
<p>GitHub Pages allows you to use a custom domain for your websites, which means you can point a domain you own to your GitHub Pages site. This is often referred to as setting up custom DNS for GitHub Pages. By configuring the DNS settings for your domain, you can make it resolve to your GitHub Pages site, effectively associating your custom domain with your GitHub Pages website.</p>
<h2 id="heading-create-cname-file"><strong>Create CNAME file</strong></h2>
<p>To create a CNAME file for your custom domain, follow these steps:</p>
<ol>
<li><p>Create a new file in the root of your GitHub Pages repository.</p>
</li>
<li><p>Name the file “CNAME” (without any file extension).</p>
</li>
<li><p>Open the “CNAME” file and add your custom domain (e.g., example.com) as the content of the file.</p>
</li>
</ol>
<p>You can consult an example at <a target="_blank" href="https://github.com/user-cube/devops-cheatsheet/blob/main/CNAME">CNAME</a>.</p>
<h2 id="heading-crate-dns-cname-record"><strong>Crate DNS CNAME record</strong></h2>
<p>Navigate to your DNS provider and create a CNAME record that points your subdomain to the default domain for your site. For example, if you want to use the subdomain <code>www.example.com</code> for your user site, create a CNAME record that points <code>www.example.com</code> to <code>&lt;user&gt;.github.io</code>.</p>
<p>If you want to use the subdomain <code>another.example.com</code> for your organization site, create a CNAME record that points <code>another.example.com</code> to <code>&lt;organization&gt;.github.io</code>. The CNAME record should always point to <code>&lt;user&gt;.github.io</code> or <code>&lt;organization&gt;.github.io</code>, excluding the repository name. For more information about how to create the correct record, see your DNS provider’s documentation. For more information about the default domain for your site, see “<a target="_blank" href="https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages#types-of-github-pages-sites">About GitHub Pages</a>”.</p>
<h3 id="heading-configure-dns-with-cloudflare"><strong>Configure DNS with Cloudflare</strong></h3>
<p>If you are using Cloudflare as your DNS provider, follow these steps to configure your custom domain:</p>
<ol>
<li><p>Log in to your Cloudflare account.</p>
</li>
<li><p>Navigate to your domain’s DNS settings.</p>
</li>
<li><p>Add a CNAME record with the name “www” and point it to <code>&lt;user&gt;.github.io</code>.</p>
</li>
<li><p>Ensure the CNAME record is proxied through Cloudflare to benefit from their security and performance features.</p>
</li>
</ol>
<p>Press enter or click to view image in full size</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*pboXWGhvoXezgPUvL1lieg.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-configure-github-pages"><strong>Configure GitHub Pages</strong></h2>
<p>To set up a <code>www</code> or custom subdomain, such as <code>www.example.com</code> or <code>blog.example.com</code>, you must add your domain in the repository settings. After that, configure a CNAME record with your DNS provider.</p>
<p>On GitHub, navigate to your site’s repository.</p>
<p>Under your repository name, click Settings. If you cannot see the “Settings” tab, select the dropdown menu, then click Settings.</p>
<p>Screenshot of a repository header showing the tabs. The “Settings” tab is highlighted by a dark orange outline.</p>
<p>In the “Code and automation” section of the sidebar, click Pages.</p>
<p>Press enter or click to view image in full size</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*uZcwKzXY6GhK4nHmTti13A.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item></channel></rss>