Azure API Management – Platform & DevOps

Terraform-driven APIs, CI/CD pipelines, secure networking, policy governance & end-to-end observability

This project delivers a production-grade Azure API Management (APIM) platform with fully automated infrastructure and API deployments. Responsibilities ranged from infrastructure as code and policy management to network security, monitoring, and troubleshooting of live APIs.

Azure API Management Project Overview

Key Contributions

  • Infrastructure as Code: Designed APIM instances, Products, backends, log integrations, and API artifacts with Terraform (modules & workspaces).
  • CI/CD Pipelines: Built Azure DevOps pipelines to validate/plan/apply Terraform and deploy API revisions/versions from source.
  • Policy Governance: Auth, rate-limiting, header transforms, CORS, caching, JWT validation, IP filtering—managed via reusable policy templates.
  • Networking: Managed VNET-injected APIM within subnets, with route tables, NSGs, and Application Gateway/WAF in front.
  • Key Vault Integration: Centralized secrets, certificates, and named values; automated rotation and binding to APIM & gateway.
  • Monitoring & KQL: Hooked APIM to Log Analytics/App Insights; authored KQL dashboards for latency, failures, and policy effects.
  • Troubleshooting: Resolved TLS/cert issues, backend connectivity, 500 errors, and policy misconfigurations with structured runbooks.

Technical Specifications

  • IaC: Terraform modules for APIM, Application Gateway, diagnostics, role assignments, and networking.
  • Pipelines: Multi-stage Azure DevOps (PR → Test → Prod) with approvals, artifacts, and release gates.
  • APIM Artifacts: APIs, Revisions, Versions, Products, Backends, Named Values, Policy fragments.
  • Networking: VNET/Subnet design, UDR route tables, NSGs, Private DNS, WAF policies on App Gateway.
  • Quality: Linting & validation for Terraform, unit checks for policies, smoke tests against API revisions.

What I Managed Day-to-Day

  • Routing & Security: Ensured correct egress/ingress via UDRs; locked down subnets with NSGs; maintained WAF rules.
  • Secrets & Certificates: Created/rotated Key Vault secrets; bound TLS certs to custom domains and backends.
  • Policies at Scale: Applied inbound/outbound/Backend policies by environment through parameterized fragments.
  • Diagnostics: Investigated failures using KQL, APIM trace, and App Gateway access logs.
  • Safe Releases: Used revisions + canary routing before promoting to a new version.

Troubleshooting Scenarios Solved

  • TLS / Certificate Chain: Missing intermediates & expired certs—fixed via proper chain upload & auto-rotation from Key Vault.
  • 500s from Backends: Identified policy-induced header issues, auth token mismatches, and timeouts; added retries/circuit breakers.
  • Routing Anomalies: Corrected UDR next hops and NSG rules to restore private connectivity.
  • CORS & Auth: Unified CORS in policies; standardized OAuth/JWT validation and audience/issuer checks.

Monitoring & Observability

  • Dashboards: Requests, latency, 4xx/5xx, cache hit ratio, throttling events, and backend dependency health.
  • Logging: Centralized APIM & App Gateway logs to Log Analytics; linked to alerts and action groups.
  • Alerts: SLO-based thresholds (p95 latency, error rate) with on-call routing.

Impact

  • Faster Delivery: Reproducible environments and one-click pipeline releases.
  • Better Security: Private networking, WAF, and policy standardization.
  • Higher Reliability: Proactive alerts and data-driven troubleshooting reduced MTTR.