Questions to ask application teams as a DevOps Engineer
3 min readSep 23, 2024
As a DevOps engineer, interacting with application teams across various capabilities, the goal is to ensure alignment, performance, scalability, and security throughout the application lifecycle. Here are a few key questions a DevOps engineer should ask to gather the right insights and serve the teams effectively:
- Infrastructure & Environment Setup
— What environments do you need (e.g., dev, staging, production)?
— Are your environments consistent across different stages (dev/prod parity)?
— How is the current infrastructure provisioned (manual, Infrastructure as Code)?
— Are there any specific cloud provider preferences (AWS, Azure, GCP)?
— What’s the expected scaling pattern (auto-scaling, static provisioning)? - Continuous Integration (CI)
— How are your builds currently triggered (manually, scheduled, on commit)?
— What tests are automated in your CI pipeline (unit, integration, functional)?
— Are there any issues with build time or flaky tests?
— How do you handle environment-specific configurations in the CI process?
— What version control system (VCS) is used, and do you follow Git branching strategies (GitFlow, trunk-based)? - Continuous Delivery/Deployment (CD)
— Are deployments manual or automated? What tool(s) do you use (Jenkins, GitLab CI, CircleCI, etc.)?
— How do you handle canary deployments, blue-green deployments, or feature toggles?
— What’s the rollback strategy in case of a failed deployment?
— What’s the frequency of releases, and how confident are you in release quality?
— How do you ensure environment variables and secrets are securely managed during deployment? - Configuration Management
— Are you using any configuration management tools (Ansible, Puppet, Chef)?
— How is configuration stored and versioned?
— How often do configuration changes happen, and how are they deployed?
— How do you handle secrets management (e.g., Vault, AWS Secrets Manager)? - Monitoring & Logging
— What tools do you use for monitoring (Prometheus, Grafana, CloudWatch)?
— Are there any specific KPIs or metrics you need to track (e.g., error rates, latencies, throughput)?
— How are logs managed (centralized, distributed)?
— What log aggregation and visualization tools are in place (ELK stack, Splunk)?
— How do you handle alerting? Are thresholds dynamic or static - Security & Compliance
— What security tools are in place (static code analysis, dependency scanning)?
— How do you ensure compliance with data protection regulations (GDPR, HIPAA)?
— How are secrets and credentials managed and rotated?
— What are the security practices for containerized workloads (image scanning, vulnerability assessment)?
— Do you follow any specific security frameworks (ISO, NIST, SOC2) - Containers & Orchestration
— Are you using containers? If so, what tools (Docker, Podman)?
— Are containers orchestrated (Kubernetes, Docker Swarm)? How is the cluster managed?
— How are container images built, stored, and secured (private registry)?
— How do you handle scaling and resource allocation in the cluster (horizontal scaling, resource limits)? - Cost Optimization
— How is cost tracking and optimization managed across cloud and infrastructure resources?
— Are there any cost anomalies or areas where you suspect overspending?
— How are unused resources (idle VMs, unallocated storage) tracked?
— How often do you review infrastructure costs, and what tools do you use (AWS Cost Explorer, GCP Cost Management)? - Collaboration & Workflow
— How do different teams collaborate (Dev, Ops, QA) in the delivery process?
— How do you handle code reviews, approvals, and merges?
— What documentation practices do you follow for infrastructure and deployments?
— Do you use a shared knowledge base (Confluence, Notion)?
— What challenges do you face in the collaboration between teams - Incident Management & Disaster Recovery
— How are incidents reported and resolved (incident tracking system)?
— What’s the current mean time to resolution (MTTR)?
— How do you handle post-mortem analyses?
— What’s the disaster recovery (DR) strategy, and is it tested regularly?
— What’s the backup strategy for critical data and services? - Automation & Tooling
— What parts of the development and operations process are automated? Where do you see gaps?
— What tools do you use for task automation (scripts, CI/CD tools)?
— Are there any pain points where automation could improve efficiency?
— What integrations do you need with existing tools (e.g., Slack, Jira, ServiceNow)?
By asking these questions, a DevOps engineer can assess where teams are excelling and where improvements can be made across infrastructure, deployment processes, security, monitoring, and more. This helps in crafting a tailored approach for DevOps strategies and automation efforts.