From a small hypervisor running on your own racks to a high-traffic, multi-cloud enterprise, automation has become a necessity, not a luxury. If you’re looking to build or improve automation within your own organization, what are the best tools to help you accomplish this task?
You might think, based on the title of this blog post, that I believe Ansible and/or Terraform reign supreme in their spaces. But, I do not. I’m even inclined to believe that Chef may have a fundamentally superior approach to configuration management. I’ve architected deployments using CloudFormation and even OpsWorks stacks. And, under the right circumstances, I would do it again.
However, Ansible and Terraform are both exceptional tools that complement each other in myriad marvelous ways. Not only do they each offer fantastic advantages over their respective alternatives, but you can integrate them to accomplish more than either would be capable of alone. In this blog, I’ll share some of these with you, as well as some of the reasons that I like to use them together to manage cloud deployments.
While Chef’s approach to configuration management might employ a more elegant technical design, I find technical superiority to be less valuable than manifest simplicity. The finest tool ever to be made is useless if nobody is skilled enough to operate it. Likewise, if you’re as clever as you can be when you write it, how will you ever debug it? In other words, if using a tool pushes your technical aptitude, how can you ever expect to troubleshoot it properly?
Ansible is simple. It makes sense. The code is terse and easy to read which makes it more accessible. Junior developers can contribute with more confidence, web developers can get more engaged, and managers can make some sense out of the code. The learning curve is not steep. You won’t have to close your eyes for a second to wrap your mind around the way your executions are going to go down as you might with Chef because everything happens in a prescribed order. And whereas Puppet code can be straightforward too, you could easily find yourself sinking weeks of time into troubleshooting connectivity issues with Puppet masters. Ansible is agentless. Any system running SSH (or WinRM) can already be managed. No binary to install, no NTP sync; it just gets the job done!
But, Ansible is far more than just a configuration management solution. It’s an orchestration tool, a superior alternative to shell scripting, and a data processing solution. The creators of Ansible staunchly maintain that it is not a programming language, yet I submit that it is a framework for architecting complex applications built upon reusable and polymorphic Python modules.
With infrastructure-as-code, the question usually boils down to choosing either a cloud-native or cloud-agnostic solution. If your organization must manage a sophisticated and truly cloud-agnostic deployment strategy, then integrating each cloud’s native tooling would be extremely cumbersome. And, Terraform is the only inherently multi-cloud IaC solution. On the other hand, organizations working in a single public cloud may be inclined to use its proprietary solution—presuming it to be safer or more robust.
Despite my own bias for Terraform, there was a point in time when I regularly worked with AWS CloudFormation, assuming it to be more robust and rich in features. However, as I delved deeper into its mechanics, I found that for anything beyond very simple deployments, CloudFormation was quite cumbersome and inflexible. Sure, you can launch infrastructure from a nice web GUI with nothing but the URL of a template. But writing a template to deploy a dynamic number of instances across a dynamic number of availability zones is damn near impossible. And, don’t even get me started on custom resources!
Terraform, by contrast, is smart. It’s dynamic, extensible, and its code is simple. You must manage your own state, but the alternative with CloudFormation is having to file tickets with AWS every time something goes wrong and hoping that they can help. Initial deployments don’t just die or rollback an entire project when one trivial error is encountered, and updates can be applied with confidence because Terraform lets you target infrastructure subsets (blast radius!). And, although I’ve admittedly never used Azure Resource Manager, it appears to look like CloudFormation – if it still only accepted JSON templates.
But it makes sense when you think about it. Public cloud providers like Amazon and Azure didn’t architect their IaC solutions with guiding principles learned from years of iteration and experimentation. They simply built them to work. These aren’t their flagship products; they’re second-class citizen services, provided merely as a convenience—the proverbial cheeseburger at a Chinese food restaurant.
To be fair, though, there is one thing that I’ve found CloudFormation to be particularly useful for: provisioning the secure Terraform remote state backend infrastructure you need to get you up and running!
Combining Ansible and Terraform
Like a hammer and chisel, these tools can sculpt masterpieces; Terraform can shape the deployment that you envision, and Ansible can set it free. Here are just a few ways in which you can do more with both.
Terraform is stateful and idempotent, assuming you have already established a persistent, remote state infrastructure. If not, the same Ansible code can ensure all necessary resources on the first run without creating redundancies on subsequent ones. Ansible can also be used to initialize backends and template out Terraform variable files dynamically. Suffice it to say, if complex logic is required to configure your deployment at run-time, I would not recommend trying to implement it with Terraform.
Terraform is also verbose. Too verbose, in fact, as it has no mechanism for redacting information. That means no certificates, keys, or other sensitive information can be responsibly managed by Terraform directly. However, Terraform supports the notion of a “null_resource” that can be used to orchestrate arbitrary code execution as part of the standard resource lifecycle. And Ansible does support redaction. That means database passwords can be generated and vaulted adjacent to the code that provisions the database itself. Certificates can be created and signed a few lines above the code that manages the secure listeners that use them. Need to ensure key rotation on a defined interval in line with the continuous delivery of a fleet of instances? These tools have got you covered!
But the most powerful example I have found to date dawned on me after months of contemplating a conundrum: how do I ensure the configuration management of a fleet of secure, publicly inaccessible instances with full visibility of the execution logs in real-time? How can I access that which I have designed to be inaccessible? The solution that I ultimately arrived at was to order a few null resources (Ansible playbook runs) in a dependency chain to leverage and then destroy an ephemeral bridge. Once the infrastructure needed to access the private fleet had been established, it would hop in and provision them. The last step was to simply pass the attribute outputs of those ephemeral resources to another Ansible playbook that would burn the bridge down.
Terraform and Ansible are not necessarily the best at what they do. Well, maybe Terraform is. But every tool has its use cases and there is no “one-size-fits-all”. That said, they both warrant serious consideration when evaluating your own automation. On their own merits, they are each as dynamic and robust as the code that you are able to produce. Together, they are gestalt.