Adopting Infrastructure as Code at Simplificator
08.02.2024 - Andy PfisterYou might think, “Infrastructure as Code? This is quite an old topic”. And you are right: If we check Google Trends, it started to take off in 2016. But we can also observe that the topic is still going strong today, so it’s an excellent opportunity to tell you about the Infrastructure as Code solution we adopted at Simplificator in the last three months.
Infrastructure size
Simplificator currently maintains about 30 servers. Most of them are servers where we run customer applications, but a couple are also used to support infrastructure, like our monitoring, backup, and ticketing system. The number of servers also only grows by a few each year.
The current process involves many manual steps: We log into the web interface of our hosting provider, order the vServer according to a couple of parameters documented in our internal playbook, and manually add DNS records. Then, we add the server to a provisioning repository, where we run Ansible to install base infrastructure like Docker or monitoring on the server.
The process is documented quite well, but errors still happen occasionally. Sometimes, people forget to enable IPv6 or add the corresponding AAAA DNS record. Sometimes, an outdated SSH key is provisioned to the server, so Ansible cannot access the servers — small things are avoidable and ultimately add up in time spent troubleshooting issues. Those mistakes definitely can be avoided if these conventions are just automated by an IaC tool.
In conclusion, we wanted Infrastructure as Code not because we have a large server farm but to avoid mistakes and manual work.
Terraform
Terraform might be the obvious choice when realizing Infrastructure as Code. In 2020, we realized a customer project with Terraform. They needed to manage a large number of servers, and their cloud provider offered a good Terraform provider.
While everything worked out with Terraform for our customer, it did not feel like a good solution for Simplificator. From our perspective, Terraform’s DSL felt limiting and not intuitive. For example, Terraform v0.12 was released at the time of the customer project, and we hit a limit since it was impossible to code a loop within a loop. We had to update to a beta version, convert all our data structures to maps, and manually move all the Terraform state to match the new data structure in order to avoid Terraform replacing our entire infrastructure.
I’m sure this is better nowadays, as Terraform issued new releases. But still, since we were not in a hurry to implement Infrastructure as Code and wanted something that felt more suitable for us, we passed on Terraform.
Pulumi
We discovered Pulumi during one of our hackdays. Pulumi allows us to do Infrastructure as Code using a programming language like Go, Python, or Typescript. Pulumi inspects the resource definition you wrote in your code and sends all these definitions to its CLI, which will then check against one or more providers to see which resources must be created, which need to be updated, and which need to be destroyed. It’s kind of like Terraform works as well, or desired state configuration tools in general.
We decided to use Typescript for our Pulumi project, a language all at Simplificator can write. To provision a server, we first defined a new function:
export const makeServer = async ({
name,
version = 0,
diskSize = 10,
}: {
name: string;
version?: number;
diskSize?: number;
}): Promise<MakeServerResponse> => {
const templateId = await templateIdToUse(name);
const computeInstance = new ComputeInstance(`${name}-${version}`, {
type: "standard.tiny",
templateId,
zone,
diskSize,
ipv6: true,
name: name,
securityGroupIds: [http.id, ssh.id, netdata.id, internet.id, ntp.id],
sshKey: sshKey.name,
});
const dnsRecordIpV4 = new DomainRecord(`${name}-${version}-ipv4`, {
domain: internalDomain.id,
recordType: "A",
content: computeInstance.publicIpAddress,
name: `${name}.s`
})
const dnsRecordIpV6 = new DomainRecord(`${name}-${version}-ipv6`, {
domain: internalDomain.id,
recordType: "AAAA",
content: computeInstance.ipv6Address,
name: `${name}.s`
})
return {
computeInstance,
dnsRecordIpV4,
dnsRecordIpV6,
}
};
This function encapsulates a couple of conventions, which I mentioned earlier:
- IPv6 gets enabled by default.
- Our Cloud provider relies on security groups to open ports. We always need the same ones, so this is also in the code now.
- We also provision two DNS records, one for IPv4 and one for IPv6, depending on the outcome of the server resource creation.
Then, we call this function from our index.ts
file with the servers we want to create:
const servers = await Promise.all(
[
{ name: "simplificator-example-production-02", diskSize: 15 },
{ name: "simpli-example-staging-01", diskSize: 10 },
].map(makeServer)
);
Pulumi and Terraform
An interesting detail we noted during our research was that Pulumi somewhat depends on Terraform. For most of the cloud providers, Pulumi uses an internal tool called pulumi-terraform-bridge to build Pulumi definitions out of Terraform schema providers. Pulumi announced in 2021 that they started to ship “native” packages, which are automatically built based on API definitions of the respective Cloud provider.
In other words, if you do not use one of the big three (Google, AWS, Azure), the Pulumi package will likely be behind a few days of Terraform until they rebuild their package. For us, this has not caused any trouble with our Cloud provider so far, but it is a detail to keep in mind.
What’s next?
We decided not to mess with our existing tenant on our Cloud provider but rather opened a new one under Pulumi’s total control. We are slowly re-creating servers there and migrating systems. We need to do this anyway, as we still run a couple of Debian 10 servers that need to be updated to Debian 12.
Overall, we are pretty happy with Pulumi so far. Writing the resource definitions in Typescript eliminates learning a new DSL altogether. An engineer at Simplificator can now create a server in about two lines of code instead of a half-hour-long manual process. And, of course, all the infrastructure definitions are documented in Git, so changes are reviewed, and we can read up in the future why this simplificator-example-production-02
is even there.