Guest Post: Managing the Virtual Private Cloud with Terraform

pr-3210_terraform-9

Today we’re publishing a guest post written by our client Alexey Vakhov. Alexey is the CTO of Uchi.ru, a company that develops an educational platform under the same name and also hosts interactive competitions (olympiads) for schoolchildren. The entire Uchi.ru infrastructure is built on our Virtual Private Cloud.
Alexey Vakhov gives a detailed account of how he and his colleagues use the utility Terraform for automating the setup and support of a virtual infrastructure. We hope his experience will be interesting for our other VPC users.

We’ve been working on a web platform for school-level education for over four years now. On our platform, hundreds of thousands of students from all over Russia solve interactive problems and regularly participate in competitions (olympiads). The system now contains more than ten production servers. Some of these service outside visitors and handle 500-800 requests a second; others run our applications for internal users.

Our flagship product is https://uchi.ru. In addition to working on the site, we host online olympiads for math, business, and Russian language. The productions for olympiads, like for any event with specific dates, experience extremely uneven traffic. During the primary round, the site is put under a lot of stress and it’s our job to manage this; that’s why it’s important that everything technical runs smoothly. After giving out awards and calculating the totals, the olympiad production can take it easy.

About a year ago, we moved almost our entire infrastructure from dedicated servers to the cloud and are very content. The ability to quickly buy as many additional servers as we need is especially pertinent to our olympiads, since peak loads last only a few weeks. For that period, we set up a farm of a few dozen servers, and then deactivate them. For all of our other servers, the ability to quickly add resources is priceless.

Even the way we organize our infrastructure changed in the cloud. On metal servers, we had to assign multiple roles to one machine because they’re big and fairly expensive, not to mention the fact that hooking up new servers always took a few days. Over time, the system would grow fragile and lose its flexibility. Now, each production is given an isolated project with its own subnet and separate servers for each role. For example, for VPN connections we use the smallest servers with one core and 512 MB of RAM; for application servers and databases, we buy the largest servers available.

When our number of servers grew to a few dozen, we ran into a new problem: we could no longer conveniently manage configurations from the GUI.
When multiple people use the GUI, the human factor begins to play a part: imagine somebody changes something and doesn’t tell his colleagues (how can he even tell anyone about it?).

Let’s say we have 8 servers that are supposed to be identical; one may have a different version OS or a slightly different amount of RAM. Cloning or building a new production from scratch also involves a lot of manual work. This is why we decided to automate setup and infrastructure support using Terraform.

In this article, we’ll look at what Terraform is needed for and show you how to set up environments and create test servers in any OpenStack cloud, using Selectel’s cloud as an example. We’ll also talk about using this tool on a live production.

What is Terraform

Terraform is an excellent utility from HashiCorp (the creators of Vagrant, Consul, and several other tools that are well-known in specialized fields). It can be used to design, save, and change cloud infrastructures as simple templates in HCL (HashiCorp Configuration Language) format. Even though creating your own language is often not the best idea, I like HCL. It’s a JSON addon and reads easily.

We’ll give you a simplified example (in the real world, you’d have to enter a few more attributes) for creating servers in Selectel’s cloud, which almost fully demonstrates all of the syntactical nuances we need:

resource "openstack_blockstorage_volume_v1" "disk" {
  name    	= "disk"
  region  	= "ru-1"
  size    	= 10
}
 
 
resource "openstack_compute_instance_v2" "server" {
  name    	= "server"
  flavor_name = "flavor-1024-1"
  region  	= "ru-1"
 
 
  block_device {
    uuid   = "${openstack_blockstorage_volume_v1.disk.id}"
  }
}

If we saved this code as a .tf file, configure the necessary access tokens, and call the ‘terraform apply’ command in the console, we’ll get exactly what we expect: Terraform will first create a disk and only afterwards a server based on that new disk.

If we launch ‘terraform apply’ again, the utility will see that the disk and server already exist and won’t do anything. It can also easily add memory to a server by changing the ‘flavor_name’ value to ‘flavor-2048-1’ and again calling ‘terraform apply’. Terraform will predictably update only the RAM and won’t touch the disk. The command ‘terraform destroy’ deletes all created resources (and can forget the DNS records from the test servers, forgotten disks, and other leftovers that would otherwise annoy any perfectionist).

Before using Terraform, thoroughly read through the official documentation (it’s good, clear, and can be read in one sitting) and take a few days to experiment with test infrastructures, preferably on a different account. This is very important: the consequences of using Terraform without fully understanding its subtleties may be irreparable. Experimenting also helps you figure out how Terraform works with states.

Let’s take an in-depth look at Terraform scripts.

Preparation

To start with, we have to set up an environment. I’ll tell you how we do this, but I think that if you’ve already scripted your infrastructure, then you probably known what needs to be automated. I recommend reviewing the full list of supported providers; some may come as a surprise, like Grafana, PostgreSQL, Heroku, and many more. Who knows, something may come in handy while you design your infrastructure.

For now we’ll only be managing servers and DNS records.

We’ll need:

  • a user with access to the project;
  • an Openstack console utility;
  • OS image ID;
  • a prepared set of environment variables.

Set up a new project, user, and resource quota from the GUI (this can also be scripted, just not with Terraform). Save the project ID: you’ll need it when configuring access to your OpenStack provider. Access the project and create a local network if one hasn’t already been created (in Selectel’s cloud, local networks are created automatically if you order a floating IP address).

Then install console utilities for getting several internal identifiers that are not available from the GUI. Detailed installation instructions can be found in Selectel’s article Virtual Private Cloud API: Console Clients. For convenience’s sake, we loaded them in a container and use them like this:

docker run --rm \
  -e OS_AUTH_URL=https://api.selvpc.ru/identity/v3 \
  -e OS_PROJECT_ID=#{...} \
  -e OS_USER_DOMAIN_NAME=#{...} \
  -e OS_USERNAME=#{...} \
  -e OS_PASSWORD=#{...} \
  -e OS_REGION_NAME=#{...} \
  uchiru/ostack:v2 

For OS_USER_DOMAIN_NAME we enter our login (agreement number) for the Selectel control panel, PROJECT_ID — project identifier, OS_REGION_NAME — region (ru-1 for St. Petersburg, ru-2 for Moscow), OS_USERNAME/OS_PASSWORD — user login/password (don’t forget that they should have project access).

To retrieve the image ID, run the command glance image-list and find the image you need:

root@2118c4e58238:/# glance image-list |grep 16.04
eecd3d0f-6968-40ea-bed6-4c2949bbac3d | Ubuntu-16.04 LTS 32-bit
ce532860-acef-40cd-b3c7-699c22b4dfd6 | Ubuntu-16.04 LTS 64-bit

Now we have to create our flavor. OpenStack providers usually provide a number of fixed configurations; however, Selectel is more flexible in this regard and lets you create a unique server each time (in the UI, flavors are created automatically for each new server). We’ll create flavors using the command nova. Pick a naming scheme that you’ll stick with; it’s a good idea to pick something like flavor-1024-2 for a dual-core server with a gigabyte of RAM.
Command format:

nova --is-public False flavor-create  auto  0 .

Finally, for Terraform to work properly, we have to export several environment variables:

export TF_VAR_SELECTEL_ACCOUNT=112233 # Selectel login
export TF_VAR_PROJ_ID=5b1b496..   	# project ID
export TF_VAR_USER=...            	# user login
export TF_VAR_PASSWORD=...        	# user password

And that’s it! Now we have everything we need to create our first server.

Getting to Work

Create a provider.tf file with cloud access (of course you can name it whatever you’d like):

provider "openstack" {
  domain_name = "${var.SELECTEL_ACCOUNT}"
  auth_url  = "https://api.selvpc.ru/identity/v3"
  tenant_name = "${var.PROJ_ID}"
  tenant_id = "${var.PROJ_ID}"
  user_name  = "${var.USER}"
  password  = "${var.PASSWORD}"
}

Add the file variables.tf:

variable "image_list" {
  type = "map"
  default = {
    "ubuntu-x64-1604" = "2de94623-a2a2-49e3-984d-3e6ca85e2b84"
  }
}
 
 
# variables must be declared
variable "PROJ_ID" {}
variable "SELECTEL_ACCOUNT" {}
variable "USER" {}
variable "PASSWORD" {}
 
 
# floating IP and network ID can be copied from the UI
variable "box01-floating-ip" { default = "..." }
variable "network-id"    	{ default = "..." }
variable "box01-ip"      	{ default = "192.168.0.4" }

And finally, file box01.tf with a description of the server:

resource "openstack_blockstorage_volume_v1" "disk-for-box01" {
  name    	= "disk-for-box01"
  region  	= "ru-1"
  size    	= 10
  image_id	= "#${var.image_list["ubuntu-x64-1604"]}"
  volume_type = "basic.ru-1a"
}
 
 
resource "openstack_compute_instance_v2" "box01" {
  name    	= "box01"
  flavor_name = "flavor-1024-1"
  region  	= "ru-1"
 
 
  network {
    uuid    	= "${var.network-id}"
    fixed_ip_v4 = "${var.box01-ip}"
    floating_ip = "${var.box01-floating-ip}"
  }
 
 
  metadata = {
    "x_sel_server_default_addr"  = "{\"ipv4\":\"\"}"
  }
 
 
  block_device {
    uuid   = "${openstack_blockstorage_volume_v1.disk-for-box01.id}"
    source_type  	= "volume"
    boot_index   	= 0
    destination_type = "volume"
  }
}

Then we run the command ‘terraform apply’ and in a few seconds, your first server will be ready! We assign IP addresses manually to give us more control, plus, unless otherwise specified, OpenStack assigns a new address whenever resources are changed, which isn’t always convenient.

The Terraform syntax is thoroughly self-documented.
You can build the constructions you need from these resource blocks. The utility perceives dependencies on its own and, for example, will create a server only after a disk has been created. However, things that can be done simultaneously will be.

Of course, it’s much easier and faster to create one server from the GUI. I think that the real benefit of Terraform starts when you create a few dozen servers, especially if you also script DNS records, S3 buckets, and other services.

Changing Configurations

Adding or removing memory (or disks) is very simple: change the flavor and call the command ‘terraform apply’. Be aware that by running this command, your server will restart!

Disk sizes should be changed carefully since Terraform recreates them from scratch if done from a script. There is a workaround though: increase the size from the UI and in the corresponding tf scripts. Terraform will understand what’s happened and won’t touch the disk.

Adding a server is easiest of all and is done by simply copying the file box01.tf and updating the corresponding variables. For our projects, we save tf files in 10-20 servers equally (not creating any hierarchies from files): however many servers there are, that’s how many files you’ll have.

Terraform Use Case

I’ll repeat this again: the number of errors that Terraform can cause on a live production system (and very, very quickly at that) makes even ‘truncate table’ look like child’s play. This is the most dangerous tool that I’ve ever worked with, which is why it’s important to really understand the principles behind it.

From a technical point of view, Terraform is the only binary file in Go that can be compiled, unpacked, and debugged. All of the magic is implemented from a state file and various providers for different services.

The typical scene looks like this: You enter the desired configuration in tf scripts. The state file at this time is empty. You run the command ‘terraform plan’. This is the nicest and safest command: it doesn’t change anything, just records which changes the utility made if ‘terraform apply’ is executed.
Then you run ‘terraform apply’ and only then will real servers, disks, DNS records, etc. be created. Information about these resources will be saved in a state file (a JSON file with a fairly simple syntax). Later on, the utility will use the state file as well as gather information from the real world and compare it with your scripts.

The reason for all the confusion and misunderstanding is that Terraform works with three objects simultaneously:

  • tf-scripts — what you want to see;
  • state file — how the infrastructure looks according to Terraform;
  • real state — how the infrastructure really looks.

During a plan request (which only differs from apply requests insofar as changes do not actually occur), Terraform reads the state file, updates the properties of every object based on its ID (in any service, each resource has its unique identifier, which is what Terraform uses for identification) and compares the received attributes.

Roughly speaking, we can say that a state is a list of identifiers for the resources that the project manages. If you manually add a server or DNS resources, then Terraform won’t recognize it. You’ll have to implicitly add it to the state. If you delete a server from the GUI, then Terraform will also require the state be updated using the command ‘terraform state rm ’.

In the last section, I recommended increasing disk size over the UI and in a tf script. How does this work? Let’s say that a 10 GB disk is initially specified in a tf script. Terraform created it and recorded the new disk’s id and size in the state file. You changed the size in the UI to 20 GB. If you launch ‘terraform plan’ now, then Terraform will learn from OpenStack that the disk size is 20 GB, but 10 GB in the tf file, so it will suggest recreating it. If you change the disk size to 20 GB, then Terraform will be certain the disk is really 20 GB and 20 GB in the scripts, and it won’t do anything. It might seem unnecessary to save the fact that the disk was 10 GB at some point in the state file since Terraform doesn’t use this attribute, and you’re right, but some providers require that several additional attributes (like the region of OpenStack resources) be saved. I suspect this is why the authors decided to save the latest values for all of a resource’s properties in the state file.

For Terraform to work properly, the state file has to be centrally saved. We save it directly in the repository, but it can also be saved to S3 or commercial services like Atlas, Consul, and several other storage services; the full list can be found in the official documentation.

Conclusion

In this article, we took a brief look at some of Terraform’s features. Unfortunately, we left out one interesting topic: importing existing infrastructures. I’ll make this quick. Just recently, the command ‘terraform import’ was made available. It hasn’t been implemented for every provider yet and is still fairly underdeveloped. The working version today adds the appropriate resource and identifier right to the state file and calls ‘terraform refresh’.

So, if you actively use cloud services and often create and delete different elements from the GUI, I recommend giving Terraform a try. Here you’ll find all of the scripts mentioned in this article, which is enough for creating servers in Terraform. Although the tool is still in its infancy and has a fairly steep learning curve, once you’ve gotten your feet wet, you’ll feel much more in control. I’d be happy to address any questions or remarks in the comments below or via e-mail at vakhov@gmail.com.

I’d like to thank the Selectel team for inviting me to write this guest post. I’d also like to invite all of today’s readers to my personal blog (in Russian). Every business day, I publish a short entry about development, concepts, tools, and the like.

Thank you for your attention!