A few years ago, the Ostrovok.ru online hotel reservation service moved its infrastructure from the Amazon Web Services cloud to Selectel servers. The migration cost paid off in just a month, and the hosting expenses halved.
At the Russian Internet Technologies 2020 festival, Denis Bozhok, head of the company’s service infrastructure department, spoke about the reasons for the switch, architectural restrictions, and the good old heavy bare metal.
On choosing Amazon Web Services
Before the move, our systems worked on a couple dozen servers in a small data center, and we had no clear plan to move to the cloud. However, as the number of hotels represented on the website grew, so did the number of pictures. By 2014, the visual content we stored amounted to about 4–5 terabytes.
To provide for further growth, we had to think about scaling up our photo storage and decide on convenient tools for storing and reserving them. That’s when the idea came to try the legendary S3 storage by AWS.
Amazon data centers are located in more than 20 regions around the world. Five regions currently exist in Europe — Milan, Paris, London, Frankfurt, and Ireland.
On the lack of needed services
Six years ago, we moved into the Frankfurt region. At that time, it was new, and the services already introduced in the older regions came there with a delay. More than once we needed features of a service that either did not exist at all or was not available in our region.
For example, at one point spot instances were introduced in our region. This is an auction of server capacity which can save you up to 90% of expense. However, there were no tools to set up automatic ordering of the required servers. You had to go to the auction console, select the instance type, set the price, and add new virtual machines to the cluster. That meant waste of time and a high chance to miss out on price changes. That was when we created our own spot manager which ordered the right servers at the right prices and added them to the cluster by itself. A similar tool, spot fleet, was introduced by Amazon Web Services only a few years later.
Similarly, we had to optimize our operation of the DynamoDB database management system by AWS. There’s a trick in it: you pay for IOPS and set your own limits on operations. That’s convenient, but then again, there was no means of automated control. So, we developed our own mechanism to set our own IOPS limits and adjust them as needed.
On top of that, for a long time, AWS did not have convenient tools for working with Docker, which we already actively used. That’s how we did things on our own before Amazon Web Services introduced the right services for us.
All in all, I can’t say any of those situations were critical. As you move from the cloud to on-premises solutions, be prepared that many of the processes previously managed by the cloud provider will now be in your area of responsibility. Your perfect solution is one that best fits your requirements and company.
On the reasons for migrating
In 2016, we began considering changing our provider and started evaluating the Russian market. There were several reasons to move from the cloud by Amazon.
First off, we wanted to optimize costs. AWS solutions are relatively convenient but cost a lot. We understood that we could use our resources more efficiently. In addition, we had to follow the Federal Law No. 152-FZ, which requires businesses and organizations to store personal data of customers within Russia.
We chose Selectel as the new host for our infrastructure. We heard of it through the grapevine: our friends already were clients there and gave good feedback. We decided to give it a try. We got in touch with the managers, discussed the terms and conditions, and agreed that Selectel would help, should we hit the capacity limit.
We liked the offer and pulled the trigger, starting the migration from the cloud to hardware.
A rough estimate says our service costs halved after moving to Selectel servers. Of course, this does not mean any move from a foreign provider to a domestic one will be as profitable. However, in our experience, efforts spent on migration usually pay off within the first month already. A nice bonus for Ostrovok.ru was that instead of virtual servers, we got physical equipment that fully met our requirements. At the moment, we have about 400 servers at Selectel.
To be fair, I’d like to add that the photos that originally made us move to Amazon Web Services are still there in the S3 storage. The space taken up by photos has ballooned to 70 terabytes since 2014. Everything else — personal databases, search clusters, and support services — was moved. We have plans to migrate the rest, but this is not going to be easy.
All I can say here is, everyone is going to find their own traps and get into them. And that’s okay. It’s all about understanding the purpose of the switch.
For example, we faced many architectural constraints. E.g. we had to set up a common network uniting physical servers and our Virtual Private Cloud in the Selectel Cloud platform. We transfer less resource-intensive services to the VPC. For instance, we have a cluster of DNS servers running in our private cloud, which is convenient and reliable. Sometimes we can deploy some kind of test to the VPC and delete the cloud at the end of the task.
However, there was no ready-made solution to connect the servers to the VPC at that time. With the help of the guys from Selectel, we set up an L2 connection which we still use today. Today, this service is available to all customers out of the box.
When working with hardware, it is important to keep in mind the disadvantages inherent therein. Speaking from my experience, I would highlight two significant nuances:
- The configuration of servers in your clusters needs to be kept up to date, and the hardware, timely replaced. Otherwise you can run into the lack of necessary components.
- Before putting the servers into operation, you should personally choose the tests for their compliance with the current configuration. Sometimes, servers of the same configuration can differ in performance. It is better to detect this before introducing the server into the cluster.
On the good old heavy bare metal
Of course, should our infrastructure stay in the cloud, we would not have faced many of the problems described above. However, after completing the hardware-cloud-hardware cycle, concluded: for Ostrovok.ru, good old heavy bare metal was the best.
The problem with any cloud is vendor lock-in. In short, you get hooked on cloud services like Amazon and lose your freedom to move. For example, moving from Relational Database Service by Amazon to your own database without downtime is very problematic.
Add to that exotic tools that are difficult to administer and track load, and billing that is not very transparent. At AWS, it was a nearly impossible challenge to consider all factors when calculating future service costs. It was also quite easy to miss your limits. You would only find out about your mistake when getting an invoice from the provider.
The switch back to hardware was a solution dictated by our experience and the particular characteristics of our company. Our development and releases are quite predictable. When Ostrovok.ru is expecting seasonal growth or campaign influx, we provide for it by adding servers in advance. We can also discard surplus servers when the load decreases.
On the importance of support
I had experience of interaction with both foreign and Russian technical support.
Foreign support utilizes a simple logic: if you want support, you pay. If you want fast support, you pay even more. On top of that, at AWS, we had to pay for support for each of our accounts, as one subscription could not cover several accounts, even if related.
It may sound unbelievable, but Russian technical support works quicker. At Selectel, there is a chat in the control panel where you can contact a specialist and get an answer within a reasonable time. All that without extra subscriptions or premium accounts.
Also, after a few years being hosted at Selectel, we have been included in the list of the company’s key clients entitled to Customer care, a dedicated support hotline. We are immediately given updates on our issues, and we can get quick response at any time of day without having to wait for the official beginning of the shift.
The way support works is an important nuance. It’s one thing when a specialist can’t set up a service and he needs help. It’s a completely different matter when the service is simply new and there are errors on the provider’s side. We have encountered this over the years of being hosted at AWS. I feel that it’s wrong to charge for premium support if it’s the latter. After all, you end up paying to inform the cloud specialists about their own bugs.
On optimization during the pandemic
The load on our servers has decreased and so did hardware requirements. First off, we calculated the risks, figured out how many servers we’d need when users returned to the website, and ditched what we didn’t need.
The second step was to optimize expensive services. We finally had a clear incentive to “refactor” what we had long wanted to for the sake of both architecture and economy. As a result, several more servers were discarded.
Even before the coronavirus crisis, we were reviewing our solutions for efficiency. Therefore, the epidemic did not affect our optimization methods much. On the contrary, it confirmed that we had been doing everything right.
If you’re thinking about changing your hosting provider, here’s what you should pay attention to:
- Outline the problem you want to solve. Once you are aware of it, calculate the economic feasibility. It consists of both direct service costs and the time your employees spend on administration.
- With a clear goal in mind, you can start exploring the market, finding the company that would help you solve your problems perfectly. If there are several options, judge by the quality and responsiveness of technical support.
- If your business is aimed at the Russian market, consider the domestic providers. This way, service costs will depend less on currency fluctuations.
- At least once in six months, you need to reassess whether your services are efficient and in optimal shape. Do not postpone optimization. Continuous improvement of the architecture will make your business more resilient in emergency situations.
Ostrovok.ru is an online hotel reservation service. The company offers over 1,300,000 accommodation options in hotels, hostels, and apartments from direct suppliers and major partners. Ostrovok.ru is a part of Emerging Travel Group, which manages four travel brands: Ostrovok, B2B.Ostrovok, ZenHotels, and RateHawk.