High availability

Xavier Trilla

February 10, 2020 13:34
Updated

In Clouding we have created a High Availability platform for your projects. Our goal is to always offer you the highest availability in the service and for this we use multiple strategies.

Monitoring

Our monitoring system is one of the main systems of Clouding. We have configured tens of thousands of monitors, which constantly monitor the status of all equipment and infrastructure.

This allows us to anticipate any problem before it affects the service and closely monitor all platform performance.
It would be extremely long to detail all the types of monitors we use, but some examples that can help you get an idea of the level of monitoring would be:

Electrical system
- Electrical consumption per cabinet
- Status and consumption of the 2 electrical sockets of each cabinet (A and B)
- Status and consumption of the 2 power supplies of each equipment
Network
- Status of all network ports (On Switches and Computers)
- CRC errors on Switch ports and network cards
- Load level of each network port
- Load of each internet access provider
- Response time of each supplier from different international points
CPU
- Load level of each physical Core
- Hardware interrupts per second in each physical Core
- Timeout on each physical Core
- Context changes per Core per second
Memory
- Percentage of memory usage of all computers
- Memory fragmentation status of all computers
- Single bit errors corrected by the ECC system
- Swap level used (Must always be 0)
- NUMA balancing status
Disk
- Disk response times
- Disk access load percentage
- Capacity used per disk
- Wearing level of solid state disks
- Rotational disk sector errors (if a error replaces disk preventively)
Temperature
- Multiple temperature sensors per device (CPU, Disks, Chipset, etc.)
- Revolutions per Minute of each fan

These and other monitors continually report to Clouding guard technicians, who are in charge of keeping the platform always running and with the best performance. Our monitoring system reports by E-Mail, SMS and even telephone calls to ensure that an important alert never overlaps between minor or urgent alerts.

Hypervisors and separate disks

This is perhaps the main feature of our platform. In Clouding we have separated the storage of the Cloud Servers, of the equipment in which they are executed.

The great advantage of this system, compared to using a traditional local RAID, is that in the case of a hardware failure in a Hypervisor, the Cloud Servers hosted in it, can be restarted immediately in another different Hypervisor.

This makes it possible for us to recover a hardware error in a hypervisor in minutes, rather than in several hours, as would happen using a local RAID.

Triple Replica

Even if we have separate discs and hypervisors, that would be of no use if we did not have a storage infrastructure capable of ensuring that data will always be available.

In Clouding we use a high availability storage cluster, able to ensure that your data will always be available.

You can see all the information about our Triple Replica system here.

100% redundant Platform

To offer you the highest availability, it is very important that a failure in any part of it cannot affect the service. That is why all the systems that make up the Clouding platform are redundant.

Thanks to redundancy, even if errors occur in some system, they will not affect the service we provide.
You can see all the information about the redundancy of our platform here.

Articles in this section

Monitoring

Hypervisors and separate disks

Triple Replica

100% redundant Platform

Related articles