RSS Feed
Latest Updates
Oct
22
Completion, Fri 22 Oct 2021 09:25:05 PM MDT: This maintenance appears to have been a success: we can no longer reproduce the issues we were seeing during mass live migration traffic, during our testing.  All changes were made without production impact.

We'll be performing some updates on the cluster tonight for the purpose of further testing, and observing carefully to ensure this is the case: this would not be the first time initial testing gave us the impression the issue was resolved.



Update, Fri 22 Oct 2021 08:30:21 PM MDT: We are now proceeding with this maintenance, and will provide further updates as appropriate.

Update, Fri 22 Oct 2021 07:00:08 PM MDT:
This maintenance will be postponed for the moment, pending the completion of a pre-maintenance backup.  We'll update this thread when maintenance begins.


Purpose of Work:

We will be making some after-hours changes with our primary shared Hyper-V failover cluster, which hosts a smaller proportion of highly-available VPS instances that customers without a dedicated private cloud may rely on.

First, a single node will be paused and drained of its workload, after which time some network interface options will be changed on said node.

Secondly, the storage network will be set to handle cluster communication, while the cluster communication network is adjusted.  Cluster communication will then be handled by the CC network, again.


Once this is complete, a stress test of the primary cluster will ensue, with sensitive workloads moved away from problem hosts.


Impact of Work:

Work will begin at 7PM (MDT) tonight.


No impact should occur, in theory, but it is possible that a subset of VMs will experience brief outages if the instability we're attempt to resolve is not fixed as a result of this maintenance.

If that is the case, we will implement mitigations immediately, possibly blending both maintenance events to try and prevent any more incidents while we're at it.

Any customer VMs that experience issues as a result of this maintenance will be recovered ASAP, with customers informed individually if their VM is going to experience a longer-than-reboot outage as a result of any events.


We will inform you when maintenance is complete.

Please contact us with any questions / comments / concerns.
Read more »



Oct
12
Purpose of Work:

October's Patch Tuesday is here. 

Surprisingly, there's no bugs this cycle that have the internet particularly spooked, but we'll still be doing a timely patch cycle.

Here's a few lowlights for this month:


First off, there's a locally vectored elevation of privilege bug leveraging the kernel, affecting Server 2012 and up ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40449). This is confirmed to have been leveraged in at least one malware attack, judging by the exploit code maturity.

Secondly, there's an adjacent network vectored remote code execution vulnerability leveraging Exchange server, affecting Exchange 2013 and up (https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-26427).  Since it's adjacent network vectored, it should only be exploitable by hosts within the same layer 2 network, or even via RFC1918 subnets only.   Naturally, we'll be updating our exchange server, if only for good maintenance practices.

Third off, there's an adjacent network vectored remote code execution vulnerability leveraging Hyper-V, and affecting Server 2019 and up ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40461 ).  Not alot of detail about this one, but I'd guess VMs on the same layer-2 subnet of their hypervisor could break sandboxing in some fashion.  The attack complexity is high, and there doesn't appear to be any exploitation detected, as of yet.

Fourth, there's a network-vectored remote code execution vulnerability leveraging the MS DNS server, and affecting windows 2008 and up ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40469 ).  Normally this would be of high concern and priority (since every RCE vulnerability affecting DNS server is also an EOP vulnerability...), but Microsoft has listed the privileges required to run this as 'high', implying somebody would already need to have some level of admin or system-level access to a host to exploit it.  Odd, and perhaps a clerical error, so that's reason enough to drive this patching event. 


Impact of Work:


All affected hosts that are 2012 and up will be rebooted automatically / ASAP to propagate fixes, starting at 9:30PM, with some exceptions.

Internal systems on Windows 2012 and up (such as the management portal) may be temporarily impacted in the time it takes to reboot them.  Mail delivery to our helpdesk may be temporarily halted while our mail servers are updated as part of this patch cycle.  If you receive a delivery failure, you can still reach us by logging directly into the helpdesk and submitting a ticket directly via the portal, or calling us at 303-414-6910 x2, for emergencies.  


Hypervisors in a failover cluster will have rolling reboots done, in order to eliminate VPS downtime on said clusters.  Hypervisors not in a failover cluster will either be updated overnight, or have their updates scheduled, depending on customer policy / VM density.

Any hosts where updates are managed directly by the customer (or an approval process is required for zero-day updates) will not be impacted; the controlling organizations will be notified separately.


Please contact us with any questions / comments / concerns.
Read more »



Oct
6
[Resolved] Phone Line Outage, Wed 06 Oct 2021.
Posted by David Cunningham on 06 October 2021 01:23 PM
[Update, Wed 06 Oct 2021 06:29:18 PM MDT] - The issue appears to be resolved at this point.   Phone service has been stable for nearly 3 hours. We have set up additional monitoring and will update this post if the issue returns.

[Update, Wed 06 Oct 2021 05:13:19 PM MDT] - Our support line has been up and stable for the last 90 minutes.  We'll continue to observe, but the issue appears to have abated.


===
Hello, all.


Our upstream VOIP provider for our office phone line's phone service is currently seeing a DDoS upstream of them.

Mitigations are being put in place, but phone service is still being affected at the moment.

This will affect direct extensions, sales, the support line, and any other numbers for us following the pattern: 303-414-69XX.


Please send in emails directly, or email [email protected] with your support concerns, in the meantime.

We will update you when this condition seems to be clearing up.
Read more »



Sep
14
Purpose of Work:
September's Patch Tuesday has arrived, and as usual, there's enough vulnerabilities to justify day 1 overnight patching.


First off, there's a zero-day RCE vulnerability leveraging the ActiveX controls in the MSHTML feature, affecting Windows server 2008 and up ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40444 ).  This vulnerability requires minimal user interaction, and will execute code in their own user context.  It's currently being exploited in the wild, using office documents with malicious web content as the delivery mechanism for malicious payloads.  Microsoft released this patch out-of-band last week, as such.  We've patched most remote desktop environments for this already: general server patching will follow, tonight.

Secondly, there's a zero-interaction RCE vulnerability leveraging the Windows WLAN Autoconfig service, and affecting Windows server 2008 and up (https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-36965).  As we are a datacenter, Wifi is not in use on our server workload, so this patch will be applied only incidentally, as part of the monthly rollup.  However, it's worth announcing, as it is wormable, as long as there is a rogue or infected host on a wifi network where devices that are running this service are connected.  Organizations running mobile workstations should take notice.

Third off, there's a memory corruption vulnerability leveraging the Windows Scripting Engine that affects Windows Server 2008+ ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-26435).  This vulnerability requires user interaction, either via opening a file, or a webpage with a malicious file embedded in it.  It not currently detected in active exploitation.  In general, memory corruption vulnerabilities require more creative exploits to be leveraged successfully.

Forth, there are various elevation of privilege vulnerabilities, leveraging several roles (and one kernel vulnerability), affecting server 2008+ ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-36974, https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40447, https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-38671 ).  EOP vulnerabilities in general are a can of worms on any webserver, since a compromised website can easily turn into a compromised server.

Finally, Microsoft has disclosed several Elevation of Privilege vulnerabilities for various system components on Windows 2008 and 2008 R2 ( https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-36968 https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-38625 https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-38626 ).  There are a great demonstration as to why it's important any 2008 or below hosts are upgraded to 2012+; said patches are not available without ESU licensing.


Impact of Work:


All affected hosts that are 2012 and up will be rebooted automatically / ASAP to propagate fixes, starting at 9:30PM, with some exceptions.

Internal systems on Windows 2012 and up (such as the management portal) may be temporarily impacted in the time it takes to reboot them.  Mail delivery to our helpdesk may be temporarily halted while our mail servers are updated as part of this patch cycle.  If you receive a delivery failure, you can still reach us by logging directly into the helpdesk and submitting a ticket directly via the portal, or calling us at 303-414-6910 x2, for emergencies.  


Hypervisors in a failover cluster will have rolling reboots done, in order to eliminate VPS downtime on said clusters.  Hypervisors not in a failover cluster will either be updated overnight, or have their updates scheduled, depending on customer policy / VM density.

Any hosts where updates are managed directly by the customer (or an approval process is required for zero-day updates) will not be impacted; the controlling organizations will be notified separately.


Please contact us with any questions / comments / concerns.
Read more »



Sep
10

Update: Fri 17 Sep 2021 09:31:45 PM MDT

During maintenance, a cluster node self-isolated, causing its workload to require reboots.  The reboots are completed, and any VMs having lingering issues (a smaller subset) are being triaged by NOC staff.

Both events seemed to occur on the same node, so sensitive workloads are being adjusted to no longer use it.


We will fast-track migration of sensitive workloads to the Secondary Shared Hyper-V cluster, and make plans to switch over all workloads to it in the near future.

In the meantime, we will resume our no-change window of the primary shared hyper-v cluster and have Microsoft support revisit their attempt to identify the root cause of the trouble, announcing any maintenance that is required.


Update: Fri 17 Sep 2021 08:49:39 PM MDT

Maintenance and follow-up testing of the cluster is now underway.  We will gather diagnostics and cease testing if we detect any issues.

Update: Fri 17 Sep 2021 06:24:13 PM MDT

Earlier today, we had another event on the Primary Shared Hyper-V cluster, this time on a new host.  We've made some adjustments to the networking device that provides the layer-3 cluster vlan for this cluster, and will make further adjustments to hyper-v specific parameters before doing further and more extensive testing tonight to see if the issue persists.

There has at least been a marked decrease in the occurrences of outage incidents in response to mass live migration traffic since last night's maintenance.


Tonight's maintenance will begin at 8:30pm.


Conclusion: Thu 16 Sep 2021 08:49:48 PM MDT

Maintenance is complete.  All objectives of maintenance were achieved with no production impact, and the cluster was tested and validated to no longer face instability after a large live-migration event (such as pausing a node for maintenance).  We will continue to keep a close eye on this cluster overnight, but those with Highly-Available VMs should see no impact from here.


Thu 16 Sep 2021 08:03:06 PM MDT

The second part of this maintenance (removal of the problematic nodes from our hyper-v failover cluster) will begin on schedule, shortly. 


Update: Fri 10 Sep 2021 07:58:34 PM MDT

The second part of the maintenance is now rescheduled to take place at 8:00 PM on the 16th, rather than 7:00 PM, to take the availability of Microsoft escalation support into account.

We still do not expect any issues from that maintenance, but will have them on standby as a precaution.

Update: Fri 10 Sep 2021 07:30:57 PM MDT

The first part of this maintenance is completed, without incident.  We will follow up on Thursday the 16th for the next step.


Purpose of Work:


We will be making some after-hours changes with our primary shared Hyper-V failover cluster, which hosts a portion of highly-available VPS instances that customers without a dedicated private cloud may rely on.

The maintenance will have two phases:

On September 10, at 7PM MDT, we will be changing the cluster role's possible owners to exclude two hypervisors that have shown intermittent issues responding to calls from our backup appliance.  We expect no impact from this event, beyond having backups work more reliably after the fact.


On September 16, at 7PM MDT, we will be removing those two hypervisors from the cluster to see if those same hypervisors are the cause of the occasional instability on the cluster that we've been working with Microsoft to resolve. 

We have not had any of these events on this specific cluster since 8/12/2021.  Now that a considerable amount of workload is on our secondary shared hyper-v failover cluster, we'll be seeing if we can eliminate even the possibility of these events.



Impact of Work:

Work will begin at 7PM (MDT) each night.


For both events, no impact should occur, in theory, but it is possible that a subset of VMs will experience brief outages if the instability we're attempt to resolve is not fixed as a result of this maintenance.

If that is the case, we will implement mitigations immediately, possibly blending both maintenance events to try and prevent any more incidents while we're at it.

Any customer VMs that experience issues as a result of this maintenance will be recovered ASAP, with customers informed individually if their VM is going to experience a longer-than-reboot outage as a result of any events.


We will inform you when maintenance is complete.

Please contact us with any questions / comments / concerns.


Read more »