RSS Feed
Latest Updates
Mar
10
Completed - Network Maintenance 2017-03-11 02:00 MST
Posted by Anthony Kolka on 10 March 2017 04:33 PM

Update: This maintenace has been complted without incident.

 

Date:  March 11, 2017
Time: 2:00AM - 2:15AM (MST)

Purpose of Work:
We will be performing a topology change that will cause a moment of interrupted communications and packet loss.

Impact Of Work:
Customers will experience a small period of packet loss and latency.


Read more »



Mar
1
Complete - Network Maintenance - March 5, 2017 9:00PM - 2:00AM
Posted by Jay Sudowski on 01 March 2017 03:34 PM

Update: March 6, 2017 12:40AM - We have completed our maintenance work this evening, successfully.  We will continue to monitor the network closely for any unexpected problems.

Date:  March 5, 2017
Time: 9:00PM - 2:00AM (Local Time)

Purpose of Work:
We will be moving our uplinks for our ToR switches to our new core/distribution switches at 1801 California.

Impact Of Work:
Customers will experience small periods of packet loss and latency as routes reconverge. 

Special Notice:
We identified the root cause of the outage that occurred on February 23, 2017 and this problem has been mitigated.  This is maintenance window should be relatively trouble free.


Read more »



Mar
1
Routine Power Maintenance March 16, 2017
Posted by Pete Carstensen on 01 March 2017 01:35 PM

UPDATE: This Maintenance event has been completed without issue.

UPDATE: This Maintenance event will begin shortly. We will update once again upon completion.

UPDATE: Due to high winds in the area, this event has been rescheduled for Thursday March 16th, 2017.

Date: March 16, 2017
Time: 8:00AM - 4:00PM

Location: 1801 California St Suite #240

Purpose of Work:
Our critical power vendor will be performing routine, scheduled maintenance on both our "A" side and "B" side UPS units.  This work is necessary to ensure the proper long term functioning of our critical power systems.

Impact of Work: 
The primary impact of this work is loss of redundancy.  For parts of the maintenance window, it will be necessary to place our UPS units into bypass mode.  During this time, a disruption of utility power could result in a loss of power to our critical load. We will take necessary measures and precautions to minimize the risk. If there is adverse or extreme weather in the area, we will delay or cancel this maintenance.  Also, we will not put both "A" side UPS and "B" side UPS into bypass at the same time.

Please contact us with any questions you may have.


Read more »



Feb
26
Provisional Reason For Outage Report - Feb 23, 2017 Outage
Posted by Jay Sudowski on 26 February 2017 07:50 PM

We have compiled a detailed, but provisional, reason for outage report regarding the network outage we experienced the afternoon of February 23, 2017. You can download the report here.

Please contact us with any questions, comments, or concerns.


Read more »



Feb
22
Resolved - ONGOING NETWORK ISSUE - INTERMITTENT PACKET LOSS
Posted by Jay Sudowski on 22 February 2017 07:14 AM

UPDATE: Feb 23, 2017 19:00 Tonight while performing routine, low risk network operations (moving physical ports); our dist/core switches experienced near simultanous core dumps on their redundant routing engines; which left the switches in completely broken state. The network recovered when the routing-engines completed their boot up cycle.

UPDATE:  Feb 23, 2017 1:24AM - Late tonight we started having iBGP sessions flap all over the place.  In order to isolate the issue, we logically moved over a number of OSPF interfaces, which caused a bit of trouble for some switches.  At this point, we are done with our work (again) and monitoring the situation.  The diagnostic tool we are using, which is doing a continious ping / traceroute to each of our switches at 1801 has not detected any packet loss for 30 minutes.  

UPDATE:  Feb 22, 2017 6:50PM - We've made significant progress resolving the secondary issue, by making some minor tweaks to our configs.  However, a complete resolution will likely require us to schedule some maintenance windows to upgrade our top of rack switches to new versions of JunOS.  We will send out another notice for that tomorrow.

UPDATE: Feb 22, 2017 12:33PM - The widespread issue has been resolved.  However, it has uncovered another issue with a small subset of our top of rack switches at 1801 California St.  We are investigating this other issue as well.  The switches impacted are:

  • sw2-f4
  • sw2-a4
  • sw2-g3
  • sw2-c2
  • sw2-c3
  • sw2-b2
  • sw2-g3
  • sw1-i3
  • sw2-e7
  • sw2-d2

Similar to the last issue, these switches are suffering from micro outages that last for only very short periods of time, and as such our normal monitoring systems are not detecting these outages.

UPDATE: Feb 22, 2017 10:02AM - We are no longer detecting packet loss to the numerous end points we are monitoring.  At this time, we continue to monitor the situation.  Once we reach 90 minutes with no packet loss to the end points we are monitoring, we will close this issue out.

UDPATE: Feb 22, 2017 9:32AM - We have identified a 10G connection with an excessive number of drops.  We have disabled this interface at 9:28AM, which caused a brief network blip.  We will continue to investigate and monitor the situation.

Drops: 116,484,566

 

Date: Feb 22, 2017

Time: 7:12AM

Issue:

We are tracking several reports of small periods of packet loss for certain parts of our network, which generally result in a micro-outage of 8-15 seconds every 1 to 2 hours.  During our maintenance window last night, we made adjustments which we hoped would be effective in resolving these issues, but they have not been.

Consequently, we will be engaging in minimally invasive testing, troubleshooting throughout the day today in an effort to resolve the underlying cause of the issue.


Read more »