On 3/13/15 10:01, Andrew Seybold wrote:
Jay--Scottsdale was not involved, it was only North Arizona, so not sure why Scottsdale ARES would have anything to do with it.
I got a reply from the Arizona SEC, my knowledge of Arizona geography isn't that great. We do have two Flagstaff customers who weren't affected.
BTW my comment about CenturyLink having to walk the route of the fiber came from the CTO of CenturyLink and the info about the outages from the 9-1-1 center, and also the press of that area. I also don't believe but I may be wrong that there is a phone company in the area served.
That's a scary thought that their CTO doesn't know that OTDRs exist and are often installed as RTU equipment that can be deployed remotely to get a measurement without even disconnecting anything.
Here's another quote from the press that Centurylink first became aware of the problem due to complaints from customers and not due to their Network Operation Center lighting up like a Christmas tree with alarm indicators. If you believe the spin, not only do they not know how to shoot fiber with a TDR to find the break, they don't even bother to monitor their long-haul links and have to wait for customer complaints to realize they have a problem:
The Arizona Republic reported, “Employees from Centurylink told police they stared receiving complaints of interrupted cable and internet service at about noon and, upon checking their system, determined the cause of the outage was coming from…”.
The reaction on tech mailing lists: "Really? A major link is down and these employees aren’t getting calls from one of CL’s NOCs? I would hope that the paper got it wrong, and it’s just that some employees didn’t know but that other staff was already being mobilized."
Our experience with Centurylink is very different. We have several Centurylink circuits around the country and get a proactive email from their NOC when even a T-1 goes down, so I really question where the press is getting their information or if they just make it up as they go along.
It looks like in this case it was under six hours to get to a rather remote cut, set up, and begin splicing with total restoration in another six hours or so. This is definitely not too shabby, and really puts into question the "walking mile-by-mile" story.
Since my article was mostly about last mile connectivity in the Santa Barbara area when I discussed it, how many paths to you have serving each of your customers, not your connections to the Internet but the pipe between your facilities and your customers facilities?
It depends on the customer and what they're willing to pay for, how critical it is to their business. Larger customers with critical needs have two circuits on different last-mile carriers run to different Impulse points of presence. Typically these are active/active where under normal operation we will run voice and video down one pipe and data down the other, with cross-failover. If the primary data pipe fails then data rides the voice pipe. If the primary voice/video pipe fails then that fails over to the data pipe. Sometimes mixed-use active-active, sometimes active-standby.
The diverse circuits converge at or very close to the customer premise. A backhoe in their driveway is a real concern, one a block away not so much.
That being said, there are always single points of failure somewhere and failure modes not anticipated. In the big scheme, Earth is a single point of failure. Some customers are willing to take the risk of a single-homed situation. Some will have multiple circuits from the same carrier and POP which is good enough most of the time. An optics or electronics failure or bad copper loop will survive on the backup but a physical cable cut will break everything in that cable.
Classic example of unexpected failure mode, local telecom carrier that I won't name but it rhymes with Tom King's callsign suffix. These guys do a good job of ensuring that their ring is pretty much uncollapsed.
Picture a ring of fiber going from a data center to customers 1, 2, 3, 4, 5, 6, 7, 8 and back from 8 to the data center in a circle, with another pair of fibers going in the opposite direction - 8, 7, 6, 5, etc. Both directions are in the same cable but it's for the most part in a circle so that only one side will be cut in a specific incident. If there's a cut for example between 3 and 4, then equipment both at 3 and at 4 loops the return traffic back the direction where it originated. It's pretty robust and would require two separate cuts to isolate anyone.
During the Gap Fire, there were several power outages lasting for up to a few hours each as the smoke caused the HV lines coming down the mountain to arc over.
Imagine that you are customer 4. You have a backup generator and robust UPS. The carrier at their data center also has a generator and robust UPS. However, customers 3 and 7 are shops with just a small battery. When the power fails over a wide area, the batteries at both customers 3 and 7 die, their optical transceivers go dark, and this isolates 4, 5, and 6 from the data center. Further, 3 and 7 are businesses that have the gear in an interior closet and they have locked up and gone home because the power is out, hence no access to the gear. Whoops. The fix is to put an optical device with a mirror and electromagnet at all of your customer sites. When the equipment fails or the power dies, the magnet drops out and bridges the in and out fibers. But, this gear is expensive, it's mechanical and can break, the type of failure is rare, and most of your customers go home during an extended power outage so the business case for the mirror mechanism at every customer doesn't pencil out.
After the fact and getting read the riot act from customer 4, you provision another pair through the ring just to customer 4 without drop-and-insert at the others. This too doesn't scale from a business standpoint but solves this particular problem for this customer this time (after the outage).
Bottom line, each extra "9" adds ten times the reliability at typically ten times the cost and there will be gotchas out there to bite you. Murphy is alive and well.
-- Jay Hennigan - CCIE #7880 - Network Engineering - jay@impulse.net Impulse Internet Service - http://www.impulse.net/ Your local telephone and internet company - 805 884-6323 - WB6RDV