What happens when a Nimble Storage disk drive fails?

I’ve worked with Nimble Storage devices for over 2 years now, they constantly amaze me, but one of the thing I’ve yet to see in the field is an actual failure. So the other morning at 3am, I received an email about a failed disk, so I thought I’d walk those through the process of what happens.

After all, if you going to buy a new storage device, you want to know exactly how it acts in a failure scenario, wouldn’t you?

Failure Alert – Infosight and Autosupport at its best

So three things happened almost instantaneously when the failure happened;

  • Email alert sent to the recipients configured on the Nimble Storage Array
  • Failure information sent to Nimble Storage’s Infosight website
  • A support case was automatically opened with all the necessary details

Below I’ve also included screenshots of the array status page, and the event logs on the Nimble Storage device as well, where you will see normal operations such as snapshots continued.

So all I had to do in the morning was confirm the address where the parts are to be sent, and if an engineer was needed.

Note: with the service offerings from Nimble, you can opt, in the times of hardware failure, to have the replacements sent out as soon as it hits Infosight, or to confirm the delivery of parts before they are sent. For this environment, we chose the latter.

Nimble Disk Fail - email alert

Nimble Disk Fail - Infosight

Nimble Disk Fail - Array Status (failed disk)

Nimble Disk Fail - Event Log (failed disk, raid degraded)

So how do you resolve the issue?

You’ll have noticed in the above array status picture, there is no options about kicking off a rebuild of RAID or any actions to take when the failed disk is displayed.

That’s because all you need to do, is pull out the failed disk and insert the replacement from the Nimble Storage support team. It’s that simple! The Nimble array will do the rest for you.

  • Pull out failed disk (will had a red LED)
  • Insert new disk
  • Let the Nimble device automatically rebuild the raid (no user interaction)
  • When it completes successfully, the support call open with Nimble Storage Support, will be automatically closed.

So below, we can see the Nimble Array status page, the event log, where it takes only 7 minutes to rebuild the RAID!!! And the auto closed support call with Nimble.

Nimble Disk Fail - Array Status (Healthy disk) Nimble Disk Fail - Event Log (New disk, raid rebuild)

Note: This is a non-production system, and the previous alerts are due to testing and thrashing the box, and forgetting to tell Nimble when the “Maintenance window” was, so really we also tested their support!!!Nimble Disk Fail - Auto Closed Support Case
Regards

 

Dean

Leave a Reply