Two 3PAR StoreServs running in a Peer Persistence setup lost the connection to the Quorum Witness appliance. The appliance is an important part of a 3PAR Peer Persistence setup, because it acts as a tie-breaker in a split-brain scenario.
While analyzing this issue, I saw this message in the 3PAR Management Console:
In addition to that, the customer got e-mails that the 3PAR StoreServ arrays lost the connection to the Quorum Witness appliance. In my case, the CouchDB process died. A restart of the appliance brought it back online.
How to check the Quorum Witness appliance?
You can check the status of the appliance with a simple web request. The documentation shows a simple test based on curl. You can run this direct from the BASH of the appliance.
[[email protected] ~]# curl http://10.0.0.99:8080 {"couchdb":"Welcome","version":"1.0.4"} [[email protected] ~]#
But you can also use the PowerShell cmdlet Invoke-WebRequest.
PS C:\Users\patrick> Invoke-WebRequest -Uri http://10.0.0.99:8080 StatusCode : 200 StatusDescription : OK Content : {"couchdb":"Welcome","version":"1.0.4"} RawContent : HTTP/1.1 200 OK Content-Length: 40 Cache-Control: must-revalidate Content-Type: text/plain;charset=utf-8 Date: Mon, 30 Jan 2017 08:31:37 GMT Server: CouchDB/1.0.4 (Erlang OTP/R14B04) {"couchdb... Forms : {} Headers : {[Content-Length, 40], [Cache-Control, must-revalidate], [Content-Type, text/plain;charset=utf-8], [Date, Mon, 30 Jan 2017 08:31:37 GMT]...} Images : {} InputFields : {} Links : {} ParsedHtml : mshtml.HTMLDocumentClass RawContentLength : 40
If you add /witness to the URL, you can test the access to the database, which is used for Peer Persistence.
PS C:\Users\patrick> Invoke-WebRequest -Uri http://10.0.0.99:8080/witness StatusCode : 200 StatusDescription : OK Content : {"db_name":"witness","doc_count":5,"doc_del_count":4,"update_seq":149557915,"purge_seq":0,"compact_ running":false,"disk_size":48988254,"instance_start_time":"1485763322826940","disk_format_version": 5,... RawContent : HTTP/1.1 200 OK Content-Length: 234 Cache-Control: must-revalidate Content-Type: text/plain;charset=utf-8 Date: Mon, 30 Jan 2017 08:36:38 GMT Server: CouchDB/1.0.4 (Erlang OTP/R14B04) {"db_nam... Forms : {} Headers : {[Content-Length, 234], [Cache-Control, must-revalidate], [Content-Type, text/plain;charset=utf-8], [Date, Mon, 30 Jan 2017 08:36:38 GMT]...} Images : {} InputFields : {} Links : {} ParsedHtml : mshtml.HTMLDocumentClass RawContentLength : 234
If you get a connection error, check if the beam process is running.
[[email protected] ~]# netstat -tulpen |grep 8080 tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 495 10726 1643/beam [[email protected] ~]#
If not, reboot the appliance. This can be done without downtime. The appliance comes only into play, if a failover occurs.
- Why you should change your KRBTGT password prior disabling RC4 - July 28, 2022
- Use app-only authentication with the Microsoft Graph PowerShell SDK - July 22, 2022
- Getting started with the Microsoft Graph PowerShell SDK - July 21, 2022
Great! Useful information. And there is no information on how (for example using SNMP) to monitor the state of services on the arrays themselves. They tend to hang (sometime) when the connection is broken to the quorum.
Unfortunately, I don’t have any clue on how to monitor this. I know what you mean, because I have seen communication loss to the quorum a couple of times, especially if the quorum is located in another site. You can try something like SSH with expect and check the output of the command issued to the arrays.
Thank you for your feedback. Would like to use standard tools and not SSH, POWERSHELL or other crutches.