Jump to content

One ZWave failure cacades to all others?


gduprey

Recommended Posts

I have 4 ZWave Kwikset locks on the house and as part of the nightly shutdown, they are all locked, then a confirmation program runs later that queries each lock and retries until they are locked.  Unfortunately, ZWave is not reliable (but it is 95% reliable - enough to consider usable, but not 100% trustworthy), so such extra steps are warranted.

Recently, one lock died (it works, but no longer speaks to the ISY).  It'll be replaced soon.  But I noticed that since it is the first lock that is locked and then queried, when it fails, all subsequent commands to other devices also fail when done from a program.  If I have the admin console up, the first fail gives me the "can't communicate" popup error, followed shortly there after (but not immediately) by the rest all failing.

The thing is, those other locks are fine and if I individually command them from the admin console (query, lock, etc), they work fine.  So this isn't some underlying communication/network problem.  The issue happens only when a programs queues up a bunch of zwave commands in a row.

Is there anything I should do to fix this?  Is this a problem with the ISY firmware ZWave handling logic?  Is it OK to queue multiple ZWave commands in a program and expect to have them execute sequentially but otherwise independently?

Running latest 4.7.3 firmware.  Master Program looks like this:
 

Secure House - [ID 000B][Parent 00D6][Not Enabled]

If
   - No Conditions - (To add one, press 'Schedule' or 'Condition')
 
Then
        Run Program 'Close East Garage Door' (If)
        Run Program 'Close West Garage Door' (If)
        Set 'Front Door Lock' Lock
        Set 'Garage Side Door Lock' Lock
        Set 'Basement Door Lock' Lock
        Set 'Mudroom Door Lock' Lock
        Wait  30 seconds
        Set 'East Garage Control Sensor' Query
        Set 'West Garage Control Sensor' Query
        Set 'Garage Side Door Lock' Query
        Set 'Basement Door Lock' Query
        Set 'Front Door Lock' Query
        Set 'Mudroom Door Lock' Query
        Wait  30 seconds
        Run Program 'Secure Front Door' (If)
        Run Program 'Secure Basement Door' (If)
        Run Program 'Secure Garage Side Door' (If)
        Run Program 'Secure Mudroom Door' (If)
        Run Program 'Secure East Garage Door' (If)
        Run Program 'Secure West Garage Door' (If)
 
Else
   - No Actions - (To add one, press 'Action')
 

and a typical followup program (aka "Secure XXX Door") looks like this:

 

Secure Mudroom Door - [ID 00D7][Parent 00D6][Not Enabled]

If
        Status  'Mudroom Door Lock' is not Locked
 
Then
        Set 'Mudroom Door Lock' Query
        Wait  20 seconds
        Set 'Mudroom Door Lock' Lock
        Wait  20 seconds
        Run Program 'Secure Mudroom Door' (If)
 
Else
   - No Actions - (To add one, press 'Action')

In this case, the Front door (first on all lists) is the failed lock.   Other than the lock name changing, all other Secure XXX Door programs are identical and work on their own.

Because of the looping structure, these all keep running for hours, until I manually stop the "Secure Front Door" program.  Once that stops running and sending commands out,  the next iteration of all programs works exactly as it should.  And as stated earlier, if any program is run independently (i.e. no other queued up commands), it works 100% of the time.

While I get having a failed device is bad (and will be disabled until replaced), this doesn't seem right - one failed device "breaks" everything when multiple ZWave commands are queued.

Any suggestions or confirmations would be appreciated.

Gerry

Link to comment
Share on other sites

With more testing, I'm finding this seems to be a problem with anytime I "stack"/queue ZWave commands.  I removed the "dead" lock from the programs for now and while it works better, there are still failures occurring on communicating with the locks - failures that never happen when I send individual commands to the locks or run the programs individually (so far, manual commands and/or running the discrete programs work 100% of the time).

I added a WAIT 1 second between each ZWave Lock and Query command.  Things got better in the sense that instead of 3 of 4 locks being reported as not being able to communicate, it was down to usually just 1 lock (not always the same one).  So it seems to be something with stacking/queuing ZWave commands and I can imagine whatever the issue, it's made worse with lock-related commands, given they are likely longer or more-complex because of the secure protocols used with them.

It also occurs to me this might not be exclusively an issue with the ISY and perhaps ZWave congestion/collision problems?  I can see a metric-butt-tonne of ZWave traffic, but it appears the ISY is only sending one command out at a time (as you'd expect), but I think that maybe some internal state logic is expecting or waiting on something (or not expecting something) when there are multiple queued commands. 

For example, when a lock fails, I see that it retries twice in the event viewer (level 3).  There is also a LOT of other ZWave traffic going on (I have power monitors, climate sensors and door sensors and the ZWave network is rarely "quiet" for more than a second before someone is reporting in).  Again, everything works in normal usage/commands/reports.  It's just when there are queued commands and that COULD suggest the ISY state tracking is getting confused with either the amount of ZWave traffic or reports back being tied back to queued commands.

All, of course, very rough an ill-informed guesses.

Its so reliable when I don't do this, and it's still *mostly* reliable when I do, that I don't think it's the underlying zwave network (I would never rule it out, but...).

I've added some random waits in each "Secure XXXX Door" loops and between that and disabling the dead lock, it eventually always works now.  But it can take 10 minutes, even when everything is already locked, while the "good" commands/state are resolved.
 

Link to comment
Share on other sites

What repeating devices are you using? How many do you have as well?

Zwave sends a response back to its controller. Most likely what is happening is when it sends a command it's waiting on a response. Since other devices are trying to communicate at the same time that's happening. It's akin to you talkimg to your wife and the kids are asking you questions at the same time. You're trying to focus on her so you can't truly respond to them. The wait helps which is why it gets better.

The reason I ask about repeating devices is due to how beaming works. If you have repeating devices capable of beaming near each lock, they can take the load off your zwave controller in doing the same. While it doesn't guarantee success 100% of the time, it can increase the likelihood of it. 

Link to comment
Share on other sites

Fair starting/initial questions

I have over a dozen ZWave repeaters (mostly doorbells and sirens for the ability to support repeating secured devices) and except for this issue, a very reliable Zwave network.  Total of about 40 Zwave devices (about 80 Insteon) and I've spent a lot of time insuring the network is very reliable.  I do communication and networking work quite a bit and get the underlying issues with a broadcast network and message collision/sync.  So while the tools for Zwave network analysis are limited (I don't do enough to warrant purchasing the hideously expensive Zwave network analyzer), I've been setting the networks up for years and have no problems sorting network-level problems from application/controller ones (with 10 ISY/ZWave installed homes now, this being the largest one and reliable for 5+ years).  And yeah, setting up larger ZWave networks is not (usually) simple and requires a lot of tedious mucking about with repeaters, healing and stress-testing.

Given the otherwise very reliable state of the network and the fact that the problem really looks to be one where the ISY is getting confused on attributing responses to the correct command (total guess, just "feels" like that), I think the problem is something in how the ISY is handling this and in particular, queued IO requests.  I know there is already some "not as obvious as you'd like" handling of multiple IO requests within a single program and there differing "priorities" for queued IO (device) commands when both insteon and zwave command are present in the same program. 

After watching the actual actions and the event log at comms level, I have my eye on the ISY on this (and again, I don't think the ISY is dropping/losing messages, just getting "confused" when there are multiple queued commands).  In reality, it seems every command actually does succeed, but the ISY doesn't always think it does (often the physical act of succeeding - locking the door, for example) happens while the ISY is still reporting it's retrying the device).

Link to comment
Share on other sites

Yep - second post I mentioned adding a 1 second pause between commands helped (but not completely).  I know that normally, multiple IO commands in a program are not generally executed when they occur in the program, but stacked or queued until either the program completes or a WAIT statement, so while the actual delay may be helping, the WAITt may also be a matter of letting the IO requests out before program is over.

That said, the device that "always" fails now is the 1st one that sends a command out (even with delays for all the subsequent commands).  Once the "queue" empties (there are no pending IO/device commands), everything works again (even with heavy Zwave traffic still going on - there is always ZWave traffic on this busy (but not overloaded) Zwave network).

Link to comment
Share on other sites

I only have one of Z-Wave lock (Schlage), but it takes roughly 3 seconds after issuing a command to see a status update in the AC, I suppose because of the time it takes for the motor to actuate, move the deadbolt to it's new position and report the success/failure of the command.  My thought is that the ISY can control the sequence of the commands it sends, but given that locks are mechanical devices, the status receipt order may cause issues.  Maybe bump the wait up to 3 or 4 seconds?

1 hour ago, gduprey said:

It's just when there are queued commands and that COULD suggest the ISY state tracking is getting confused with either the amount of ZWave traffic or reports back being tied back to queued commands.

 

Link to comment
Share on other sites

So again, normally (i.e. manually issued commands), they succeed just perfect!  They do take some time, as you noticed, but they work.  And watching the event log in communications details mode (level 3), nothing had happens (no retries, etc).  So I think that in general, the ISY takes care of the waiting on the change from a "slower" device.  I have a water valves that takes 20+ seconds to complete moving and reporting back and it works just ducky ;-)

 

Link to comment
Share on other sites

While you can add locks to a scene, two problems:
1) It appears that you cannot lock/unlock a scene (sending ON/OFF doesn't seem do much)
2) ISY simulates a scene and unlike insteon (where one command is sent), sending a command to a ZWave scene seems to send the command to each device in turn, effectively being about the same as the whole "stacked"/queued IO thing and, assuming I could send a lock, I expect similar issues.

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


  • Recently Browsing

    • No registered users viewing this page.
  • Who's Online (See full list)

    • There are no registered users currently online
  • Forum Statistics

    • Total Topics
      36.6k
    • Total Posts
      368.3k
×
×
  • Create New...