Jump to content

Making a ZWave install more "reliable"


gduprey

Recommended Posts

Posted

Howdy,

 

I have four different ISY994ZW installs running now (most are on 4.x, one is 5.x).  They generally consist of one or more ZWave locks, one or more ZWave thermostats, one or more Zwave door sensors, one or more ZWave climate modules (temp/humidity sensors) and a number of ZWave Aeotec sirens/doorbells.

 

I've done my best to spread the siren/doorbells out logically around the houses to help provide strong coverage.  The smallest install has 3 sire/doorbells and the largest has 9.  All have had ZWave network heals done (and redone anytime something moves around physically).  All nodes that act as repeaters show that ever Zwave device has at least 3 reachable "neighbors" (most have more and 85% of them can directly see the ISY).  As far as I can tell, zwave mesh signal strength is pretty good to excellent everywhere (admittedly empirical as I can't find any "QoS" type Zwave measures).

 

In short, this should be a pretty reliable installation.  And mostly, it is.

 

However, about 3-5 times a week, with no obvious patterns/times/days of week/phase of the moon, my ISY will "miss" a ZWave event/report. From a door being locked/unlocked, to a door or window being opened/closed (I may also lose thermostat or climate reports, but such misses aren't as obvious).  There is no consistency to this (sometime during lots of other Zwave activity, sometimes with nothing else happening).

 

While this isn't a bad success rate, exactly, I really want something a lot closer to 100%.  In particular, missing door lock/unlock events has been the most visible problem as those tend to key into other things (like turning on lights, turning them off, etc, etc).

 

My insteon network (all installs have between 4 and 75 insteon devices, including motion sensors, leak sensors and normal lights) approaches 100% reliability (such that I've never seen anything get "missed").  I really want there to be a way to get ZWave to that level.

Are there any additional steps that folks could suggest that can to either help strengthen the mesh and/or reliability of my ZWave installs or help detect why this happens (quite rarely, but enough)?  I would hate to have to accept that "it's just the way it is".

Thanks,

 

Gerry

Posted

Z-Wave is RF only. RF is somewhat less reliable than powerline signals. Although powerline devices can and do interfere with powerline communication, offending devices can be filtered. There is no such filter for RF problems which can come and go.

 

The only solution is to add more Z-Wave repeating devices. Most Z-Wave line powered devices do repeat the signal. Battery powered devices do not.

Posted

As I have discussed in many other forum threads when dealing with Z-Wave I always ensure all four corners of the home is covered with at least one siren / door bells. What has been for several installs was placing one of the sirens as central to the controller as possible.

 

This allowed pretty much full coverage in terms of routing being known . . .

 

As you noted with Insteon the benefit is dual band RF & Power Line. Given Z-Wave is RF only this highly depends what brand, type, and generation of Z-Wave is in use. I've seen many of the 1st - 4th generation Z-Wave products just limp along and nothing has ever made it better.

 

Using all Generation 5 Z-Wave Plus products has increased the reliability when compared to previous generation products. But still require a huge amounts of silly siren / door bells to be placed around the home. During the X-MAS season I was at two brand new homes and one home had no less than 18 sirens in place.

 

I'm sorry but that makes no sense to me and that is pretty idiotic in terms of performance vs value. The other home had 7 sirens all scattered around with no discernible logic to its placement. As you noted there isn't a Z-Wave *RF Bridging* indicator a person can say *Well now I know its bridged and coupled here*.

 

Its pretty much trial and error and watching for things to turn on and off consistently.

 

I suspect if and when the ISY Series Controller updates to use Generation 5 Z-Wave Plus lots of these issues will go away in terms of reliability.

Posted

I suspect if and when the ISY Series Controller updates to use Generation 5 Z-Wave Plus lots of these issues will go away in terms of reliability.

 

A controller cannot increase reliability, not even for Insteon devices. If there's a communication problem, then that needs to be solved apart from any controller. Gen 5 devices improve communication no matter the controller.

Posted

Hi Gerry,

 

This is not good! When you miss an event, do you know how long it takes before the system corrects itself? i.e. if you can catch this issue at the moment it happens, and then doing a Heal Network at that moment, will give us a better indication of what's failing (in that route).

 

Also, if it's always the same devices (door locks with at higher frequency), then looking at the routes from that device to ISY is an indication of what else could be contributing to the issue above and beyond the door lock itself.

 

In short, I believe there should be a way of figuring out what's causing the issue (as long as you can catch it as it happens).

 

With kind regards,

Michel

Posted

Howdy,

 

I generally only catch it after the fact (i.e. a door was locked and notice later the lights that should be turned off when it's locked weren't).  It happens to all the devices that I have actions tied to (door locks, window/door sensors, primarily) - no one specific device seems more prone to it than any others.

 

A typical week will see, for example, a door lock be used about 30 times.  Of that 30 times, the lock/unlock event might be missed once (and many weeks, not at all).  In my larger home, I have 4 ZWave locks and 15 door/window sensors that react to events.  While a particular device may go weeks without losing a report, as a whole (house), I tend to lose a few events/reports each week (randomly).  As a percentage of events, it's probably about 1%, but again, no consistency.

Is there any way to extract anything close to a "strength" indication from each node/repeater neighbor from the ISY ZWave dongle?

Again, I have a "lot" (relative term) of siren/doorbells scattered around to help.  No device is more than one wall/20 feet from a repeater.

The few times I've caught anything have been on the door/window sensors (Ecolink).  They have a little light that flashes when they send a report and a few times I've noticed that instead of a quick flash (less than a second), the light stays on for like 30 seconds and often, the report/event is not reported.  I assume that means it's having a hard time reaching a repeater/getting confirmation (even though the unit I've noticed this on is literally 7 feet, line-of-sight/no-obstructions from a repeater).

Gerry

Posted

So some articles I've been reading about suggest that ZWave devices often "find" their closest neighbor only once (speaking particularly about battery powered devices).  After they do, they won't automatically switch to a newly added (and possibly closer/stronger) repeater.  I'm not sure this really is the case, but it bring up a few questions:

 

1) Is this the case (i.e. hanging on to a weaker but "older" device instead of using a newer/recent/stronger device)?

2) Does the network heal provide a chance to change the closest neighbor (again, for battery devices -- I can see it works for powered devices/repeaters)?

3) If the #1 is true and #2 doesn't force/let battery devices refind the best repeater/receiver, is there some way to force a battery powered device to "re-scan" for it's closest/strongest repeater/receiver?

I have no doubts about the ability of the network "heal" to fix/re-do neighbors and such for powered devices, it's the battery powered ones that I'm having problems with and having doubts about.

If this was the case, it would explain a bit why my addition of various repeaters hasn't solved the problem (even after multiple network heals).

Gerry

Posted

So some articles I've been reading about suggest that ZWave devices often "find" their closest neighbor only once (speaking particularly about battery powered devices).  After they do, they won't automatically switch to a newly added (and possibly closer/stronger) repeater.  I'm not sure this really is the case, but it bring up a few questions:

 

1) Is this the case (i.e. hanging on to a weaker but "older" device instead of using a newer/recent/stronger device)?

2) Does the network heal provide a chance to change the closest neighbor (again, for battery devices -- I can see it works for powered devices/repeaters)?

3) If the #1 is true and #2 doesn't force/let battery devices refind the best repeater/receiver, is there some way to force a battery powered device to "re-scan" for it's closest/strongest repeater/receiver?

I have no doubts about the ability of the network "heal" to fix/re-do neighbors and such for powered devices, it's the battery powered ones that I'm having problems with and having doubts about.

 

If this was the case, it would explain a bit why my addition of various repeaters hasn't solved the problem (even after multiple network heals).

 

Gerry

Based on observing Gen4 zwave rf traffic, it appears that routing is determined during "healing" (which probably should have been better named "route discovery").  After "healing", the controller routing table appears to never change, perhaps, only incrementally when a new device is added, until a new "healing" event.  I did not look into that yet so I am not sure if a new node's neighbors are discovered or not.  I would guess not.

 

So, yes, if a "better" node is included, most likely the controller would not know about its neighbors and may be using old sub-optimal routing.  However, "healing" is supposed to rediscover all available routes.

 

According to what I observed, a node never attempts to use an alternative route if it does not get an ack packet from the controller.  After 3 attempts, the node just silently gives up. It is always the controller that tries an alt route if it does not get an ack from the device.  This behavior might have changed in Gen5, I don't know.

 

Added: Actually,  battery operated devices should be no different: their neighbors discovery is forced by the controller so that the controller should know about main and alt routes to the device.

Posted

Hi Gerry,

 

I think vjk answered your routing questions. Yes, Heal Network should find the best path. This said, I am not convinced the issue is routing except perhaps only for the Ecolink. It's also possible that ISY has too much events and discarding some. Please do be kind enough to check your error log for things such as UDQFull. Something like this:

UDQ:Queue Full: MP : Task[N] SOCKZW SOCK-PROC pty=P

 

With kind regards,

Michel

Posted

Howdy,

I really do understand the basics of ZWave and RF tech (I do hardware design and software too), so I get the basics of ZWave, repeaters, healing,  etc.  

 

I feel that my primary concern is how battery powered devices participate in the network heal and, specifically, how/if they can determine there is a newer/closer/stronger repeater to talk to than they have have had in the past.  

 

I say this because when watching the ZWave communications during a heal, most battery devices do not get updated (they don't "wake up" so after a time, the ISY skips them).  

 

How do battery powered devices participate in a network heal?

 

As for too much zwave traffic, I seriously doubt it.  I only have door/window sensors (events only when door/window opens/closes - rarely), climate sensors (when things change and once every 4 hours -- low volume) and locks.  This happens occasionally in my smallest install which has 4 door/window sensors, 2 climate sensors and 2 door locks and Zwave is quiescent 99+% of the time.  A quick review of both ISY controllers do not show any full/overflow/lost messages.

 

I really think the focus of this is when/how/if battery powered devices ever reset/refresh their associated repeater/receiver and I'd be curious about how a network heal affects battery powered devices that stay asleep during the heal process.

 

Gerry

Posted

Hi Gerry,

 

Thanks for the feedback.

 

During network heal, there needs to be devices that support beaming and thus wake up the battery operated devices. If your battery operated devices remain asleep while healing, then that's a big problem.

 

With kind regards,

Michel

Posted
How do battery powered devices participate in a network heal?

 

 

A battery powered zwave device wakes up about once a second to listen for a bunch special beaming packets: (about 250 packets containing hex 0x55 and the node id lasting about 1.1s). After the device woke up, route discovery proceeds the same way as for an ordinary device.

 

As I wrote before, a Gen 4 device does not appear to make its own routing choices, it blindly obeys the routing instructions coming from the controller, i.e. the controller lays out a route, say, C->n1->n2->dest, and the "dest"node respond back dest->n2->n1->C with an ack packet.  In the case of unsolicited frames, the node uses the last route the controller provided. At least, that's what I observed in my network.

 

For example, in summer, one of my controller-to-thermostat route was flapping between a direct one and an intermediary node route perhaps due to humidity level changes, with the controller making the decision. Once, when the thermostat tried to notify about the temp change, it did not get an ack packet from the controller, tried two more times and silently gave up without trying any alt routes.

 

If a battery operated device cannot be woken up, then there is a possibility that the controller cannot reach the device directly *and* the intermediary device cannot generate the special beaming packet on behalf of the controller.

Posted

And that was my concern.  My door locks do support beaming so I assume they are waking up and getting reconfigured (which takes that excuse off the table for why I lose the occasion door lock/unlock event - sigh).

But the vast majority of my zwave devices are battery devices that do not support beaming.  They cannot be "woken" up remotely -- you have to do something physical to them.  My largest install has about 11 door/window sensors (ecolink) and 15 climate sensors (everspring) that do not wake up during a heal.  You can only wake up the door/window sensors with a battery removal/reinstall.  The climate sensors have a button on the front that, if pressed three times quickly, will wake the device up for 5-10 seconds (needed when I reconfigure ZWave parameters on them).  They are spread over 3500sqft and the idea of trying to run around and get to each one in the time that the ISY gives before it moves on is comical/impossible (some are in very hard to reach locations).

 

Other than my siren repeaters and door locks, everything else in my install is batter, non-beaming devices.

How do you get them to correctly re-configure their current best/closest/strongest repeater?

Gerry
 

Posted

How do you get them to correctly re-configure their current best/closest/strongest repeater?

Gerry

 

 

Those facing the same issues had to delete, exclude, include, and add it back. Then, finalize by completing a heal process several times to define the routes. I don't pretend to know how long it took but know it wasn't a five minute job.

Posted

I deeply hope that isn't the case (exclude/re-include as only "fix").  

 

I always "include" new devices as close to the ISY as possible based on comments made numerous times here and other ZWave places.  That would suggest nearly all my non-beamable battery operated devices are likely using sub-optimal routing/receivers.  It would also mean I really cannot move my ISY physically (which I was planning on doing).

 

Second problem is that the process of excluding and re-including a device "uses up" a ZWave device identifier (which also means it changes the identifier).  This is a huge problem for those of us using external (to the ISY) tools for integration as they have to address things via the ZWave identifier.  Changing an identifier causes havoc on such installations and starts a "search and fix" process that can take a long time and be error prone at best.  I actually asked ISY if there could be a "secondary" device identifier that would be independent of the hardware device ID so that "replaced" devices (and/or excluded/included devices) would still be referenced via the REST interface with an unchanging identifier regardless of the ISY internal one -- not much movement on that).

 

I'd be curious about UDIs response -- is this the case for non-beamable devices and if so, what is the recommended solution when things are "moved" as well as does this change the recommendations for initial device inclusion.

Gerry

Posted

I deeply hope that isn't the case (exclude/re-include as only "fix").  

 

I always "include" new devices as close to the ISY as possible based on comments made numerous times here and other ZWave places.  That would suggest nearly all my non-beamable battery operated devices are likely using sub-optimal routing/receivers.  It would also mean I really cannot move my ISY physically (which I was planning on doing).

 

Second problem is that the process of excluding and re-including a device "uses up" a ZWave device identifier (which also means it changes the identifier).  This is a huge problem for those of us using external (to the ISY) tools for integration as they have to address things via the ZWave identifier.  Changing an identifier causes havoc on such installations and starts a "search and fix" process that can take a long time and be error prone at best.  I actually asked ISY if there could be a "secondary" device identifier that would be independent of the hardware device ID so that "replaced" devices (and/or excluded/included devices) would still be referenced via the REST interface with an unchanging identifier regardless of the ISY internal one -- not much movement on that).

 

I'd be curious about UDIs response -- is this the case for non-beamable devices and if so, what is the recommended solution when things are "moved" as well as does this change the recommendations for initial device inclusion.

 

Gerry

 

Hello Gerry,

 

Is the high lighted reply bolded in red correct or a typo? 

 

I don't pretend to know what others do as best practices but excluding, including, and adding a device close to the controller would seem counter to the whole routing topology. The whole idea behind Z-Wave (generalizing here so bare with me) is once a device is added a network heal is supposed to detect where it is in relations to other devices.

 

On a high level this is supposed to define the closest path and relay the RX / TX of said devices. 

 

In my personal experience it seems there are user steps that are being taken which hinder this process. Even if we ignore for a moment what generation Z-Wave device is present. Z-Wave relies heavily on defining and knowing routes of each device to best communicate and react.

 

As you already noted this is made more complex with battery operated devices because some don't wake up or support X. Sadly, this is in part why Gen 5 was released because no one can expect a person to take a controller around the home doing a secure pairing.

 

In the limited Z-Wave installs I have been involved in we have never brought any device to the controller. As noted we always ensured all sirens where in place covering all four corners of the home, floor, zone. Once in place we simply installed the devices in their final resting place. From there we simply completed the tedious task of exclude, include, add, then complete endless heals to ensure the defined routes were done in the final location.

 

Because some devices / manufactures don't support beaming, neighbors, what ever odd elements there is has always been lingering questions after completion. But, I do have to say following the above steps has offered a pretty reliable network for some.

 

While others throwing money at the problem seemed like the only solution: 4 - 18 sirens / door bells. 

 

With respect to the whole ID issue I absolutely support your suggestion and observation. If this idea has not been noted in the product development forum please consider doing so. I am sure you will find lots of user support for such a basic concept.  

Posted

Not a typo ;-(

 

For non-battery devices, I doubt this would be a problem -- a network heal and all is "forgiven".   

 

That said, I know that for door locks/secure/encrypted devices, even the device manufacturers say to do the include within 3 feet of the primary controller.  granted, since most of them support beaming, they can be "re-assigned" with a network heal.

 

I've also read numerous reports (again, here and on other ZWave fora) about inclusion problems that were "cured" by placing the new device close to the primary controller so it connects with it.

 

So between such reports and the directives about locks/secure devices, I likely "extrapolated" that all zwave devices are "better" when included near the primary.

 

Can UDI confirm that battery devices should NOT be included near the primary and instead in their intended/final location?

If so, what is the proper method when a non-beaming battery device moves and/or mesh topology changes (i.e. repeaters are added/removed)?  I'd really like to get an authoritative answer from the horses mouth (UDI) and I'm keeping my fingers crossed that it  doesn't involve network exclusion/re-inclusion.

 

Gerry

Posted

I am going to over generalize here for a moment but the key reason devices like locks were added in close proximity is in the belief this was a security *feature*. When in reality it was just a weakness in the Gen 1-4 hardware topology. I have never brought a Z-Wave door lock to the controller ever and it has been added with out issue.

 

But as stated the basic infrastructure was already in place: Multi Siren / Bells.

 

As was stated in another thread there isn't a lot of noticeable benefit to those using Gen 1-4 hardware with Gen 5. The fact people see improvements in the so called mesh is incidental to the key problem. Which is the lack of defined routing and weak RF RX / TX from older hardware.

 

This is similar to older Insteon products which were single band vs dual band. Along with hardware that *Now* offer higher RF output to compensate for a lack there of. Neither Z-Wave & Insteon increased the power output for fun - it was done because both of them knew it wasn't cutting it and wanted to bolster the product line moving forward.

 

As I mentioned in this and other threads those using all Gen 5 devices have seen the most reliability in the mesh. Whether that translates to a consistent RX / TX to the end controller is up to debate unless said controller supports the very same. Those using older Insteon devices already know the whole peek / poke was extremely slow and writing to these first generation devices vs newer hardware was pretty much night and day.

 

I suspect the very same with Z-Wave in that older hardware simply are lacking the basic fundamentals of logic, power, and hand shaking.  

Posted

Inclusion is an interesting situation. For  inclusion, I imagine, the device has to be directly reachable by the controller requiring the device to be placed, at least initially, in a location directly reachable by the controller.

 

The static devices need eventually to be placed in their intended locations for the routing table to make sense. But, it should not be a problem because a subsequent to inclusion route discovery aka "healing" should create a more or less correct table. Not sure how mobile controllers handle routing since I do not own one

 

Regarding incremented node ids post exclusion/inclusion.  I am pretty sure that UDI does not have direct control over that part of the zwave stack.  The zwave SoCs I saw including a gen5 chip are in fact co-processors that the main MCU communicates with over a USART.  My understanding based on reading publicly available pieces of info on the internet is that even when one signs an NDA with the sigma boys, one does not gain access to the co-processor firmware but rather only to the USART exposed serial protocol which does not appear to include API that controls node id assignment.

Posted

As I mentioned in this and other threads those using all Gen 5 devices have seen the most reliability in the mesh. Whether that translates to a consistent RX / TX to the end controller is up to debate unless said controller supports the very same. Those using older Insteon devices already know the whole peek / poke was extremely slow and writing to these first generation devices vs newer hardware was pretty much night and day.

 

I suspect the very same with Z-Wave in that older hardware simply are lacking the basic fundamentals of logic, power, and hand shaking.  

 

The gen5 chip offers several interesting improvements in comparison to the gen 4 one: several levels of Tx power, encrypted communication for devices other than locks and presumably improved routing.  In practice, most likely, people see improvements due to increased Tx power.  Sigma also offer a 100Kbit data rate which I do not find particularly interesting. Improving routing and error recovery algos followed by encryption would be the most valuable improvement. Not sure if the sigma engineering team is quite qualified to do that.  Unfortunately, however bad the zwave protocol may be from an engineering PoV, other HA RF alternatives are even worse.

 

A couple of months ago I played with a mini Gen5 network consisting of a new aeon stick, a gen 4 module and a gen5 appliance module in the basement.  According to my measurements, the gen5 module RSSI was about the same as it was with the 4-year old Intermatic module. Not sure if the both gen5 devices were happy with the signal level or some other factor was at play.  Presumably, if communication was unsuccessful, one or both devices would have increased their Tx levels.

Posted

The static devices need eventually to be placed in their intended locations for the routing table to make sense. But, it should not be a problem because a subsequent to inclusion route discovery aka "healing" should create a more or less correct table.

 

which would be fine if there was a method for handling the problem with battery powered, non-beacon devices where that heal doesn't seem to be able to "reset" the devices neighbor/receiver.  If such devices really do capture the receiver node only at inclusion, it would suggest you have to put them where they will eventually go BEFORE inclusion (which may not even be possible) and never move them and/or never change the neighbor they talk to.  That seems awfully static to me and I hope this isn't the case.

 

Regarding incremented node ids post exclusion/inclusion. I am pretty sure that UDI does not have direct control over that part of the zwave stack. The zwave SoCs I saw including a gen5 chip are in fact co-processors that the main MCU communicates with over a USART.

Yep - that matches what I've been told as well and I am really fine with it. But changing IDs for the logically "same" device (whether because of Zwave inclusion/exclusion or Insteon "replace device") breaks external access. That's why I was pushing (and will continue to advocate for) a "secondary" device identifier/attribute that can be manually set, carried over when devices are replaced and used as a device ID reference in the REST interface.  As systems get more integrated, it's becoming a deal killer (not for me yet, but it does make me utterly terrified of ZWave inclusion/exclusion as a "fix" for problems because of the cascade fallout).

 

Gerry

Posted

I am going to over generalize here for a moment but the key reason devices like locks were added in close proximity is in the belief this was a security *feature*. When in reality it was just a weakness in the Gen 1-4 hardware topology. I have never brought a Z-Wave door lock to the controller ever and it has been added with out issue.

A very likely reason.  Glad to know it works for you and I'll alter my methodology for installs going forward.  But there is that "installed base" problem (existing ZWave stuff) that is going to be a problem.  Hopefully UDI weighs in on all this.  

 

Gerry

Posted

A very likely reason.  Glad to know it works for you and I'll alter my methodology for installs going forward.  But there is that "installed base" problem (existing ZWave stuff) that is going to be a problem.  Hopefully UDI weighs in on all this.  

 

Gerry

 

Gerry,

 

I believe in the grand scheme of things people with older legacy hardware will have to accept some of the limitations. Or move forward and upgrade their network with newer technology like Gen 5. I myself had to accept that fact in my own Insteon network and it wasn't so much of reliability.

 

As my single band network was in my home rock solid . . .

 

The problem I saw as the network grew the amount of time to add, delete, modify large scenes just took forever. As some devices failed I of course replaced them with newer more current hardware. And the same process which I listed up above started to decrease in time to complete the same tasks.

 

A simple test for you would be to pick one lone battery device and try what I noted up top. If you see a difference well you got your answer which assumes RF repeaters are present to relay the same. Speaking for myself only upon starting this HA journey I waited a considerably long time for things to shake out in the Insteon camp.

 

Think tach switch issues . . .

 

This same view is seen in the Z-Wave adoption for me because my real world tests and observations told me - I better wait! The waiting appears to have paid off because Z-Wave Gen 5 Plus has offered lots of advancements which legacy products don't have or support.

 

Let us know what happens if you decide to do a mini test and see if going threw all those hoops makes a difference. 

Posted

 

which would be fine if there was a method for handling the problem with battery powered, non-beacon devices where that heal doesn't seem to be able to "reset" the devices neighbor/receiver. 

I am actually surprised that non-wakeable battery powered devices like that even exist. They apparently can only be used if they are in direct range of the controller.  What's the brand I wonder ?

Posted

Ecolink for the door/window sensors.  Everspring for the climate sensor.  Both are reasonably popular and work well (except for the occasional lost event, but for a given sensor, that is a once every 5-6 weeks event, if that).

 

They are really low-power devices and they are completely asleep asleep unless "poked" (for the door sensors, only when open/closed, for the climate sensors, only when the temp/humidity changes or once every 4 hours).  As a result, door sensor battery lasts at least a year and climate sensors are 8-10 months typical.

 

Gerry

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...