shannong Posted October 6, 2014 Posted October 6, 2014 (edited) I'm have a reliability issue with a program turning off lights that is triggered by the switching off of a 2477D Dual-band dimmer. The targets are a variety of 2477D/S and a Fanlinc. The intent is that when this one particular switch is switched 'Off' a program fires that turns off all lights in the adjoining room and also a Sonos player via network resource. This has worked fine for the first 9 months it was setup. The switch is not in any scene with the other lights and is not used to turn them On, only Off. Recently it has become unreliable working perhaps only 50% of the time or less. The 'Off' event is always seen by ISY and the program fires. However, often the lights won't turn off although the network resource always works. The strange thing for me is that the ISY updates the status of those switches as 'Off' even though they are still on, which shouldn't happen in an Insteon environment due to ACKs, right? Another odd thing is that if I switch it 'Off' again and the program runs again the ISY turns them off on the second try about 50% of the time. I say "odd" because why would the ISY send an 'Off' command to a device it thinks is already 'Off'. It should do nothing, right? I feel confident that it's not a noise or other interference issue because if I turn these lights on/off from ISY it works 100% of the time. I'm not having any other reliability issues. As for adding delays, the issue is present even when there is only one device that needs to be turned off. Also, all the lights in question are in a scene together currently so adding a delay isn't possible. ISY 4.2.7. PLM is 2413S v9B. The ISY and all relevant devices are on the same panel together with numerous dual-band devices on both legs. No other issues at this time with about 130 Insteon devices of which about 80% are dual-band. Level 3 log attached that includes the first "missed" off and a second one that is run even though the ISY sees them as 'Off' already. I'd really like help dissecting the log to figure this out. Also, can anyone please comment on the situation of the devices showing 'Off' when they are in fact still on? 25.6F.7D Switch that triggers the event 28.F8.63 & 28.E6.DA Two devices to be turned off. ISY-Events-Log.v4.2.7__Sun 2014.10.05 11.04.14 AM.txt Edited October 6, 2014 by shannong
LeeG Posted October 6, 2014 Posted October 6, 2014 Scene Responders do not ACK a Scene request. The ISY sends the commands requested regardless of whether they would appear redundant. Note that your post does not have an event log attached.
shannong Posted October 6, 2014 Author Posted October 6, 2014 Oops.I added the log. Well, that helps clear up some of the confusion knowing scene requests aren't ACK'd. Seems like a bad protocol design choice to me. What about the ISY sending 'Off' commands to devices it sees are already off? I thought that didn't happen.
LeeG Posted October 6, 2014 Posted October 6, 2014 (edited) Device 25.6F.7D is still sending inbound traffic at the same time the three Scenes have been initiated. That could be interfering with the Scene activity but would not explain how it functioned for 9 months. The ISY sends the commands requested. That includes sending On commands to a device that is already On and Off commands to a device that is already Off. Edited October 6, 2014 by LeeG
shannong Posted October 6, 2014 Author Posted October 6, 2014 Thanks for the quick and helpful replies. The ISY sends the commands requested. That includes sending On commands to a device that is already On and Off commands to a device that is already Off. There are about 12 other devices in those scenes that are already off during that test captured in the log. Why aren't those devices sent 'Off' commands if the ISY always sends them regardless of device status? Device 25.6F.7D is still sending inbound traffic at the same time the three Scenes have been initiated. That could be interfering with the Scene activity but would not explain how it functioned for 9 months. Is the inbound traffic from Device 25.6F.7D just repeated messages carried by the multitude other devices repeating the same 'Off' event? If not, what other traffic would it be? It is not a Controller for any scenes. Side note, ISY should be on 4.2.15, I plan to upgrade tonight but the problem did not coincide with my upgrade to the 4.2.x train.
shannong Posted October 6, 2014 Author Posted October 6, 2014 Looking in the log, my second attempt after the first failure started at 11:04:09. The lights were successfully turned off with that attempt. But I don't actually see any 'Off' commands sent to them. ??
LeeG Posted October 6, 2014 Posted October 6, 2014 (edited) "There are about 12 other devices in those scenes that are already off during that test captured in the log. Why aren't those devices sent 'Off' commands if the ISY always sends them regardless of device status?" Why do you think the other devices did not receive Off commands? "Is the inbound traffic from Device 25.6F.7D just repeated messages carried by the multitude other devices repeating the same 'Off' event? If not, what other traffic would it be? It is not a Controller for any scenes." Device 25.6F.7D like all devices that change Status is a Controller to the PLM. That relation is established when the device is added to the ISY. The addition traffic is part of the device paddle/button press. When a device paddle/button is pressed a number of messages are sent to PLM. The first message received results in the Program trigger which sends 3 Scene commands (two On and one Off). As those three Scene messages are processed by the PLM the additional messages from 25.6F.7D are being received. This message is the first message from button press. It results in the Program bring triggered. Sun 10/05/2014 11:03:43 AM : [iNST-SRX ] 02 50 25.6F.7D 00.00.01 CB 13 00 LTOFFRR(00) Sun 10/05/2014 11:03:43 AM : [std-Group ] 25.6F.7D-->Group=1, Max Hops=3, Hops Left=2 Sun 10/05/2014 11:03:43 AM : [D2D EVENT ] Event [25 6F 7D 1] [DOF] [0] uom=0 prec=-1 Sun 10/05/2014 11:03:43 AM : [ 25 6F 7D 1] DOF 0 Sun 10/05/2014 11:03:43 AM : [D2D-CMP 00FE] CTL [25 6F 7D 1] DOF op=1 Event(val=0 uom=0 prec=-1) is Condition(val=0 uom=0 prec=-1) --> true Sun 10/05/2014 11:03:43 AM : [D2D-CMP 0039] CTL [25 6F 7D 1] DOF op=1 Event(val=0 uom=0 prec=-1) is Condition(val=0 uom=0 prec=-1) --> true The next three messages are the Program invoking the three Scenes Sun 10/05/2014 11:03:43 AM : [iNST-TX-I1 ] 02 62 00 00 22 CF 11 00 Sun 10/05/2014 11:03:43 AM : [iNST-TX-I1 ] 02 62 00 00 33 CF 11 00 Sun 10/05/2014 11:03:43 AM : [iNST-TX-I1 ] 02 62 00 00 19 CF 13 00 Sun 10/05/2014 11:03:43 AM : [ Time] 11:03:46 27(0) This is the next message from 25.6F.7D. Sun 10/05/2014 11:03:43 AM : [iNST-SRX ] 02 50 25.6F.7D 00.00.01 C7 13 00 LTOFFRR(00) Sun 10/05/2014 11:03:43 AM : [std-Group ] 25.6F.7D-->Group=1, Max Hops=3, Hops Left=1 Sun 10/05/2014 11:03:43 AM : [iNST-DUP ] Previous message ignored. This message is the PLM accepting the first Scene command Sun 10/05/2014 11:03:43 AM : [iNST-ACK ] 02 62 00.00.22 CF 11 00 06 LTONRR (00) This is next message from 25.6F.7D Sun 10/05/2014 11:03:43 AM : [iNST-SRX ] 02 50 25.6F.7D 25.26.E4 41 13 01 LTOFFRR(01) Sun 10/05/2014 11:03:43 AM : [std-Cleanup ] 25.6F.7D-->ISY/PLM Group=1, Max Hops=1, Hops Left=0 Sun 10/05/2014 11:03:43 AM : [iNST-DUP ] Previous message ignored. This is the PLM accepting the second Scene command Sun 10/05/2014 11:03:43 AM : [iNST-ACK ] 02 62 00.00.33 CF 11 00 06 LTONRR (00) Sun 10/05/2014 11:03:43 AM : [D2D EVENT ] Event [28 F8 63 1] [sT] [0] uom=0 prec=-1 Sun 10/05/2014 11:03:43 AM : [ 28 F8 63 1] ST 0 Sun 10/05/2014 11:03:43 AM : [D2D EVENT ] Event [28 E6 DA 1] [sT] [0] uom=0 prec=-1 Sun 10/05/2014 11:03:43 AM : [ 28 E6 DA 1] ST 0 This is next message from 25.6F.7D Sun 10/05/2014 11:03:43 AM : [iNST-SRX ] 02 50 25.6F.7D 13.01.01 CB 06 00 (00) Sun 10/05/2014 11:03:43 AM : [std-Group ] 25.6F.7D-->13.01.01, Max Hops=3, Hops Left=2 Sun 10/05/2014 11:03:43 AM : [iNST-INFO ] Previous message ignored. Sun 10/05/2014 11:03:43 AM : [iNST-SRX ] 02 50 25.6F.7D 13.01.01 C3 06 00 (00) Sun 10/05/2014 11:03:43 AM : [std-Group ] 25.6F.7D-->13.01.01, Max Hops=3, Hops Left=0 Sun 10/05/2014 11:03:43 AM : [iNST-INFO ] Previous message ignored. This is the PLM accepting the third Scene command Sun 10/05/2014 11:03:43 AM : [iNST-ACK ] 02 62 00.00.19 CF 13 00 06 LTOFFRR(00) "Looking in the log, my second attempt after the first failure started at 11:04:09. The lights were successfully turned off with that attempt. But I don't actually see any 'Off' commands sent to them. ??" When a Scene is turned On/Off the Scene command is part of the Scene invocation. Note Red lines below. All devices physically see the Scene command on the powerline. Those devices which have a Responder link record that matches the Scene number react. Sun 10/05/2014 11:03:43 AM : [iNST-TX-I1 ] 02 62 00 00 22 CF 11 00 Sun 10/05/2014 11:03:43 AM : [iNST-TX-I1 ] 02 62 00 00 33 CF 11 00 Sun 10/05/2014 11:03:43 AM : [iNST-TX-I1 ] 02 62 00 00 19 CF 13 00 Edited October 6, 2014 by LeeG
LeeG Posted October 6, 2014 Posted October 6, 2014 To the first question about not seeing Off commands, perhaps you were expecting 02 50 ........ response from each device. This would be an ACK which does not happen on Scenes invoked from PLM
shannong Posted October 6, 2014 Author Posted October 6, 2014 Thanks for the deconstruction and notation. Very helpful to understand the logs for future troubleshooting. In the first failed attempt, I see traffic like this associated with the two On devices during the second Scene. Sun 10/05/2014 11:03:43 AM : [ 28 F8 63 1] ST 0 I assume that means the device stated it's level is now zero. Hence the ISY updating the status to be Off. After the first unsuccessful attempt they are shown as Off in ISY though in reality On. The second successful attempt resulted in them turning Off but I don't see any traffic with those addresses. Why not? I discovered in my testing that I can alleviate the problem by adding a one second delay at the beginning of the Then clause. But I'd like to understand the mechanics for future troubleshooting using the logs.
LeeG Posted October 6, 2014 Posted October 6, 2014 (edited) "I assume that means the device stated it's level is now zero. Hence the ISY updating the status to be Off." The ISY has no feedback from any of the devices in the Scene (no ACKs). That trace entry reflects the change in Status the ISY thinks happened as a result of the Scene execution. There are no more Insteon messages that the PLM would pass back to the ISY because there are no ACKs. These messages do no appear the second time because the ISY already has them marked Off. No change in Status the ISY knows about so no message about a state change That is good news about the effect of the Wait. It means the overlap of inbound messages from 25.6F.7D with the Scene outbound messages was the issue. Edited October 6, 2014 by LeeG
LeeG Posted October 6, 2014 Posted October 6, 2014 Inbound messages from a device start with 02 5x ........ The only inbound messages are those from 25.6F.7D where the button was pressed.
shannong Posted October 6, 2014 Author Posted October 6, 2014 (edited) That is good news about the effect of the Wait. It means the overlap of inbound messages from 25.6F.7D with the Scene outbound messages was the issue. Mixed bag for me. Good that my issue is resolved. Bad that the Insteon protocol and implementation is designed such that the simple scenario presented here does not function properly without resorting to workarounds and hacks. It's also puzzling how this was an non-issue until now. Thanks so much for your help in understanding the logs, the protocol, and interactions of various components. Edited October 7, 2014 by shannong
LeeG Posted October 7, 2014 Posted October 7, 2014 That was my question in the initial response. I was pretty certain a Wait would resolve the problem, it normally does, but do not know why now unless something changed that makes the Program trigger faster and thus issue the Scenes sooner. Insteon does not stack commands to avoid conflicts. Insteon does not put two messages on the Insteon network at the same time but does not analyze current activity, holding a command until the logical conflict is over. If an interaction creates a conflict on the Insteon network Insteon resolves it by terminating something. At insteon.com there is an insteondetails.pdf document that covers some of the internal workings of Insteon. I think that document covers Insteon only allowing a single Scene to run.
shannong Posted October 12, 2014 Author Posted October 12, 2014 (edited) BTW. I took out the Wait and did some more testing today. It worked 100% of the time with approximately 50 attempts whereas before it was only 50% reliable. The only difference and I can identify was my upgrade to 4.2.16. I walked the entire house looking for something that might have been plugged in or On that that has changed and couldn't identify anything. The problem usually was usually noticeable in the evening most since it's gets used most often then. I turned on everything that is normally on in the evening including the upstairs HVAC which isn't on a filter like the downstairs unit, although on a different panel. Dunno. I'm going to leave it operating without the Wait for a while and see if it resurfaces for further testing and root cause analysis. Edited October 12, 2014 by shannong
jgorm Posted October 24, 2014 Posted October 24, 2014 I've had this happen a few times where stuff didn't turn off. I put in a turn off, wait 5 seconds, and then turn off again on those programs and it fixed the issue. Not an elegant solution, but it worked.
shannong Posted October 24, 2014 Author Posted October 24, 2014 Thanks additional for the suggestion. Adding the one second wait at the beginning fixed the problem and seems to validate Lee's diagnosis of a collision during the send/receive of the initial device state change and executing the scene command.
Recommended Posts