Jump to content

Can control all devices but status doesnt work


bsobel

Recommended Posts

Ok, I *think* I know what's going on, I suspect either the PLM or ISY has a bug in its COM port processing code.

 

I setup houselinc and added the problematic switch to it. What I am seeing in the houselinc event log vs the isy one is interesting.

 

Both appear to be receiving roughly the same data. However houselinc is logging the data as it occurs while ISY is often waiting for another event to occur before outputting the log data.

 

For example I walk up to the switch and turn it on (from an off state). Houselinc logs the change immediately, but sometimes the ISY doesnt show the data in the log. If I turn the swtich off, houselinc shows the off and ISY's logs appear to show the ON AND THE OFF coming in, both have the same timestamp. So I think internally ISY is eventually seeing everything, but some of this data is sitting in a com buffer until another event causes it to be flushed. It probably did process the 'on' then the 'off' but since they came together my code triggering on the 'on' case probably never fired.

 

It's not always the 'first' switch press in a test. Sometime the data does show up in the log on the first transmission I cause, but within a switch press or two houselinc is seeing the data live and the ISY log is definately buffering at least one previous event until another one causes it all to be flushed.

 

This is *VERY* reproducible and explains most if not all of my 'weird' symptoms.

Link to comment
Share on other sites

Another follow-up. I went to some other switches which are more reliable. I've found all of them are sending more data (e.g. most have lincs to other devices) and that seems to be generating enough data that the event viewer sees the data in 'real time'.

 

Its the new switches/keypadlinc that is only linked to the PLM which is generating less data and not being tracked by ISY. I verified this by holding the keypad to generate bright/dim commands, that causes ISY to see the traffic sooner (since enough com data was generated to process) and its tracking these 'problematic' switches real time in that case.

 

I'm seeing this on ALL of my new switchlincs now that I'm looking for it.

Link to comment
Share on other sites

This is interesting input because my problem switch is also one of my newest ones (I wanted to be sure I had a reliable switch in this location). Because it is a relay switch I can't verify the dim test but I wonder if my test of hitting the paddle multiple times managed to trigger the buffer to process the command?

Link to comment
Share on other sites

I'm presuming yes, those additional paddle punches eventually cause enough traffic that the data is read and processed. Should be fairly easy to verify, run the event viewer and have someone toggle the switch for you. I figured out the on/off commands (what they look like) from a switch that worked (albeit it spit out additional data which seemed to be enough to get processed each time). And then clearly saw the missing events when I generated additional on or off commands.

Link to comment
Share on other sites

bsobel,

 

Thanks so very much for the troubleshooting.

 

A little background:

There's no intential buffering on ISY side: ISY reads from the serial port till either

a. It receives a complete package (depending on the type)

b. A timeout (4 seconds) is reached and thus the message is discarded and we start fresh

 

So, a better test would be:

1. Turn on the problematic switch

2. Within 4 seconds, press and/or turn on any other non-problematic device

 

Please let me know if you still see the same behavior: receiving the event for #1 and then #2.

 

Thanks and with kind regards,

Michel

 

I'm presuming yes, those additional paddle punches eventually cause enough traffic that the data is read and processed. Should be fairly easy to verify, run the event viewer and have someone toggle the switch for you. I figured out the on/off commands (what they look like) from a switch that worked (albeit it spit out additional data which seemed to be enough to get processed each time). And then clearly saw the missing events when I generated additional on or off commands.
Link to comment
Share on other sites

Michel,

 

Thanks for the quick reply. I just got back home and did some quick tests. What I'm seeing is this:

 

If I try your test and active the problematic keypadlinc (was a switch linc, swapped this out to debug it) key 1 (load on). Just doing that on/off generally does not generate any traffic that ISYs event viewer sees. I say generally as it seems to 'sometimes' almost randomly catch an event (at least, I havent figured out the pattern).

 

Activating another device which is not problematic does seem to almost always force ISY to see the keypadlincs transmission. I tested both a relaylinc next to it as well as button 'A' on the same keypadlinc. Pressing those both quickly did seem to get the log to report the seperate devices activating. Houselinc seems to be seeing the transmissions via it's USB PLC reliably.

 

Is it possible your definition of a 'complete package' is too long for a type of insteon transmission? The main thing all of these switches have in common are:

 

1) They are relatively new (installed in batches over the last few months)

2) They were added to ISY by either 'new insteon device' or 'start linking'

3) They are NOT linked to any other devices <-- This seems key. I can not find a switch which is linked to another device (including linking to my USB PLM which houselinc uses) which shows this behaviour. All of my 'older' switches which were used in Houselinc seem to be reliable in ISY. Its only the new switches which only ISY knows about which are problematic. However, adding the switch to houselinc didn't resolve the issue (albeit, its in the ?? mode as I didnt let it do its walk off that switch, didnt want it to screw up any links ISY needed).

 

Cheers,

Bill

Link to comment
Share on other sites

Hi Bill,

 

Thanks so very much for testing!

 

The main problem is that if ISY gets a timeout, then it discards the whole packet. In case of no timeout, it's very possible that there's a bug in ISY but I cannot fathom why this has gone unnoticed for so long.

 

INSTEON standard packages (both ACK and TX) have fixed lengths and, thus, the only thing that I can think of is that the PLM is buffering them and only sends it to ISY when reaching a limit.

 

Can I send you a brand new SWL in exchange for yours? I shall pay for shipping both ways. This way, I can test and experience precisely what's going on.

 

With kind regards,

Michel

 

Michel,

 

Thanks for the quick reply. I just got back home and did some quick tests. What I'm seeing is this:

 

If I try your test and active the problematic keypadlinc (was a switch linc, swapped this out to debug it) key 1 (load on). Just doing that on/off generally does not generate any traffic that ISYs event viewer sees. I say generally as it seems to 'sometimes' almost randomly catch an event (at least, I havent figured out the pattern).

 

Activating another device which is not problematic does seem to almost always force ISY to see the keypadlincs transmission. I tested both a relaylinc next to it as well as button 'A' on the same keypadlinc. Pressing those both quickly did seem to get the log to report the seperate devices activating. Houselinc seems to be seeing the transmissions via it's USB PLC reliably.

 

Is it possible your definition of a 'complete package' is too long for a type of insteon transmission? The main thing all of these switches have in common are:

 

1) They are relatively new (installed in batches over the last few months)

2) They were added to ISY by either 'new insteon device' or 'start linking'

3) They are NOT linked to any other devices

 

Cheers,

Bill

Link to comment
Share on other sites

"Can I send you a brand new SWL in exchange for yours? I shall pay for shipping both ways. This way, I can test and experience precisely what's going on. "

 

Sure thing, but I'd suspect you'd be more interested in the PLM (which I got from you for what it's worth). Any switchlinc/keypad linc not linked to another device (other than the PLM) seems to show this. Its like there isn't enough traffic generated to get something to notice.

 

Im traveling until Thursday night, but I plan to drop a terminal program on the PLM when I get back and see if I can confirm it's buffering the data or not.

 

Bill

Link to comment
Share on other sites

Good news/bad news...

 

I hooked up a terminal program to the PLM and can confirm the PLM is not sending the data up to ISY (Im seeing the same delays with the terminal program).

 

I was kinda hoping it was your bug (I loathe the idea of trying to get through to the right person at Smarthome, sigh...). I'm pinging my Smarthome contact now, see if he can help me out.

 

So, those of us with status problems probably need a PLM fix....

Link to comment
Share on other sites

Since the PLM does work with some switches wouldn't the problem more likely be the newer generation of switches rather than the PLM? Maybe the PLM would send the data fine if it thought it got a valid message from the switch.

 

Since the PLM does work with some switches wouldn't the problem more likely be the newer generation of switches rather than the PLM? Maybe the PLM would send the data fine if it thought it got a valid message from the switch.

 

A couple of notes:

 

There is a mix of switches here. They are 'new' to me, but not all always'newer' firmware versions.

 

The PLC sees traffic from these switches fine, so they are absolutely sending valid messages.

 

The PLM does send the data, it just doesnt until there is additional events to report.

 

It seems to happen to device which dont generate more than just a one line status change. If I link them to something else they appear to work (since that generates more data)

Link to comment
Share on other sites

Hello Bill,

 

Thanks so very much for your outstanding debugging effort. As per IndyMike, could it be that some of these devices may not be sending a hop count more than 1. But, that would not explain why none of your devices send status updates. In short, I must agree with you that it's the PLM.

 

Thanks again so very much,

With kind regards,

Michel

 

Since the PLM does work with some switches wouldn't the problem more likely be the newer generation of switches rather than the PLM? Maybe the PLM would send the data fine if it thought it got a valid message from the switch.

 

Since the PLM does work with some switches wouldn't the problem more likely be the newer generation of switches rather than the PLM? Maybe the PLM would send the data fine if it thought it got a valid message from the switch.

 

A couple of notes:

 

There is a mix of switches here. They are 'new' to me, but not all always'newer' firmware versions.

 

The PLC sees traffic from these switches fine, so they are absolutely sending valid messages.

 

The PLM does send the data, it just doesnt until there is additional events to report.

 

It seems to happen to device which dont generate more than just a one line status change. If I link them to something else they appear to work (since that generates more data)

Link to comment
Share on other sites

Hello Bill,

 

Thanks so very much for your outstanding debugging effort. As per IndyMike, could it be that some of these devices may not be sending a hop count more than 1. But, that would not explain why none of your devices send status updates. In short, I must agree with you that it's the PLM.

 

I ruled out the hopcount when on a keypadlinc bright/dim is recorded correctly but on/off isn't seen (its one of the commands that gets buffered until sometime later). Also on the same keypad, while on/off doesnt come thru button A which is linked to another device is reliable 100% of the time. I figured those both ruled out the hopcount (also, the PLC is seeing the command, third reason...)

Link to comment
Share on other sites

Hi Bill,

 

Yes, you are 100% correct. And, based on your input, we can also rule out missing links in the PLM since dim/bright work.

 

So, I guess all points to the PLM.

 

With kind regards,

Michel

 

Hello Bill,

 

Thanks so very much for your outstanding debugging effort. As per IndyMike, could it be that some of these devices may not be sending a hop count more than 1. But, that would not explain why none of your devices send status updates. In short, I must agree with you that it's the PLM.

 

I ruled out the hopcount when on a keypadlinc bright/dim is recorded correctly but on/off isn't seen (its one of the commands that gets buffered until sometime later). Also on the same keypad, while on/off doesnt come thru button A which is linked to another device is reliable 100% of the time. I figured those both ruled out the hopcount (also, the PLC is seeing the command, third reason...)

Link to comment
Share on other sites

Gentlemen,

 

I'd like to respectfully disagree - I'm still hung up on hop counts, but maybe in a different way.

 

Every device that I've installed (30+) has first been bench tested by attaching it to a power cord (similar to Upstate's test). This is essentially the same configuration that you are having problems with: single device linked only to the PLM. I have never seen a problem with the ISY recognizing a local device "press".

 

I can't rule out differences in newer version switches (I do have a number of V.2C Switchlincs and KPL's) nor can I rule out a firmware difference in the PLM (I'm using a Rev 2.75).

 

What I would like to point out is the basic difference in how the switches communicate with the PLM and the Hop counts used. The following was generated with a V.2C Switchlinc Relay on a power cord plugged into my PLM.

 

The LEFT pane shows the Command/Response when using the ISY to turn on the SWL directly. It uses a 3 hop count direct command and receives a 3 hop count response from the SWL.

 

The RIGHT pane shows how the SWL communicates when it's paddle is pressed. In contrast to the ISY command, the SWL communicates using the Group broadcast mode (group of 2 devices - the SWL and PLM). The big difference that I see here is that the Group cleanup is communicated with a Maximum Hop count of 1 (1 retry if you will).

 

Insteon_Communication.JPG

 

A single 1 hop may simply not be enough for large installations like Upstates'. What I'm fearing is that the PLM is waiting for the Group Cleanup (or other transmission) prior to passing the information to the ISY. If the PLM doesn't receive this communication, the ISY never sees the Group command.

 

Upstate, you can verify this in part by plugging your "Problem" Icon in close proximity to the PLM. If the PLM still can't see the Icon, it would seem that we have a difference in the switch itself since your Switchlinc can communicate reliably.

 

If the PLM is able to see the Icon reliably, we may have part of the answer.

The question would then be - is this a PLM firmware problem or device communication problem.

 

IM

Link to comment
Share on other sites

Bill,

 

I wouldn't call that a silly question - asked it myself awhile ago. My understanding is that the device hop counts are programmed in the firmware. I believe the ISY uses 3 hops for all comms.

 

I just got finished walking my test switch around my 4500 sq foot 3 level house. I could not force a failure even though I had disconnected my accesspoints (2). Unfortunately I can't simulate what you and Upstate are seeing.

 

What version PLM and KPL are you running?

 

I haven't dug deep enough into the protocol, so possibly a silly question. Are the hop counts programmable or fixed?
Link to comment
Share on other sites

Ok, I took the problematic Switchlinc and wired it into an outlet near the PLM. You were right, it is functioning correctly here. When tested in other parts of the house, the problem returns.

 

I suspect you may have nailed it with the group cleanup hop count of 1 vs the directed command hop count of 3.

 

A couple of questions/comments (for Michel or whomever)

 

a) If I'm following correctly the switch is being programmed to do a group broadcast when pressed. The PLM is a member of the group, but the switch isn't expecting or enforcing any ACK from the PLM on its broadcast. So it sends out the group cleanup with a hop count of 1 and the PLM just doesnt see it. Does this sound right?

 

B) Is there a way to 'link' the switchlinc to the PLM so a direct broadcast would be sent as well? This seems my only hope of fixing this without a firmware upgrade to all of my switches/keypads and a rip/replace of all of them.

 

c) Am I missing an option to work around this? I have access points in most rooms (I think I have 7 in the house). But the hop count will keep it from relaying anyhow.

 

d) I also installed a hardwired signal bridge yesterday. No change.

 

e) Switchlinc tested 3.3. Plm rev 2.9

 

Bill

Link to comment
Share on other sites

Bill,

 

First and foremost, you are the one who uncovered the fact that the PLM does not relay the information to the ISY in the absence of a cleanup command. I simply sought to explain that behavior (a theory at this point).

 

Thank you.

 

a) If I'm following correctly the switch is being programmed to do a group broadcast when pressed. The PLM is a member of the group, but the switch isn't expecting or enforcing any ACK from the PLM on its broadcast. So it sends out the group cleanup with a hop count of 1 and the PLM just doesnt see it. Does this sound right?

 

That's the theory at this point. Somehow, when you have other devices in the group, the PLM sees the traffic and relays the original broadcast command to the ISY. Since your linked devices are in different locations you are increasing the chances that the PLM will hear them. I'm still very fuzzy on why the PLM would complete the transmission of the original broadcast command when receiving a cleanup for a different device address (maybe Michel can help).

 

 

c) Am I missing an option to work around this? I have access points in most rooms (I think I have 7 in the house). But the hop count will keep it from relaying anyhow.

 

As I understand the message "hopping", 1 hop is sufficient to make it to a accesspoint and communiate (RF) with the other accesspoints on the net. If another Hop is required to make it to the PLM you'll be short.

 

If you don't mind me asking, how large is your installation (devices, sq feet, home layout). As I indicated, my 3 story home can run without accesspoints installed.

 

d) I also installed a hardwired signal bridge yesterday. No change.

I have one installed at my electrical panel as well. My PLM is also on a dedicated circuit at the panel (basement) with my accesspoints installed at the most distant points on the second floor.

 

In my view, using the signal bridge at the panel prevents the PLM from "burning" a hop when bridging the phases. In my configuration, this is "clear" for for transmissions from the PLM to other devices. When other devices are transmitting to the PLM it's a crap shoot (not sure what the dominant path is).

 

e) Switchlinc tested 3.3. Plm rev 2.9

 

Both quite a bit newer than my devices/PLM. We could still have issues with firmware revision here.

 

In general, I have an advantage in that I initially started using X10 in my home (home is 8 years old). In order to do that I had to track down signal absorbers (4 filters in the house) and boost the X10 signal at my panel. As a result I had a pretty good idea where my problem areas were prior to installing Insteon (3 years ago).

 

If you really need all 7 of your Accesspoints for your system to be reliable, either you have a much larger installation than I, or you have many noise/absorption points. If you feel that you have noise/absorption points, you may be better served by finding the "problem" devices and filtering them. This would presumably allow the cleanup messages to make it through without the accesspoints.

 

Just an opinion (with very little information to go on),

IM

Link to comment
Share on other sites

Upstate, you can verify this in part by plugging your "Problem" Icon in close proximity to the PLM. If the PLM still can't see the Icon, it would seem that we have a difference in the switch itself since your Switchlinc can communicate reliably.

 

If the PLM is able to see the Icon reliably, we may have part of the answer.

The question would then be - is this a PLM firmware problem or device communication problem.

 

IM

 

I need to make a correction here. My original problem switch is a SwitchLinc, not an Icon. At one point early on in this thread I noted that I also had some Icons that seemed to exhibit the same issue as my problem Switchlinc and Michel had me go and test all of the Icons with the same firmware rev to verify a theory he had. This may or may not be part of the same problem that I had with the original SwitchLinc Relay that I eventually swapped for another Switchlinc Relay to solve the problem. The problem SwitchLinc that I swapped with my working test switch is a relatively new 2476S with a hardware version of V2.5

 

I now have this problem switch on my test lead and I can plug it directly into the PLM and test it there if you want me to. (but probably not until later this evening)

Link to comment
Share on other sites

Home is about 4800sf, we have about 129 devices (switch lincs, relay lincs, keypad lincs being the majority, and some appliance/lamplincs as well).

 

Not honestly sure I needed 7 access points, I had two and the system seemed ok. I started adding more to debug this issue. Im pretty sure I could remove them all and just lose the RF controllers at this point.

Link to comment
Share on other sites

I tried the problem switch plugged directly into the PLM and it still operated in a flaky manner. In a series of 7 On/Off cycles only 3 commands registered.

 

At first when I tried it I noticed the X-10 address from some earlier testing was still set and those X-10 commands registered every time. So I got a string of X-10 On and Off commands with only a few Insteon ones mixed into the series.

 

I removed the X-10 address but that did not make the Insteon reporting any better.

 

Maybe this is just a bad switch and not related to the problem others are seeing.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...