Jump to content

Programming in response to communication errors


porscheguy

Recommended Posts

I've searched for a while trying to find a post dealing with this issue but decided to just ask a question:

 

I have what seems to be a pretty solid Insteon-only (no X-10) system. I have no problem communicating with all devices manually, and under program control I get pretty reliable performance. But I occasionally get errors (-2) where no acknowledgement of the status occurs, and this usually means that the device does not do what it has been commanded to do.

 

I can and will continue to work at getting a better network but what I want to know is how - under program control - to deal with errors when they occur.

 

Whenever I've experienced an error - the next time I try to communicate with the same device it works fine - so it is an infrequent event. What I'd like is a way of detecting the error and be able to repeat the command until it "takes" or, if it doesn't after a few attempts, have it notify me by email.

 

I see nothing in the program capabilities that detects an error so I can't use it in a condition.

 

What is the best and most general way to detect and respond to an infrequent error of this type automatically (or automagically) when they occur?

 

Thanks for any help -

Link to comment

Hmmm -

 

Obviously I'm new at this, but I find this troubling. Insteon touts their product as being very robust, but in fact it isn't. The forum is full of users who are happy to finally achieve 98% reliability. So if I have a control system, that really controls much of the time, but occasionally not - I at least want some way of knowing when there is a problem. And I'd like to know right away - not a week later when I return home, or when I happen to log in from a remote location and find the error in a log file.

 

At least so far, the only problems I have are due to device A (for example a relay) that is told to activate (or deactivate) and it doesn't do it. In this case the error is that a status request is unanaswered and the ISY flags this as an error.

 

Is there no way that I can get a notification (email) whenever there is an error of any kind?

Link to comment

FYI - I just had another of those "infrequent" failures to communicate. A pool pump controlled by an EZIO40 failed to operate because of a communication error. It has worked for weeks but today it decides not to communicate. I get a nice dialogue on my ISY that that particular device failed to communicate. It seems it would be very "easy" ( for you that is) to, at the same time that the dialogue is gereated, an optional notification could be generated.

 

 

However, in the log, I note something that I don't understand. I don't know yet how to cut and paste from the log and get it to look right, but here is a segment:

 

PoolEquipment / ValveDirection On 255 Tue 2011/04/05 11:00:00 AM Program Log

PoolEquipment / ValvePower On 255 Tue 2011/04/05 11:00:00 AM Program Log

] 02 02 [uNKNOWN ] 02 02 Tue 2011/04/05 11:00:01 AM System -2

PoolEquipment / PoolPump Tue 2011/04/05 11:00:05 AM System -2

PoolEquipment / ValvePower Off 0 Tue 2011/04/05 11:00:40 AM Program Log

PoolEquipment / ValveDirection Off 0 Tue 2011/04/05 11:00:40 AM Program Log

PoolEquipment / ValveDirection Status 100% Tue 2011/04/05 11:00:40 AM System Log

PoolEquipment / ValveDirection Status 0% Tue 2011/04/05 11:00:41 AM System Log

 

 

 

At 11:00 am it is correct that a command to turn on ValvePower, and turn ValveDirection off. I see the commands going out and then later the status showing that the commands were correctly executed. But in the middle of all of this there is the two -2 errors and the UNKNOWN indicator. None of this should be there - there is no communication scheduled for the PoolPump at this time. What is all this about?

 

All of these - ValvePower, ValveDirection and PoolPump are relays in the same EZIO40. Is this somehow relevant?

Link to comment

Hello porscheguy,

 

-2 means device communication error. UNKNOWN means ISY received a message from a device with an unknown address. Subsequent to the Unknown, there's another -2 for your EZIO ...

 

Yes, we are sizing this requirement and hopefully we can included it in a future build.

 

With kind regards,

Michel

Link to comment

Thanks Michel - I think that would be very useful. Just as a thought, maybe if something like this were available, the user could select which specific errors would result in a notification; any error, or a set of certain selected errors.

 

On another tact, it would also be nice if an error could trigger a condition in a program in which case a repeat loop could be implemented to try to resolve the error.

 

Thanks again -

Link to comment

Hello porscheguy,

 

Actually that would not be a good idea since, in case of extreme communications problems, your system is going to go into an infinite loop keep trying to rectify it. Although we cannot prevent this scenario, but I strongly recommend limiting the actions to a notification.

 

With kind regards,

Michel

Link to comment

I guess what I was thinking is that in the cases I experience, it seems the first time I retry it succeeds, although currently there is a significant time lapse between an error and my finding out about it and trying to correct it. If a program condition were available, I would use repeats with a wait of some time - maybe 10 seconds - and retry. I'd also give up and send a notification after some number - say 5 - unsuccesful retrys.

 

If there is a major communication problem I'm up a creek anyway and a notification would hopefully go out and I'd get around to finding out what is going on. If it's optional - let the buyer beware!

 

But anything in this direction would be appreciated.

 

Thanks Michel - I guess that's enough on this subject.

Link to comment

After converting from x-10 to Insteon over a 2 year period, for the most part, after my initial communication woes, my installation has been pretty stable. But I have noticed the same occasional random errors pop up on my Console that porscheguy is experiencing.

Whenever I've experienced an error - the next time I try to communicate with the same device it works fine

However, a query or other action on that device shows that it is working just fine. But I share porscheguy’s concern that Insteon is not the robust, dependable system Smarthome touts.

https://docs.google.com/leaf?id=0B3JB9HSAQ97hNWZmNzA5YmItYzEyYi00ZGFlLTk2M2MtYTVlYTNlMmIzOGM0&hl=en

Further, it seems to take an inordinate amount of time to resolve all the issues that may pop up. For example, I have a v1.0 Keypadlinc that has operated fine for years with no communication issues. That device finally failed and Smarthome stepped up and did replace the device due to known issues with that version. However, after swapping out the device with a new v5.1 Keypadlinc (and no other changes), I am having inconsistent communication issues with the new device.

 

I’ve been scratching my head trying to resolve the problem, by the ISY just keeps cycling me through a loop and the errors that pop up are not understandable. What I see after linking the new device and adjusting scenes, I get a ‘cannot communicate with device’ message or other errors pop up every time I try to write to the device. (see link above). The error log shows:

Wed 2011/04/06 05:11:39 PM System -170001 [uDSockets] RSub:29 error:0

Wed 2011/04/06 05:11:49 PM System -5012 n/a

Wed 2011/04/06 05:11:49 PM System -170001 [uDSockets] HTTP:24 error:0

Wed 2011/04/06 05:11:49 PM System -170001 [uDSockets] HTTP:28 error:0

Wed 2011/04/06 05:11:59 PM System -170001 [uDSockets] HTTP:28 error:0

Wed 2011/04/06 05:12:09 PM System -170001 [uDSockets] HTTP:28 error:0

Wed 2011/04/06 05:12:19 PM System -170001 [uDSockets] HTTP:28 error:0

Wed 2011/04/06 05:12:29 PM System -170001 [uDSockets] HTTP:28 error:0

When I tried writing to the device later this morning, I got an error:6. But what is weird is it appears all links and scenes were written to the device, I can control the lights from the console and from the keypadlinc. But the icon status of this device goes from 1101, to Writing, to a red exclamation (errors pop up). I can clear the red explanation by querying the device and it’s status goes back to 1101.

 

I am wondering if there is some element of this device (related to communication) that may be defective. But how does one figure this out?

Link to comment

I got these errors after updating to 3.1.2. It did put me in a loop very slowly trying to write to something over and over again. Note the 10 sec retries.

 

Since I'm not using variables I downgraded to 2.8.16 which seemed to have cleared up the problem.

Link to comment

sanders2222

 

We all wish there was a tool that would definitively measure and display powerline quality at particular locations.

 

The best way I have found to determine if the problem with a particular device is powerline or device is to move the device. This is technically more difficult with a wired device but it is possible. Home improvement stores carry a 3 wire Appliance cord with a plug on one end and bare wires on the other. Connect the new KeypadLinc to the Appliance cord and move the plug point close to the ISY PLM plug point.

 

Do the Query which should start any pending write operations. If they complete at the alternate site the KeypadLinc is okay.

 

Testing new wired devices in this manor is a good idea. Rarely is a device defective out of the box but it does happen. Testing with the appliance cord is very easy with a device right out of the box. When that test works any problem at the install point can be attributed to the location or the load it is controlling.

 

Lee

Link to comment

Thanks for the tip LeeG. I have some of those cords with plug on one end in my garage already. I've salvaged them from the past and have been looking for some use for them.

 

In another recent post, Michel wrote

V35 problems:

1. Scenes are activated intermittently

2. Intermittent network communication errors on ALL your devices (not just v35s)

 

In short, if you have V35s in your system, I am almost certain that all your communication errors are originating from those devices.

I noticed my Keypadlinc shows up as a v35 on the ISY console. But the little sticker on the switch is v5.1. I also have another switch that show v35 on the console. Is the version shown on the ISY that which Michel is referring? My symptoms are identical to 1 & 2 above.

Link to comment
I noticed my Keypadlinc shows up as a v35 on the ISY console. But the little sticker on the switch is v5.1. I also have another switch that show v35 on the console. Is the version shown on the ISY that which Michel is referring? My symptoms are identical to 1 & 2 above.

 

I understand that there are hardware versions and software versions, and that it is the version that shows up on the ISY amin console that should give you concern if it is v35. It sounds like you may have an issue with the v35 devices.

Link to comment
As I understood it, the v35 issue was only with Switchlinc relay (possibly some SL dimmers also?) but not with KPL's, can anyone confirm?

 

That is also consistent with my recollection, but a more authoritative source would be better.

Link to comment

I'm back again. I decided to continue in this topic because of my issues with errors and wanting to be able to deal with errors in a programming methodology. But it brings to mind a question that probably is better handled in a different forum area, and maybe someone can point me to the appropriate area.

 

The question is - why do I get these errors? Here is my reason for asking this. I have device D200 (the 200 means it is at least 200 ft away from the PLM and ISY), another device D100, and another device D1. (Don't ask me right now which power leg, and how many other Insteon devices are on each path, because I don't know at the moment!) But I send on/off commands to each of these and I get the same basic result on the Event Viewer (without the timng) -

 

[iNST-ACK ] 02 62 16.6A.3F 0F 13 00 06 LTOFFRR(00)

[iNST-SRX ] 02 50 16.6A.3F AA.AA.AA 2B 13 00 LTOFFRR(00)

[standard-Direct Ack][16.6A.3F-->ISY/PLM Group=0] Max Hops=3, Hops Left=2

 

What this indicates to me is that in all cases, whether 200 ft away, or 100 ft away, or 1 ft away, the signal is achieved directly with 1 hop (or is it 0 hops?). No problem as far as signal strength. With such seemingly good communication, why do I get any errors at all from such devices? With additional hops available, and with the supposed "Retries" that Insteon claims as part of the "Insteon Engine" it would seem that I shouldn't have any errors.

 

Let me note that I have receivded errors from devices at times when there is no other communication activity anywhere near the time of the error, and early in the morning when no powerline activity at all should be present.

 

I have my event viewer set to Device Communication Events, which I think gives maximal communication info. It would seem to me that the number of hops required should be an indicator of network quality. I agree with LeeG that some device that could record or measure signal quality, noise, etc. would be helpful.

 

What I want to know is:

 

1) When an error occurs, e.g. -2, will the event viewer show the supposed Retries that Insteon claims is part of the Insteon Engine.

2) As far as the Insteon Retries, how much time is allowed to elapse between retries?

2) Does the ISY as part of its software try to do any communication resolution of its own?

 

Thanks anyone -

Link to comment

porscheguy

 

1) When an error occurs, e.g. -2, will the event viewer show the supposed Retries that Insteon claims is part of the Insteon Engine.

 

Insteon command retry is automatic, done by the firmware in the Controller. In this case the PLM would be the Controller. It will not show up as additional commands issued by the ISY.

 

2) As far as the Insteon Retries, how much time is allowed to elapse between retries?

 

The retry would be in milliseconds. Has to do with time slots on the powerline. Too detail to describe here. All the gory details are in the insteondetails.pdf document on the insteon.net web site.

 

2) Does the ISY as part of its software try to do any communication resolution of its own?

 

Not sure what you are asking here. As noted before Insteon command retry is automatic, done by the Controller device. If configuration commands such as updating the link database fail the ISY does queue these for later. The device node will have an green icon to the left if pending updates are queued. Could also be queued because an RF device is asleep so the green icon does not always indicate a powerline issue.

 

Powerline quality is never an absolute. If it is good at 10:00 AM it may not be good an hour later. Depends on what devices (and at times combinations of devices) are actually running/powered at the time. One individual had a new AC installed. The fan generated enough interference to create problems. I think this is an exception but it shows how a problem can be cyclic in nature.

 

Lee

Link to comment

Hello sanders2222,

 

You can ignore UDSockets error; they basically mean that ISY is busy doing something holding up the HTTP task. When you do any type writing to devices (restore, link, writing pending updates) that might take a long time, ONE or more of HTTP tasks are held up till the process is finished. The warnings in the log can be ignored.

 

porscheguy,

 

If you cannot communicate with a devices, there are multiple cases one of which is signal not reaching destination. It's very difficult to figure out why you are having the errors without having the logs for each error and see exactly what transpired.

 

To answer your questions:

1. ISY retries 3 times sending the same command

2. The number of native INSTEON retries is NOT captured in the Event Viewer

3. Above and beyond retrying, ISY does not do anything to resolve communication errors

 

With kind regards,

Michel

Link to comment

Michel

 

I did not think the ISY retried in addition to the Controller retry.

 

I guess I am reporting a bug under 3.1.2. I created a situation where a device does not respond to a Program initiated On command. I did get an error popup but see no retry in the event log for the On command issued by a Program. I see the same no retry for an On button click under the Admin Console.

 

Fri 04/08/2011 02:43:58 PM : [ Time] 14:44:00 0(0)

 

Fri 04/08/2011 02:43:58 PM : [iNST-ACK ] 02 62 0D.4B.82 0F 11 FF 06 LTONRR (FF)

 

At this point I did a Query which is retried perhaps by the Admin Console since that is the app that issued the Query.

 

Fri 04/08/2011 02:48:28 PM : [iNST-ACK ] 02 62 0D.4B.82 0F 19 00 06 LTSREQ (LIGHT)

 

Fri 04/08/2011 02:48:37 PM : [iNST-ACK ] 02 62 0D.4B.82 0F 19 00 06 LTSREQ (LIGHT)

 

Fri 04/08/2011 02:48:46 PM : [iNST-ACK ] 02 62 0D.4B.82 0F 19 00 06 LTSREQ (LIGHT)

 

Lee

Link to comment

Hello Lee,

 

Thanks so very much for the clarification. I was totally wrong: program execution does not go through message handlers which impose the 3 retry rule. On the other hand, anything that has to do with database write/read is retried 3 times.

 

I am so very sorry for the confusion.

 

With kind regards,

Michel

Link to comment

Archived

This topic is now archived and is closed to further replies.


  • Recently Browsing

    • No registered users viewing this page.
  • Who's Online (See full list)

    • There are no registered users currently online
  • Forum Statistics

    • Total Topics
      36.9k
    • Total Posts
      370.3k
×
×
  • Create New...