EDIT: NO JOY, problem recurred within 24 hours. See here, and down thread ... https://forum.universal-devices.com/topic/46764-multiple-devices-become-unresponsive-not-solved/ -- I have recovered from this, I think, but can't be certain there still aren't some issues. I am wondering if there is something more I should do, like certain diagnostic steps, or even potentially replace the PLM. In some 15 yrs of ops, have never seen quite this kind of failure. The System System is large, 100+ devices, mostly insteon, a bit of zwave (700 series dongle), polisy running "LTS" 5.4.4. Zwave was added during the time of insteon troubles. Stuck at 5.4.4 as the polisy OS won't support further upgrades, and the bit of Zwave now complicates a move (reflash/restore etc.), so been sitting on it for a bit. The Problem One morning I noticed that ALL of my wireless insteon devices stopped working (ISY would not see any transmits), that includes the 2842 motions, the newer 2844 motions, and the mini two & four key 2432 remotes. But, in addition a number of the 2334 eight/six button units would also not be seen by the ISY. Curiously, scenes that toggled the state of the buttons on these, still functioned. (so ISY saw no TX, but units accepted RX) Zwave devices not affected. In the UI, no device showed as not responding, or any indication of offline status, except that the wireless devices' pages showed blank fields, that is no ON or OFF status, etc., and not able to QUERY or change settings. The was no "nine blink" indication on the motion devices (indicating network connection loss), nor the "three blink" showing low battery. What I Did Scratched my head for while, checked the diagnostic screens, nothing obvious to me. Then on a lark, starting with the simplest thing, I put one of the wireless devices in link mode, and did a simple RESTORE DEVICE command. I saw the usual activity dialogs, and then amazingly, the device started working normally! I then tried this with one of the 2334 eight button panels, that restored its function as well. I went through each of the devices (that I know faulted), and did the same. They all started working. The Question How did this happen? Do I need to do anything more? Were all of the wireless devices, or matching tables in the PLM, somehow selectively memory corrupted? But, what about the eight button panels too? Is the PLM starting to fail. Certainly I've seen random devices fall off for control when that happened in the past, but never such a logically grouped set of devices, nor could I restore function as easily. With the device restores the system seems to be back up and running. Pretty sure the PLM is newer version, with the updated capacitors. Thoughts?? Orest

@oskrypuch , not what I expected, but your PLM is definitely being reset due to a communication error with the ISY Observations: You have a mind numbing amount of communication to two devices that I believe are thermostats (11.B2.60 and 0E.67.7F). you appear to be requesting temperature data from each of these every few seconds. That's 12 send/receive communications in 5 seconds and it repeats roughly every 30 seconds. In 2 hours and 4 minutes you had over 5K send/receive comms or roughly 1 every 1.4 seconds. Your communications to the devices are excellent. 96% of your communication with the two devices was received with 3 hops remaining. It doesn't get much better than that. You did have 2 communication errors between the ISY and the PLM. The PLM will normally echo a serial command back to the ISY. Sometimes the echo is incorrect. You had two of these. We've seen rare cases where these can cause the dreaded "All-on". The errors in your log appeared innocuous at first. When I looked closer, a communication to the PLM was interpreted as a "Reset Modem" command. You also had two serial communication timeout errors. This is where the ISY sends a command and the PLM does not echo it back. My scanning routine declares a timeout if nothing is received back within 10 seconds, or if xx commands have been executed. The following snippet from your event viewer log shows 4 normal ISY to PLM exchanges. The ACK is the PLM acknowledging the transmission. The 5th exchange is corrupted. The PLM appears to have byte shifted the ISY command. It believes the ISY is requesting a Modem Reset 10269 Sat 06/20/2026 12:40:04 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F F0 E8 10271 Sat 06/20/2026 12:40:04 PM : [INST-ACK ] 02 62 0E 67 7F 0F F0 E8 06 (E8) 10277 Sat 06/20/2026 12:40:04 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F 6A 00 10279 Sat 06/20/2026 12:40:04 PM : [INST-ACK ] 02 62 0E 67 7F 0F 6A 00 06 (00) 10285 Sat 06/20/2026 12:40:04 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F 6B 02 10287 Sat 06/20/2026 12:40:04 PM : [INST-ACK ] 02 62 0E 67 7F 0F 6B 02 06 (02) 10293 Sat 06/20/2026 12:40:05 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F F0 49 10295 Sat 06/20/2026 12:40:05 PM : [INST-ACK ] 02 62 0E 67 7F 0F F0 49 06 (49) 10301 Sat 06/20/2026 12:40:05 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F 6A 20 10303 Sat 06/20/2026 12:40:08 PM : [RST-ACK ] 02 67 06 My 1st suggestion would be to reduce the number of queries to your thermostat by a factor of at least 100. Temperatures simply don't change that fast. Currently thinking through other workarounds.

Your link table is a bit odd, but I don't see anything that would prevent the device from communicating with the PLM Observations: Lines 1 and 6 (red) are responder link records for group 0 (normally the PLM) device @71.B3.6E. This is the link that allows your PLM to control the device. I've highlighted the links because they are essentially duplicates - this shouldn't happen. Line 5 is a responder link record for group 0 device @58.23.C7. This is another controller in your system that can control your device. Again it is not normal. If this device were to change the state of the device the PLM/ISY would probably Not know about it. Line 4 is the group 1 controller link to your PLM @71.B3.6E. Your device uses this link to communicated local changes back to the PLM/ISY. Line 3 is a group 1 controller link to device 46.7A.2B. When you turn on your local device it will communicate with this device. Line 2 is a group 4 responder link to device 46.7A.2B. Bottom line, I don't see anything that would prevent this device from communicating with the PLM (unless the duplicate records are screwing things up). Please determine what the controller device @58.23.C7 is. This may be a "leftover" from your old PLM. Link tables don't get written to unless you ASK the ISY to write them. Link tables ARE modified when changing device backlighting and adjusting scene settings. If you have programs that modify either, you may with to disable them to troubleshoot. If you find your link tables again being modified, please post back.

Why did multiple devices become unresponsive?

Thursday at 06:12 PM3 days

Author

So, perhaps the old failing PLM, continually farkled up the device tables to some extent, taking a bit of time to corrupt most of the devices -- enough that device transmissions failed. A replacement of the PLM, just carried that bad data (sourced from the old PLM table) into the new PLM, much as @paulbates is suggesting.

Finally, the device RESTORE (vice just the PLM replacement procedure), from the Polisy, WITH a new PLM in place finally fixes it as there is no new/further corruption process.

Maybe wishful thinking, but so far the observations would be consistent. Fingers crossed, will report back.

Orest

Edited Thursday at 06:16 PM3 days by oskrypuch

1

Thursday at 10:45 PM3 days

Author

And, it is the same pattern with all the devices I've looked at. There is one or two extra lines in the device table, sometimes with no high water mark null line.

Otherwise, all the lines match.

A device restore updates the device table to match the ISY table, and then the device starts fully working. (that is its transmissions are then seen by Polisy)

I've fixed a dozen or so, and will monitor to see if they "unfix", like earlier with the old PLM still in place.

Orest

Edited Thursday at 10:47 PM3 days by oskrypuch

1

Friday at 12:57 PM2 days

18 hours ago, oskrypuch said:
Poking around the link tables, here is a capture of the device and ISY tables for a given device, when it is not working correctly ...
And here it is, AFTER a device RESTORE (as per above) of that device (and return to functioning) ...

They match now, and it is the device that was changed, the last three entries were deleted, and a high water mark was added.
I don't know exactly what that means, but probably someone here can decode that.
Now, to see if the disparity in the tables (and loss of function) recurs.
We might be getting somewhere.
Orest

Your link table is a bit odd, but I don't see anything that would prevent the device from communicating with the PLM

Observations:

Lines 1 and 6 (red) are responder link records for group 0 (normally the PLM) device @71.B3.6E. This is the link that allows your PLM to control the device. I've highlighted the links because they are essentially duplicates - this shouldn't happen.
Line 5 is a responder link record for group 0 device @58.23.C7. This is another controller in your system that can control your device. Again it is not normal. If this device were to change the state of the device the PLM/ISY would probably Not know about it.
Line 4 is the group 1 controller link to your PLM @71.B3.6E. Your device uses this link to communicated local changes back to the PLM/ISY.
Line 3 is a group 1 controller link to device 46.7A.2B. When you turn on your local device it will communicate with this device.
Line 2 is a group 4 responder link to device 46.7A.2B.

Bottom line, I don't see anything that would prevent this device from communicating with the PLM (unless the duplicate records are screwing things up).

Please determine what the controller device @58.23.C7 is. This may be a "leftover" from your old PLM.

Link tables don't get written to unless you ASK the ISY to write them. Link tables ARE modified when changing device backlighting and adjusting scene settings. If you have programs that modify either, you may with to disable them to troubleshoot.

If you find your link tables again being modified, please post back.

1

Friday at 10:18 PM2 days

Author

@IndyMike

Thanks so much for explaining the tables. YES, 58.23.C7 is the old PLM. Not sure why it was not removed with the PLM replacement procedure, but perhaps that doesn't happen.

But, regardless, having both control devices listed in there would not be part of the issue, as the problem existed before the new PLM replaced the old.

FWIW, doing a device restore does remove that duplication in the table.

But, having the cleaned up device link table doesn't make a difference either. Even though I did device restores on several devices as a test, cleaned up the tables, the problem recurred over night in these devices, and the tables did not corrupt -- as you suggest, that shouldn't happen, and it didn't.

So, boiling it down even further...

1) with this fault condition, any device that needs to transmit state to the PLM, fails to do so.

2) performing a device restore on a given device alone, immediately allows it to function again (temporarily), and information (a keypress, motion detected, etc.) is once again seen by the Polisy from the device

3) function is restored for less than 24 hours, and then it returns to fault state "1" above -- this point in particular is a real head scratcher to me -- this is probably the key to the puzzle, understanding why this happens

4) outward (Polisy -> device, ON/OFF etc.) controls of the devices work normally and are not affected

5) grouped scenes of devices, with one device a control device, and actions independent of the PLM/Polisy work normally

6) this is not related to the state of the device table, that was a canard, even though the tables were "fixed" as of today in those devices I was testing, the fault pattern recurred.

I do have one set of routines that changes the illumination of the LED keys on the 8 button panels, but only a few of them.

The fault state affects EVERY device, wired, wireless, across the entire system, with the common point being that they transmit information/state to the Polisy.

I am now starting to wonder if there is a problem with the Polisy, perhaps some bad memory developing or similar, or a corrupted firmware, or both. And, somehow, the "device restore" cleans up something in a table in the Polisy as well.

I'm really not sure where to go. I would hate to replace the Polisy (a huge undertaking), and then find out that was not cause, but there seems little else to attack at the moment.

A bit tongue in cheek, but if I did a device restore on every affected device, once every six hours or so, that might step around the issue! Does not seem practical, and not even sure how you would do that, clearly not possible to do for the wireless devices.

Orest

Edited Friday at 10:27 PM2 days by oskrypuch

Friday at 10:30 PM2 days

If it really does this over the 24 hours following a device restore and you suspect the Polisy, would you be willing to try device restores and then powering off the Polisy for 24 hours, as a test?

Friday at 10:37 PM2 days

Author

4 minutes ago, Guy Lavoie said:
If it really does this over the 24 hours following a device restore and you suspect the Polisy, would you be willing to try device restores and then powering off the Polisy for 24 hours, as a test?

As I say, I'm game for anything, it is very frustrating and puzzling. We can do a dumb home for a day, it is half-dumb now!

But, what is your thinking on this? To confirm/exclude that it is something the Polisy is doing or internally faulting, or that the Polisy is doing/faulting after running for a while?

Orest

Edited Friday at 10:39 PM2 days by oskrypuch

Friday at 10:49 PM2 days

4 minutes ago, oskrypuch said:
As I say, I'm game for anything, it is very frustrating and puzzling. We can do a dumb home for a day, it is half-dumb now!
But, what is your thinking on this? To confirm/exclude that it is something the Polisy is doing or internally faulting, or that the Polisy is doing/faulting after running for a while?
Orest

It's so unusual that I don't really have a going theory. But having a whole bunch of link device link tables getting modified, seemingly all in a short time period, can't be done by many different things. Another alternative could be to just disconnect the PLM from the Polisy, if it's also involved in doing non-Insteon things. In fact if you try this and it tries unsuccessfully to update devices overnight, maybe you'd get telltale error messages in a log file.

Weird problems like this sometimes need to be tested by proceeding by elimination.

1

Friday at 11:10 PM2 days

Author

Just thinking, the other valuable data-point would be to know precisely what the RESTORE DEVICE command does, as it does reset the problem, even if only temporarily.

But, for that, I think we would need to have the developers comment.

@Michel Kohanim

Orest

Edited Friday at 11:13 PM2 days by oskrypuch

Friday at 11:15 PM2 days

Just now, oskrypuch said:
Just thinking, the other valuable data-point would be to know precisely what the RESTORE DEVICE command does, as it does reset the problem, even if only temporarily.
But, I think we would need to have the developers comment on that.
Orest

What it does is write out it's saved configuration of that device's link table to the device. It's handy as a backup, but is also used with the device replacement function when replacing a bad device with a new one.

Saturday at 03:58 AM1 day

Author

Although I thought earlier, that the change in the device link table was pertinent, presently, with no change to the link table the fault occurs. There were some stray and duplicated entries previously, but they were not significant to this issue.

And now (with the device link table "fixed"), the device restore function doesn't actually end up changing anything in the device's link table, yet results in an immediate restoration of the function of the device.

I understand in a global sense what the device restore function does, but perhaps there is some internal data table or cache fix up that happens as well. Sure I am grasping at straws, to try to explain (and mitigate) what I am seeing.

I just now have "device restored" one 8 button panel, it now works, for example one of the buttons opens/closes the curtain, button pushes from this device are now seen by Polisy and the correct program is triggered with effect. With that device restore the device link table did not change, so it must have done or triggered something else to happen. This fix up effect is immediate, and so far, 100% consistent -- once the fault occurs.

I will be monitoring it for function as well as any change in the device link table. If it continues acting as devices have in the last few days, by tomorrow it will be no longer working.

I'll also look at giving the Polisy a 24hr power off "rest", to see what happens.

Orest

Edited Saturday at 04:28 AM1 day by oskrypuch

Saturday at 11:05 AM1 day

@oskrypuch , I'm going to try the be brief here (not easy for me).

I do not think the device tables are your problem. They are a symptom.

When you do a device restore, you Also write corresponding records to the PLM. I believe this is what is resetting your devices. Yes, that would mean that your PLM is again loosing links and the restore process is resetting them.

Possibilities:

PLM memory exceeded. Unlikely given the fact that you seem to have a medium sized system and Many devices go offline at the same time.
SDCard failing (not sure if the Polisy still uses SDcards). Check the card and/or the SSD/hdd for errors. EDIT: The PolISY appears to use a SSD for it's filesystem. Unlikely that this is the issue.
Communication errors between the ISY and PLM. Check cable

Recommendations:

Discontinue any programs that adjust backlighting or scene levels. These will cause writes to ~~both the PLM and~~ devices
Perform a PLM restore
Perform a "Show PLM Links Table". This can be difficult because network traffic will interrupt the process. Once you are satisfied that you have a valid Link Table count (perform this several times to get a consistent #) save the table.
If/When your devices stop responding, perform another "Show PLM Links Table". My guess is that it will be very different.

My guess is that, for what ever reason, an end of table record is being erroneously written to the PLM. When a device tries to communicate to the PLM, the PLM hits the erroneous End Of Record (does not find the device link address) and ignores the device.

Edit: ~~After re-reading your post, I would guess that your "backlight program" is causing writes to the devices and the PLM~~. For some reason the PLM is being written with an end of record during this operation. This would account for the < 24 hour repetition of the problem.

Edit again: yet another swing and a miss. I tested the backlight and adjust scene functions. These DO NOT modify the PLM. They should not be causing what you are seeing.

If may be possible for an external device to modify things using the REST interface (polyglot, etc). I'm not sure how to capture this other than looking at the event viewer on level 3. The following snippet shows the ISY writing to the PLM during a device restore. We are looking for the "02 6F" command writing to the PLM.

If you could open the event viewer on level 3 and capture events over a 24 hour period we should be able to inspect for writes.

Sat 06/20/2026 06:36:13 AM : [MNG-LNK-RSP ] 02 6F 40 E2 00 41 29 3D 01 20 45 15

Sat 06/20/2026 06:36:13 AM : [PLM ] Group 0 : Writing Controller Link matching [41 29 3D 1 ] Link 0 : 0FF8 [A20053BC3AFF1F01]

So much for short posts....

Edited Saturday at 12:49 PM1 day by IndyMike

1

Saturday at 02:24 PM1 day

@IndyMike Has it been established that the PLM link table is getting modified? All along my perception is that it's the device's link tables that were. It's true that a restore PLM writes to both devices and the PLM, if indeed whatever is causing the problem is updating links. Orest hasn't really indicated if scheduled commands going out from the Polisy to Insteon devices also stopped happening. The original post was mainly about wireless sensors and keypads seemingly not sending anything to the Polisy. @oskrypuch , could you tell us if sending out scheduled commands also fails?

Just so we're all seeing the same thing.

Saturday at 02:40 PM1 day

7 minutes ago, Guy Lavoie said:
@IndyMike Has it been established that the PLM link table is getting modified? All along my perception is that it's the device's link tables that were. It's true that a restore PLM writes to both devices and the PLM, if indeed whatever is causing the problem is updating links. Orest hasn't really indicated if scheduled commands going out from the Polisy to Insteon devices also stopped happening. The original post was mainly about wireless sensors and keypads seemingly not sending anything to the Polisy. @oskrypuch , could you tell us if sending out scheduled commands also fails?
Just so we're all seeing the same thing.

@Guy Lavoie , not as of yet.

I do not see anything in the device link tables that would stop it from responding to the PLM. That leaves the PLM table itself, and @oskrypuch has had isssues with previous PLMs loosing records. That's why I was asking @oskrypuch to perform the "Show PLM Links Table" and save. I am guessing that within 24 hours the table will have changed. This will confirm that it's a PLM table problem.

To determine why the PLM table is changing, I asked for a "event viewer capture" over a 24 hour period. This would hopefully show the actual command that is modifying the PLM table.

To be clear, I have never seen this before on my ISY994. The PolISY and EISY have many plugins and rest devices that I do not have. It's possible that one of them is causing an issue. I am also not aware of a REST command that can modify the PLM table, but I am open to learning.

Saturday at 03:14 PM1 day

Yes, the "Show PLM Links Table" would be my next move. It can also be saved to a file, for analysis and comparison later.

Saturday at 04:48 PM1 day

Author

@IndyMike Do not apologize for a loooong post, I love it!

The PLM table shows 70 entries (table is now saved out), presume that means it sees 70 insteon devices. That is about right, there are some more ZWave devices as well which obviously are not reflected there.

Cable - this fault has survived through two PLMs, and also through two different cables, one the DB9 serial cable, and now the USB cable, so that eliminates cable, connectors and plugs as an issue.

A second PLM restore (or even just a serial device restore on each device) is on the plate here for sure. I obviously did one PLM restore, when I installed the new 2413U PLM.

For now, I once again device restored one 2334 keypad (yesterday) to monitor, it is still working some 14 hrs later, will see if a full 24 hours results in the fault.

Also, today I device restored two more 2334's, four 2477S switches and one 2844 motion. Why those? ... The 2334s partly display scene status (that works fine), but some of the buttons also trigger programs so send a packet to the PLM/Polisy. The 2477S manual switch status is used as a logical trigger for some programs, so that requires status transmission to the PLM/Polisy. And, the motion of course sends packets to the PLM/Polisy for action. Those were chosen as a sampling, as anything that requires a packet from a device, to tx to the PLM and the Polisy, fails when the fault occurs. Once device restored, they resumed full normal functioning, for now.

As noted, I have now saved out the PLM table, 70 entries, but curiously no NULL end of list item, should there be one? Perhaps I don't have the full listing captured, but I did try a few times.

I am running the event viewer in mode 3, there will be a lot of data there!

The one huge advantage with this otherwise annoying issue, is that it is100% consistent, so potentially can be debugged.

100% consistent, when the fault state occurs (no idea what triggers this) all devices in the system that need to transmit a packet to the PLM, and then Polisy, for action don't, or the packet is lost/dropped in transmission. I was watching the event viewer, and pushing buttons of devices when faulted, show no device activity, which makes sense. Because all devices that use this mode (Tx), lose this ability all at once, as has been suggested, it surely has to be something up stream, the PLM or Polisy.

Also 100% consistent, once all devices are faulted, a DEVICE RESTORE on a given device will restore it to normal function (for a while, until the next "trigger"), and it will now get its transmission packet received by the Polisy, which then carries out its action.

Nothing else, that I have been able to discern is affected. The system otherwise is working. Programs are running, executing commands, scenes external to the Polisy continue to work, as is direct control of lights from manual operation of switches.

The one obvious exception to programs running properly during the fault, are those that rely on a report of state of devices (switch positions, motions, etc.), the logic may fail as the state is not correctly updated at the PLM/Polisy.

And using the UD mobile app to trigger programs works fine, as a surrogate for example, of a curtain open/close button push of a 2334 keypad. But, you would expect that.

Will report again tomorrow, and post the logs.

Orest

Edited Saturday at 05:18 PM1 day by oskrypuch

Saturday at 05:27 PM1 day

Author

For a temporal overview of this ...

1) State: normal, all working

-> some kind of trigger occurs (seems to happen every day or so, now)

2) State: all devices can no longer successfully transmit packets to the PLM/Polisy, no other functions affected

-> individual RESTORE DEVICE command on a given device

3) State: that one given device is restored to normal function, all other devices still remain faulted

-> some kind of trigger occurs, again

4) State: ALL devices once again can no longer transmit to the PLM/Polisy.

Orest

Saturday at 05:32 PM1 day

Yup, long posts are fine when they contain lots of information 🙂

One PLM entry per device actually sounds kind of low. You must have less Insteon scenes than many of us.

The 100% consistency is a plus, yes! Aloows testing and troubleshooting.

The next time you do a device restore, capture the device communications log, so that you can see what the messages that write to a device link table look like. This could help you in finding any unplanned writes that this problem seems to be doing.

To be clear: please tell us if scheduled commands from the Polisy to devices are still working ok, even when this problem occurs. Your mention of: "The one exception to programs running properly during the fault, those that rely on a report of state of devices (switch positions, motions, etc.), the logic may fail as the state is not updated at the PLM/Polisy." seems to indicate that outgoing Polisy commands are working, as long as they don't rely on updated device statuses.

Saturday at 06:24 PM1 day

Author

@Guy Lavoie Correct, to be absolutely clear, all scheduled commands from the Polisy to devices are working ok.

--

AND, the fault trigger just occurred, all devices can no longer transmit, including the ones that I just recently DEVICE RESTORED.

AND, looked at the PLM table, it was BLANK! Hit START a number of times, no change. I then DEVICE RESTOREd one device, some PLM links now show up in the PLM table, eleven of them. Well if one device gives eleven links, clearly the PLM table is way under populated, even when I thought it was "full".

DEVICE RESTORE a second device, now there are 29 entries in the PLM.

Something is clearly "rotten" in the PLM table (or access to the PLM table by the Polisy) when the fault occurs, and that is shutting down devices communicating!!

@IndyMike Attached is the original 70 entry PLM table, which is likely "light", and the event log that covers today, including the period when the fault occurred, which appears to be an emptying of the PLM table!

PLM Links Table.v5.4.4__Sat 2026.06.20 11.50.29 AM.xml ISY-Events-Log.v5.4.4__Sat 2026.06.20 02.05.23 PM.txt

Edited Saturday at 06:38 PM1 day by oskrypuch

Saturday at 06:30 PM1 day

Author

... and that single DEVICE RESTORE, restored the function of that one device.

So, I am thinking, that the 70 or so entries of the PLM table (copy uploaded), really just represents the PLM entries from the few devices I had just DEVICE RESTOREd. Given the number of devices I have, I might expect to properly see many hundreds of PLM entries.

But, what is causing the mass clearing of the PLM table!? Is a failing Polisy, the problem?

Orest

P.S. And, there are NO [ 02 6F ] entries in the log.

Edited Saturday at 06:39 PM1 day by oskrypuch

Saturday at 06:45 PM1 day

8 minutes ago, oskrypuch said:
.But, what is caused the mass clearing of the PLM table!? Is a failing Polisy, the problem?

That's the million dollar question. Two million dollars actually because you have two different PLMs doing this.

It's almost as if there was a phantom "Delete PLM" going on, or it was being factory reset. Neither of which can be scheduled. Adding to the mystery is that you clearly say that outgoing commands keep working. Links need to be there for that too.

Hopefully @IndyMike can tell us a bit more about what he sees in the event log as being relevent.

Saturday at 09:09 PM1 day

I am looking at the event viewer. Nothing obvious at the moment.

This may take some time....

14 hours ago14 hr

Solution

@oskrypuch , not what I expected, but your PLM is definitely being reset due to a communication error with the ISY

Observations:

You have a mind numbing amount of communication to two devices that I believe are thermostats (11.B2.60 and 0E.67.7F). you appear to be requesting temperature data from each of these every few seconds. That's 12 send/receive communications in 5 seconds and it repeats roughly every 30 seconds. In 2 hours and 4 minutes you had over 5K send/receive comms or roughly 1 every 1.4 seconds.
Your communications to the devices are excellent. 96% of your communication with the two devices was received with 3 hops remaining. It doesn't get much better than that.
You did have 2 communication errors between the ISY and the PLM. The PLM will normally echo a serial command back to the ISY. Sometimes the echo is incorrect. You had two of these. We've seen rare cases where these can cause the dreaded "All-on". The errors in your log appeared innocuous at first. When I looked closer, a communication to the PLM was interpreted as a "Reset Modem" command.
You also had two serial communication timeout errors. This is where the ISY sends a command and the PLM does not echo it back. My scanning routine declares a timeout if nothing is received back within 10 seconds, or if xx commands have been executed.

The following snippet from your event viewer log shows 4 normal ISY to PLM exchanges. The ACK is the PLM acknowledging the transmission. The 5th exchange is corrupted. The PLM appears to have byte shifted the ISY command. It believes the ISY is requesting a Modem Reset

10269 Sat 06/20/2026 12:40:04 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F F0 E8

10271 Sat 06/20/2026 12:40:04 PM : [INST-ACK ] 02 62 0E 67 7F 0F F0 E8 06 (E8)

10277 Sat 06/20/2026 12:40:04 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F 6A 00

10279 Sat 06/20/2026 12:40:04 PM : [INST-ACK ] 02 62 0E 67 7F 0F 6A 00 06 (00)

10285 Sat 06/20/2026 12:40:04 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F 6B 02

10287 Sat 06/20/2026 12:40:04 PM : [INST-ACK ] 02 62 0E 67 7F 0F 6B 02 06 (02)

10293 Sat 06/20/2026 12:40:05 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F F0 49

10295 Sat 06/20/2026 12:40:05 PM : [INST-ACK ] 02 62 0E 67 7F 0F F0 49 06 (49)

10301 Sat 06/20/2026 12:40:05 PM : [INST-TX-I1 ] 02 62 0E 67 7F 0F 6A 20

10303 Sat 06/20/2026 12:40:08 PM : [RST-ACK ] 02 67 06

My 1st suggestion would be to reduce the number of queries to your thermostat by a factor of at least 100. Temperatures simply don't change that fast.

Currently thinking through other workarounds.

10 hours ago10 hr

Very interesting @IndyMike .The description of a PLM Reset would certainly explain a lot of odd problems. I'm here to learn too 🙂

8 hours ago8 hr

Author

@IndyMike I am so grateful for your time and expertise on this! I think we are well on the way to solving this issue. Having the PLM (errantly) reset from time to time, would explain everything we are seeing.

Do you feel there is a good chance that this comm overload triggered the errant reception and reset?

The every thirty seconds thermostat polling was not intentional. There was a (recent) bit of code I added, and it looks like it got into a loop from time to time, I've squashed that, and checking the device comm log, the severe polling of the stats is stopped.

I also do monitor stats that are calling for heat and cool to "count" the time of operation, but this is not new, and this executes once a minute, only if a stat is calling. For good measure I've disabled this for now. If I reenable it, will move it out to five minute intervals, that is accurate enough.

Runaway code is obviously a bad thing in general, to say nothing of triggering an ALL-ON issue!

As a test, I have DEVICE RESTORED some usually affected devices, and saved out the PLM TABLE for comparison/reference. And now I know precisely what command to look for in the log, if I see that fault again. If that sorts it out, I'm just a PLM restore to be back to normal!

Will report back, and happy to hear any other suggestions you may have.

Orest

Edited 6 hours ago6 hr by oskrypuch

1

8 hours ago8 hr

@oskrypuch , I have to give credit to @kclenden . Some time ago he opened my eyes to the fact that ISY to PLM errors are occurring. I have been monitoring them on my system with the ISY994.

To date I have seen 1 verified instance of a "All-on" caused by a communication error. Yours was the 1st instance of a communication error causing a PLM reset. I had not previously conceived of that.

I would say that cutting down on the fast repetitive communication would definitely help reduce the issue. Bottom line, if you didn't have issues prior to the program modification, that's most likely your answer.

As a go forward, I would recommend a full PLM restore followed by individual device restores. The PLM log that you posted only contained 8 devices.

Why did multiple devices become unresponsive?

Featured Replies

Create an account or sign in to comment

Top Posters In This Topic

Recently Browsing 0

Who's Online (See full list)

Forum Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)