Jump to content

ISY 994i is self-rebooting


oskrypuch

Recommended Posts

This is a mature install, with perhaps a hundred nodes of various kinds.

In the last couple days I've noticed that I am losing system function, which is usually followed by a reboot of the ISY, two to three times within an hour, and then it works again for a while. I have a routine that emails when the unit reboots, so have a clear record on that. I have tried a soft reboot, and a power cycle, has not made a difference.

I am on my third modem (in 12 years), and second ISY unit. It is an ISY 994i running firmware 4.7.3.

Any idea for possible causes, or what to do next?

Thanks.

* Orest

Link to comment

Take a look at your ISY error log to see if there's anything that may be an indicator.

Another possibility is the ISY power supply and or its connection.

If the PLM was failing or failed your ISY would start up in the Safe Mode which doesn't seem to be the case

 

Link to comment

OK, got a new 5v PS, and bypassed the UPS that I normally plug it in to (just in case). The unit is still rebooting, and now perhaps four times an hour. I actually witnessed one of these, as I was standing right by it. All the blue lights went on (there was no outage of the power light), and then it flickered as is typical for reboot.

I checked the log files (from the menu File|log, download, enable macros), and the log file is empty. I expect it may be emptied with each reboot, or perhaps I just can't access it any more.

I am now unable to access the unit from the admin console reliably, sometimes it comes up, but I can't download the programs any more, and sometimes it comes up with a java socket error, and only the File menu option is available.

I have to assume the unit is toast, unless someone has some suggestions.

 

Yes, I have backups.

So, if I replace the unit, I may get the Z-wave combo unit ( ISY994IZW+IRPRO ), I assume there will be no issues with restoring my current configuration/programs to it? I am running the most up to date v4 firmware, as noted upthread.

I also have the weather module, am I able to transfer it to the new unit, or does that die with my current one?

Should I update the modem as well?

* Orest

 

 

Link to comment

Hi Orest,

Thank you. Based on your logs, I think you have runaway programs (perhaps seasonal). If you can reboot with the PLM and disable all your programs, then I think you can resolve this by figuring out which programs are doing this:

thermostats / MAIN - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:58:06 PM    Program    Log    
thermostats / MAIN - tstat - 2.2 -    Cool Setpoint    22°    Sat 2019/04/20 07:58:07 PM    Program    Log    
thermostats / MAIN - tstat - 2.2 -    Heat Setpoint    21°    Sat 2019/04/20 07:58:08 PM    Program    Log    
thermostats / Master - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:58:08 PM    Program    Log    
thermostats / Master - tstat - 2.2 -    Cool Setpoint    22°    Sat 2019/04/20 07:58:09 PM    Program    Log    
thermostats / Master - tstat - 2.2 -    Heat Setpoint    21°    Sat 2019/04/20 07:58:10 PM    Program    Log    
thermostats / UpStrs - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:58:11 PM    Program    Log    
thermostats / UpStrs - tstat - 2.2 -    Cool Setpoint    22°    Sat 2019/04/20 07:58:12 PM    Program    Log    
thermostats / UpStrs - tstat - 2.2 -    Heat Setpoint    20°    Sat 2019/04/20 07:58:12 PM    Program    Log    
thermostats / SLRM - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:58:13 PM    Program    Log    
thermostats / SLRM - tstat - 2.2 -    Cool Setpoint    24°    Sat 2019/04/20 07:58:14 PM    Program    Log    
thermostats / SLRM - tstat - 2.2 -    Heat Setpoint    21°    Sat 2019

then again ...

thermostats / MAIN - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:59:20 PM    Program    Log    
thermostats / MAIN - tstat - 2.2 -    Cool Setpoint    22°    Sat 2019/04/20 07:59:21 PM    Program    Log    
thermostats / MAIN - tstat - 2.2 -    Heat Setpoint    21°    Sat 2019/04/20 07:59:21 PM    Program    Log    
thermostats / Master - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:59:22 PM    Program    Log    
thermostats / Master - tstat - 2.2 -    Cool Setpoint    22°    Sat 2019/04/20 07:59:23 PM    Program    Log    
thermostats / Master - tstat - 2.2 -    Heat Setpoint    21°    Sat 2019/04/20 07:59:24 PM    Program    Log    
thermostats / UpStrs - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:59:25 PM    Program    Log    
thermostats / UpStrs - tstat - 2.2 -    Cool Setpoint    22°    Sat 2019/04/20 07:59:25 PM    Program    Log    
thermostats / UpStrs - tstat - 2.2 -    Heat Setpoint    20°    Sat 2019/04/20 07:59:26 PM    Program    Log    
thermostats / SLRM - tstat - 2.2 -    Thermostat Mode    Auto    Sat 2019/04/20 07:59:27 PM    Program    Log    
thermostats / SLRM - tstat - 2.2 -    Cool Setpoint    24°    Sat 2019/04/20 07:59:28 PM    Program    Log    
thermostats / SLRM - tstat - 2.2 -    Heat Setpoint    21°    Sat 2019/04/20 07:59:29 PM    Program    Log    
 

As you can see, a program keeps setting the set points to the same things; repeatedly and in succession. Something is wrong.

With kind regards,
Michel

Link to comment

Good thought.

But, the thermostats, and a variety of other things are normalized when the system reboots. The ISY has been rebooting sometimes every few minutes, which is in turn triggering these frequent sequences. It started doing reboots maybe once a day a week or two ago, then several a day, and now it is every five minutes or so.

Checking the time stamps, it appears the reboots are causing these, rather than the reverse.

* Orest

 

Link to comment

Well, did replace the hardware in the hope that would fix it, new ISY994i + Zwave, with a new PS of course.

Worked great for a couple of days, but then back to the same pattern!!   :-(

With a new ISY and PS, and the old programs restored, the reboots have started again. It reboots every 10 to 35 mins, never an exact time multiple, nor a precise number of minutes past the hour. I reviewed the log files again, and my emailed error and other notices that I programmed in, and nothing strikes a bell.

I deleted all of the sections having to do with the thermostats (Michel suggestion), and that didn't stop it.

I do have sequential backups of the v4 program set, and I think what I might do is restore (to the old unit) one backup at a time, and see when if the problem stops. Then comparing the code, it may give me a clue.

It appears it has got to be something in the code, everything else is eliminated. The PLM is the same, but that seems unlikely to be the culprit?? @Michel Kohanim any speculation as to what kind of errant code (or otherwise) could lead to a memory leak or something, bad enough that it crashes the ISY?

* Orest

Link to comment

Since you're having the same exact issue with 2 different devices I would start by disabling all programs to see if reboots still happen. If it stops, re-enable to simple programs first (basic on offs) first and see what happens. If that works then start enabling programs until the isy reboots. 

Link to comment
8 hours ago, lilyoyo1 said:

Since you're having the same exact issue with 2 different devices I would start by disabling all programs to see if reboots still happen. If it stops, re-enable to simple programs first (basic on offs) first and see what happens. If that works then start enabling programs until the isy reboots. 

That is kind of my plan, but in reverse, going backwards with older and older restores.

Curiously, since my last restore to the factory "empty" state, and then a restore back to my current v5 program state, I have had no further reboots, just like when I first set up the new unit. It may be that the problem takes a day or two to start up. Perhaps there is something errant in my code, that takes a while to run out of swap space or something.

* Orest

 

Link to comment
23 minutes ago, oskrypuch said:

That is kind of my plan, but in reverse, going backwards with older and older restores.

Curiously, since my last restore to the factory "empty" state, and then a restore back to my current v5 program state, I have had no further reboots, just like when I first set up the new unit. It may be that the problem takes a day or two to start up. Perhaps there is something errant in my code, that takes a while to run out of swap space or something.

* Orest

 

Do you have any remote CPU devices that input into your ISY Rest inputs or using any Node servers?

I notice that NodeLink can reboot ISY remotely,  so it is possible  for another external device to reboot ISY intentionally, or even by comm error.

Link to comment
2 hours ago, Michel Kohanim said:

@oskrypuch,

I definitely think it has to do with your thermostat programs. 

With kind regards,
Michel

That is one of the first things I tried now. I completely deleted all of the programs having anything to do with thermostat control, made no difference, it did not break the cycle of reboots every 10 to 20 min.

I will try deleting other parts of the program set, to see if I can hit on an area where the culprit may be.

I am going to try going back in time through the restores, to see if some minor change is causing this, there have been no major changes to the program set in a while.

At least the reboots are fairly frequent, making it a little easier to dissect.

* Orest

Link to comment
3 hours ago, larryllix said:

Do you have any remote CPU devices that input into your ISY Rest inputs or using any Node servers?

I notice that NodeLink can reboot ISY remotely,  so it is possible  for another external device to reboot ISY intentionally, or even by comm error.

I do not have any other CPU devices, just Insteon devices and one X-10 outside motion, and one X-10 ding-donger.

Only other thing is of course the PLM. Seems unlikely that would be the issue, but who knows.

* Orest

Link to comment

Since the fresh restore, it seems to work for a day or so, and then start the reboot cycle, well, here is a new wrinkle ...

After working fine for a day, it stopped responding/sending to devices, and I can now no longer access the ISY at all from the admin console, even with the PLM disconnected to get to safe mode. It appears to be rebooting sequentially. The error light is not on, and the memory light and Rx/Tx lights flicker similar to what I remember for a normal reboot, then they stop for a bit, and cycle continues.

I am thinking that I'm going to have to factory reset it, to get control of it again. (looked up how to do that)

* Orest

 

Link to comment
1 hour ago, oskrypuch said:

Since the fresh restore, it seems to work for a day or so, and then start the reboot cycle, well, here is a new wrinkle ...

After working fine for a day, it stopped responding/sending to devices, and I can now no longer access the ISY at all from the admin console, even with the PLM disconnected to get to safe mode. It appears to be rebooting sequentially. The error light is not on, and the memory light and Rx/Tx lights flicker similar to what I remember for a normal reboot, then they stop for a bit, and cycle continues.

I am thinking that I'm going to have to factory reset it, to get control of it again. (looked up how to do that)

Also attached is my program set, if anyone wants to have a look. This is the dump from my original v4 install, the modules had been transferred already, so there are references to non-specific modules and null values.

* Orest

ISY_v4.txt 261.06 kB · 1 download

Looking at your programs I can see what appears to be programming before V5 and even before variables where introduced. With so many programs running other empty program I would try something else here.

Create a good, up to date, backup.
Delete groups of programs and retest until they are all gone and ISY is clean.
   or
factory reset your ISY. Reset your time zone and other basic parameters for ISY. 
See if the problem clears up. Write a few simple programs to test.
Restore your programs.

I would carefully  look for oscillating programs in your admin console. Are you getting the "busy" box a lot? Are program icons oscillating?
Variable usage and V5 feature usage could simplify and reduce the number of programs greatly and may find any cyclic loops that could be clobbering your ISY.

Link to comment

Oh yes, there is a lot of legacy code, that long predates the availability of variables. Used the "program" variable technique back then, like everyone else. As soon as variables appeared, I immediately switched to their use. I have long been wanting to clean all that old stuff out, but it is a matter of setting aside the time, and not wanting to break something that is working. Of course it isn't now!

I have factory reset the (new) unit that was now continually rebooting, couldn't get to it with the Admin console, and then restored my config that I has already migrated to v5, that ran just fine for a day or so.

I then set all the programs as disabled, and ensured there was nothing running. There isn't now. I will now let that sit for a few days, to ensure all is stable.

I often will bring up the summary view to test things, in particular just lately, and there are not oscillating programs. I sat for a while pressing the refresh button to see if I could catch what program(s) ran, before a hang, but was unsuccessful in catching it. Before each reboot, there is typically a period of dead quiet.

I also brainstormed as to what program might be starting up at the intervals I noticed the reboots.

If the unit stays running, which I expect it will now, I'll starting enabling folder by folder, and perhaps at the same time fixing up the code, getting the more important bits up first.

* Orest

 

Link to comment
8 hours ago, oskrypuch said:

That is kind of my plan, but in reverse, going backwards with older and older restores.

Curiously, since my last restore to the factory "empty" state, and then a restore back to my current v5 program state, I have had no further reboots, just like when I first set up the new unit. It may be that the problem takes a day or two to start up. Perhaps there is something errant in my code, that takes a while to run out of swap space or something.

* Orest

 

Whatever firmware/UI version you're on, (though that isn't the problem) put that on the ISY without using anything else with it. Let it sit for a day or 2 to see if it reboots on it's own (probably won't). That would confirm it's not the system but something you're running with it. 

With that said, its much easier to disable everything and re-enable 1 at a time vs trying to guess exactly what the errant program is. This is because it may be a couple of programs causing it. By enabling them 1 at a time you can compare the problem program with others that may be the trigger

Link to comment

@oskrypuch,

1. This is not going to work (change it to Or):

        Time is 12:00:00AM
    And Time is  6:00:00AM
    And Time is 12:00:00PM
    And Time is  6:00:00PM

2. As @larryllix mentioned, all those notifications for specific temps should be changed to variables

3. Way too many programs calling other programs (very difficult to debug)

 

With kind regards,
Michel

Link to comment

Archived

This topic is now archived and is closed to further replies.


  • Recently Browsing

    • No registered users viewing this page.
  • Forum Statistics

    • Total Topics
      36.9k
    • Total Posts
      370.2k
×
×
  • Create New...