Jump to content

IoX restarts. UD and I need help to troubleshoot.


Go to solution Solved by Illusion,

Recommended Posts

Posted (edited)

I have an open ticket with Universal Devices. I have been troubleshooting and narrowing down this issue I am having for close on 6 months now. The UD ticket was opened once I closed in on cause and effect enough to have something useful to say. Ticked has been actively addressed back and forth for the past month and now we are stuck, with no ideas how to address nor further narrow down the cause. So I come to the forum.

If my Polisy looses internet access, when it is restored the IoX component will restart.

Data points we have so far:

  1. If I unplug the ethernet cable to the Polisy, when I plug it back in the IoX will restart.
  2. If I reboot the router, when the router comes back up, IoX will restart.
  3. If I unplug power to the ONT (fiber optical network terminal), when I power the ONT back on, IoX will restart. This most closely simulates the issue that needs to be corrected. Any time my ISP goes down for a moment, when it is restored my IoX will restart. Maintenance by Frontier. in the middle of the night by the service results in an IoX restart. 
  4. I have an ASUS router. It has a button with each device connected to disable access to the internet for that device. If I throw that switch in the router, my IoX shows that the portal is offline, the UD portal shows that my IoX is offline. But the IoX DOES NOT RESTART when I once again allow access from the Polisy to the internet. 
  5. My ASUS router also allows me to stop the WAN with a radio button. If I do that everything on my network looses access to the internet, but the LAN is maintained and devices on the network can still connect to each other. Re-enabling the WAN on the router will result in in an IoX restart. I do not know what the difference between killing the WAN and killing internet access for just the Polisy is. I do not understand the results of #5 vs #4.
  6. The network has to be down for 30 seconds or more to cause the behavior.
  7. The Polisy does not produce any beeps or lights when the IoX restarts after internet restoration.

Variations on those tests:

  1. Stopping PG3x does not stop IoX from restarting on loss and restoration of internet access. So I think we can exclude interaction from the few plugins I have.
  2. Disabling all programs does not stop IoX from restarting on loss and restoration of internet access.
  3. If I sever the connection between IoX and the UD Portal, IoX DOES NOT RESTART with loss and restoration of the internet. 

Parameters we are dealing with:

  • The Polisy is up to date
  • The Iox is version 5.8.4
  • The Polisy is wired network only. No WiFi
  • The only inbound connection to the system I can think of is the UD Portal.
  • The test are 100% repeatable with results always the same.
  • Polisy, Polyglot are not restarting/rebooting during the event.

UD cannot recreate the issue on their end. It is tough to troubleshoot remotely because it requires the loss of network connection to cause the issue, so tough for them to watch happen. 

Questions:

  1. Has anyone else experienced this?
  2. Does anyone have any suggestions on possible cause and or corrective action?
  3. Does anyone have any other tests that could give us additional data to provide more clarity on what is happening?
Edited by Illusion
Posted

@Illusion I have not experienced this problem. However, from the test scenarios and results you have described, suggest a hardware issue on your policy, specifically related to the hardwired NIC interface. When it sees an electrical status change on the port, it is somehow causing the IoX restart. Hopefully there is a way to capture IoX diagnostic logs around the NIC event to help diagnose.

-Rob

  • Like 1
Posted

@RRalstonLogs have been captured and sent to UD. If your theory were correct, what is the relevant difference between having my IoX connected to the portal vs not connected to the UD Portal? Would not the electrical status change be the same either way, so we should see the same results in those two scenarios?

Posted
29 minutes ago, Illusion said:

If I sever the connection between IoX and the UD Portal, IoX DOES NOT RESTART with loss and restoration of the internet in any of the above test modes.

This is an interesting point. What if you keep the portal connected, but turn off any system and ud communications notifications in UD Mobile? (In the notifications tab tap the 3 dots then "Mobile Devices/Groups" and turn off for "System Notifications", "UD Communications", and "Plugin Notifications" to all groups/devices). I might also stop the notification plugin. Then test the internet drop.

I would think they would target the connection to the portal since that seems to be a big data point for if it's not connected to your account then IoX isn't restarting. 

Also, do you have "Query at Restart" set on the IoX Config tab?

Lastly, along the lines of what @Guy Lavoie suggested, do you have any programs that are stuck in a loop if you lose internet? Perhaps a random program checking a status of a wifi device or a plugin that requires internet and is suddenly in a fast loop if the internet drops that's causing the IoX system to fail? (though I would expect something like that to be in a log that would be seen by now.)

Very interesting situation. 

Posted

What kind of devices do you have? Insteon? Z-wave? Do you have the ZMatter board? Any other connected devices, other than the usual PLM, ethernet, power? Original power supply on the Polisy? Tried a different one?

Other than portal and admin access, is TCP/IP used by anything else (plugin, device access, networking module)? Alexa or Google home?

Just the fact that it restarts if your ISP goes down seems to make a hardware/electrical issue rather unlikely.

The fact that you can repeat the problem at will is a big plus in helping to troubleshoot it.

Posted

While I haven't had this specific issue, I have had some very strange issues with my Polisy.  At one time IoX would randomly start dumping a log message continuously until it filled up the disk and then things would start failing. Clearing the log file and restarting the system would resolve it for a few days and then it would happen again.  I don't have that issue anymore so it must have been an update that fixed it, but  no root cause was ever found.

Given the amount of debug you've already put into this I would think the next steps would be to determine if it is something in the OS or hardware.  I know it's not easy, but backing up your stuff and then restoring on new hardware or re-imaging your Polisy and restoring seems like a reasonable next step.

Posted
20 hours ago, Illusion said:
  1.  
  2.  
  3. I have an ASUS router. It has a button with each device connected to disable access to the internet for that device. If I throw that switch in the router, my IoX shows that the portal is offline, the UD portal shows that my IoX is offline. But the IoX DOES NOT RESTART when I once again allow access from the Polisy to the internet. 
  4. My ASUS router also allows me to stop the WAN with a radio button. If I do that everything on my network looses access to the internet, but the LAN is maintained and devices on the network can still connect to each other. Re-enabling the WAN on the router will result in in an IoX restart. I do not know what the difference between killing the WAN and killing internet access for just the Polisy is. I do not understand the results of #5 vs #4.

@Illusion, toggling the WAN MAY result in a new WAN IP address being assigned by your ISP.  Disabling internet access to IoX should have no effect.  

Just pointing out the difference.  I have no ability to test.

Posted
19 hours ago, Geddy said:

This is an interesting point. What if you keep the portal connected, but turn off any system and ud communications notifications in UD Mobile? (In the notifications tab tap the 3 dots then "Mobile Devices/Groups" and turn off for "System Notifications", "UD Communications", and "Plugin Notifications" to all groups/devices). I might also stop the notification plugin. Then test the internet drop.

Ooh, that is a good one! I like it. But alas, no change. IoX still reboots. I turned off "System Notifications", "UD Communications", and "Plugin Notifications" for all groups/devices. I also stopped the Notification Plugin for this test.

19 hours ago, Geddy said:

Also, do you have "Query at Restart" set on the IoX Config tab?

Yes. I kinda have to. I did not use to, because my Polisy and my IoX programs are built to never restart unless I actively and manually do it. I have over 300 nodes, so it is a long time to 'Query All' if I am working on something that requires a reboot or restart. My Polisy is on a DC UPS, so does not loose power unless power is off for a long time. But now that my ISY restarts an average of once a week or so, I have to have 'Query at Restart' enabled or my ISY does not know the status of any of my devices. My whole wake up programs fall apart in this case. Disaster!

19 hours ago, Geddy said:

Lastly, along the lines of what @Guy Lavoie suggested, do you have any programs that are stuck in a loop if you lose internet? Perhaps a random program checking a status of a wifi device or a plugin that requires internet and is suddenly in a fast loop if the internet drops that's causing the IoX system to fail? (though I would expect something like that to be in a log that would be seen by now.)

I really feel like we can move away from the program angle. Disabling ALL programs does not stop the behavior. I admit to being a somewhat poor program writer, but I have been pretty great about never creating any kind of looping program. Surely UD would have seen that in the logs as you suspect. I like your thinking that the connection to a wifi device on the local network is involved, but the only thing I have like that is my Tempest weather station and associated plugin. But I feel like that is excluded by the stopping PG3x test. Further, my ISY does not get the weather info from the internet, it gets it locally from the LAN. And even if it did, I feel like the severing of the connection to the portal causing a change in behavior would exclude that since it has nothing to do with the portal.

Posted

Once again, it seems to be something related to the internet at large, if a problem at your ISP triggers it. Need to look at anything/everything net related. For example, it must get the time from a timerserver, though I don't see options to set that up in configuration, other than defining your timezone. Could there be an issue with not being able to reach a service like that?

Posted
20 hours ago, Guy Lavoie said:

What kind of devices do you have? Insteon? Z-wave? Do you have the ZMatter board? Any other connected devices, other than the usual PLM, ethernet, power? Original power supply on the Polisy? Tried a different one?

Other than portal and admin access, is TCP/IP used by anything else (plugin, device access, networking module)? Alexa or Google home?

I have lots of Insteon, a little bit of Z-wave. I have the Zmatter board. Power supply is an APC DC UPS (CP12142LI)

 https://www.apc.com/us/en/product/CP12142LI/network-ups-12vdc-lithium-battery19500mah-bms-4led/

I have considered altering the power supply for testing, but I cannot bring myself to make the effort because IoX does not reboot unless the portal is connected and I loose and restore internet.  I cannot see how a different power supply would alter that variable.

TCP/IP is used extensively in my plugins and networking module.

Posted
20 hours ago, bpwwer said:

Given the amount of debug you've already put into this I would think the next steps would be to determine if it is something in the OS or hardware.  I know it's not easy, but backing up your stuff and then restoring on new hardware or re-imaging your Polisy and restoring seems like a reasonable next step.

There is new OS for the Polisy coming out soon. UD and I plan to install that and continue testing, but Michel has low confidence that it will correct this particular issue. We were focused on hardware right up to the point that I can make it stop by severing the connection to the UD Portal. That is where we kinda hit a wall.

Unfortunately, I do not have other hardware that I could switch to, and that would be a major endeavor. Our current plan is to install the new OS soon and if the issued continues do some network traffic monitoring. But I started this forum post because people much smarter than me do not think that is going to fix anything.

Posted

Well this is one of those "I couldn't do this if I tried" phenomenon. It just doesn't seem possible. Someone, somewhere will come up with an outside the box explanation that will solve it. Bringing it to the forum is a good next step.

You would almost need to try blocking individual port numbers to try to find out what it is on the internet that's causing it.

  • Like 1
Posted
1 hour ago, IndyMike said:

@Illusion, toggling the WAN MAY result in a new WAN IP address being assigned by your ISP.  Disabling internet access to IoX should have no effect.  

That is a good one! I started to test this as a factor but stopped when I realized that the test where I unplug the ethernet cable preserves WAN IP address. I suspect that a reboot of the router does as well since the modem retains power. But turning off my internet over and over throughout the day creates other problems that I have to limit. So much so that I really wish the 'block internet access' switch would work to cause the issue. But your suggestions does make me think that there is something there. So my Polisy has a reserved IP address. The Polisy is not set to a static IP internally, but the router always gives it the same LAN IP address. But if I 'block internet access' for just this device, the LAN IP address remains constant. But this should also be true if I remove power to the ONT. The LAN never goes down in that case. I am still unable to reconcile the difference between Data points we have so far: #4 and #5

I feel like there is a piece of information here that if I could isolate it might inform people smarter than me. 

Posted

@Illusion a very random and probably insignificant option.

Have you tried using a different network cable connecting the Polisy to the router? Maybe one of different length or manufacturer. I would go as far as suggesting it be a "new to you" cable and not one that's being used elsewhere or been used in the past. 

Another suggestion would be do you have any way to connect the Polisy to the network through another device other than directly to the main router? Not sure of the fiber modem options (I have cable not fiber), but does it have multiple ports that you're able to use? Do you have another network hub somewhere you can remove the direct connection from the router to the Polisy?

Posted

My steps of troubleshooting at the point where you are would be in this order:

1. Replace network cable

2. Replace power supply

3. Replace Polisy

If none of those fix the issue, you now know that it's not the Polisy. My next steps would be swapping out hardware starting with the router, change wall outlets or a different circuit breaker in the house.

Posted

@Geddy Just tested with different network cable connected to a switch connected to a mesh node in another part of the house. Same results.

@gregkinney 1. Network cable replaced. 2. Tested on completely different power supply. A battery, so no wall outlet or circuit breaker involved at all in this test. Full isolation from home power. Same results.

New Data point! I was doing all these test by just pulling the ethernet cable out and plugging it back in since I had taken the effort to get to the back of my Polisy to change/reroute the network cable. Most of my tests are done by shutting off the WAN in the router just because it is the easiest. But what I realized on one of the restarts, the restart time did not seem right. Because it is a physical removal and restoration, I can know the exact second that the internet was restored.

Internet was restored at 12.06.15p. IoX System status shows a startup time of 12.05.34!!! IoX is showing a startup time 41s BEFORE the ethernet cable was plugged in! I do not know if it is relevant, but I know computers like correct time. 

Screenshot2025-02-06at12_10_16PM.png.61a739049ed7e9e737e33706eae42354.png

If I select the restart IoX button, or I hard reboot the Polisy, the startup time is correct. (The system time in the Admin Console is correct in all cases)

  • Like 1
Posted

Have you ever accessed unix through a ssh connection (UDI might have had you do that during troubleshooting)?

If you have or are willing to try, from your PC DOS prompt, do "ssh admin@polisy.local" and enter the admin password. If you get that far, then just ask the for date by entering "date<cr>". Does the date and time shown here match the ones you see in IoX?

Posted (edited)

Okay, here is what I currently have investigating the time/NTP issue.

@Guy Lavoie Thank you for that suggestion. Can confirm that Polisy has correct time as verified through your ssh suggestion. 

The time is not always so far off. I restored internet at 11.50.30am and system shows:

PIc1.png.94ce6cf9f6aa768fea96a706d85e1901.png

I restored at 12.02.30p and system shows:

Pic2.png.e10e12ffa9817988ca42af01e276adc5.png

I restored internet at 12.09.03p and system shows:

image.png.535878cec77c313874dfb1c69397b254.png

I am not sure this strangeness has anything to do with my issue at all, but is fascinating. I was wrong about this only happening with the internet drop and restore. There is the same time discrepancy if I reboot the Polisy or 'Restart IoX' from the configuration pane. The same discrepancy shows up if I do a remove/restore power to the Polisy. I had missed this by not paying attention to the AM/PM or because all those test were done in the AM and being off by a few seconds during a reboot is impossible to detect because of system startup. It is off by a few seconds and always seems to show only AM. I will do more testing in the afternoon. 

 

Edited by Illusion
Posted

Well, there is one more test you could try to see if the timeserver process might be involved. It's to kill the ntpd process. No worries, I tried this on my spare Polisy that I use for testing. A reboot will restart it normally.

Again, ssh to the Polisy and do this:

 

ps -aux | grep ntp         (that vertical bar is the pipe character, usually over the backslash on your keybpard)

 

You should get two lines of output. One of them will start with ntpd and the next field is a number, the process id. Lets say for example that it's 1125 (what I see on mine) Then you do:

 

sudo kill 1125               (use your actual process id...)

 

You will be prompted for a password. Enter the admin password. This stops ntpd  (network time protocol daemon)

Confirm that it's not running by redoing the ps -aux... command above. Once that is done, redo your networking tests that would cause IoX to restart and see if it still happens. Then reboot your Polisy to put it back to normal.

Tell us if anything changes. 

Posted

Based on what you've reported, I'd say the most likely answer is that the Admin Console has a bug and always displays "AM" for the start time. 

There's a very small possibility that if the hardware real-time clock is 12 hours off from actual time and IoX somehow is using that when the internet is down (vs. using the system software real-time clock), that when it sees a 12 hour time change, it crashes.

However, from what I've been able to find, FreeBSD is supposed to keep the hardware clock synchronized with NTP so it should never be that far off.  But maybe there's a hardware issue with the real-time clock that prevents it from being updated.

 

What do you get if you run

sysctl machdep | grep rtc

If machdep.disable_rtc_set is set to 1, it disables updating the hardware clock.

 

  • Like 1
Posted (edited)

I am going to move away from system startup time research. I am going with @bpwwer view that this is an admin console bug. Reboot/restarts later in the afternoon were still 41s off, but the AM/PM differential went away. Based on the fact that the time displayed in the admin console, that ssh shows Polisy knows the correct time, and that the email I get from ISY on reboot is timestamped correctly in the message body by the ISY, I do not want to spend any more time on this. I did 6 hours of tests on this issue yesterday.

I am not opposed to coming back to this, but I would like to focus my efforts on the fact that IoX does not restart if I am not connected to the UD Portal. I think the answer lies in that. Either that or figuring out what is different about stopping internet access for the IoX vs other forms of internet interruption.  I think that could give us a great data point. 

Data point #4:

On 2/5/2025 at 10:58 AM, Illusion said:

I have an ASUS router. It has a button with each device connected to disable access to the internet for that device. If I throw that switch in the router, my IoX shows that the portal is offline, the UD portal shows that my IoX is offline. But the IoX DOES NOT RESTART when I once again allow access from the Polisy to the internet. 

and 

Variations on those tests #3:

On 2/5/2025 at 10:58 AM, Illusion said:

If I sever the connection between IoX and the UD Portal, IoX DOES NOT RESTART with loss and restoration of the internet. 

@Geddy Regarding your idea of changing networking cable and topology, I thought that would be a good thing to take as far as I could while testing the startup time anomaly extensively. So I strung ethernet cable together to run my Polisy on my neighbor's network. So different WAN, different LAN, different cable, different router. Same results. IoX restarts unless I break the connection to the UD Portal. 

Edited by Illusion

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...