Jump to content

Polisy going non-responsive once a day


Sirmeili

Recommended Posts

This just started happening, but my polisy is going non-responsive once a day. It's quite frustrating.

One thing I've noticed when I do get to the logs is that it  says there are errors with "java.net.Sockettimeoutexception" over and over.

I can still SSH into the device, but nothing seems to be overwhelming the system.

 

I also can't access the ISY Admin either. I have zero nodes installed and basically just use this as an ISY for my HomeAssistant install. It has been working for months without an issue until recently.

Any ideas?

I will also open a ticket, but thought the forums might offer faster responses on the holiday weekend.

Link to comment

Yes, I have had the same issue. Below is the text from my ticket a couple weeks ago. It was supposedly fixed in 5.7.1, however, I am still having disconnections. I have opened another ticket but Michel thinks that it's just portal network connectivity.

 
Universal Devices

Michel Kohanim

replied

17 days ago

Hi Greg,

Recent updates to Home Assistant have caused dramatic bombardment from HA to Polisy causing it to either crash or consider it DOS attack. We are currently working on a solution that will handle these attacks more gracefully. We'll hopefully have a solution by the end of the week.

View more!
Greg Kinney

Greg Kinney

replied

17 days ago

Yes I have Home Assistant

Universal Devices

Michel Kohanim

replied

17 days ago
 

Hi Greg, do you have any of the following:

1. Home Assistant

2. Home Bridge

3. ELK or Harmony Node servers

Link to comment
4 minutes ago, gregkinney said:

I did not validate anything with my logs so I don't know.

I looked at the HA logs and I did see lots of these, so it could be that it is on the HA side, but not sure as I never looked before.

 

2023-11-25 11:57:12.651 DEBUG (MainThread) [pyisy.events.websocket] Starting websocket connection.
2023-11-25 11:57:12.654 ERROR (MainThread) [pyisy.events.websocket] Unexpected websocket error Session is closed
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/pyisy/events/websocket.py", line 218, in websocket
    async with self.req_session.ws_connect(
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 779, in _ws_connect
    resp = await self.request(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/client.py", line 400, in _request
    raise RuntimeError("Session is closed")
RuntimeError: Session is closed

I did install the HACs UD ISY/IOX plugin (which is just a newer version of hte one in HA that uses a beta version of pyisy). I'll see if that is any better.

Link to comment
57 minutes ago, gregkinney said:

Please report back and let me know if it helped 🤞

Will. do. So far so good and I don't see the above errors in the HA logs, but then again, don't know if they occurred because the polisy freaked out. I should know in the nest 24 hours and I'll report back.

Link to comment

So, just a quick update. As of this morning the Polisy is acting fine (both PG3x and the ISY). I'm not seeing any of the java.net.sockettimeoutexceptions in the polisy logs. Still not going to say it's 100% working, I'll continue to monitor, but things are looking up.

Some notes if you are using Home Assistant and want to install the beta version of the "Universal Devices ISY/IoX" integration from HACs.

  1. t is a replacement/overwrite. That is to say you only have to install it from HACs and nothing else. It will put itself in the custom_components directory and upon restart HA will us that version instead of the included one (so easy install/backout)
  2. If you are using any Events in HA for ISY devices, the event data has changed and broken a lot of my automations. It adds the address into the event data. It's a pretty easy fix, but something to be aware of.
  3. As far as I can tell, except #2, everything else from a device/entity perspective is the same and all my dashboards and automations (except the issue in #2 for triggers) are working as expected.

I'll reprort back tomorrow if it's still working or sooner if I notice it breaks again.

  • Like 1
Link to comment

Well, not great news. I woke up and while HA can control the ISY, the polisy is slow to react I get duplicate commands (like it thinks that the ISY didn't respond and HA tried again, or that was all on the ISY side)

Trying to log into the pg3x web interface just constantly kicks me back out to the login screen after logging in, but before doing so, it says there is no ISY detected.

I have no idea where to see how HA might be overloading the ISY. I've kept the Event Viewer on and I don't see a bunch of excessive commands or anything being sent to it.

I've also tried rebooting from the ISY admin interface and the phone since this started happening, but all I get is any ISY with no lights on it. I have to power cycle it manually to get it to work.

 

Looking at the HA logs, I noticed that I was getting heartbeat messages on Saturday from the ISY, but they stopped shortly after my last restart (within an hour or 2) and after that I would get these

2023-11-25 15:40:15.170 WARNING (MainThread) [pyisyox.events.websocket] Websocket disconnected unexpectedly with code: 0

I've also seen this here and there:

2023-11-26 04:13:31.099 ERROR (MainThread) [pyisyox.events.websocket] Error during receive Received frame with non-zero reserved bits

After about 24 hours I start to see this:

2023-11-26 13:54:06.099 WARNING (MainThread) [pyisyox] Timeout while trying to connect to the ISY.

That goes on untl this morning when it was not as responsive.

Note that until this morning, HA didn't see to have many issues talking to the ISY, but obviously in the background it did.

I still don't know if this is on the ISY or the HA side, but I can tell you that the code on the HA side for this integration seems to not have changed since earlier this year. it's using websockets so not sure if something changed in the base HA code, but the integration hasn't changed.

At this time, I'm not 100% sure the "beta" is the way to go. If anyone can tell me where I can see the logs that points to HA hammering the ISY, I would love to look at them so I can go back to HA and see if it can ber fixed over there. Otherwise, their logs don't show a bunch of communication back to the ISY until the ISY starts to timeout.

Edited by Sirmeili
Link to comment

Michel said he was making changes on the ISY side to be able to handle HA so I would keep to that logic at the moment. I'm excited for you to share all of the above with them in your ticket. Please let me know what happens, I'm going to wait to do anything until you hopefully get somewhere with the above info.

Link to comment
3 hours ago, gregkinney said:

Michel said he was making changes on the ISY side to be able to handle HA so I would keep to that logic at the moment. I'm excited for you to share all of the above with them in your ticket. Please let me know what happens, I'm going to wait to do anything until you hopefully get somewhere with the above info.

Yeah, I'm still waiting on a first contact from my ticket (holiday weekend so I understand).

What  are you doing at the moment to keep everything running? This is killing my WAF. Are you waiting for it to fail or are you doing some kind of automated reboot?

Link to comment

I'm not confident I'm having serious issues anymore. Before updating to 5.7.1, I would notice 2-4 crashes per day and I would confirm it indeed was crashing because it would be 5-10 minutes before things would respond again. After updating to 5.7.1, I'm getting notifications that it has disconnected and reconnected 1-2 times per day, however, I don't think it is crashing this time. Things are immediately responding after I get the notifications. It might be exactly what Michel said - that it just loses connectivity with the portal temporarily. So we might be having different issues, I'm not sure. Are you on 5.7.1?

Edited by gregkinney
Link to comment

Thought I would share my updated ticket:

Universal Devices

Michel Kohanim

replied

22 minutes ago
 

Hi Greg, the first thing you need to do is to disable HA for a day and see whether or not you still get these issues. As far as I remember, your HA was bombarding IoX with traffic. Although we may have fixed the crash, it does not mean that Polisy can handle the traffic.

View less!
Greg Kinney

Greg Kinney

replied

31 minutes ago

Yes the last runtimes are correct. Yes I am on 5.7.1. When I was on 5.7.0 and I would get a disconnect/reconnect notice on my phone from the UD app, it would be 5-10 minutes before any devices would respond because it was loading everything (I assume it had restarted or crashed). Now on 5.7.1, when I get a disconnect/reconnect notice, my devices are still immediately responsive. So if it's a network related issue, is that on my end or on your end?

View less!
Universal Devices

Michel Kohanim

replied

56 minutes ago

Thanks Greg. So, the last/next runtimes are correct? If so, the issue is network related. What's your IoX version? Is it really 5.7.1?

Link to comment

So I heard from UD support.

After requesting my system logs, they suggested I run update once again. 
This would be the third time running the update for 5.7.1.

Third time seems to be the charm as I woke up this morning with the polisy still online. 
I’ll wait a few days before claiming this is the final fix as it was somewhat sporadic. 

Link to comment
  • 2 weeks later...

I had some issues with my polisy that are quite possibly falling into this same category, though they seem to have been fixed.  I use Automation Shacks Nodelink program and per Michelle it was overwhelming Polisy with open sockets and causing ISY to reboot and then in a later firmware was causing it to freeze.  Reboots were about once a day and were preferred to freezing.  After sending a bunch of logs to Michelle and having him remote into my machine, it appears that a few ISY updates later the problem was solved.  Nodelink I do not believe has had any changes in a long time, perhaps years, so at some point it would appear that a change in ISY was the source of the compatibility issue.

Link to comment
Guest
This topic is now closed to further replies.

×
×
  • Create New...