bpwwer Posted May 24, 2023 Author Posted May 24, 2023 4 hours ago, drprm1 said: Since installing this upgrade my nodeservers will go offline and come back every couple of hours. Shows on UD Mobile as Fail then connected....Have 72 notifications on ud mobile from last night. The fail and connected notices are both the exact same time. Not really affecting anything that I can tell, just annoying to see them. Tried rebooting/restarting everything. Also tried installing the upgrade package again. All of them or only specific node servers? What node servers do you have installed? My Roku node server seemed to do this. What's happening is that the mqtt connection drops, the node server then re-tries and successfully reconnects. So you see a fail notification and a connect notification. I was seeing this on only one of about a dozen node servers installed. I don't know why that one specific node server was having issues with the mqtt connection.
drprm1 Posted May 24, 2023 Posted May 24, 2023 I erased all the notification in UD mobile so not sure exactly which ones were doing it.. I have MyQ, Ring, Climacell, FlumeWater, Hue, Wireless Tags, Push, Holiday, and Yolink Nodeservers. I'll watch for exact ones and post them
drprm1 Posted May 25, 2023 Posted May 25, 2023 It’s every one of my nodeservers. UD Notifications show a “fail” and “connected” at the same time. Not always fail before connect. ex.
dbwarner5 Posted May 25, 2023 Posted May 25, 2023 (edited) On 5/23/2023 at 5:09 PM, macjeff said: Thats correct. It is just a flag that turns on. I just did it on a second Polisy I have and worked fine 1. run command but I would replace polisy.local with your IP address of polisy which I had to do https://polisy.local:8443/rest/pg3x.enable 2. it will give you the normal Https insecure warning but find option to go to page anyways 3. Should ask you to login and login with polisy credentials 4. you will get the XML error you said. Its because its just setting a flag 5. Power Cycle Polisy AND WAIT. Took 30 min on mine. 6. Empty Browser cache or try adding ? to the end of your Polisy URL which should force it to load without cache. 7. Pg3 should now say pg3x. Make sure all the nodeservers are running. 8. Then I had to reboot Polisy one more time because ISY was acting strange- happened on both polisy's but a second reboot fixed. the important thing is be patient. If you restart it too soon I am not sure what will happen. did the above. no problems. gave it plenty of time to reboot. Just completed second reboot. all looks good with one exception: -Notification NS is now asking for a Portal API. didnt know what this was, but here is the info for anyone else. Edited May 25, 2023 by dbwarner5 1
macjeff Posted May 25, 2023 Posted May 25, 2023 10 minutes ago, dbwarner5 said: did the above. no problems. gave it plenty of time to reboot. Just completed second reboot. all looks good with one exception: -Notification NS is now asking for a Portal API. didnt know what this was, but here is the info for anyone else. This happened to me also but when upgrading Notification NOT the pg3x. You need to go UD Mobile app Click on notifications tab at bottom Click on the gear icon for settings Your key will be there. What I did is copy it and email it to myself so I could get it on the computer to paste into the Notification NS config 1
bpwwer Posted May 25, 2023 Author Posted May 25, 2023 7 hours ago, drprm1 said: It’s every one of my nodeservers. UD Notifications show a “fail” and “connected” at the same time. Not always fail before connect. I suspect that's because the fail/reconnect happens within a couple of seconds. You may have said earlier, but are you running PG3 or PG3x? What does the node server log show at those times for a couple of them?
drprm1 Posted May 25, 2023 Posted May 25, 2023 After upgrade to Px3……. Looked at a few logs and they don’t show back that far. I’ll try and catch a couple earlier.
macjeff Posted May 25, 2023 Posted May 25, 2023 2 hours ago, bpwwer said: I suspect that's because the fail/reconnect happens within a couple of seconds. You may have said earlier, but are you running PG3 or PG3x? What does the node server log show at those times for a couple of them? I have had a few random of these. Ecobee is one of them but there was another but not all of them. It may be related. See screenshot @bpwwer- If you want to see my logs send me a PM
bpwwer Posted May 25, 2023 Author Posted May 25, 2023 Please verify in the node server logs that the node servers are actually disconnecting and reconnecting. You'll have to do this within the same day the notification was sent. In my case, the node server was disconnecting and reconnecting so the notification system was working as intended. Also, the node server was working as intended. When the connection dropped, it reconnected and the node server continued working without issue. The question is what's causing the connection to drop in the first place? In my case it appears to be caused by network packet corruption or possibly a bug in the low level networking code. In either case, not something I can fix.
BigMojo Posted May 26, 2023 Posted May 26, 2023 I’ve been seeing the same behavior also. Would it be possible to insert a connection drop timer variable before the notification triggers? This may also be a DHCP renewal but I haven’t had a chance to look at that yet.
dbwarner5 Posted May 26, 2023 Posted May 26, 2023 23 hours ago, bpwwer said: Please verify in the node server logs that the node servers are actually disconnecting and reconnecting. You'll have to do this within the same day the notification was sent. In my case, the node server was disconnecting and reconnecting so the notification system was working as intended. Also, the node server was working as intended. When the connection dropped, it reconnected and the node server continued working without issue. The question is what's causing the connection to drop in the first place? In my case it appears to be caused by network packet corruption or possibly a bug in the low level networking code. In either case, not something I can fix. This is happening to me as well. PGx ... today alone, I have 79 messages of connect / reconnects... ~15 messages per event, had three disconnects, but only two connects: two disconnects happened back to back w/o reporting the connect. Any suggestions of why or what to look for to solve on my end would be great, as well as @BigMojos suggestion? However notice the log from Sun NS. Time corresponds with notification exactly: 2023-05-26 13:59:57,575 Command udi_interface.interface INFO interface:_handleInput: Received stop from Polyglot... Shutting Down.2023-05-26 13:59:57,586 Command udi_interface.interface INFO interface:stop: Disconnecting from MQTT... localhost:8883 2023-05-26 13:59:57,588 Command udi_interface.interface INFO interface:_disconnect: MQTT Graceful disconnection. 2023-05-26 14:00:25,539 MainThread udi_interface INFO polylogger:set_basic_config: set_basic_config: enable=True level=30 2023-05-26 14:00:31,870 MainThread udi_interface INFO __init__:<module>: UDI Python Interface for Polyglot version 3 3.0.59 Starting... 2023-05-26 14:00:32,204 MainThread udi_interface.interface INFO interface:__init__: Initialization received from Polyglot V3 3.1.27 [ISY: 5.6.0, Slot: 7] 2023-05-26 14:00:32,208 MainThread udi_interface.interface INFO interface:__init__: Connect: Network Interface: {'addr': '192.168.68.76', 'netmask': '255.255.252.0', 'broadcast': '192.168.71.255'} 2023-05-26 14:00:32,314 Interface udi_interface.interface INFO interface:_startMqtt: Connecting to MQTT... localhost:8883 2023-05-26 14:00:32,416 Interface udi_interface.interface INFO interface:_startMqtt: Using SSL certs: 000db9532a30_7.cert 000db9532a30_7.key 2023-05-26 14:00:32,480 MQTT udi_interface.interface INFO interface:_connect: MQTT Connected with result code 0 (Success) 2023-05-26 14:00:32,482 MQTT udi_interface.interface INFO interface:_connect: MQTT Subscribing to topic: udi/pg3/ns/clients/00:0d:b9:53:2a:30_7 - MID: 2 Result: 0 2023-05-26 14:00:32,526 MQTT udi_interface.interface INFO interface:_subscribe: MQTT Subscribed Succesfully for Message ID: 2 - QoS: (0,) 2023-05-26 14:00:32,573 MainThread udi_interface.interface WARNING interface:db_getNodeDrivers: sunctrl not found in database. 2023-05-26 14:00:32,577 MainThread udi_interface.interface INFO interface:addNode: Adding node Sun Position(sunctrl) [None] 2023-05-26 14:00:32,580 MainThread udi_interface.interface INFO interface:setController: Using node "sunctrl", driver "ST" for connection status. 2023-05-26 14:00:37,819 Thread-1 udi_interface.custom INFO custom:_save: Sending data notices to Polyglot. 2023-05-26 14:00:39,037 MQTT udi_interface.interface INFO interface:_message: Successfully set key = notices 2023-05-26 14:00:39,375 Thread-3 udi_interface INFO sun-poly:start: Started Sun Position And similar result from iAqualink: 2023-05-26 13:59:57,590 Command udi_interface.interface INFO interface:_handleInput: Received stop from Polyglot... Shutting Down.2023-05-26 13:59:57,603 Thread-4848 udi_interface INFO iaqualinkns:stop: iAquaLink node server is shutting down... 2023-05-26 13:59:57,629 Thread-4848 udi_interface.node DEBUG node:setDriver: service:iAquaLink Service Reporting set ST to 0 to Polyglot 2023-05-26 13:59:57,630 Thread-4848 udi_interface.node DEBUG node:reportDriver: Updating value to 0 2023-05-26 13:59:57,632 Thread-4848 udi_interface.interface DEBUG interface:send: PUBLISHING {'set': [{'address': 'service', 'driver': 'ST', 'value': '0', 'uom': 2}]} 2023-05-26 13:59:57,634 Thread-4848 udi_interface.interface INFO interface:stop: Disconnecting from MQTT... localhost:8883 2023-05-26 13:59:57,637 Thread-4848 udi_interface.interface INFO interface:_disconnect: MQTT Graceful disconnection. 2023-05-26 13:59:57,640 MQTT udi_interface.interface DEBUG interface:_startMqtt: MQTT: Done 2023-05-26 14:00:30,419 MainThread udi_interface INFO polylogger:set_basic_config: set_basic_config: enable=True level=30 2023-05-26 14:00:38,157 MainThread udi_interface INFO __init__:<module>: UDI Python Interface for Polyglot version 3 3.0.59 Starting... 2023-05-26 14:00:38,240 MainThread udi_interface.interface INFO interface:__init__: Initialization received from Polyglot V3 3.1.27 [ISY: 5.6.0, Slot: 10] 2023-05-26 14:00:38,253 MainThread udi_interface.interface INFO interface:__init__: Connect: Network Interface: {'addr': '192.168.68.76', 'netmask': '255.255.252.0', 'broadcast': '192.168.71.255'} 2023-05-26 14:00:38,254 MainThread udi_interface INFO iaqualinkns:run: Starting iAquaLink node server...
dbwarner5 Posted May 26, 2023 Posted May 26, 2023 @bpwwer Here are my PGx and system logs in case that would help? pg3_5-26-2023_62259-PM.zipsystem_5-26-2023_62303-PM.zip
bpwwer Posted May 26, 2023 Author Posted May 26, 2023 @dbwarner5 stopping and starting the node server are supposed to generate notifications. On stop, it disconnects and on start it connects. Those are normal and correct. Stopping and starting node servers should not be happening often. In fact, the only way the node server can be stopped if if you stop it or PG3 shuts down. What I'm concerned about are "failed" notifications. Those mean the node server stopped communicating with PG3 which is something that should not ever happen and if it does, it's probably a bug somewhere.
dbwarner5 Posted May 27, 2023 Posted May 27, 2023 (edited) @bpwwer yes i get all of the above. What i dont get is why is PGx telling the NS to shut down? I am not telling it to do anything. These disconnects, reconnects, as shown on the logs appear to be directed by Stopped by Polyglot. Why? Edited May 27, 2023 by dbwarner5
bpwwer Posted May 27, 2023 Author Posted May 27, 2023 15 minutes ago, dbwarner5 said: @bpwwer yes i get all of the above. What i dont get is why is PGx telling the NS to shut down? I am not telling it to do anything. These disconnects, reconnects, as shown on the logs appear to be directed by Stopped by Polyglot. Why? Like I said, there are only 2 cases where PG3 will tell a node server to shutdown. 1. Someone clicks the stop or restart buttons in the UI 2. PG3 shuts down. In your case it looks like PG3 shutdown: 5/26/2023, 13:59:57 [pg3] error: uncaughtException: read ECONNRESET Error: read ECONNRESET at TCP.onStreamRead (node:internal/stream_base_commons:217:20) 5/26/2023, 13:59:57 [pg3] error: uncaught ECONNRESET exception: Error: read ECONNRESET at TCP.onStreamRead (node:internal/stream_base_commons:217:20) 5/26/2023, 13:59:57 [pg3] info: Caught SIGTERM/SIGINT Shutting down. 5/26/2023, 13:59:57 [pg3] info: Stopping running NodeServers... That error indicates that the MQTT broker unexpectedly closed the network connection and since that connection is not supposed to close unless PG3 exits normally, PG3 simply aborts and restarts. As to why, I have no idea. The broker logs may have some information, but I've not found them to be very helpful in the past. Could also be something external disrupted the network (router reboot, etc.).
dbwarner5 Posted May 28, 2023 Posted May 28, 2023 20 hours ago, bpwwer said: Like I said, there are only 2 cases where PG3 will tell a node server to shutdown. 1. Someone clicks the stop or restart buttons in the UI 2. PG3 shuts down. In your case it looks like PG3 shutdown: 5/26/2023, 13:59:57 [pg3] error: uncaughtException: read ECONNRESET Error: read ECONNRESET at TCP.onStreamRead (node:internal/stream_base_commons:217:20) 5/26/2023, 13:59:57 [pg3] error: uncaught ECONNRESET exception: Error: read ECONNRESET at TCP.onStreamRead (node:internal/stream_base_commons:217:20) 5/26/2023, 13:59:57 [pg3] info: Caught SIGTERM/SIGINT Shutting down. 5/26/2023, 13:59:57 [pg3] info: Stopping running NodeServers... That error indicates that the MQTT broker unexpectedly closed the network connection and since that connection is not supposed to close unless PG3 exits normally, PG3 simply aborts and restarts. As to why, I have no idea. The broker logs may have some information, but I've not found them to be very helpful in the past. Could also be something external disrupted the network (router reboot, etc.). Ok Thanks Bob. I will keep an eye on it and maybe reboot the Polisy one more time. What is also interesting is that it is almost instantaneous that it shuts down and then starts the restart procedure. If it persists, will open ticket. Regards...
dbwarner5 Posted May 28, 2023 Posted May 28, 2023 @bpwwer Bob, a separate question, I did the upgrade from PG3 to PGX. Now my one RussoundRIO NS on PG2 is no longer working and I can no longer access PG2.. was this a result of the upgrade or is it something else? Best solution is to port RussoundRIO to PG3/X but havent see that addressed yet. thanks.
bpwwer Posted May 28, 2023 Author Posted May 28, 2023 3 hours ago, dbwarner5 said: @bpwwer Bob, a separate question, I did the upgrade from PG3 to PGX. Now my one RussoundRIO NS on PG2 is no longer working and I can no longer access PG2.. was this a result of the upgrade or is it something else? Best solution is to port RussoundRIO to PG3/X but havent see that addressed yet. thanks. I believe one of the recent system updates does remove PG2 from the system. I don't know if @Michel Kohanim announced that or not. I had plans to include RIO support in my Russound PG3 node server but that hasn't made it on to my active tasks yet. 1
macjeff Posted May 28, 2023 Posted May 28, 2023 1 minute ago, bpwwer said: I believe one of the recent system updates does remove PG2 from the system. I don't know if @Michel Kohanim announced that or not. I had plans to include RIO support in my Russound PG3 node server but that hasn't made it on to my active tasks yet. Mine was on the latest firmware and then I ran the PG3x update and it disappeared. Maybe that update ran at the same time but i am 100% sure it disappeared after the latest ISY firmware.
dbwarner5 Posted May 29, 2023 Posted May 29, 2023 20 hours ago, bpwwer said: I believe one of the recent system updates does remove PG2 from the system. I don't know if @Michel Kohanim announced that or not. I had plans to include RIO support in my Russound PG3 node server but that hasn't made it on to my active tasks yet. Wow.. that's a huge bummer. I wish I had known before I did the "upgrade" (whenever pg2 disappeared) as this is clearly a downgrade losing pg2 RussoundRIO. I have migrated all my other NS to PG3/X and am happily supporting the developers, but this is a very unpleasant surprise as I have lots of things tied into my whole house audio system via the RussoundRIO NS. @bpwwer hope things can settle down enough for you to port this over! let me know if you need test cases. Thanks.
BigMojo Posted May 29, 2023 Posted May 29, 2023 Quote Any suggestions of why or what to look for to solve on my end would be great, as well as @BigMojos suggestion? However notice the log from Sun NS. Time corresponds with notification exactly: @dbwarner5, I can tell you I’ll be looking at my router’s logs (Syslog’d to NAS) to see if/when anything is getting recorded for the IOX’s MAC/IP. A quickie test would be to modify your router DHCP lease time to as low a value as you can and see if that changes your failure frequency. If so, raise the lease time to a large value to correlate a greatly reduced fail frequency. I’m tied up with the holiday weekend but I’ll take a deeper look in the next few days to see if I can replicate your condition. Happy Memorial Day everyone!
dbwarner5 Posted May 30, 2023 Posted May 30, 2023 22 hours ago, BigMojo said: @dbwarner5, I can tell you I’ll be looking at my router’s logs (Syslog’d to NAS) to see if/when anything is getting recorded for the IOX’s MAC/IP. A quickie test would be to modify your router DHCP lease time to as low a value as you can and see if that changes your failure frequency. If so, raise the lease time to a large value to correlate a greatly reduced fail frequency. I’m tied up with the holiday weekend but I’ll take a deeper look in the next few days to see if I can replicate your condition. Happy Memorial Day everyone! Thanks. I opened a ticket to understand what these errors mean as this seems to be the original of the problem: 5/30/2023, 09:19:59 [pg3] error: uncaughtException: read ECONNRESET Error: read ECONNRESET at TCP. onStreamRead (node: internal/stream_base_commons:217:20) 5/30/2023, 09:19:59 [pg3] error: uncaught ECONNRESET exception: Error: read ECONNRESET at TCP.onStreamRead (node: internal/stream_base_commons:217:20) 5/30/2023, 09:19:59 [pg3] info: Caught SIGTERM/SIGINT Shutting down. 5/30/2023, 09:19:59 [pg31 info: Stopping running NodeServers...
brians Posted May 30, 2023 Posted May 30, 2023 On 5/23/2023 at 9:35 AM, kzboray said: @dbwarner5 Upgrading to PG3x Enable PG3x: https://polisy.local:8443/rest/pg3x.enable Reboot and wait for a beep + 1. Migrating node servers from PG3 to PG3x might take up to 2 minutes per node server. So, please be patient Once all node servers show connected, reboot again. That link has this error... This XML file does not appear to have any style information associated with it. The document tree is shown below.
macjeff Posted May 30, 2023 Posted May 30, 2023 20 minutes ago, brians said: That link has this error... This XML file does not appear to have any style information associated with it. The document tree is shown below. Just ignore the error because all the link does is turn on a flag that when you restart the unit it replaces PG 3 with PG 3X it’s correct that there is nothing to display. Just after you get that reboot the polisy and wait
brians Posted May 31, 2023 Posted May 31, 2023 8 minutes ago, macjeff said: Just ignore the error because all the link does is turn on a flag that when you restart the unit it replaces PG 3 with PG 3X it’s correct that there is nothing to display. Just after you get that reboot the polisy and wait Yes thanks, I guess would be too difficult for them to display something on screen Wonder how long it takes, still got this going on after about 20 min...
Recommended Posts