Jump to content

PGx3 down after an internet disconnect/reconnect


Go to solution Solved by bpwwer,

Recommended Posts

Posted

I noticed that if I disconnect from the internet for a short time then reconnect that all node servers are in a fail state and I need to restart PGx3. Is this normal or will PGx3 reconnect by itself after some period of time?

Posted

You should be able to restart the node servers without having to restart PG3x.  However, PG3x won't try to restart them automatically.

The "failed" state means that the node server failed and disconnected from PG3x unexpectedly. 

In your case, the node servers are likely crashing when they get disconnected from the internet.  Most are not designed to recover gracefully when that happens.  This is something that would have to be fixed in each of node servers.

Posted

Thanks Bob,

Since restarting pg3x does seem to solve the problem, is it possible to add an option/feature/flag to restart PG3x in this situation?  

Posted

PG3x doesn't know the internet connection was off-line so it can't detect that situation.   It doesn't know why the nodes servers failed, just that they have.

The problem is with specific node servers.  Some don't handle the internet disconnect well others do. 

Posted

OK thanks. Just my opinion but I think if a node server needs/uses/relies on an internet connection then it should be able to recover from an internet outage not just crash. I have 5 node servers and they all fail if there is an internet outage. 

I guess I could check UD Mobile notifications for a failed node server and do a restart but it would be simpler if it was all automatic

  • Solution
Posted
49 minutes ago, GTench said:

OK thanks. Just my opinion but I think if a node server needs/uses/relies on an internet connection then it should be able to recover from an internet outage not just crash. I have 5 node servers and they all fail if there is an internet outage. 

Yes, I agree, they shouldn't crash.   You can help by reporting the issues to the node server authors.

And there's a good chance that some of them may be my node servers, I know many/most of mine don't handle lose of network connectivity well and it's on my list of things to fix eventually.

Posted

OK thanks... will do. Yes, Weatherflow is one of the ones that crashed

Posted

Bob,

The node server authors are passing the issue back to PG3x. 

39 minutes ago, Goose66 said:

All those errors are from the udi_interface (the PG3/PG3x) side of the node server interface. @bpwwer would be the man to investigate here.

 

31 minutes ago, Goose66 said:

The log file contains the same errors here as in the EnvisaLink-DSC log file you posted earlier. Again, this appears to be a PG3/PG3x issue. I suggest you post a message with the log file(s) in the PG3 or PG3x support forum regarding this issue.

 

EnvisaLink-DSC_6-21-2023_10902_PM.zip YoLink_6-21-2023_11016_PM.zip

Posted (edited)

@bpwwer,

AFAIK from the two attached node server log files, both node servers are running along swimmingly, and then udi_interface logs this message:

2023-06-21 13:05:54,611 MQTT       udi_interface.interface INFO     interface:_disconnect: MQTT Unexpected disconnection. Trying reconnect. rc: 16

About 10 seconds later, the udi_interface throws this error:

023-06-21 13:06:09,269 MQTT       udi_interface.interface ERROR    interface:_disconnect: MQTT Connection error: An exception of type timeout occured. Arguments:
('_ssl.c:1117: The handshake operation timed out',)
Traceback (most recent call last):
  File "/var/polyglot/pg3/ns/0021b90261c8_1/.local/lib/python3.9/site-packages/udi_interface/interface.py", line 460, in _disconnect
    self._mqttc.reconnect()
  File "/var/polyglot/pg3/ns/0021b90261c8_1/.local/lib/python3.9/site-packages/paho/mqtt/client.py", line 1073, in reconnect
    sock.do_handshake()
  File "/usr/local/lib/python3.9/ssl.py", line 1310, in do_handshake
    self._sslobj.do_handshake()
socket.timeout: _ssl.c:1117: The handshake operation timed out

It then appears that the node server is "restarted," or at least all of the initialization routines are performed again. Another 10 seconds, the udi_interface throws the same "handshake operation timed out." The node server continues to run and attempt to update the node driver values (it's initializing) but gets dozens of warnings and errors like these:

2023-06-21 13:06:37,775 MainThread udi_interface.interface WARNING  interface:send: MQTT Send waiting on connection :: {'config': {'version': '3.0.8'}}
2023-06-21 13:06:40,782 MainThread udi_interface.interface ERROR    interface:send: MQTT Send timeout :: {'config': {'version': '3.0.8'}}.

That goes on for a few minutes and then the user evidently pulled the plug. These events are almost exactly the same in both node server log files as well as having similar timestamps. Note that my node server (EnvisaLink-DSC) doesn't rely on or have a connection to an Internet service - just a direct connection to the Envisalink device over the local network. 

It appears to me that it was PG3x (or the MQTT broker) that yacked when the Internet connection went down and the node servers (specifically, udi_interface) were never able to reconnect. Perhaps you can get @GTenchto give you a PG3 log file.

Edited by Goose66
Posted

I looked at the files too and noticed that. 

This doesn't appear to be an internet outage but a local network outage which is  a very different thing.

There's a lot of local network communication between components and if that is disrupted for more than milliseconds, it may be unrecoverable.

The MQTT connection between a node server and PG3x is happening on 127.0.0.1, that's not even external to the machine so for that to get a handshake error implies that something is actually blocking network activity on the local machine.  

I'll have to do some investigation but even unplugging the network connection should not effect that connection.

Posted

I don't know if this will help but here are a few more details concerning what I did.

I have an incoming fiber connection to a modem. The modem is connected to a unifi switch with an ethernet cable. The modem has its wifi turned off. Wifi is provided through out the house by unifi access points connected to the switch. I never enabled wifi on the eisy. Eisy is connected directly to the unifi switch with an ethernet cable. My testing consisted of unplugging the ethernet cable from the modem to the unifi switch for a couple of minutes then plugging it back in. If the disconnect/reconnect time is say 15 to maybe 30 seconds the node servers do do show fail but I have not timed this. Disconnect/reconnect of a few minutes caused a fail of all node servers

I am using the latest releases of PG3x and IOX.

Posted

Thanks for the details on your setup and what you did to test it.

The node server to PG3 communication should not be effected by external network issues, but it clearly appears to be.  I'll have to do some investigation to see if I can figure out why.

Guest
This topic is now closed to further replies.

  • Recently Browsing

    • No registered users viewing this page.
  • Who's Online (See full list)

    • There are no registered users currently online
  • Forum Statistics

    • Total Topics
      37.2k
    • Total Posts
      372.4k
×
×
  • Create New...