johnnyt Posted September 25, 2022 Posted September 25, 2022 I've been testing my Polisy's IoP in preparation for migrating from 994i when the new Zwave/Matter dongle arrives along with the zwave migration tool in about 6-8 weeks. IoP is only running 3 node servers: OpenWeatherMap, Weatherflow and ST-Inventory. It has no PLM and no Zwave dongle connected to it, and the are no programs enabled/running so the only thing happening is that the Node Servers are doing their device updates. Yesterday I noticed the error log had been getting polluted with "UDQ: Queue(s) Full, message ignored" errors, which happened constantly between about 4 AM on Sept 21 until 11:30 PM yesterday (24th), when Polisy was restarted. I've been seeing 'full queues' on my 994i for past couple of years occasionally when it has been overloaded, mostly during restarts, but never continuously like as been occurring for past couple of days on IoP. I've attached IoP log, and error log. I can provide the PG3 log file for yesterday if it helps but be warned that the 62MB file also has a ton of messages related to the 5 NS used by my 994i and I don't see a way to easily separate them out from the IoP related entries. My usual method of importing into Excel and filtering on a column doesn't work well with these logs. (If you know how I could easily do that with Windows, please let me know.) Why would IoP event queue(s) have gotten overloaded and remain that way for 3 days with nothing but device updates from 3 NS? I can certainly report the problem to UDI but figured I would start here first to rule in/rule out a PG3 issue before reporting this as an IoP issue. Any info would be appreciated. IoPLogs.zip
johnnyt Posted September 25, 2022 Author Posted September 25, 2022 I'm not quite seeing the resemblance because I don't have any zwave devices or dongle (or Insteon PLM) so how could there be any events related to either? Plus I'm getting full queue messages. The error log for the other problem has none of those messages. That said I do have a lot of -17000 messages too (like the other one) so I will try was you suggested there, namely: Quote One test would be to disable (stop) all of the node servers and then reboot. PG3 will remember the last state of the node servers so it should start with all node servers disconnected (not running). Then slowly start them manually one by one giving it a minute or two to stabilize between each. and I'll keep tracking the other thread to see how that one evolves.
bpwwer Posted September 25, 2022 Posted September 25, 2022 I believe -170001 messages are related to the queue being full. That was the connection I made between all these cases. Issues with z-wave devices not updating, Insteon devices not updating, PG3 node server devices not updating, programs not triggering could all be caused by the same root cause, something hung and the new tasks not being pull from the queue and thus the queue filling up.
johnnyt Posted September 25, 2022 Author Posted September 25, 2022 So as suggested I stopped the 3 NS connecting to IoP, restarted IoP, waited a bit and started OpenWeatherMap NS (only). Here are the error log entries: Time User Code Message Mon 1900/01/01 12:00:00 AM System -170001 <s:Envelope><s:Body><u:GetSystemTime xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemTime></s:Body></s:Envelope> Sun 2022/09/25 03:41:07 PM 0 -170001 <s:Envelope><s:Body><u:GetSystemOptions xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemOptions></s:Body></s:Envelope> Sun 2022/09/25 03:41:08 PM 0 -170001 <s:Envelope><s:Body><u:GetNetworkConfig xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetNetworkConfig></s:Body></s:Envelope> Sun 2022/09/25 03:41:20 PM 0 -170001 <s:Envelope><s:Body><u:Reboot xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:Reboot></s:Body></s:Envelope> Sun 2022/09/25 03:41:23 PM 0 -5 Start Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/TMP Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/LOG Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/WEB Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CODE Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/D2D Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/MAIL Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/USER Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/USER/WEB Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/NET Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/SEP Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/OADR Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/BILLING Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/DEF Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/GLOBAL Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/GLOBAL/i1 Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/GLOBAL/i1/nls Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/GLOBAL/i1/editor Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/GLOBAL/i1/nodedef Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/f1 Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/f1/i1 Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/f1/i1/nls Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/f1/i1/editor Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/f1/i1/nodedef Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/DEF/f1/i1/emap Sun 2022/09/25 03:41:23 PM 0 -110026 ./FILES/CONF/DEF/f10 Sun 2022/09/25 03:41:45 PM 0 -110022 ./FILES/CONF/INSTENG.OPT Sun 2022/09/25 03:41:45 PM 0 -110012 ./FILES/CONF/INSTENG.OPT Sun 2022/09/25 03:52:37 PM 0 -170001 <s:Envelope><s:Body><u:GetISYConfig xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetISYConfig></s:Body></s:Envelope> Sun 2022/09/25 03:52:41 PM 0 -170001 <s:Envelope><s:Body><u:Authenticate xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><name>jean</name><id>11111</id></u:Authenticate></s:Body></s:Envelope> Sun 2022/09/25 03:52:41 PM 0 -170001 <s:Envelope><s:Body><u:GetStartupTime xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetStartupTime></s:Body></s:Envelope> Sun 2022/09/25 03:52:42 PM 0 -170001 <s:Envelope><s:Body><u:GetSysConf xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><name>/CONF/INTEGER.VAR</name></u:GetSysConf></s:Body></s:Envelope> Sun 2022/09/25 03:52:42 PM 0 -170001 <s:Envelope><s:Body><u:GetSysConf xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><name>/CONF/STATE.VAR</name></u:GetSysConf></s:Body></s:Envelope> Sun 2022/09/25 03:52:42 PM 0 -170001 <s:Envelope><s:Body><u:GetVariables xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><type>1</type></u:GetVariables></s:Body></s:Envelope> Sun 2022/09/25 03:52:42 PM 0 -170001 <s:Envelope><s:Body><u:GetVariables xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><type>2</type></u:GetVariables></s:Body></s:Envelope> Sun 2022/09/25 03:52:44 PM 0 -170001 <s:Envelope><s:Body><u:GetNodesConfig xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetNodesConfig></s:Body></s:Envelope> Sun 2022/09/25 03:52:47 PM 0 -170001 <s:Envelope><s:Body><u:GetSystemTime xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemTime></s:Body></s:Envelope> Sun 2022/09/25 03:52:47 PM 0 -170001 <s:Envelope><s:Body><u:SetDebugLevel xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><option>1</option></u:SetDebugLevel></s:Body></s:Envelope> Sun 2022/09/25 03:52:47 PM 0 -170001 <s:Envelope><s:Body><u:GetSystemOptions xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemOptions></s:Body></s:Envelope> Sun 2022/09/25 03:52:48 PM 0 -170001 <s:Envelope><s:Body><u:Subscribe xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><reportURL>REUSE_SOCKET</reportURL><duration>infinite</duration><send>F</send></u:Subscribe></s:Body></s:Envelope> Sun 2022/09/25 03:52:48 PM 0 -170001 <s:Envelope><s:Body><u:IsSubscribed xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><SID>uuid:28</SID></u:IsSubscribed></s:Body></s:Envelope> Sun 2022/09/25 03:52:49 PM 0 -170001 <s:Envelope><s:Body><u:RefreshDeviceStatus xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><sid>uuid:28</sid></u:RefreshDeviceStatus></s:Body></s:Envelope> Sun 2022/09/25 03:52:49 PM 0 -170001 <s:Envelope><s:Body><u:GetSystemStatus xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemStatus></s:Body></s:Envelope> Sun 2022/09/25 03:52:52 PM 0 -170001 <s:Envelope><s:Body><u:GetDisclaimerStatus xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetDisclaimerStatus></s:Body></s:Envelope> Sun 2022/09/25 03:53:31 PM 0 -170001 <s:Envelope><s:Body><u:GetErrorLog xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetErrorLog></s:Body></s:Envelope> Right off the bat a bunch of -170001 errors, but no full queue messages Also, I should mention that the last time I did an 'upgrade package' (using IoP AC button) was about 10 days ago. Others seem to be mentioning very recent updates as maybe being part of the problem. Should I just go ahead and start the other NS' or is there something I should test (or wait to see) before I do that?
Michel Kohanim Posted September 25, 2022 Posted September 25, 2022 @johnnyt, -170001 is not error. It's information (debug). With kind regards, Michel 1
johnnyt Posted September 25, 2022 Author Posted September 25, 2022 1 hour ago, Michel Kohanim said: @johnnyt, -170001 is not error. It's information (debug). With kind regards, Michel So @bpwwer, does this mean the -170001 messages would not be related to queue(s) being full, and the other post is a totally separate thing going on? If so, how can we narrow down the root cause of the issues I ran into? Also, let me know if I should go ahead and start the other 2 node servers that are still stopped. Or maybe run them one at a time for a day or two each or something? Thanks.
bpwwer Posted September 25, 2022 Posted September 25, 2022 @Michel Kohanimthe expert on what the messages mean. So when the error log is showing lots of those messages, what does it mean? @johnnytin your case, we're more concerned about the queue full messages so if you're not seeing those yet, keep starting node servers.
Michel Kohanim Posted September 25, 2022 Posted September 25, 2022 @bpwwer, they are just debugging statements. With kind regards, Michel
johnnyt Posted September 26, 2022 Author Posted September 26, 2022 15 hours ago, bpwwer said: @johnnytin your case, we're more concerned about the queue full messages so if you're not seeing those yet, keep starting node servers. Ok, I've restarted the other 2 node servers and will check periodically for full queue messages. This is a bit off topic but I notice that restarting a NS doesn't always update ISY for sometimes quite a while. If I understand correctly it's because the NS' retains the 'old' values and assumes ISY still has those values, and will only change them when it sees a change upstream in the value it has stored, e.g. from OpenWeatherMap. Is that right? If so, that's a bit of a problem when the whole point of a restart is to refresh things everywhere (in ISY too). I realize the benefit of the NS keeping score and not sending data ISY already has, especially at PG3/Polisy restart, but could the logic of doing an individual NS restart (not at PG3/Polisy level) be such that it refreshes the ISY data at startup? Or maybe a new button to call for a manual push of everything, need it or not and regardless of where in the polling delay things are?
bpwwer Posted September 26, 2022 Posted September 26, 2022 2 hours ago, johnnyt said: This is a bit off topic but I notice that restarting a NS doesn't always update ISY for sometimes quite a while. If I understand correctly it's because the NS' retains the 'old' values and assumes ISY still has those values, and will only change them when it sees a change upstream in the value it has stored, e.g. from OpenWeatherMap. Is that right? No, it's not right. When a node server starts it should send the current values to the ISY. However, this is up to the node server so all of them may not do this. Node server startup can also take a while depending on how many nodes it creates and how many values are associated with each node. Depending on the node server, the start up time could be anywhere between a couple of seconds to minutes.
Recommended Posts