Jump to content

IoP reporting "Queue(s) Full" running only 3 node servers


johnnyt

Recommended Posts

Posted

I've been testing my Polisy's IoP in preparation for migrating from 994i when the new Zwave/Matter dongle arrives along with the zwave migration tool in about 6-8 weeks.  IoP is only running 3 node servers: OpenWeatherMap, Weatherflow and ST-Inventory. It has no PLM and no Zwave dongle connected to it, and the are no programs enabled/running so the only thing happening is that the Node Servers are doing their device updates.

Yesterday I noticed the error log had been getting polluted with "UDQ: Queue(s) Full, message ignored" errors, which happened constantly between about 4 AM on Sept 21 until 11:30 PM yesterday (24th), when Polisy was restarted. 

I've been seeing 'full queues' on my 994i for past couple of years occasionally when it has been overloaded, mostly during restarts, but never continuously like as been occurring for past couple of days on IoP.

I've attached IoP log, and error log. I can provide the PG3 log file for yesterday if it helps but be warned that the 62MB file also has a ton of messages related to the 5 NS used by my 994i and I don't see a way to easily separate them out from the IoP related entries. My usual method of importing into Excel and filtering on a column doesn't work well with these logs. (If you know how I could easily do that with Windows, please let me know.)

Why would IoP event queue(s) have gotten overloaded and remain that way for 3 days with nothing but device updates from 3 NS? I can certainly report the problem to UDI but figured I would start here first to rule in/rule out a PG3 issue before reporting this as an IoP issue.

Any info would be appreciated.

 

IoPLogs.zip

Posted

I'm not quite seeing the resemblance because I don't have any zwave devices or dongle (or Insteon PLM) so how could there be any events related to either? Plus I'm getting full queue messages. The error log for the other problem has none of those messages.

That said I do have a lot of -17000 messages too (like the other one) so I will try was you suggested there, namely: 

Quote

One test would be to disable (stop) all of the node servers and then reboot.  PG3 will remember the last state of the node servers so it should start with all node servers disconnected (not running).  Then slowly start them manually one by one giving it a minute or two to stabilize between each.

and I'll keep tracking the other thread to see how that one evolves.

Posted

I believe -170001 messages are related to the queue being full.  That was the connection I made between all these cases. 

Issues with z-wave devices not updating, Insteon devices not updating, PG3 node server devices not updating, programs not triggering could all be caused by the same root cause, something hung and the new tasks not being pull from the queue and thus the queue filling up.

Posted

So as suggested I stopped the 3 NS connecting to IoP, restarted IoP, waited a bit and started OpenWeatherMap NS (only).

Here are the error log entries:

Time                      	User     	Code	Message
Mon 1900/01/01 12:00:00 AM	System	-170001	<s:Envelope><s:Body><u:GetSystemTime xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemTime></s:Body></s:Envelope>	
Sun 2022/09/25 03:41:07 PM	0	-170001	<s:Envelope><s:Body><u:GetSystemOptions xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemOptions></s:Body></s:Envelope>
Sun 2022/09/25 03:41:08 PM	0	-170001	<s:Envelope><s:Body><u:GetNetworkConfig xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetNetworkConfig></s:Body></s:Envelope>
Sun 2022/09/25 03:41:20 PM	0	-170001	<s:Envelope><s:Body><u:Reboot xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:Reboot></s:Body></s:Envelope>
Sun 2022/09/25 03:41:23 PM	0	-5	Start
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/TMP
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/LOG
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/WEB
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CODE
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/D2D
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/MAIL
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/USER
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/USER/WEB
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/NET
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/SEP
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/OADR
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/BILLING
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/DEF
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/GLOBAL
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/GLOBAL/i1
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/GLOBAL/i1/nls
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/GLOBAL/i1/editor
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/GLOBAL/i1/nodedef
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/f1
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/f1/i1
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/f1/i1/nls
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/f1/i1/editor
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/f1/i1/nodedef
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/DEF/f1/i1/emap
Sun 2022/09/25 03:41:23 PM	0	-110026	./FILES/CONF/DEF/f10
Sun 2022/09/25 03:41:45 PM	0	-110022	./FILES/CONF/INSTENG.OPT
Sun 2022/09/25 03:41:45 PM	0	-110012	./FILES/CONF/INSTENG.OPT
Sun 2022/09/25 03:52:37 PM	0	-170001	<s:Envelope><s:Body><u:GetISYConfig xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetISYConfig></s:Body></s:Envelope>
Sun 2022/09/25 03:52:41 PM	0	-170001	<s:Envelope><s:Body><u:Authenticate xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><name>jean</name><id>11111</id></u:Authenticate></s:Body></s:Envelope>
Sun 2022/09/25 03:52:41 PM	0	-170001	<s:Envelope><s:Body><u:GetStartupTime xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetStartupTime></s:Body></s:Envelope>
Sun 2022/09/25 03:52:42 PM	0	-170001	<s:Envelope><s:Body><u:GetSysConf xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><name>/CONF/INTEGER.VAR</name></u:GetSysConf></s:Body></s:Envelope>
Sun 2022/09/25 03:52:42 PM	0	-170001	<s:Envelope><s:Body><u:GetSysConf xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><name>/CONF/STATE.VAR</name></u:GetSysConf></s:Body></s:Envelope>
Sun 2022/09/25 03:52:42 PM	0	-170001	<s:Envelope><s:Body><u:GetVariables xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><type>1</type></u:GetVariables></s:Body></s:Envelope>
Sun 2022/09/25 03:52:42 PM	0	-170001	<s:Envelope><s:Body><u:GetVariables xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><type>2</type></u:GetVariables></s:Body></s:Envelope>
Sun 2022/09/25 03:52:44 PM	0	-170001	<s:Envelope><s:Body><u:GetNodesConfig xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetNodesConfig></s:Body></s:Envelope>
Sun 2022/09/25 03:52:47 PM	0	-170001	<s:Envelope><s:Body><u:GetSystemTime xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemTime></s:Body></s:Envelope>
Sun 2022/09/25 03:52:47 PM	0	-170001	<s:Envelope><s:Body><u:SetDebugLevel xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><option>1</option></u:SetDebugLevel></s:Body></s:Envelope>
Sun 2022/09/25 03:52:47 PM	0	-170001	<s:Envelope><s:Body><u:GetSystemOptions xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemOptions></s:Body></s:Envelope>
Sun 2022/09/25 03:52:48 PM	0	-170001	<s:Envelope><s:Body><u:Subscribe xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><reportURL>REUSE_SOCKET</reportURL><duration>infinite</duration><send>F</send></u:Subscribe></s:Body></s:Envelope>
Sun 2022/09/25 03:52:48 PM	0	-170001	<s:Envelope><s:Body><u:IsSubscribed xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><SID>uuid:28</SID></u:IsSubscribed></s:Body></s:Envelope>
Sun 2022/09/25 03:52:49 PM	0	-170001	<s:Envelope><s:Body><u:RefreshDeviceStatus xmlns:u="urn:udi-com:service:X_Polisy_Service:1"><sid>uuid:28</sid></u:RefreshDeviceStatus></s:Body></s:Envelope>
Sun 2022/09/25 03:52:49 PM	0	-170001	<s:Envelope><s:Body><u:GetSystemStatus xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetSystemStatus></s:Body></s:Envelope>
Sun 2022/09/25 03:52:52 PM	0	-170001	<s:Envelope><s:Body><u:GetDisclaimerStatus xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetDisclaimerStatus></s:Body></s:Envelope>
Sun 2022/09/25 03:53:31 PM	0	-170001	<s:Envelope><s:Body><u:GetErrorLog xmlns:u="urn:udi-com:service:X_Polisy_Service:1"></u:GetErrorLog></s:Body></s:Envelope>

 

Right off the bat a bunch of -170001 errors, but no full queue messages

Also, I should mention that the last time I did an 'upgrade package' (using IoP AC button) was about 10 days ago. Others seem to be mentioning very recent updates as maybe being part of the problem.

Should I just go ahead and start the other NS' or is there something I should test (or wait to see) before I do that?

 

Posted
1 hour ago, Michel Kohanim said:

@johnnyt, -170001 is not error. It's information (debug).

With kind regards,

Michel

So @bpwwer, does this mean the -170001 messages would not be related to queue(s) being full, and the other post is a totally separate thing going on? If so, how can we narrow down the root cause of the issues I ran into? 

Also, let me know if I should go ahead and start the other 2 node servers that are still stopped. Or maybe run them one at a time for a day or two each or something?

Thanks.

Posted

@Michel Kohanimthe expert on what the messages mean.  So when the error log is showing lots of those messages, what does it mean?

@johnnytin your case, we're more concerned about the queue full messages so if you're not seeing those yet, keep starting node servers.

Posted
15 hours ago, bpwwer said:

@johnnytin your case, we're more concerned about the queue full messages so if you're not seeing those yet, keep starting node servers.

Ok, I've restarted the other 2 node servers and will check periodically for full queue messages.

This is a bit off topic but I notice that restarting a NS doesn't always update ISY for sometimes quite a while. If I understand correctly it's because the NS' retains the 'old' values and assumes ISY still has those values, and will only change them when it sees a change upstream in the value it has stored, e.g. from OpenWeatherMap. Is that right? If so, that's a bit of a problem when the whole point of a restart is to refresh things everywhere (in ISY too). I realize the benefit of the NS keeping score and not sending data ISY already has, especially at PG3/Polisy restart, but could the logic of doing an individual NS restart (not at PG3/Polisy level) be such that it refreshes the ISY data at startup? Or maybe a new button to call for a manual push of everything, need it or not and regardless of where in the polling delay things are?

 

Posted
2 hours ago, johnnyt said:

This is a bit off topic but I notice that restarting a NS doesn't always update ISY for sometimes quite a while. If I understand correctly it's because the NS' retains the 'old' values and assumes ISY still has those values, and will only change them when it sees a change upstream in the value it has stored, e.g. from OpenWeatherMap. Is that right? 

No, it's not right.  When a node server starts it should send the current values to the ISY.  However, this is up to the node server so all of them may not do this.

Node server startup can also take a while depending on how many nodes it creates and how many values are associated with each node.  Depending on the node server, the start up time could be anywhere between a couple of seconds to minutes.

Guest
This topic is now closed to further replies.

×
×
  • Create New...