Jump to content

Positive connection indication required


Recommended Posts

Posted

In a previous thread back in 2010 Michel and I conversed about the fact that the lack of indication within the admin console when there is a connectivity issue (eg caused by a firewall). From experience, I recognize the condition because the admin console shows the ramp times as 9 minutes - but that itself indicates poor user interface design. Here's what Michel said:

I do understand the usability issue. At the moment, there's basically no way around it unless we make the Admin Console keep polling ISY and thus cause other problems.

 

As it stands now, when you lose connection with ISY, based on the advertised max-age information provided by ISY during authentication, Admin Console "assumes" the subscription is alive until either:

1. the keep-alive/max age duration is passed without an event

2. there's socket connection exception

 

For the percentage of the time this is indeed an issue (not to mention that the system is usually left alone once configured), the workaround (keep polling ISY) is completely and utterly a very ugly solution.

I'd like this to be revisited, because I get bitten by it so many times, most recently today when trying to connect in to my home network remotely, and I'm supposedly technically proficient! I believe that the root cause is because not only does the admin console connect to the ISY server, but the server does a reverse connection to the admin console; this has firewall implications.

 

The issue to me is that at the bottom of the admin client is a green flag, supposedly showing everything is OK, when it's basically not. I understand the desire not to do ongoing polling of the ISY from the console but I think the status indications need to be clearer and sooner. This could be addressed as part of the startup handshaking. I'd recommend something like the following:


  •  
    [*:2szpe2sp]At startup, it should be "assumed" the subscription is not alive until positive confirmation (in the form of the reverse connection / handshake) is received. Only then should the flag in the lower left hand corner go green to indicate a positive 2-way connection is established
    [*:2szpe2sp]Device status information should not be filled in until the 2-way connection is verified. Currently default values are used resulting in a misleading display (eg the 9-minute ramp time I mentioned)

 

This has the following benefits:

 


  • [*:2szpe2sp]With a responsive network with everything working OK the user will see no difference to the way things currently appear
    [*:2szpe2sp]When network problems exist at startup (which is probably the majority of fault situations), the correct status will be immediately displayed
    [*:2szpe2sp]If network problems occur sometime later (ie not at startup), then they will get detected after some delay as they are at present using Michel's criteria. That's quite usual in networks

 

So I think the key difference is that I would have the admin console assume and indicate that things aren't working until proven otherwise.

 

- Andrew

Posted

Hi Andrew,

 

Thanks so very much for the feedback. It's quite true that ISY needs a connection to publish events to (back to your computer).

 

Subscriptions are indeed acknowledged by a subscription ID being returned from ISY. So, the handshaking is not the problem here. The problem is that subsequent to the handshake, ISY tries to publish events to the Admin Console and for some reason, after a few retries, it assumes the Admin Console is not there.

 

On the Admin Console side, there's a max-age heartbeat (2 minutes) that the Admin Console looks for. If the hearbeat does not arrive, then the Admin Console assumes that the subscription is not there.

 

Choices are:

1. Decrease the heartbeat ... performance implications

2. Poll ISY for active subscription

 

Above and beyond these two approaches I do not see any other way of effectively addressing this issue.

 

One question I have is: can you find a pattern in which this happens more often?

 

With kind regards,

Michel

Posted

Thanks Michel,

 

Subscriptions are indeed acknowledged by a subscription ID being returned from ISY

Is that acknowledgement performed on the forward connection established by the console, or on a reverse connection established by ISY? I presume it must be something like this (at the socket level):

CONSOLE                                  ISY

   --------- Connect Request -------->
   <-------- Accept -----------------
   .
   .
   .
   --------- Subscribe Request ------->
   <-------- OK, here's your ID -------
   .
   .
   <---------- Connect Request --------

Posted

OK - not what I was expecting!

 

Forgive me for getting into this, but I'm curious because of the line of work of myself and my company. My company creates and sells Java (and other) software that runs on corporate servers so in order for our clients to connect to the server they had to be AV/firewall friendly. We send real time updates to multiple subscribed clients so there is a similarity to ISY. We originally utilized direct socket IO, but at some point switched to XML-RPC over HTTP, requiring us (I believe) to always have a call outstanding to the server to enable the server to asynchronously send its updates. More recent products use browsers with AJAX. Our servers also talk to other systems using a myriad of different legacy protocols and middleware so we've quite a lot of comms experience.

 

With the ISY I can't get a clear picture of when the problem happens to me. I suspect most of the issues I encountered are AV/firewall related, though the blocking seems inconsistent, even on my own PC. But when it happens, it seems to happen immediately

 

I'm basically coming back to my original point of positively detecting earlier whether things are OK or not, since as I mentioned above, it seems to be immediately apparent. From my limited viewpoint it seems there could be a couple of candidate approaches for earlier detection:

 


  • [*:raw0amft]Deterministic. If the events are being blocked, does the blocking begin at or after a predictable number of events or even at a particular event type? Eg: the first event; the 10th event; the NAME-YOUR-EVENT-HERE event?
    [*:raw0amft]Time-based. If the deterministic method won't work, maybe your heartbeat could start off with a much shorter max-age interval for new subscriptions (a few seconds), then lengthen it back to two minutes once the link state has been determined.
    [*:raw0amft] Some combination of the above

 

Two minutes is a long time at startup to be showing the wrong state with misleading detail!

 

- Andrew

Posted

Hi Andrew,

 

I am surprised that this is not what you expected. Unlike servers, you cannot have applications (such as Admin Console) act as port servers with defined ports (you never know what other applications are running on the clients machine and which ports they use). As such - and if we were to use Admin Console as a port server - for every instantiation of Admin Console, we would have to find an unbound port (random). This does not play well with firewalls and the precise reason why we use in the Subscribe web service call. ISY does support publishing to a URL (instead of REUSE_SOCKET) which is used to publish events to portal clients.

 

ISY checks the TCP un-acked backlog and retries 3 times before giving up. To make ISY responsive, the number of un-acked packets is set to 2 otherwise, assume that you have 3 client subscribed to ISY, if one of them does not unsubscribe gracefully, ISY would be sitting there trying to publish events on the socket for much longer and thus all clients would pay the price for one not being online.

 

Perhaps we can use IsSubscribed service call a couple of times at start up ...

 

With kind regards,

Michel

Posted

Wrt ports / servers, I know all that. Your response was only not what I expected because of a previous comment - I guess we were talking at cross purposes.

 

So some more quick questions on the console:

 

1) Does it use the same socket for subscriptions as it does for other webservices (ie is it shared or dedicated).

2) Does it gets its initial status via a web service call, or via the events subscription in some sort of catchup manner? If the former, does the events subscription happen before or after the status request?

 

I guess I don't know at what stage the blocking is happening, and whether it impacts other service calls. Looking at your webservices docs, I do see the IsSubscribed service, and depending on the answers to above and how the blocking manifests I agree that it may help with an earlier detection.

 

Imo a better solution that you could introduce that would help detect and resolve problems is to include a sequence number in each event. That way if the client receives (eg) event 3 & event 5, but not event 4, it would know there's a problem. You would then introduce / modify your web services to utilize the sequence number:


  •  
    [*:232nhde5]Modify the Subscribe service response to have ISY send back the most recent event sequence number ("High water mark" - HWM) so it knows the current state
    [*:232nhde5]Do the same for tthe IsSubscribed response, so the client could then compare that value with its own records and take the appropriate action if it has missing entries.
    [*:232nhde5]Introduce new events to broadcast subscriptions / unsubscriptions. Not a bad thing in a multiuser system, but in this case it also forces an event to occur whenever a client subscribes so you can verify the channel operation

 

So the startup processing could then be something like:

#1 DO Perform the Subscribe service and obtain HWM
#2 DO Expect to receive subscription event (may not receive it if events are getting blocked)
#3 DO Perform the IsSubscribed service and compare HWM's - it should have been incremented by at least one because ISY will have sent out the subscription notification
#4 If the HWM has the correct value then mark the ISY link state as up

This is a pretty lightweight solution. You could go one step further by having ISY cache the last events. In this case the client could request a retransmission of the missing event(s) via a new service call (eg RequestEventRange), so effectively providing error recovery. It would also provide seamless fallback to polled operation when there is an ongoing issue. By having the ISY cache sufficient items, if the client detects there is a problem in #3/4 it can switch to periodically issuing a service call to RequestEventRange to return all events from its previous HWM to the new HWM. The bad news is this is polling, but the good news it's only used in a fallback situation, and the user is at least left with a working system without having to make configuration changes to their PC.

 

Hope there's something of use in there for you

 

- Andrew

Posted

Hi Andrew,

 

 

1) Does it use the same socket for subscriptions as it does for other webservices (ie is it shared or dedicated).

Absolutely not. Once a socket is used for subscription, it's never used again till the subscription expires

 

2) Does it gets its initial status via a web service call, or via the events subscription in some sort of catchup manner? If the former, does the events subscription happen before or after the status request?

Subscriptions are published. On initial subscribe call, all statuses are published to ISY

 

I guess I don't know at what stage the blocking is happening, and whether it impacts other service calls. Looking at your webservices docs, I do see the IsSubscribed service, and depending on the answers to above and how the blocking manifests I agree that it may help with an earlier detection.

I think so too. Personally, I think the problem is Java socket layer which might be busy and not acking the events being published

 

Imo a better solution that you could introduce that would help detect and resolve problems is to include a sequence number in each event.

Each event does indeed have a sequence number

 

That way if the client receives (eg) event 3 & event 5, but not event 4, it would know there's a problem. You would then introduce / modify your web services to utilize the sequence number:

Based on our design, that's basically an impossibility: if ISY cannot publish an event, it expires the whole subscription

 

So the startup processing could then be something like:

#1 DO Perform the Subscribe service and obtain HWM
#2 DO Expect to receive subscription event (may not receive it if events are getting blocked)
#3 DO Perform the IsSubscribed service and compare HWM's - it should have been incremented by at least one because ISY will have sent out the subscription notification
#4 If the HWM has the correct value then mark the ISY link state as up

We already do 1-3

 

This is a pretty lightweight solution. You could go one step further by having ISY cache the last events. In this case the client could request a retransmission of the missing event(s) via a new service call (eg RequestEventRange), so effectively providing error recovery.

Major synchronization issues

 

It would also provide seamless fallback to polled operation when there is an ongoing issue. By having the ISY cache sufficient items, if the client detects there is a problem in #3/4 it can switch to periodically issuing a service call to RequestEventRange to return all events from its previous HWM to the new HWM. The bad news is this is polling, but the good news it's only used in a fallback situation, and the user is at least left with a working system without having to make configuration changes to their PC.

See above. This only solves initial statuses not showing. Assuming we get initial status show up (most probably stale in case of cache), then we will be left with a system that does not provide status feedback as events happen. It's best to find the root cause and fix it. Or, as you mentioned, use IsSubscribed more than once during the startup routine to make sure that Admin Console is subscribed.

 

Hope there's something of use in there for you

Yes, absolutely and I would sincerely appreciate it if you could see if there are any patterns (for instance, the computer was too busy, or too much network traffic, etc.). Thanks again so very much for your feedback.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...