
bmercier
Employees-
Posts
219 -
Joined
-
Last visited
Everything posted by bmercier
-
No, there is plenty of disk space already
-
I know it's been 3 outages in just a few days, and this looks really bad on us. Let me share with you a few more details. The first outage was due to a stupid lack of disk space on the database server. Logs suddenly started to grow quickly until we ran out of space. Clearing the logs allowed us to get back online quickly. Second outage was more subtle and was due to a performance problem with the oAuth server. It went slower and slower until it reached a threshold at which point Alexa was making retries on the oAuth server, which quickly worsened the problem with deadlocks on oAuth records, which ultimately made the database server reach 100% cpu in an instant, which made everything terribly slow. A simple index added fixed the problem. This time again is a different problem. It appears to be an Amazon failure where connection to the database server was severed. I could not even SSH to the server, all communication was stopped. I experienced this type of problems a few times over the years, and due to redundancy it never had an impact on the uptime. But when it happens on the database server like this time, it's a different story. So this is really 3 different and unrelated problem which happened in a short period of time.
- 39 replies
-
- 11
-
-
-
May take a few minutes to have all ISYs to reconnect.
-
It's up now. ISYs are reconnecting.
-
Looking into it now
-
Thanks for the suggestion. We are already planning for a similar feature. It would actually test connectivity down to your ISY and send you a push notification. This way, it would detect if your ISY disconnects from portal. Not depending on portal though is a bit trickier as the database itself is required to know the list of current registered users. Will look into that, it's a good suggestion. Thanks, Benoit
-
If you are referring to Alexa skill or google home, please relink. It may have become unlinked. Portal services are all up.
-
ISY Portal has been down last night for about 1 hours and 30 minutes, starting 2022-11-30 03:50 UTC (22h50 Eastern time, 19h50 Pacific time). This was caused by a performance problem with the database server. The root cause was a performance problem on a specific query related to oAuth services and it was resolved by adding an index. The problem appeared suddenly because the queries were taking longer and longer and reached a threshold where Alexa services were resending requests, creating deadlocks. The service was restored around 2022-11-30 05:10 UTC. During the outage, access to Portal services were prevented, including Alexa and Google Home services. After service restauration, Alexa, Google home and other services worked again. During this outage, your Alexa skill link may have become "expired". If so, please relink the skill.
-
- 5
-
-
-
Portal is now back to normal. We will analyze the logs tomorrow, but as a preliminary conclusion, database server was running at 100% due to a missing index. Technically Echo and Google Home should now be working too. However, I could see in the logs that the issue is related to oAuth. So if the Alexa or Google Home can't communicate with any device, it is possible that the skill may have to be unlinked and relinked.
-
Currently being investigated
-
Try to delete the device in the Alexa app. Then, ask Alexa, discover my devices. The device as configured in Portal should reappear. Try to control it then. Benoit
-
Next step, find your device in the Alexa app, and try to use them from there.
-
What was the problem?
-
ISY Web access: Login to portal, besides your ISY, click Select Tools. ISY Web access is the first option. The goal is to test Portal to your ISY connectivity. Benoit
-
Can you control devices through ISY Web accesss?
-
There are no active problems with Portal. Alexa troubleshooting tips: Make sure your ISY is online on portal Try to control devices using ISY Web access In the Alexa app, make sure you can see your device If so, try to control it from the app
-
ISY Portal has been down last night for about 2 hours and 10 minutes, starting 2022-11-17 03:17 UTC (20h17 Eastern time, 17h17 Pacific time). The root cause is a failure of the database server, on which all services depends. The service was restored 2022-11-17 05:27 UTC, at which point ISYs were starting to reconnect to Portal. This process takes about 15 minutes. During the outage, access to Portal services were prevented, including Alexa and Google Home services. After service restauration, Alexa, Google home and other services worked again. No user action was required. Most of the portal infrastructure is redundant, but the database server currently is not. We will be looking at options to further enhance the infrastructure uptime.
- 32 replies
-
- 11
-
-
-
Routines should not be affected, unless the skill is disabled and re-enabled.
-
It's now resolved. Most ISYs have now reconnected. Benoit
-
The URL is correct. The only thing I can think of would be extensions. I tried Edge myself, and I don't get the error. You might try with InPrivate, just to see if the cache might be playing tricks. Benoit
-
my.isy.io uses TinyMCE, but this message should not appear. Could you confirm the URL the browser is seeing when you get that message? It's either a matter of URL, or tinyMCE API key. If it's not the URL, I'm thinking that a browser extension might be causing problems. You could try to disable extensions and see if one is causing the issue. Benoit