Det er langt fra alle hændelser der omfatter Rittencom media's systemer.
- Hændelser omfattende (gs), Cluster og Grid Service, omfatter ikke vores systemer medmindre andet er angivet.
Derudover er det også kun hændelser omhandlende datacenteret i El Segundo, CA
der omfatter vores systemer.
Hændelser omhandlende networking eller router issues kan derudover også omhandle vores systemer, disse er ofte router og forbindelses fejl i netværkskommunikationer over atlanten, og er derfor langt ude for vores handlingsmæssige rækkevidte, og er ofte hændelser som berører større dele af internettet men tilgengæld også hurtigt bliver fikset.
Vi gør opmærksom på at al henvendelse angående hjælp og support skal rettes til
Rittencom media.
System Hændelser
Dette er realtime system hændelser, rapporteret direkte fra (mt) mediatemple's datacentre inden for den seneste uge.
fredag d. 30 juli 2010 - 00:54
mt_monitor: Services on Host Server vzd020 are Unavailable http://mdtm.pl/bMbh8N
Kilde: Twitter / mt_monitor
Læs mere..
fredag d. 30 juli 2010 - 00:48
All services have been restored to vzd020 as of 3:45 PM PDT.
After a quick analysis, our engineers confirmed that this was a temporary issue that would be resolved by a reboot of the host machine. After the reboot, system checks were performed and it was determined all services are functioning normally.
Kilde: (mt) weblog » System incidents
Læs mere..
fredag d. 30 juli 2010 - 00:19
As of approximately 3:13 PM PDT, Host Server vzd020 has been experiencing some difficulties. This only affects services on (ve) Servers on physical host machine vzd020.
To see which host server your (ve) Server resides on, please see the Server Guide page in the AccountCenter.
(mt) Engineers are working as quickly as possible to restore all services to this host. Updates to this page will be made as soon as more information is available. We apologize for any inconvenience and we thank you for your patience.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 08:15
mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues http://mdtm.pl/d77KBA
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 07:58
This post is a summary of Incident #1420, relating to a period of authentication issues with the (gs) Grid Service.
Earlier today, the AccountCenter became unavailable for approximately 15 minutes due to MySQL Replication errors. Soon after, we began receiving reports of failed email and FTP authentication from customers on various Clusters. After some investigation, it was determined that a portion of the account authentication servers, used by each (gs) Grid-Service Cluster, were out of sync. This is the process by which all new password changes are stored and synced across our multi-node, clustered (gs) Grid-Service platform. These servers are replicated database slaves, which are normally self-healing.
(mt) Engineers identified the source of this issue and made the appropriate corrections to restore functionality to these servers.
Date/Time: The issue started at approximately 3:15 PM PDT on Tuesday, July 27, 2010 and was resolved by 6:30 PM PDT. Service impact was variable across the (gs) Grid-Service during this time.
Symptoms: Customers creating or modifying email addresses or updating FTP/SSH passwords may have experienced authentication errors.
Impact: All (gs) Grid-Service Clusters were affected. Email was not lost during this time.
Root Cause and Takeaways: Although our investigation will be ongoing, we have identified a point where the binary logs that are required for replication were corrupted. Going forward, we are looking into system changes which would help prevent this issue from re-occurring. We will also be looking into increasing the efficiency of our replication repair utilities. Performing this change will allow us the ability to repair replication services for all (gs) Grid-Service Clusters simultaneously.
This now concludes this System Incident. If you feel that you are still experiencing the symptoms outlined in this post, please open a support request from the (mt) AccountCenter.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 07:43
mt_monitor: #1420 - Incident Review http://mdtm.pl/9u6fKU
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 03:37
mt_monitor: #1420 - (gs) Grid-Service Replication Services Restored http://mdtm.pl/9rnyWH
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 03:27
As of 6:27 PM PDT, all (gs) Grid-Service clusters are operating with replication services fully restored. A full incident review will be published later this evening once we’ve examined the root cause and outlined potential takeaways moving forward.
Once again, we appreciate your patience as we worked to resolve this matter.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 03:06
mt_monitor: #1420 - (gs) Grid-Service Cluster.03 Replication Services Restored http://mdtm.pl/a4cdVu
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 02:53
As of 5:45 PM PDT, replication services for (gs) Grid-Service Cluster.03 and Cluster.04 have been restored. To recap, Cluster.01, 02, 03 and 04 should be operating normally. We will continue to repair the remaining clusters and update this status page accordingly.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 02:41
mt_monitor: #1420 - (gs) Grid-Service Cluster.02 Replication Services Restored http://mdtm.pl/9C79yV
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 02:26
mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues http://mdtm.pl/bwlMtA
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 02:26
mt_monitor: #1420 - (gs) Grid-Service Cluster.01 Replication Services Restored http://mdtm.pl/armqNf
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 02:24
As of 5:20 PM PDT, replication services for (gs) Grid-Service Cluster.01 and Cluster.02 have been restored. Additional work must be done to correct replication on the remaining clusters. As noted before, we will continue updating this status page as replication services normalize for each (gs) Grid-Service cluster. Please note this is not affecting any (dv) Dedicated-Virtual or (ve) Servers at this time.
Thank you for your patience and understanding in this matter.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 01:55
Shortly after our last update, we received word from our engineering team that replication services for Cluster.01 have been restored. They have now moved on to repairing the rest of the (gs) Grid-Services clusters. To reiterate some common symptoms associated with this incident, you may experience issues logging in with or creating new email/ftp/ssh users. You may also have issues when attempting to update email/ftp/ssh user passwords from within the AccountCenter. This is caused by the replication issue and will be rectified as soon as possible.
Once other (gs) Grid-Service clusters have been repaired, additional updates to this status page will be provided.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 01:38
As of 4:30 PM PDT, (mt) Engineers are still actively investigating this issue. The repair sequence to our replication servers is already underway; Cluster.01 should be normalizing shortly. As each (gs) Grid-Service cluster’s replication service returns to normal, we will update this status page with further information.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 01:32
mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues http://mdtm.pl/bGKuYA
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 01:32
mt_monitor: #1418 - Post-Mortem http://mdtm.pl/cW316Y
Kilde: Twitter / mt_monitor
Læs mere..
onsdag d. 28 juli 2010 - 01:25
This post is a summary of Incident #1418, relating to a period of excessive load and service interruption which affected Storage Segment 03 on Cluster.03 of the (gs) Grid-Service.
Details:
Date/Time: The issue started at approximately 12:12 PM on Tuesday, July 27 and was resolved by 1:15 PM, Pacific Time. Service impact was contained to a window of a little more than an hour.
Symptoms: Access to all services was interrupted. This included:
HTTP
FTP/SFTP/SSH
Email and webmail
During the period of website unavailability, affected sites would have produced a “403 Forbidden” or a “500 Internal Server Error” message.
Impact: All customers on (gs) Grid-Service Cluster.03, Storage Segment 03 were affected by this system incident. The rest of the (gs) Grid-Service, all (dv) Dedicated-Virtual Servers, and all (ve) Servers remained unaffected.
Root Cause: Our engineers have determined that the root cause of the high load was related to a very high file lock count on Storage Segment 03. The immediate fix was a reboot of the storage segment, which led to the service interruption noted above. Once the storage segment stabilized, the customers who had higher than normal file locks were notified directly and some of their services were temporarily taken offline to protect other customers on the same storage segment.
Takeaways: We are actively monitoring the entire cluster for high load and for users with abnormally high file lock counts. If we find any unusual usage, we will notify customers individually and work diligently to prevent any further service interruption.
This now concludes this System Incident. If you feel that you are still experiencing the symptoms outlined in this post, please open a support request from the (mt) AccountCenter.
Kilde: (mt) weblog » System incidents
Læs mere..
onsdag d. 28 juli 2010 - 00:32
At 3:15 PM PDT, (mt) Media Temple noticed an issue with our replication services for the (gs) Grid-Service. This is the process by which all new password changes are stored and synced across our multi-node, clustered (gs) Grid-Service platform.
If you have added or modified an email/ftp/ssh user in the last hour, you will most likely have trouble logging in with that new password. We are aware of this issue and are working to restore this functionality right now.
(mt) Engineers have already begun to correct the replication issue however it may take up to 2 hours until this is corrected. We will let you know as soon as this incident is completely resolved.
Kilde: (mt) weblog » System incidents
Læs mere..
tirsdag d. 27 juli 2010 - 23:27
mt_monitor: #1419 - AccountCenter Availability http://mdtm.pl/9x7HCV
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 23:27
mt_monitor: #1419 - AccountCenter Services Restored http://mdtm.pl/ao6IR7
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 22:24
mt_monitor: #1418 - Status Update http://mdtm.pl/a5GQIV
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 22:24
mt_monitor: #1418 - Status Update http://mdtm.pl/96R0ON
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 21:52
mt_monitor: #1418 - Status Update http://mdtm.pl/ajJsEZ
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 21:21
mt_monitor: #1418 - High Load on Cluster.03 http://mdtm.pl/b6FSLU
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 19:16
mt_monitor: #1411 - Maintenance Completed http://mdtm.pl/c7KtNe
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 14:18
mt_monitor: #1417 Services have been restored http://mdtm.pl/9b8jwr
Kilde: Twitter / mt_monitor
Læs mere..
tirsdag d. 27 juli 2010 - 13:14
mt_monitor: #1417 - Storage Segment connectivity issues http://mdtm.pl/9Urn7c
Kilde: Twitter / mt_monitor
Læs mere..