System Hændelser

Dette er realtime system hændelser, rapporteret direkte fra (mt) mediatemple's datacentre inden for den seneste uge.

fredag d. 30 juli 2010 - 07:26

mt_monitor: #1425 - Services Restored to vzd020

mt_monitor: #1425 - Services Restored to vzd020 http://mdtm.pl/b6dOSj

Kilde: Twitter / mt_monitor Læs mere..

fredag d. 30 juli 2010 - 00:54

mt_monitor: Services on Host Server vzd020 are Unavailable

mt_monitor: Services on Host Server vzd020 are Unavailable http://mdtm.pl/bMbh8N

Kilde: Twitter / mt_monitor Læs mere..

fredag d. 30 juli 2010 - 00:48

#1425 - Services Restored to vzd020

All services have been restored to vzd020 as of 3:45 PM PDT.

After a quick analysis, our engineers confirmed that this was a temporary issue that would be resolved by a reboot of the host machine. After the reboot, system checks were performed and it was determined all services are functioning normally.

Kilde: (mt) weblog » System incidents Læs mere..

fredag d. 30 juli 2010 - 00:19

Services on Host Server vzd020 are Unavailable

As of approximately 3:13 PM PDT, Host Server vzd020 has been experiencing some difficulties. This only affects services on (ve) Servers on physical host machine vzd020.
To see which host server your (ve) Server resides on, please see the Server Guide page in the AccountCenter.

(mt) Engineers are working as quickly as possible to restore all services to this host. Updates to this page will be made as soon as more information is available. We apologize for any inconvenience and we thank you for your patience.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 08:15

mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues

mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues http://mdtm.pl/d77KBA

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 07:58

#1420 - Incident Review

This post is a summary of Incident #1420, relating to a period of authentication issues with the (gs) Grid Service.

Earlier today, the AccountCenter became unavailable for approximately 15 minutes due to MySQL Replication errors. Soon after, we began receiving reports of failed email and FTP authentication from customers on various Clusters. After some investigation, it was determined that a portion of the account authentication servers, used by each (gs) Grid-Service Cluster, were out of sync. This is the process by which all new password changes are stored and synced across our multi-node, clustered (gs) Grid-Service platform. These servers are replicated database slaves, which are normally self-healing.

(mt) Engineers identified the source of this issue and made the appropriate corrections to restore functionality to these servers. 

Date/Time: The issue started at approximately 3:15 PM PDT on Tuesday, July 27, 2010 and was resolved by 6:30 PM PDT. Service impact was variable across the (gs) Grid-Service during this time.

Symptoms: Customers creating or modifying email addresses or updating FTP/SSH passwords may have experienced authentication errors.

Impact: All (gs) Grid-Service Clusters were affected.  Email was not lost during this time.

Root Cause and Takeaways: Although our investigation will be ongoing, we have identified a point where the binary logs that are required for replication were corrupted. Going forward, we are looking into system changes which would help prevent this issue from re-occurring. We will also be looking into increasing the efficiency of our replication repair utilities.  Performing this change will allow us the ability to repair replication services for all (gs) Grid-Service Clusters simultaneously.

This now concludes this System Incident. If you feel that you are still experiencing the symptoms outlined in this post, please open a support request from the (mt) AccountCenter.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 07:43

mt_monitor: #1420 - Incident Review

mt_monitor: #1420 - Incident Review http://mdtm.pl/9u6fKU

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 03:37

mt_monitor: #1420 - (gs) Grid-Service Replication Services Restored

mt_monitor: #1420 - (gs) Grid-Service Replication Services Restored http://mdtm.pl/9rnyWH

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 03:27

#1420 - (gs) Grid-Service Replication Services Restored

As of 6:27 PM PDT, all (gs) Grid-Service clusters are operating with replication services fully restored. A full incident review will be published later this evening once we’ve examined the root cause and outlined potential takeaways moving forward.

Once again, we appreciate your patience as we worked to resolve this matter.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 03:06

mt_monitor: #1420 - (gs) Grid-Service Cluster.03 Replication Services Restored

mt_monitor: #1420 - (gs) Grid-Service Cluster.03 Replication Services Restored http://mdtm.pl/a4cdVu

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 02:53

#1420 - (gs) Grid-Service Cluster.03 Replication Services Restored

As of 5:45 PM PDT, replication services for (gs) Grid-Service Cluster.03 and Cluster.04 have been restored. To recap, Cluster.01, 02, 03 and 04 should be operating normally. We will continue to repair the remaining clusters and update this status page accordingly.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 02:41

mt_monitor: #1420 - (gs) Grid-Service Cluster.02 Replication Services Restored

mt_monitor: #1420 - (gs) Grid-Service Cluster.02 Replication Services Restored http://mdtm.pl/9C79yV

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 02:26

mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues

mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues http://mdtm.pl/bwlMtA

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 02:26

mt_monitor: #1420 - (gs) Grid-Service Cluster.01 Replication Services Restored

mt_monitor: #1420 - (gs) Grid-Service Cluster.01 Replication Services Restored http://mdtm.pl/armqNf

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 02:24

#1420 - (gs) Grid-Service Cluster.02 Replication Services Restored

As of 5:20 PM PDT, replication services for (gs) Grid-Service Cluster.01 and Cluster.02 have been restored. Additional work must be done to correct replication on the remaining clusters. As noted before, we will continue updating this status page as replication services normalize for each (gs) Grid-Service cluster. Please note this is not affecting any (dv) Dedicated-Virtual or (ve) Servers at this time.

Thank you for your patience and understanding in this matter.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 01:55

#1420 - (gs) Grid-Service Cluster.01 Replication Services Restored

Shortly after our last update, we received word from our engineering team that replication services for Cluster.01 have been restored. They have now moved on to repairing the rest of the (gs) Grid-Services clusters. To reiterate some common symptoms associated with this incident, you may experience issues logging in with or creating new email/ftp/ssh users. You may also have issues when attempting to update email/ftp/ssh user passwords from within the AccountCenter. This is caused by the replication issue and will be rectified as soon as possible.

Once other (gs) Grid-Service clusters have been repaired, additional updates to this status page will be provided.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 01:38

#1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues

As of 4:30 PM PDT, (mt) Engineers are still actively investigating this issue. The repair sequence to our replication servers is already underway; Cluster.01 should be normalizing shortly. As each (gs) Grid-Service cluster’s replication service returns to normal, we will update this status page with further information.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 01:32

mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues

mt_monitor: #1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues http://mdtm.pl/bGKuYA

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 01:32

mt_monitor: #1418 - Post-Mortem

mt_monitor: #1418 - Post-Mortem http://mdtm.pl/cW316Y

Kilde: Twitter / mt_monitor Læs mere..

onsdag d. 28 juli 2010 - 01:25

#1418 - Incident Review

This post is a summary of Incident #1418, relating to a period of excessive load and service interruption which affected Storage Segment 03 on Cluster.03 of the (gs) Grid-Service.

Details:

Date/Time: The issue started at approximately 12:12 PM on Tuesday, July 27 and was resolved by 1:15 PM, Pacific Time. Service impact was contained to a window of a little more than an hour.

Symptoms: Access to all services was interrupted. This included:

HTTP

FTP/SFTP/SSH

Email and webmail

During the period of website unavailability, affected sites would have produced a “403 Forbidden” or a “500 Internal Server Error” message.

Impact: All customers on (gs) Grid-Service Cluster.03, Storage Segment 03 were affected by this system incident. The rest of the (gs) Grid-Service, all (dv) Dedicated-Virtual Servers, and all (ve) Servers remained unaffected.

Root  Cause: Our engineers have determined that the root cause of the high load was related to a very high file lock count on Storage Segment 03. The immediate fix was a reboot of the storage segment, which led to the service interruption noted above.  Once the storage segment stabilized, the customers who had higher than normal file locks were notified directly and some of their services were temporarily taken offline to protect other customers on the same storage segment.

Takeaways: We are actively monitoring the entire cluster for high load and for users with abnormally high file lock counts. If we find any unusual usage, we will notify customers individually and work diligently to prevent any further service interruption.

This now concludes this System Incident. If you feel that you are still experiencing the symptoms outlined in this post, please open a support request from the (mt) AccountCenter.

Kilde: (mt) weblog » System incidents Læs mere..

onsdag d. 28 juli 2010 - 00:32

#1420 - (gs) Grid-Service Email, FTP, SSH Authentication Issues

At 3:15 PM PDT, (mt) Media Temple noticed an issue with our replication services for the (gs) Grid-Service. This is the process by which all new password changes are stored and synced across our multi-node, clustered (gs) Grid-Service platform.

If you have added or modified an email/ftp/ssh user in the last hour, you will most likely have trouble logging in with that new password. We are aware of this issue and are working to restore this functionality right now.

(mt) Engineers have already begun to correct the replication issue however it may take up to 2 hours until this is corrected. We will let you know as soon as this incident is completely resolved.

Kilde: (mt) weblog » System incidents Læs mere..

tirsdag d. 27 juli 2010 - 23:27

mt_monitor: #1419 - AccountCenter Availability

mt_monitor: #1419 - AccountCenter Availability http://mdtm.pl/9x7HCV

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 23:27

mt_monitor: #1419 - AccountCenter Services Restored

mt_monitor: #1419 - AccountCenter Services Restored http://mdtm.pl/ao6IR7

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 22:24

mt_monitor: #1418 - Status Update

mt_monitor: #1418 - Status Update http://mdtm.pl/a5GQIV

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 22:24

mt_monitor: #1418 - Status Update

mt_monitor: #1418 - Status Update http://mdtm.pl/96R0ON

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 21:52

mt_monitor: #1418 - Status Update

mt_monitor: #1418 - Status Update http://mdtm.pl/ajJsEZ

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 21:21

mt_monitor: #1418 - High Load on Cluster.03

mt_monitor: #1418 - High Load on Cluster.03 http://mdtm.pl/b6FSLU

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 19:16

mt_monitor: #1411 - Maintenance Completed

mt_monitor: #1411 - Maintenance Completed http://mdtm.pl/c7KtNe

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 14:18

mt_monitor: #1417 Services have been restored

mt_monitor: #1417 Services have been restored http://mdtm.pl/9b8jwr

Kilde: Twitter / mt_monitor Læs mere..

tirsdag d. 27 juli 2010 - 13:14

mt_monitor: #1417 - Storage Segment connectivity issues

mt_monitor: #1417 - Storage Segment connectivity issues http://mdtm.pl/9Urn7c

Kilde: Twitter / mt_monitor Læs mere..