The International Earth Rotation And Reference Systems Service (IERS) announced that a positive leap second will be introduced on the last day of June 2015 (Official Bulletin C 49) making the day with 86,401 seconds.

In 2012, a similar event created major outages on most of the internet with only few avoiding problems. See this Forbes post from July 2012: +1: Google Aces ‘Leap Second’ While Reddit, LinkedIn And More Went Down Saturday.

Why a Leap Second?

Before getting in the details of the 2012 Bug, you should take the time to read Phil Plait’s post “Wait just a second! No really, wait JUST A SECOND”. In short:

The reason [the leap second] is done is because the atomic clock standard we use has a very slightly different rate than the rotation-of-the-Earth based Coordinated Universal Time system. To be clear: it’s not that the Earth is slowing down so much we have to add a second every couple of years! It’s that they run at different rates, so we have to compensate by throwing in the odd leap second now and again.

There has been discussion in 2011 at a meeting of the American Astronautical Society to decouple the civil timekeeping from earth rotation, see the paper The Colloquium On Decoupling Civil Timekeeping From Earth Rotation. Timekeeping is such a topic, we even have one for Mars!

The 2012 Leap Second Bug in Details

July 1st 2012 was far to be the best time for internet companies. While many Amazon AWS customers got hit by a major outage in the US East regions due to a power failure, most of the internet got also hit by a Leap Second Bug.

The 2012 Leap Second Bug refers to computer glitches and outages resulting from the leap second that added an extra second to June 30th, 2012 in order to keep atomic clocks in line with the planet Earth. An extra second is periodically added to the Coordinated Universal Time (UTC) in order to compensate for Earth's inconstant speed of rotation.

In 2012, Mozilla identified an issue with all their Java services indicated by the Bug 769972. Also, a bug impacted MySQL with High CPU load:

The issue has been seen on many systems, including on SUSE Linux Enterprise and Red Hat Enterprise Linux. Ultimately, the problem was due to a Leap Second bug in the Linux Kernel:

This issue stemmed from the timekeeping subsystem not notifying the hrtimer subsystem that the leapsecond occurred, causing CLOCK_REALTIME hritmers to be fired one second early, and sub-second CLOCK_REALTIME hrtimer timeouts to fire immediately (causing the load spikes).

Google genuinely patched their NTP servers to not set the “Leap Indicator” but instead gradually adding milliseconds to their systems without impacting clients.

More Archive from 2012:

Comments