News
recommended reading
Top 5 Machine Learning and Self-Healing Techniques used by SRE
News Case Study
December 11, 2018 | COMMENTS

This post is a re-post from my original LinkedIn post.

Over the past few years I had the unique opportunity to see a start-up, TubeMogul, going through hyper-growth, an IPO, and an acquisition by a fortune 500, Adobe. In this journey, I was exposed to a lot of technical challenges, and I work on systems at an astonishing scale, i.e. over 350 billions real-time bidding request a day. It allowed me to build some strong personal opinions on the role of an SRE and how they can help transform an organization. I'm lucky enough to work with a talented team of SRE that keep pushing the limits of innovation while executing through chaos.

Human Cognitive Limit

As I flew back from the ML for DevOps (Houston) summit that Adobe sponsored, I took the time to reflect on some of the ways our SRE teams excel in their job and how they leverage machine learning and self-healing principle to scale their day-to-day operations.

I.T. Systems, with the broad adoption of public and private cloud, get more complex over time. The hyper-adoption of micro-services and the increase of loosely coupled distributed systems are an obvious factor, though you can see how IoT devices, edge computing, and al. can factor into the mix.

Point being, it is increasingly difficult for a single individual to understand the space in which a product evolve and live. One cannot assume knowing it all. Humans quickly reach their cognitive limit. So, how do SRE overcome this limit? Below is my take on the top 5 machine learning and self-healing techniques used by SRE to scale and operate increasingly complex environments.

Continue Reading
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
News
May 11, 2017 | COMMENTS

Over the past decade I had the privilege to build a massive scale infrastructure at a small start-up called TubeMogul. We went thru an IPO and an acquisition from a Fortune 500 company, Adobe. Hence, it was quite a privilege to present my team accomplishment at the OpenStack Summit 2017 in Boston. We built a fully automated infrastructure which enable our team to leverage a multi-cloud environment with cloud-bursting capabilities. Check out the presentation on slideshare/youtube and our interview on #TheCube.

Continue Reading
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
News
November 12, 2015 | COMMENTS

Today, I got the privilege to present my team work at USENIX LISA 15. TubeMogul grew from few servers to over two thousands servers and handling over one trillion http requests a month, processed in less than 50ms each. To keep up with the fast growth, the SRE team had to implement an efficient Continuous Delivery infrastructure that allowed to do over 10,000 puppet deployment and 8,500 application deployment in 2014. In this presentation, we will cover the nuts and bolts of the TubeMogul operations engineering team and how they overcome challenges.

Continue Reading
Puppet Camp Silicon Valley: How TubeMogul reached 10,000 Puppet deployment in one year
News
May 26, 2015 | COMMENTS

Amazing crowd gathering at Puppet Camp Silicon Valley where I was able to present again some of the work we did at TubeMogul to improve our Operations Engineering Continuous Delivery workflow with Git, Gerrit, and Jenkins, and of course Puppet.

Continue Reading
Puppet Camp Paris: Improving Operations Efficiency with Puppet
News
April 20, 2015 | COMMENTS

During Puppet Camp Paris, I got the privilege to present the Continuous Delivery Workflow of TubeMogul's Operations Engineering Team. In few years, we went from few servers to over two thousands nodes fully managed by Puppet. In our presentation, we went over the challenges we faced as well as the implementation of our workflow to improve our day to day operation while still moving fast.

With our operations continuous delivery workflow using Git, Gerrit, and Jenkins, we have been able to manage over 10,000 Puppet deployment in 2014.

Continue Reading