Google Comes to MEST Africa!

Guest Contributor | Tuesday, May 8th, 2018

This post was written by Heather Mavunga, a Zimbabwean entrepreneur in training in the MEST class of 2018.

It has been an exciting two weeks filled with industry speakers, strategy advisors and C Level Executives coming to MEST to teach workshops on Service Level Objectives! Recently, EITs participated in classes delivered by Site Reliability Engineers from Google, including Ruth King, MEST fellow Emmanuel Klu, and Anton Tolchanov.

System Design Considerations

Emmanuel Klu is a Site Reliability Engineer from Google and  is currently with MEST as a Technology Teaching Fellow. He invited his colleagues Ruth and Anton to share their stories.  

Emmanuel walked us through  considerations you need to make when designing a scalable system. The highlight of his presentation was the realization that there is a tradeoff that needs to occur when designing for user experience and designing for reliability. This means that as a designer your goal is to anticipate your users’ needs and design a system that is easy to navigate and pleasing to the eye.

In systems design the objective is to build a robust system that can withstand load and deliver on bandwidth. Designing a skeleton design on paper followed with a flow chart translation guarantees that every member of the team is on the same page.

Introduction to service level objectives

Service Level Agreements are agreements that a company  enters with its users, while service level objectives branch into service level indicators. These are targets that you set for your system to achieve according to the way you’ve set up the  servers that store your information, the content delivery networks that deliver content and what you have selected as a load balancer to manage the weight of traffic on your platform.

Anton focused on an Introduction to Service Level Objectives. This is  a very crucial segment of the development process. It is about designing your system in a way that is reliable. It is an objective process where you consider what would make your system slow or crash and then design it in such a way that these errors are minimised and measured. By tracking these errors you can you estimate an “error budget.” This budget comes especially in handy when incidents happen.

As you scale your platform, a record of errors can help you design your architecture correctly so you can keep estimating how reliable your system can be. It became clear during Anton’s workshop that the most common examples of Service Level Objectives that should be measured at all costs are error rate and latency.

Error rate refers to  actions that should be measured, such as response time and load time. Latency is simply delay: looking at how long it takes information to travel from Point A to Point B. Anton’s talk had us thinking about the kinds of products that we wanted to build and how that could affect our choice of providers. Between text, image, video, artificial intelligence, and payments, the kind of information we would be sharing became an important concern.

Reliable launches at scale

Ruth King focused on “Reliable Launches at Scale” and “Choosing the Right Cloud Platform for Your Business.” The former translates to the question of how you launch new features that seamlessly integrate into the existing user flow without disrupting the experience. The launch process needs to be adaptable and reliable.  It also needs to be done in a way that is scalable. That means managing a “rapid pace of change without compromising stability of the site.” In other words, hope is not a strategy.

Ruth’s talk made it clear that small changes could cause huge errors. After going through a launch list of common questions to ask during the process, this highly volatile engineering sector could be managed by simply applying principles of operations to engineering.

Some of the common questions she touched upon emphasised the need to be robust. A well crafted launch system needs to be robust enough to identify obvious errors. It also has to be thorough. All the details that every member of the development team notice have to be consistently addressed. Concerns can range anywhere from different latency requirements for different users in different countries to a roll out plan. A roll out plan identifies actions to take to launch a new service such as gradual and staged rollouts. An example of this would be a plan that takes into account how many users and growth in traffic you expect during and after the launch and optimises accordingly.

 

The History of the Old Reader

One of the culminating moments of the week was Anton’s  talk on ‘The Old Reader’, an RSS tool that he developed with two friends when Google Reader was shut down in 2013. The Old Reader is an RSS aggregator first, a web publisher of dynamic content second, and a social site where users can like and search for other users who share their interests last. The best thing about The Old Reader is the RSS technology. Simplified this means Rich Site Summary or Really Simple Syndication. It is ad free. The RSS reader simply takes the headline of the article you are interested in and delivers news at a time you set.

 

During Anton’s presentation, we went through the system design of their project that began as a hobby, which they later sold to an American startup for a healthy offer. We examined the specifications of the hardware and costs it took to maintain on a monthly basis. As the number of users increased, they scaled horizontally, adding more machines so as to up ram.

In the end, after they reached a million monthly users and a steady increase of data, they realized they could no longer support the project as a hobby. This talk was important because it connected the dots of the conversations happening throughout the entire week.

All in all, Google week at MEST Africa was a tremendous success. EITs greatly benefited from the expertise of practicing industry professionals who have experience working on global challenges at one of the world’s largest tech companies ever.