Burgeoning Netflix goes to the cloud Tuesday, 16 February 2016

News article written by Corbett Communications. The statements made or opinions expressed do not necessarily reflect the views of Engineers Australia.

Online video streaming service Netflix has now migrated every last drop of data to the Amazon cloud, a mammoth task it began in 2008 prompted by a major database corruption. The service has been moving huge portions of its operations to Amazon Web Services (AWS) since that time, which was finally completed in January.

So in age of fast-moving digitalisation, why did it take Netflix so long?

Netflix's vice president of cloud and platform engineering Yury Izrailevsky said it “was a lot of hard work and we had to make a number of difficult choices along the way.

“The majority of our systems, including all customer-facing services, had been migrated to the cloud prior to 2015,” he explained.

“Since then, we've been taking the time necessary to figure out a secure and durable cloud path for our billing infrastructure as well as all aspects of our customer and employee data management.”

While it would have been much easier to move to the cloud by 'forklifting' all of the systems unchanged out of the data centre and drop them into AWS, IIzrailevsky said Netflix would have ended up moving all the problems and limitations of the data centre along with it.

“Instead, we chose the cloud-native approach, rebuilding virtually all of our technology and fundamentally changing the way we operate the company,” he explained. “Architecturally, we migrated from a monolithic app to hundreds of micro-services, and denormalised our data model, using NoSQL databases.”

Budget approvals, centralised release coordination and multi-week hardware provisioning cycles made way to continuous delivery while engineering teams made independent decisions using self-service tools in a loosely coupled DevOps environment that helped accelerate innovation that included new systems which had to be built and new skills learned.

Izrailevsky said Netflix has continued to evolve rapidly, incorporating many new resource-hungry features and relying on ever-growing volumes of data.

“Supporting such rapid growth would have been extremely difficult out of our own data centres; we simply could not have racked the servers fast enough,” he explained.
While operating from the cloud is far superior to data centres, it is not without its faults.

“There were a number of outages in our data centres, and while we have hit some inevitable rough patches in the cloud, especially in the earlier days of cloud migration, we saw a steady increase in our overall availability, nearing our desired goal of four nines of service uptime,” Izrailevsky said.

Failures are unavoidable in any large scale distributed system (there were three late last year), including a cloud-based one, according to the Netflix engineer, adding that the cloud allows the ability to build highly reliable services out of fundamentally unreliable but redundant components.

“By incorporating the principles of redundancy and graceful degradation in our architecture, and being disciplined about regular production drills using Simian Army, it is possible to survive failures in the cloud infrastructure and within our own systems without impacting the member experience,” he revealed.

In early January, Netflix expanded its service to over 130 new countries and with the elasticity of the cloud he said that allows Netflix to add thousands of virtual servers and petabytes of storage within minutes, making the expansion possible. The service has 74.76 million subscribers globally and expects to hit the 80 million mark this quarter.