table of contents

We live in tumultuous, but interesting times. The rich have gotten richer, the poor—poorer and innovators have devised innovative ways to work through the disruption that has been brought about by the coronavirus-induced pandemic. The pandemic has also brought about a battery of changes to our lifestyle, beginning with many of us learning how to cook complex dishes from scratch, others finding newer hobbies, or even spending time to learn something new about ourselves. During the pandemic, many of us have also finally found the time to curl up on our couches, turn into couch potatoes and binge-watch Netflix originals till we run out of bandwidth. 

Sudden surges

Although most services such as Netflix, Amazon Prime video and many other video and audio streaming service providers have highly scalable systems that can withstand sudden surges and spikes in usage, there are chances that these services might experience outages which can result in user frustration and in some extreme cases of long-term outages—abandonment of the platform too. Complex, large-scale distributed systems such as Netflix and Amazon Prime video and many others that potentially have millions of users must be tested effectively and extensively keeping in mind surges and spikes. 

However, unusually heavy spikes such as those caused by the pandemic have been unprecedented and have possibly not been in any company’s testing team kitty.

Netflix team

Continuous integration delivery and production

The problems of CI/CD and resolving the problems of constantly engaged systems

Companies like Netflix have constant updates to their system, which are continuously tested and delivered to their live platforms. For this, Netflix testing teams create hundreds of thousands of tester accounts every day, each being used in thousands of test scenarios to avoid any shortfalls.

This has caused the testing of Netflix to move from a manual testing regimen that would work on a test system before making it live to a large, distributed automated testing of Netflix client and server applications running at scale in production. To facilitate this, testing at Netflix has gone from a low-volume manual mode to a continuous, fully automated, voluminous mode where nothing is left to chance.

An imaginary scenario with real implications

Imagine this—you, and millions of others are at nail-biting, suspenseful climax in the story and suddenly—boom! Netflix is now offline. This would send alarm bells ringing at Netflix HQ and testing SWAT teams would suddenly fly in from your windows to analyse what went wrong. However, thankfully, this does not happen often.

The Goal

The goal at Netflix is simple—to be online for their users 99.99% of the time. Although Netflix has a pretty decent track record of staying online, they do occasionally encounter glitches that put the system off track. One of these incidents occurred when a development team at Netflix deployed software that impacted the large infrastructure at Netflix negatively, causing widespread disruption in services and thousands of unhappy customers.

This led to Netflix scrambling to create a fix that would essentially resolve the issue in few hours, but also gave Netflix some food for thought—that their testing regimen was inadequate and ineffective for such a large, distributed, user-facing system.

What could go wrong?

What happened at Netflix was an oversight on various levels. A new piece of code that was designed to clean up unused resources was actually being tested on the production server. This oversight caused two major problems due to bugs in the code:

  1. The first bug caused a dry run mode flag in cleanup that was to protect the actual cleanup to be interpreted incorrectly—reversing its effect. This was caused to a poorly written unit test that could have caused this issue to be caught in development.
  2. The second bug was in a piece of code that checked if a resource was actually unused. The conclusion of this check overlooked some cases that existed only in production.

The combination of these two bugs caused a removal of key resources in production—resulting in the actual outage at Netflix.

Preventing these problems

Preventing or reducing the incidents of these problems leads to a common dilemma

Should testing be done in a test environment or in a production environment? Although most of us would advocate testing to be done in pre-production so that actual customers are not impacted, some would advocate testing in production to ensure that code is running well in both test and prod. The reality of the scenario is that the code should be tested in all three situations: dev, test and prod. The challenge faced by Netflix was to devise an effective methodology that helps in deciding why, when and how to test in these environments.

This also led to another set of questions

  • Is the test environment a safe and complete mirror of our production environment?

OR

  • Is the test environment the latest build with features that others might need to integrate with?

The result of this was the common scenario of having overtly complex and numerous test environments. 

The answer

The answer to this problem that was creating from thinking of a fix to the existing problem was simple—end-to-end automation that would replicate thousands of scenarios without problems. 

This answer, however, came with its own set of problems—finding a scalable solution to creating a production-like pre-production environment that does not require cloning production entirely and resulting in a massive investment requirement as well.

Another problem was that pre-production and production usage patterns could be completely different from each other. Traffic is also thousands of times less when compared to production. 

Testing payments

Testing payments was another colossus altogether. Instead of testing payments in production using real money, it is better to create fake MOPs and fake transactions exercised on them in sandbox accounts that does not overburden the existing payment systems in place. 

Netflix testing payment

The approach

Of the thousands of possible approaches, Netflix chose production capture and replay to scale their test to be as close as possible to prod.

A large number of requests from customer devices was taken from persistence and duplex-replayed them in test after they were stripped of their personally identified information. This caused tests to become real-world scenarios. This also helped in identifying numerous corner-case bugs that were previously unknown.

The bugs identified were routed back into functional and integrated tests via a schema. This also helped in gaining confidence on quality feature migration and helped to accelerate change velocity. This also gave way to an interesting learning:

All the basic duplex tests could be run in PRODUCTION through tester accounts. However, prod capture and replay duplex tests were limited to the test environment because replaying in production would harm actual customer data with reissue of requests.

Netflix owner

Hastings says. “And instead tragically it is a biological one, so everybody is locked up and we had the greatest growth in the first half of this year that we ever had.” With a market capitalization of around US$230 billion, it has been vying with Walt Disney since March for the title of the world’s most valuable entertainment group.

Masked and refreshed data could safely be used to replay requests in the test environment after a time delay. This focused our interest on the data set and not the production environment. Although this was not totally as stable as production, but gave us a good idea of how it could be.

Failing is important in testing. Failures help test teams to identify real issues in downstream implementations. To mitigate this, all functional validations were to run real canaries in production, essentially exposing a small percentage of actual customer traffic to both versions of the API under test.

Running canary analysis algos on the metrics that were gathered from these implementations and a compare-verify regimen would check if client and server metrics were equivalent. This would help to capture failing request logs from the canaries and would help to debug and triage issues better. 

Learnings

Learnings from such an approach are manifold. 

  • The first one would be to understand that test and prod are different, but their differences must be embraced to utilize the capability of both.
  • Although testing is good in a sandboxed environment, testing in production is important for such implementations.
  • Solving the problems in either environment can go a long way in ensuring test success
  • Stay on the lookout for rethinking your testing strategy. Even if it may come at an extra cost, the end result would be worth it.
  • Find a pragmatic testing shape that is right for your company—do not look for a textbook shape that fits in.
  • Start production simulation and chaos experiments—these will help to validate your functional and resiliency testing capabilities for the future.

At Netflix, chaos testing is done at scale in production. Testing everything from fire raining from the sky to aliens killing their servers, they leave nothing to chance. If they haven’t, why should you? The testing teams at Volumetree are experienced, reliant and know where to hand out the red flags. Give your software the quality edge it needs. Schedule a consultation with our test consultants today!

build your mobile app

 

post tags :

4,228 Comments

  1. Lezgice March 7, 2024 at 5:35 am

    Yazarın empatik ve duyarlı yaklaşımı gerçekten içime dokundu. İçeriğin derinlikleri beni düşündürdü ve yeni bakış açıları kazanmama yardımcı oldu. | Sındırgı toptan giyim

  2. Python finally Nedir? March 7, 2024 at 5:43 am

    Yazarın açıklayıcı ve sürükleyici üslubuyla yazılmış bu içerik gerçekten beni içine çekti. Konuya olan hakimiyeti ve derinlemesine analizi takdire şayan. | Yenişehir Mahallesi, Pendik Battaniye Yıkama

  3. Hınıs / Erzurum March 7, 2024 at 5:44 am

    Bu yazıyı okuduktan sonra kendimi daha bilinçli ve hazır hissediyorum. Yazarın tutkulu ve samimi sözleri gerçekten içime işledi. | Prefabrik Ev Nevşehir

  4. WordPress Web Tasarım Fiyatları March 7, 2024 at 5:53 am

    The author’s expertise and attention to detail were truly impressive. I felt truly informed after reading this. | Yavuzeli toptan giyim

  5. CSS transform-style Özelliği Nedir? March 7, 2024 at 5:55 am

    The author’s expertise and attention to detail were truly impressive. I felt truly informed after reading this. | Aksaray Prefabrik

  6. Seo Süreci March 7, 2024 at 6:02 am

    The author’s original and in-depth analysis was truly impressive. It’s commendable how the author delved into different aspects of the topic. | toptan giyim Bozdoğan, Aydın

  7. Pexels Nedir? March 7, 2024 at 6:04 am

    The depth and comprehensiveness of the content were truly impressive. The author’s mastery of the subject matter is clearly evident. It was a great reading experience. | toptan giyim Söğüt

  8. WordPress Imsanity Eklentisi March 7, 2024 at 6:11 am

    Yazarın tutkulu ve özgün yaklaşımı gerçekten dikkat çekiciydi. İçeriğin derinlemesine analizi ve örneklerle desteklenmesi konuyu daha da ilginç hale getirmiş. | toptan giyim Oltu

  9. Yazarın samimi ve içten üslubu gerçekten içime dokundu. İçeriğin derinlikleri beni düşündürdü ve yeni bakış açıları kazanmama yardımcı oldu. | Mimar Sinan Mahallesi, Tuzla Köşe Takımı Yıkama

  10. Python math.log10() Nedir? March 7, 2024 at 6:20 am

    İçeriğin detaylı ve kapsamlı olması gerçekten etkileyici. Yazarın konuya olan hakimiyeti belli ki çok yüksek. Harika bir okuma deneyimi oldu. | toptan giyim Kızıltepe

  11. PHP range() Nedir? March 7, 2024 at 6:22 am

    Bu yazıyı okumak benim için bir zihin açıcı deneyimdi. Yazarın derinlemesine araştırması ve samimi anlatımı gerçekten takdire şayandı. Teşekkürler! | Şahinbey, Gaziantep toptan giyim

  12. Web Tasarımı ve Kodlama Örnekleri March 7, 2024 at 6:30 am

    The author’s sincere and heartfelt approach truly resonated with me. The depth of the content made me contemplate and gain new perspectives. | Yatağan toptan giyim

  13. Lamine March 7, 2024 at 6:31 am

    This article left me feeling more knowledgeable about the subject. The author’s clear and concise expressions helped me understand complex topics better. | Söğütlü, Sakarya toptan giyim

  14. PHP unset() Nedir? March 7, 2024 at 6:40 am

    Bu yazıyı okurken gerçekten etkilendim. Yazarın açıklayıcı üslubu ve örnekleriyle desteklenen analizi gerçekten takdire şayan. | Cevizli Mahallesi, Maltepe Koltuk Yıkama

  15. Web Tasarım Yardım March 7, 2024 at 6:42 am

    Reading this article left me feeling more motivated and prepared. The author’s encouraging and inspiring words really resonated with me. | Sapanbağları, Pendik Çekyat Yıkama

  16. PHP date() Nedir? March 7, 2024 at 6:50 am

    The author’s expertise and attention to detail were truly commendable. I felt truly informed after reading this. | Atılganlar Sitesi / Kartal / İstanbul – Sıfır Bir Kıyma Ocakbaşı

  17. Girişimci March 7, 2024 at 6:51 am

    Bu yazıyı okurken kendimi adeta bir yolculukta gibi hissettim. Yazarın sürükleyici anlatımı ve içeriğin akıcılığı gerçekten beni içine çekti. | toptan giyim Seyhan, Adana

  18. jQuery :selected Nedir? March 7, 2024 at 6:59 am

    Bu yazıyı okurken birçok yeni şey öğrendim. Yazarın konuyu ele alış tarzı gerçekten ilgi çekiciydi. Kendimi bu yazıya kaptırmış buldum. | Konteyner Diyarbakır

  19. CSS border-left-style Özelliği Nedir? March 7, 2024 at 7:01 am

    The author’s mastery and originality truly impressed me. The depth of analysis and use of examples made the topic even more captivating. | Sevgi Sitesi / Kartal / İstanbul – Sıfır Bir Kıyma Ocakbaşı

  20. Şebinkarahisar / Giresun March 7, 2024 at 7:09 am

    Bu yazıyı okumak gerçekten bir zihin açıcı deneyimdi. Yazarın derinlemesine araştırması ve net ifadeleri sayesinde konuyu daha iyi kavradım. | toptan giyim Soma

  21. SEO Uzmanı March 7, 2024 at 7:10 am

    Bu yazıyı okurken gerçekten keyif aldım. Yazarın net ifadeleri ve akıcı üslubu sayesinde konuyu daha iyi anladım. | Ermenek, Karaman toptan giyim

  22. Belmopan March 7, 2024 at 7:19 am

    Yazarın özgün ve derinlemesine analizi gerçekten etkileyiciydi. Konunun farklı yönlerini bu kadar detaylı ele alması gerçekten takdire şayan. | toptan giyim Mazgirt

  23. Matbaa March 7, 2024 at 7:21 am

    The talented team at MAFA excels in turning ideas into reality, showcasing their creativity. | https://rb.gy/7qcfxn

  24. JavaScript activeElement Nedir? March 7, 2024 at 7:29 am

    Bu yazıyı okumak benim için bir zihin açıcı deneyimdi. Yazarın derinlemesine araştırması ve samimi anlatımı gerçekten takdire şayandı. Teşekkürler! | toptan giyim Elmadağ

  25. Web Tasarım Paketleri March 7, 2024 at 7:31 am

    Bu yazıyı okumak gerçekten bir zihin açıcı deneyimdi. Yazarın derinlemesine araştırması ve net ifadeleri sayesinde konuyu daha iyi kavradım. | Uşak Prefabrik Ev

  26. Horasan / Erzurum March 7, 2024 at 7:39 am

    Reading this article was truly enjoyable. The author’s warm and sincere style of writing made the content even more engaging. | Çınar Mahallesi, Maltepe Yorgan Yıkama

  27. Ardanuç / Artvin March 7, 2024 at 7:41 am

    Yazarın samimi ve içten üslubu gerçekten içime dokundu. İçeriğin derinlikleri beni düşündürdü ve yeni bakış açıları kazanmama yardımcı oldu. | toptan giyim Akyazı, Sakarya

  28. Seo Kitabı March 7, 2024 at 7:48 am

    Yazarın samimi ve içten üslubu gerçekten içime dokundu. İçeriğin derinlikleri beni düşündürdü ve yeni bakış açıları kazanmama yardımcı oldu. | toptan giyim Erfelek

  29. CSS margin-inline-end Özelliği Nedir? March 7, 2024 at 7:49 am

    Reading this article left me feeling more motivated and prepared. The author’s encouraging and inspiring words really resonated with me. | toptan giyim Foça, İzmir

  30. YouTube Nedir? March 7, 2024 at 7:57 am

    The author’s original and in-depth analysis was truly impressive. It’s commendable how the author delved into different aspects of the topic. | Halı Yıkama Teknikleri

  31. PHP lcg_value() Nedir? March 7, 2024 at 7:59 am

    The author’s original and in-depth analysis was truly impressive. It’s commendable how the author delved into different aspects of the topic. | toptan giyim Bozdoğan, Aydın

  32. WordPress Zephyr Teması March 7, 2024 at 8:06 am

    Bu yazıyı okuduktan sonra kendimi daha motive ve bilinçli hissediyorum. Yazarın cesaret verici ve ilham dolu sözleri gerçekten içime işledi. | toptan giyim Gönen

  33. Python cmath.atanh(x) Nedir? March 7, 2024 at 8:15 am

    Yazarın konuya olan derin ilgisi ve detaylara verdiği önem gerçekten etkileyiciydi. İçerikteki örnekler ve vaka analizleri konuyu daha da ilginç hale getirmiş. | toptan giyim Ümraniye, İstanbul

  34. Güzellik Salonu March 7, 2024 at 8:17 am

    Bu yazıyı okuduktan sonra kendimi daha bilinçli ve hazır hissediyorum. Yazarın tutkulu ve samimi sözleri gerçekten içime işledi. | toptan giyim Silvan, Diyarbakır

  35. Sosyal Medya Yönetimi Paketleri March 7, 2024 at 8:26 am

    MAFA’s web design and development solutions are top-notch. It’s a pleasure to see their creativity shine. | https://bit.ly/search-engine-optimization-SEO

  36. Erbaa / Tokat March 7, 2024 at 8:26 am

    Yazarın samimi ve içten üslubu gerçekten dikkat çekiciydi. İçeriğin derinlikleri beni düşündürdü ve yeni perspektifler kazanmama yardımcı oldu. | Velibaba Mahallesi, Pendik Koltuk Yıkama

  37. Soranice March 7, 2024 at 8:35 am

    Bu yazıyı okumak gerçekten bir zevkti. Yazarın sıcak ve samimi üslubuyla yazılmış olması içeriği daha da ilgi çekici kıldı. | toptan giyim Simav, Kütahya

  38. İçeriğin detaylı ve kapsamlı olması gerçekten etkileyici. Yazarın konuya olan hakimiyeti belli ki çok yüksek. Harika bir okuma deneyimi oldu. | Perde Yıkama

  39. Prototip Nasıl Yapılır? March 7, 2024 at 8:45 am

    The author’s deep interest and knowledge were truly impressive. The inclusion of examples and case studies made the topic even more enlightening. | toptan giyim Akçakent

  40. Taşeron March 7, 2024 at 8:46 am

    The depth and comprehensiveness of the content were truly impressive. The author’s mastery of the subject matter is clearly evident. It was a great reading experience. | Kirazpınar Mahallesi, Gebze Halı Yıkama

  41. Web Sitesi Tasarım Firmaları March 7, 2024 at 8:54 am

    The author’s original and in-depth analysis was truly impressive. It’s commendable how the author delved into different aspects of the topic. | Akse Mahallesi, Çayırova Koltuk Yıkama

  42. JavaScript isEqualNode() Nedir? March 7, 2024 at 8:56 am

    Yazarın özgün ve derinlemesine analizi gerçekten etkileyiciydi. Konunun farklı yönlerini bu kadar detaylı ele alması gerçekten takdire şayan. | toptan giyim Ulubey, Uşak

  43. İş Süreçleri Tasarımı Nedir? March 7, 2024 at 9:04 am

    The author’s sincere and heartfelt approach truly resonated with me. The depth of the content made me contemplate and gain new perspectives. | Konteyner Ev Diyarbakır

  44. Python staticmethod() Nedir? March 7, 2024 at 9:06 am

    Yazarın özgün ve derinlemesine analizi gerçekten etkileyiciydi. Konunun farklı yönlerini bu kadar detaylı ele alması gerçekten takdire şayan. | Konya Konteyner

  45. PHP lchgrp() Nedir? March 7, 2024 at 9:14 am

    I learned so many new things while reading this article. The author’s approach to the subject was truly engaging. I found myself fully engrossed in this piece. | toptan giyim Ayvalık, Balıkesir

  46. WordPress Medicare Teması March 7, 2024 at 9:16 am

    Reading this article left me feeling more motivated and prepared. The author’s encouraging and inspiring words really resonated with me. | Çaybaşı toptan giyim

  47. JavaScript onseeking Nedir? March 7, 2024 at 9:24 am

    This article left me feeling more motivated and informed. The author’s passionate and sincere words really struck a chord with me. | Aydıntepe, Tuzla Battaniye Yıkama

  48. PHP stream_get_filters() Nedir? March 7, 2024 at 9:25 am

    The author’s sincere and heartfelt approach truly resonated with me. The depth of the content made me contemplate and gain new perspectives. | Ballıca Mahallesi, Pendik Halı Yıkama

  49. PHP ip2long() Nedir? March 7, 2024 at 9:35 am

    Bu yazıyı okuduktan sonra kendimi daha motive ve bilinçli hissediyorum. Yazarın cesaret verici ve ilham dolu sözleri gerçekten içime işledi. | Salıpazarı toptan giyim

  50. Mersin March 7, 2024 at 9:36 am

    The depth and comprehensiveness of the content were truly impressive. The author’s mastery of the subject matter is clearly evident. It was a great reading experience. | Soğanlık Mahallesi, Kartal Karyola Yıkama

Comments are closed.

your ideal recruitment agency

view related content