table of contents

We live in tumultuous, but interesting times. The rich have gotten richer, the poor—poorer and innovators have devised innovative ways to work through the disruption that has been brought about by the coronavirus-induced pandemic. The pandemic has also brought about a battery of changes to our lifestyle, beginning with many of us learning how to cook complex dishes from scratch, others finding newer hobbies, or even spending time to learn something new about ourselves. During the pandemic, many of us have also finally found the time to curl up on our couches, turn into couch potatoes and binge-watch Netflix originals till we run out of bandwidth. 

Sudden surges

Although most services such as Netflix, Amazon Prime video and many other video and audio streaming service providers have highly scalable systems that can withstand sudden surges and spikes in usage, there are chances that these services might experience outages which can result in user frustration and in some extreme cases of long-term outages—abandonment of the platform too. Complex, large-scale distributed systems such as Netflix and Amazon Prime video and many others that potentially have millions of users must be tested effectively and extensively keeping in mind surges and spikes. 

However, unusually heavy spikes such as those caused by the pandemic have been unprecedented and have possibly not been in any company’s testing team kitty.

Netflix team

Continuous integration delivery and production

The problems of CI/CD and resolving the problems of constantly engaged systems

Companies like Netflix have constant updates to their system, which are continuously tested and delivered to their live platforms. For this, Netflix testing teams create hundreds of thousands of tester accounts every day, each being used in thousands of test scenarios to avoid any shortfalls.

This has caused the testing of Netflix to move from a manual testing regimen that would work on a test system before making it live to a large, distributed automated testing of Netflix client and server applications running at scale in production. To facilitate this, testing at Netflix has gone from a low-volume manual mode to a continuous, fully automated, voluminous mode where nothing is left to chance.

An imaginary scenario with real implications

Imagine this—you, and millions of others are at nail-biting, suspenseful climax in the story and suddenly—boom! Netflix is now offline. This would send alarm bells ringing at Netflix HQ and testing SWAT teams would suddenly fly in from your windows to analyse what went wrong. However, thankfully, this does not happen often.

The Goal

The goal at Netflix is simple—to be online for their users 99.99% of the time. Although Netflix has a pretty decent track record of staying online, they do occasionally encounter glitches that put the system off track. One of these incidents occurred when a development team at Netflix deployed software that impacted the large infrastructure at Netflix negatively, causing widespread disruption in services and thousands of unhappy customers.

This led to Netflix scrambling to create a fix that would essentially resolve the issue in few hours, but also gave Netflix some food for thought—that their testing regimen was inadequate and ineffective for such a large, distributed, user-facing system.

What could go wrong?

What happened at Netflix was an oversight on various levels. A new piece of code that was designed to clean up unused resources was actually being tested on the production server. This oversight caused two major problems due to bugs in the code:

  1. The first bug caused a dry run mode flag in cleanup that was to protect the actual cleanup to be interpreted incorrectly—reversing its effect. This was caused to a poorly written unit test that could have caused this issue to be caught in development.
  2. The second bug was in a piece of code that checked if a resource was actually unused. The conclusion of this check overlooked some cases that existed only in production.

The combination of these two bugs caused a removal of key resources in production—resulting in the actual outage at Netflix.

Preventing these problems

Preventing or reducing the incidents of these problems leads to a common dilemma

Should testing be done in a test environment or in a production environment? Although most of us would advocate testing to be done in pre-production so that actual customers are not impacted, some would advocate testing in production to ensure that code is running well in both test and prod. The reality of the scenario is that the code should be tested in all three situations: dev, test and prod. The challenge faced by Netflix was to devise an effective methodology that helps in deciding why, when and how to test in these environments.

This also led to another set of questions

  • Is the test environment a safe and complete mirror of our production environment?

OR

  • Is the test environment the latest build with features that others might need to integrate with?

The result of this was the common scenario of having overtly complex and numerous test environments. 

The answer

The answer to this problem that was creating from thinking of a fix to the existing problem was simple—end-to-end automation that would replicate thousands of scenarios without problems. 

This answer, however, came with its own set of problems—finding a scalable solution to creating a production-like pre-production environment that does not require cloning production entirely and resulting in a massive investment requirement as well.

Another problem was that pre-production and production usage patterns could be completely different from each other. Traffic is also thousands of times less when compared to production. 

Testing payments

Testing payments was another colossus altogether. Instead of testing payments in production using real money, it is better to create fake MOPs and fake transactions exercised on them in sandbox accounts that does not overburden the existing payment systems in place. 

Netflix testing payment

The approach

Of the thousands of possible approaches, Netflix chose production capture and replay to scale their test to be as close as possible to prod.

A large number of requests from customer devices was taken from persistence and duplex-replayed them in test after they were stripped of their personally identified information. This caused tests to become real-world scenarios. This also helped in identifying numerous corner-case bugs that were previously unknown.

The bugs identified were routed back into functional and integrated tests via a schema. This also helped in gaining confidence on quality feature migration and helped to accelerate change velocity. This also gave way to an interesting learning:

All the basic duplex tests could be run in PRODUCTION through tester accounts. However, prod capture and replay duplex tests were limited to the test environment because replaying in production would harm actual customer data with reissue of requests.

Netflix owner

Hastings says. “And instead tragically it is a biological one, so everybody is locked up and we had the greatest growth in the first half of this year that we ever had.” With a market capitalization of around US$230 billion, it has been vying with Walt Disney since March for the title of the world’s most valuable entertainment group.

Masked and refreshed data could safely be used to replay requests in the test environment after a time delay. This focused our interest on the data set and not the production environment. Although this was not totally as stable as production, but gave us a good idea of how it could be.

Failing is important in testing. Failures help test teams to identify real issues in downstream implementations. To mitigate this, all functional validations were to run real canaries in production, essentially exposing a small percentage of actual customer traffic to both versions of the API under test.

Running canary analysis algos on the metrics that were gathered from these implementations and a compare-verify regimen would check if client and server metrics were equivalent. This would help to capture failing request logs from the canaries and would help to debug and triage issues better. 

Learnings

Learnings from such an approach are manifold. 

  • The first one would be to understand that test and prod are different, but their differences must be embraced to utilize the capability of both.
  • Although testing is good in a sandboxed environment, testing in production is important for such implementations.
  • Solving the problems in either environment can go a long way in ensuring test success
  • Stay on the lookout for rethinking your testing strategy. Even if it may come at an extra cost, the end result would be worth it.
  • Find a pragmatic testing shape that is right for your company—do not look for a textbook shape that fits in.
  • Start production simulation and chaos experiments—these will help to validate your functional and resiliency testing capabilities for the future.

At Netflix, chaos testing is done at scale in production. Testing everything from fire raining from the sky to aliens killing their servers, they leave nothing to chance. If they haven’t, why should you? The testing teams at Volumetree are experienced, reliant and know where to hand out the red flags. Give your software the quality edge it needs. Schedule a consultation with our test consultants today!

build your mobile app

 

post tags :

4,228 Comments

  1. Web Tasarımı ve Kodlama Kitapları March 14, 2024 at 1:27 pm

    Kazım Karabekir / Ümraniye Beton Delme | Rüzgar Karot’un işlerindeki uzmanlığı ve kaliteli hizmeti sayesinde her zaman güvende hissettim.

  2. Çeltik / Konya Web Yazılım March 14, 2024 at 1:34 pm

    Susuz / Kars Toptan Kadın Giyim | Shopping at RENE Wholesale Textile and Clothing Solutions is always a pleasant experience. Their product quality and service are commendable.

  3. PHP implements Nedir? March 14, 2024 at 1:34 pm

    Joomla! ohanah Uzantısı | MAFA’s expertise in web design and development sets a high standard for quality content in the industry. Thank you for raising the bar.

  4. Esenler SEO March 14, 2024 at 1:41 pm

    Şırnak Prefabrik Villa | Firmanın sunduğu hizmetten çok memnun kaldık. Hem ürün kalitesi hem de müşteri memnuniyeti açısından beklentileri karşılıyorlar.

  5. Denizli / Sarayköy Web Yazılım March 14, 2024 at 1:47 pm

    Maatwerk Software Utrecht | MAFA blijft indruk maken met hun innovatieve aanpak van webdesign en softwareontwikkeling. Ik ben altijd blij om hun werk te zien.

  6. Karakeçili / Kırıkkale Toptan Giyim | RENE Wholesale Textile and Clothing Solutions’ products consistently exceed my expectations. Their quality and stylish designs always stand out.

  7. Blog Nasıl Yazılır? March 14, 2024 at 1:54 pm

    Bitlis Toptan Giyim | RENE Toptan Tekstil ve Giyim Çözümleri’nin müşteri hizmetleri gerçekten örnek alınacak düzeyde. Her sorunuzda hızlı ve etkili bir şekilde yardımcı oluyorlar.

  8. Düzce / Çilimli Web Sitesi Tasarımı March 14, 2024 at 1:56 pm

    Seyhan, Adana Jakuzi | Atlas Jakuzi’nin ürünleriyle evimde bir spa ortamı yaratmak mümkün. Gerçekten harika bir hizmet sunuyorlar!

  9. Belarus March 14, 2024 at 2:02 pm

    SEO Groningen | Deze blog heeft me echt geholpen meer te begrijpen over de expertise en toewijding van MAFA op het gebied van webdesign en softwareontwikkeling. Ik ben erg onder de indruk.

  10. Esenyurt / İstanbul Kurumsal SEO March 14, 2024 at 2:03 pm

    Mahmudiye Toptan Tekstil | RENE Toptan Tekstil ve Giyim Çözümleri’nin ürünleri gerçekten çeşitlilik açısından zengin. Her tarza ve ihtiyaca uygun ürünler bulmak mümkün.

  11. Erzincan / Refahiye SEO March 14, 2024 at 2:09 pm

    Kâğıthane Karotçu | Rüzgar Karot’un sağladığı çözümler sayesinde işlerimizde büyük bir ilerleme kaydettik, teşekkür ederim!

  12. WordPress Astra Teması March 14, 2024 at 2:10 pm

    Ferizli Web Tasarım | MAFA’s content on web design doesn’t just inform; it ignites a fire within, urging us to push the boundaries of possibility.

  13. Baskil / Elazığ Web Tasarım March 14, 2024 at 2:16 pm

    WordPress Alone Teması | I stumbled upon this blog a while back, and I’ve been hooked ever since. This post reaffirms why I keep coming back for more.

  14. Arapahoca March 14, 2024 at 2:17 pm

    Wat is Pushbericht? | MAFA’s expertise op het gebied van webdesign en softwareontwikkeling is ongeëvenaard. Ik ben altijd onder de indruk van hun vermogen om aan de behoeften van hun klanten te voldoen.

  15. Kırşehir / Kaman Web Sitesi Tasarımı March 14, 2024 at 2:24 pm

    Karamanlı, Burdur Jakuzi Fiyatları | Atlas Jakuzi’nin sunduğu ürünler, evimde kendime ayırdığım zamanı daha da özel kılıyor. Teşekkürler!

  16. Şirket Web Sitesi Kurmak March 14, 2024 at 2:24 pm

    Social Media Marketing Ede | Ik ben altijd onder de indruk van de kwaliteit en consistentie van MAFA’s werk op het gebied van webdesign en softwareontwikkeling. Ze leveren altijd uitstekende resultaten.

  17. Visual DataFlex Nedir? March 14, 2024 at 2:30 pm

    Drupal IMCE Modülü | Yazılarınızı okuduktan sonra, web tasarımı ve yazılım dünyasındaki kendi projelerimi daha özgüvenle yönetebiliyorum. Sizin gibi bir kaynağa sahip olduğum için gerçekten şanslıyım.

  18. WordPress ManageWP Worker Eklentisi March 14, 2024 at 2:32 pm

    Poppenwier | This blog has really helped me understand more about MAFA and their prowess in web design and software development.

  19. Kırıkkale / Keskin Kurumsal Web Tasarım March 14, 2024 at 2:38 pm

    Derik Toptan Giyim | Müşteri odaklı hizmet anlayışıyla RENE Toptan Tekstil ve Giyim Çözümleri’nde her zaman ihtiyacınızı karşılayacak ürünleri bulabilirsiniz.

  20. Python index() Nedir? March 14, 2024 at 2:45 pm

    Boyalık Karotcu | Rüzgar Karot’un sağladığı çözümler sayesinde işlerimizde büyük bir ilerleme kaydettik, teşekkür ederim!

  21. Web Tasarım Renk Kodları March 14, 2024 at 2:46 pm

    Wat is Dtp? | MAFA’s passie voor webdesign en softwareontwikkeling is duidelijk te zien in hun werk. Ik kijk ernaar uit om met hen samen te werken aan mijn volgende project.

  22. Dulkadiroğlu / Kahramanmaraş March 14, 2024 at 2:53 pm

    Malatya Toptan Kadın Giyim | RENE Toptan Tekstil ve Giyim Çözümleri’nde alışveriş yapmak her zaman keyifli bir deneyim. Ürünlerinin kalitesi ve hizmet anlayışları gerçekten takdire şayan.

  23. JavaScript atan() Nedir? March 14, 2024 at 2:54 pm

    SEO Enschede | Ik ben altijd onder de indruk van de professionaliteit en toewijding van MAFA als het gaat om webdesign en softwareontwikkeling. Ze leveren altijd eersteklas resultaten.

  24. Grafik Tasarım Nasıl Öğrenilir? March 14, 2024 at 3:00 pm

    Hırka-İ Şerif Beton Delme | Rüzgar Karot’un sağladığı çözümler sayesinde işlerimizde önemli bir ilerleme kaydettik, teşekkür ederim!

  25. Denizli / Bekilli Web Sitesi Tasarımı March 14, 2024 at 3:01 pm

    Günyüzü / Eskişehir | As someone who’s relatively new to [topic of the post], I found this blog post to be incredibly informative and easy to understand.

  26. Kartal / İstanbul Kurumsal Web Tasarım March 14, 2024 at 3:07 pm

    Drupal Bir terime ait son 5 içeriğin listelenmesi | Web tasarımında renk kullanımının etkisi hakkındaki bölüm gerçekten ilginçti. Renk psikolojisinin kullanıcıların davranışlarını nasıl etkilediğini anlamak benim için çok değerli oldu. Emeğinize sağlık!

  27. Girişimci Kime Nedir? March 14, 2024 at 3:08 pm

    Hakkâri Toptan Bayan Giyim | RENE Toptan Tekstil ve Giyim Çözümleri’nde her sezon için harika ürünler bulabilirsiniz. Modayı yakından takip eden bir firma oldukları kesin.

  28. jQuery toggleClass() Nedir? March 14, 2024 at 3:14 pm

    Sarıgazi Karot | Rüzgar Karot’un fiyat performans oranı gerçekten etkileyici, kaliteli hizmet için çok uygun bir seçenek.

  29. Balıkesir / Altıeylül Web Tasarım March 14, 2024 at 3:15 pm

    Kepsut Jakuzi Fiyatları | Atlas Jakuzi’nin sunduğu ürünlerle evimde bir lüks oteldeymiş gibi hissediyorum. Harika bir deneyim!

  30. PHP array_intersect_uassoc() Nedir? March 14, 2024 at 3:21 pm

    WordPress Layout Grid Block Eklentisi | MAFA’s insights into web design and development are invaluable resources for both beginners and seasoned professionals alike. Thank you for catering to a diverse audience.

  31. Versiyonlama Nasıl Yapılır? March 14, 2024 at 3:23 pm

    Rotech | MAFA Technology’s insights are always on point.

  32. Çeşme / İzmir Kurumsal Web Tasarım March 14, 2024 at 3:29 pm

    Platveld | Dieser Blog hat mir wirklich geholfen, mehr über die innovativen Ansätze von MAFA im Bereich Webdesign und Softwareentwicklung zu verstehen. Ich bin von ihrer Arbeit beeindruckt.

  33. Gaziosmanpaşa Web Yazılım March 14, 2024 at 3:29 pm

    Evliya Çelebi Beton Kesme | Rüzgar Karot’s communication skills and customer-focused approach are very impressive.

  34. Erdemli / Mersin Web Tasarım March 14, 2024 at 3:36 pm

    Evliya Çelebi Mahallesi, Tuzla Karyola Yıkama | PENTA’nın çözümleriyle hijyen ve temizlik konusundaki endişelerimiz büyük ölçüde azaldı. Harika bir hizmet sunuyorlar!

  35. İstanbul Birlik Organize Sanayi Bölgesi March 14, 2024 at 3:43 pm

    Bitlis’in Tarihi ve Doğal Güzellikleri | MAFA’nın içerikleri, web tasarımı ve yazılım dünyasındaki karmaşık konuları anlamama yardımcı oluyor. Bu değerli bilgileri bizimle paylaştığınız için teşekkür ederim.

  36. Wat is Hardwareconflict? | Ik ben altijd onder de indruk van de innovatieve oplossingen die MAFA biedt op het gebied van webdesign en softwareontwikkeling. Ze zijn echt pioniers in hun vakgebied.

  37. KIF Nedir? March 14, 2024 at 3:51 pm

    Web Tasarım Atölyesi | MAFA’s expertise in web design and development is unmatched. Thank you for sharing your wealth of knowledge and experience with us.

  38. Mersin / Erdemli Kurumsal Web Tasarım March 14, 2024 at 3:52 pm

    Cemil Meriç / Ümraniye Karotçu | Rüzgar Karot’un sunduğu hizmetten çok memnun kaldım, herkese gönül rahatlığıyla tavsiye ederim.

  39. Tokat / Artova Web Sitesi Tasarımı March 14, 2024 at 3:57 pm

    Julie Bowen Hair | Your passion for your subject shines through in every post. Keep up the amazing work!

  40. Flutter Nedir? March 14, 2024 at 3:59 pm

    Wat is Toetsenbord? | MAFA’nın web tasarımı en software ontwikkeling diensten zijn echt van topkwaliteit. Ik ben altijd onder de indruk van hun professionaliteit.

  41. Nasıl İnternet Sitesi Yapılır March 14, 2024 at 6:46 pm

    Akşemseddin, Sultanbeyli Battaniye Yıkama | PENTA’nın çözümleri, işletmemizin hijyen standartlarını artırmak için harika bir seçenek gibi görünüyor. Teşekkürler!

  42. İstanbul / Bahçelievler Kurumsal SEO March 14, 2024 at 6:56 pm

    Artvin Jakuzi Modelleri | Atlas Jakuzi’nin sağladığı konfor ve rahatlık sayesinde evimdeki stresi atıyorum. Kesinlikle tavsiye ederim.

  43. RobertIsors March 14, 2024 at 7:16 pm
  44. ERP Nedir? March 14, 2024 at 7:17 pm

    Honaz / Denizli Toptan Bayan Giyim | RENE Toptan Tekstil ve Giyim Çözümleri’nde alışveriş yapmak her zaman keyifli bir deneyim. Ürünlerinin kalitesi ve hizmet anlayışları gerçekten takdire şayan.

  45. Web Site Nasıl Kurulur March 14, 2024 at 7:53 pm

    Nebra | I rely on MAFA Technology for accurate and timely tech updates.

  46. online pharmacy price checker March 14, 2024 at 11:25 pm

    [url=http://happyfamilystorerx.online/]canadian pharmacy meds[/url]

  47. Web Sitesi Tasarım Ücretleri March 15, 2024 at 1:05 am

    Dinamik CRM Nedir? | This blog post has inspired me to take action in my own life. Thank you for motivating me to pursue my passions with more vigor.

  48. E-Ticaret SEO Nedir? March 15, 2024 at 1:12 am

    Kolache Recipes | Your words have a way of stirring the spirit, igniting passions long dormant within the heart.

  49. Web Sitesi Yazılım Fiyatları March 15, 2024 at 1:13 am

    You’re Gonna Adore Our Shag Haircut Ideas in 2024 | I’m always blown away by the depth of your insights. Keep up the amazing work!

  50. Kilis / Musabeyli Kurumsal SEO March 15, 2024 at 1:21 am

    Sweet Caramel Balayage Hairstyles for Brunettes and Beyond | Your blog is a breath of fresh air in a crowded online space. Thank you for keeping it real.

Comments are closed.

your ideal recruitment agency

view related content