Seattle Node.js - Node.js in Production

Node.js in Production

Seattle Node.jsMay 8th, 2013

Ryan Roemer@ryan_roemer

Overview

  • Production!
  • Well, what our production looks like.
  • Five Node.js-related things we've learned.
  • Some additional resources.

What is "Production"?

Joyent

Curiosity Media

SpanishDict
fluencia

Brought to you by...

  • A team of three engineers
  • Who are full-time developers
  • Running everything in the cloud
  • With minimal time available for ops

SpanishDict.com

SpanishDict

Demo

A Spanish-English Dictionary

SpanishDict.com is the world’s largest Spanish-English dictionary, translation, and language learning website. We develop and provide reliable, accurate, easy-to-use resources for learning Spanish.

Our visitors

A quick glance into our data and usage:

  • 6,000,000 Unique visitors every month
  • 1,000,000 Translations
  • 100,000 Questions and answers
  • 25,000 Flashcards
  • 5,000 Video pronunciations
  • 90 Lessons

Our Services

Node Other
API server Web site
Auto-suggest server Data mining
Translation server Operations
Text-to-speech server  

Our Traffic

Very low latency for our db-backed services.

Service Reqs/min
API server 35K / min
Auto-suggest server 15K / min
Translation server 2.5K / min
Text-to-speech server 400 / min

Five Node.js production tips

  1. Know when to Node
  2. Keep up with Node
  3. Design for failure
  4. Isolate services
  5. Analyze everything

1. Know when to Node

Should you use Node.js?

Yes

  • Small apps (think JSON APIs)
  • "Glue" for services or data
    • Proxies
    • Concurrent data
    • Use the stream module
  • Lots of connections

Maybe not

  • Computation
  • Legacy applications
  • "Solved" problems (fuzzy search, NLP, etc.)

2. Keep up with Node

Bleeding edge, lots of breakage.

Stay up to date with Node.js and libraries.

Infrastructure

Amazon Web Services
Opscode Chef

Node.js deployments

  • PAAS: Often, the easier way.

  • IAAS: Expect some DIY

    • Build custom Node.js versions
    • Install modules from scratch
    • Get ready to roll back

3. Design for failure

Fail and recover at multiple levels.

App-level

  • Errors
    • Handle: uncaughtException
    • Listen: foo.on("error")
  • Use the cluster module
    • Workers: die early on errors
    • Master: monitor and kill workers

Server-level

  • Use monit or alternatives
  • Restart the Node.js master

Service-level

  • Have lots of small apps
  • Stateless, fungible servers
  • Hot failover wherever possible

4. Isolate services

Separate resource and failure classes.

Resources

  • CPU/Load: Run out of this and it's over.
  • Also, memory, I/O, etc.
  • ... and combinations thereof

Our pains

Node.js apps aren't necessarily good neighbors.

  • Suggest (DB) and translate (http)
  • Backend (DB) and web site (CPU/load, memory)
  • Read and write servers

Takeaways

  • Always preserve CPU
  • Monitor system stats for cross-pressure

5. Analyze everything

How well are we addressing lessons 1-4?

Data drives problem discovery and action.

Log, Monitor, Mine


Scout
PagerDuty
Pingdom
Loggly
AWS Elastic MapReduce / Hadoop

Things to look for

Some metrics that affect Node.js apps

Type Metrics Uses
System CPU, I/O, memory, network Alert
Server Throughput, latency Alert, Report
Traffic Peaks (weeks, months) Report
Errors Quantitative, qualitative Alert, Report

Decisions, Goals

  • Identify
    • Resource pressure
    • Bugs
  • Decide
    • Scale up, scale down?
    • Separate?

Demo

Recap

  1. Know when to Node
  2. Keep up with Node
  3. Design for failure
  4. Isolate services
  5. Analyze everything

Further Reading

Thanks!

Ryan Roemer@ryan_roemer

@SeattleNode