Come join the SRE team at Stack Overflow! As one of the top 50 websites by traffic volume worldwide, we hit some unique challenges. Recently we’ve launched Stack Overflow for Enterprise and Stack Overflow for Teams, allowing organizations to have a private experience on the platform they already know and love. The success of these new products requires us to rethink our infrastructure strategy for supporting on-prem, cloud, and remote deployments.
We’re looking for someone with Linux and Windows Server experience (3+ years). Experience with managing internet-facing services is a plus. We don’t expect you to know everything about all of the technologies we use, so you’ll work with other members of the team to learn and develop your skills.
As an SRE, you’ll bring a developer mindset to system administration, always looking for ways to automate manual work and create repeatable, scalable systems and processes. We are wiki-centric and prefer to document and automate in small increments as we work.
We are a remote-first team with members in many timezones. Candidates that live near one of our data centers in Jersey City, NJ or Denver, Colorado will occasionally be visiting them in person to keep our infrastructure running.
What you’ll do:
- Maintain the services and infrastructure platform used by the Stack Overflow websites.
- Be deeply involved in our move from .NET Framework to .NET Core and then to Linux containers in Kubernetes.
- Lead an initiative to adopt monitoring-centric operations (KPI or Google SRE error budgets) for our applications and internal data services.
- Be part of our on-call rotation (approximately 1 week out of 5).
- Act as a subject matter expert around our IIS infrastructure and automation
- (If you are located in NJ/NY) Occasionally visit our Jersey City datacenter when remote-hands are insufficient.
Technologies you’ll work with:
- Our application stack is IIS, .NET Framework and Core, and Microsoft SQL Server on Windows; Redis, Elasticsearch, and HAProxy on Linux (CentOS)
- Our control-plane is a mixture of Puppet for Windows and Linux, moving towards Kubernetes
- Hardware platforms: Dell Servers and EqualLogic storage, Fortinet and Cisco network devices
- In the future: We are in the middle of a multi-year move to Kubernetes.
Some projects that we've recently completed or are working on:
- Created an automated pipeline for SSL Certificates with Let’s Encrypt via Hashicorp Vault
- Built our first Kubernetes clusters with associated CI/CD pipelines
- Upgraded 14 production SQL servers without service downtime
- Improved Windows automation by deploying Puppet and Chocolatey
- Created a secure replica of our infrastructure for storing private Q&A data
- Reinvented how DNS is managed
- Implemented autonomous OS upgrades for both Windows and Linux servers
- Upgraded hardware with zero downtime across a variety of services
- Migrated to a new CDN
Skills & Requirements
We’re looking for:
- Experience working in a mixed Linux / Windows environment
- A love of monitoring, and data-driven operations
- A love of Infrastructure as Code
- Experience with the HTTP protocol, load balancers, CDNs
- A track record of taking on challenges and delivering thorough, stable, and maintainable systems
- Strong written communication skills and a strong inclination to “document as you go”
Not required, but please let us know if you have experience with:
- Experience with Microsoft SQL Server administration and query tuning
- Experience in security, or have worked in a SOC2 or PCI environment
What you’ll get in return:
- Flexible hours
- 20 days paid vacation + holidays
- Completely free health insurance - no copay, no premiums (US residents)
- Generous parental leave (10-16 weeks at 100% pay), family care leave, and unlimited sick days
- Employees will never be poked with a sharp stick
When you work in our office… You’ll get your own private office in our headquarters in New York City, and enjoy additional benefits like free lunch every day prepared by our own in-house chefs, transportation reimbursement, and all the espresso you can drink.
If you want to work remote…. (US time zones) We’ll help you set up a great home office, with an ergonomic chair, standing desk, and any other equipment you need to do your job.