Site Reliability Engineer

Remote
Full Time
Mid Level

 The Company

Cerbo is a high-growth healthcare SaaS company, doing our part in the medical market to support holistic lifestyles and personalized medicine. Our software – Cerbo EHR – is a cloud-based electronic health records (EHR) and patient portal software system. Healthcare offices across the country – and some around the world – use Cerbo for most everything they do in their day-to-day operations. Cerbo originally started as a developer’s nights-and-weekends project. And has grown into one of the leading EHR systems for functional or “root cause” medicine and membership- or cash-based clinics. Because of our unique origins, we often approach things a bit differently. That is, success for us is not just about the bottom line. It’s more about providing a great product, operating with integrity, and supporting our clients and our team. During the past four years our team has grown, and thousands of practitioners and patients use our product. To this end, we’re looking for a Site Reliability Engineer to join our growing team.

What You’ll Do

As the Site Reliability Engineer (SRE), you will play a pivotal role managing the future of our technology. You will work with our current SRE and engineering team to tune, optimize and enhance our Amazon Web Services Infrastructure. If you're passionate about building and maintaining highly available, scalable systems and thrive in a fast-paced environment, we'd love to hear from you!


Primary Responsibilities

  • Design, implement, and maintain scalable and reliable cloud infrastructure on AWS
  • Manage and optimize Kubernetes clusters using Amazon EKS
  • Develop and maintain Infrastructure as Code using Terraform
  • Implement and improve CI/CD pipelines using GitHub Actions and ArgoCD
  • Ensure system security and implement best practices
  • Monitor and optimize system performance using Grafana and Prometheus
  • Track our AWS spending and suggest ways to cut operating costs
  • Troubleshoot and resolve complex issues in production environments
  • Collaborate with development teams to improve application reliability and performance
  • Participate in On Call rotation with other SREs and engineering team membe

Required Skills

  • Extensive experience with AWS services and best practices
  • Proficiency in managing Kubernetes clusters, particularly Amazon EKS
  • Strong knowledge of Helm for Kubernetes package management
  • Extensive experience with Infrastructure as Code, specifically Terraform
  • Familiarity with CI/CD pipelines, particularly GitHub Actions
  • Advanced Linux administration skills
  • Solid understanding of networking concepts and protocols
  • Experience in implementing and maintaining security best practices
  • Proficiency in using monitoring and observability tools, especially Grafana and Prometheus
Qualifications
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 3+ years of experience in a Site Reliability Engineering or similar role
  • Strong problem-solving skills and attention to detail
  • Excellent communication skills and ability to work in a team environment
  • Certifications in AWS, Kubernetes, or other relevant technologies are a plus

Compensation & Benefits

  • Competitive compensation based on experience
  • Comprehensive health, dental and vision benefits
  • 401(k) plan with matching company contribution
  • Short-term disability & long-term disability insurance
  • Paid Time Off and company holidays 
  • Full suite of remote working tools and processes

 Location: 100% Remote

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. 

Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*