Senior DevOps Engineer / Site Reliability Engineer (SRE)

Raleigh, North Carolina, United States
Full-Time
Remote

Job Description:

A leading B2B SaaS platform in the cross-border e-commerce sector, is expanding its North America operations. We're seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to architect and maintain our unified global O&M (operations and maintenance) platform.

This is a newly created role supporting our North America team's contribution. You'll work directly with our Middle Platform Director, Technical Experts, and CEO in a collaborative, remote-first environment.

KEY RESPONSIBILITIES:

• Design, develop, and maintain unified operation and platform management systems covering resource management, monitoring & alerting, configuration management, and automated operation & maintenance

• Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response processes to realize intelligent O&M

• Establish DevOps standards and best practices; promote standardization of DevOps toolchains (technology selection, version management)

• Provide platform-level technical support for product and engineering teams; resolve complex system issues, reduce technical debt, and lead infrastructure and architecture upgrades

• Promote SRE concepts and engineering practices; organize technical sharing and training; build a reliability engineering system

• Conduct technical research and innovation; track cloud-native/DevOps industry trends; evaluate new technologies and drive continuous modernization of O&M platforms

REQUIRED QUALIFICATIONS:

• Currently residing in California or North Carolina, USA

• US Green Card or US Citizenship (work authorization; no sponsorship available)

• Fluent in Mandarin Chinese (working language; close collaboration with domestic R&D required)

• Bachelor's degree or above in Computer Science or related field

• 4-6 years of hands-on experience in DevOps/SRE/Platform Engineering

• Proficient in at least one major cloud platform (AWS/Azure/GCP) with deep understanding of VPC, EC2, EKS/K8s, RDS, IAM

• Proficient in Linux, networking, containers (Docker/Kubernetes), load balancing, and service governance

• Skilled in IaC (Infrastructure as Code) tools: Terraform, Ansible, Helm

• Experience building CI/CD pipelines: Jenkins, Argo CD, CodeBuild, etc.

• Familiar with monitoring/logging/tracing: Prometheus, Grafana, ELK, OpenTelemetry

• Proficient in at least one development/scripting language: Python, Shell, Go

• Excellent system design, analysis, and troubleshooting skills

• Strong cross-team communication and collaboration abilities

PREFERRED QUALIFICATIONS:

• Master's degree in Computer Science or related field

• Experience with global platforms, cross-border SRE, multi-cloud O&M

• Led platform reconstruction, self-healing systems, or observability initiatives

• Go development, service mesh, chaos engineering, capacity planning experience

• Demonstrated success improving system availability, reducing incident rates, increasing automation

• Global technical vision and cross-cultural collaboration experience

• Result-oriented, self-driven, experienced in technical evangelism/sharing

COMPENSATION:

• Base Salary: $140,000 - $160,000 annually (top candidates may receive 5-10% upward adjustment)

• 401(k): Dollar-for-dollar match, up to 4% of salary

• Medical Insurance

• PTO: 12 days annually

• Social Security & Housing Fund: Contributed per US legal requirements

WORK ENVIRONMENT:

• Location: Silicon Valley, CA OR Raleigh, NC (homebase available)

• Department: Tech O&M Department

• Working Style: Remote-first

• Hours: 8 hours per day, weekends off

• Travel: No business travel required

• Expected Start: ASAP

Interview Process: Round 1 (Online): Middle Platform Director + Technical Expert | Round 2 (Online): Head of HR | Round 3 (Online): CEO/Founder