Senior DevOps Engineer / Site Reliability Engineer (SRE)
Job Description:
A leading B2B SaaS platform in the cross-border e-commerce sector, is expanding its North America operations. We're seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to architect and maintain our unified global O&M (operations and maintenance) platform.
This is a newly created role supporting our North America team's contribution. You'll work directly with our Middle Platform Director, Technical Experts, and CEO in a collaborative, remote-first environment.
KEY RESPONSIBILITIES:
• Design, develop, and maintain unified operation and platform management systems covering resource management, monitoring & alerting, configuration management, and automated operation & maintenance
• Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response processes to realize intelligent O&M
• Establish DevOps standards and best practices; promote standardization of DevOps toolchains (technology selection, version management)
• Provide platform-level technical support for product and engineering teams; resolve complex system issues, reduce technical debt, and lead infrastructure and architecture upgrades
• Promote SRE concepts and engineering practices; organize technical sharing and training; build a reliability engineering system
• Conduct technical research and innovation; track cloud-native/DevOps industry trends; evaluate new technologies and drive continuous modernization of O&M platforms
REQUIRED QUALIFICATIONS:
• Currently residing in California or North Carolina, USA
• US Green Card or US Citizenship (work authorization; no sponsorship available)
• Fluent in Mandarin Chinese (working language; close collaboration with domestic R&D required)
• Bachelor's degree or above in Computer Science or related field
• 4-6 years of hands-on experience in DevOps/SRE/Platform Engineering
• Proficient in at least one major cloud platform (AWS/Azure/GCP) with deep understanding of VPC, EC2, EKS/K8s, RDS, IAM
• Proficient in Linux, networking, containers (Docker/Kubernetes), load balancing, and service governance
• Skilled in IaC (Infrastructure as Code) tools: Terraform, Ansible, Helm
• Experience building CI/CD pipelines: Jenkins, Argo CD, CodeBuild, etc.
• Familiar with monitoring/logging/tracing: Prometheus, Grafana, ELK, OpenTelemetry
• Proficient in at least one development/scripting language: Python, Shell, Go
• Excellent system design, analysis, and troubleshooting skills
• Strong cross-team communication and collaboration abilities
PREFERRED QUALIFICATIONS:
• Master's degree in Computer Science or related field
• Experience with global platforms, cross-border SRE, multi-cloud O&M
• Led platform reconstruction, self-healing systems, or observability initiatives
• Go development, service mesh, chaos engineering, capacity planning experience
• Demonstrated success improving system availability, reducing incident rates, increasing automation
• Global technical vision and cross-cultural collaboration experience
• Result-oriented, self-driven, experienced in technical evangelism/sharing
COMPENSATION:
• Base Salary: $140,000 - $160,000 annually (top candidates may receive 5-10% upward adjustment)
• 401(k): Dollar-for-dollar match, up to 4% of salary
• Medical Insurance
• PTO: 12 days annually
• Social Security & Housing Fund: Contributed per US legal requirements
WORK ENVIRONMENT:
• Location: Silicon Valley, CA OR Raleigh, NC (homebase available)
• Department: Tech O&M Department
• Working Style: Remote-first
• Hours: 8 hours per day, weekends off
• Travel: No business travel required
• Expected Start: ASAP
Interview Process: Round 1 (Online): Middle Platform Director + Technical Expert | Round 2 (Online): Head of HR | Round 3 (Online): CEO/Founder