Tesla’s Vehicle Engineering team is currently seeking a site reliability engineer (SRE) to focus on improving our in-house manufacturing and supply chain computer vision platform. This role will contribute to many of our other purpose-built applications that automate and improve the process of designing, building, and delivering Tesla products, globally.
Tesla’s Vehicle Engineering team, in general, owns the centralized technical aspects of our multi-continent Gigafactory network and product line. The systems we work on include Tesla’s most important programs, such as Cybertruck, cell production, Robotaxi, and new factories. The software team is responsible for making these systems and operations every bit as intelligent and dynamic as the products themselves.
As a member of the Engineering Automation team, you will lead our efforts to deploy software in Texas, Berlin, and beyond. We are currently looking for an experienced SRE to scale our infrastructure to multiple datacenters, while improving our existing CI pipeline, CD flow, git-ops, and other devops-related tasks on our high-availability and data-intensive applications. In SCOA, engineers have the
ownership freedom to push forward their recommendations, experiment, and ultimately
drive new technology into the production line to positively impact the company and our customers.
Responsibilities
Derive functional (application) and non-functional (availability, performance, security, and maintainability, cost) requirements for global deployments and scaling needs
Collaborate with stakeholders across regions to understand deployment requirements
Deliver on deployment requirements with proactive monitoring tools and dashboards
Maintain existing infrastructure deployed both on AWS and private Kubernetes clusters
Recommend latest ops best practices, with a focus on automation
Write technical documentation and runbooks
Requirements
BS in Computer Science/related field, or equivalent industry experience
Fluency with tools for orchestration in AWS or a private cloud, especially with Kubernetes, using tools like Terraform, Ansible, ArgoCD, and Docker
Demonstrated ability to respond to production outages and implement long-term fixes
Familiarity with of machine-to-machine communication concepts, i.e., TCP/IP, web sockets, and RPC
Understanding of when to use and how to set up message brokers, e.g., Kafka and AWS Kinesis
Ability to choose the right storage system, e.g., relational DBs (AWS RDS for Postgres, CockroachDB), cache (Redis, Elastic Search), S3 (public and private clouds), time-series DBs (InfluxDB, Prometheus) based on the application need
Relevant experience with the Go/Python/JS or other programming languages
Employee Benefits
As a full time Tesla employee you will receive full benefits from day 1 for you and your dependents.
Kaiser and UnitedHealthcare PPO and HSA plans (including infertility coverage)
3 medical plan choices with $0 paycheck contribution
Vision & dental plans (including orthodontic coverage)
Company paid Life, AD&D, short-term and long-term disability
401(k), Employee Stock Purchase Plans, and other financial benefits
Employee Assistance Program, Paid Time Off, and Paid Holidays
Back-up childcare and employee discounts