Microsoft
Office 365 is at the center of Microsoft’s cloud first, devices first strategy as it brings together cloud versions of our most trusted communication and collaboration products like Exchange, SharePoint, and Teams with our cross-platform desktop suites and mobile apps. The Office 365 Enterprise Cloud team works with Microsoft’s largest enterprise and government customers to deliver features that meet their specific needs and enable cloud adoption. As you would expect, our customers have the highest expectations for feature quality, security, reliability, availability, and performance.
The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Strong collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our government customers and users.
At Microsoft, we can offer you a strong team, exciting challenges, and a fun place to work. The work environment empowers you to have a positive impact on millions of end users.
Responsibilities
Independently develops code or scripts that automate the performance of repetitive and easily scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale.
Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams and owners to major customer impacting issues and escalates resolution of highly impactful issues affecting multiple components or features to other engineers or engineering teams as needed.
Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings.
Develops and tests basic changes to optimize code and improve the observability, reliability and operability of a defined range of platform, system, or product components or features with direction from other engineers.
Leverages technical expertise in large scale distributed systems and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or code to improve the availability, reliability, efficiency, observability, and performance of product components or features supported by their team.
Demonstrates expertise in distributed systems design, interactions between cloud technology layers and components, common dependencies at scale, and the code that defines infrastructures. Can identify and recommend configurations optimal of cloud technology solutions and modify the code base that defines systems or cloud technologies to improve the reliability and operability of supported products with minimal guidance from other engineers.
Qualifications
Required/Minimum Qualifications
4+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor’s Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
OR Master’s Degree in Computer Science, Information Technology, or related field.
Additional or Preferred Qualifications
5+ years technical experience in software engineering, network engineering, or systems administration
OR Bachelor’s Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
OR Master’s Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration.
Familiarity with one or more general purpose programming languages including but not limited to: Java, C/C++, C#, Python, JavaScript, PowerShell
Experience with the Microsoft cloud and/or stack including O365, Azure, Windows or other Microsoft software/services
Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns
Demonstrated ability to debug, fix, and optimize code
Full-stack troubleshooting skills across network, application, hardware, management fabric, and distributed services layers
Excellent communications skills, both verbal and written
Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Candidates must have an active TS and be willing to upgrade to TS/SCI (with polygraph) or have an active TS/SCI and be willing to upgrade to TS/SCI (with polygraph). This role will require candidates to maintain the TS/SCI (with polygraph) clearance.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirement may result in employment action up to and including termination.
Site Reliability Engineering IC3 – The typical base pay range for this role across the U.S. is USD $94,300 – $182,600 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $120,900 – $198,600 per year.
Microsoft has different base pay ranges for different work locations within the United States, which allows us to pay employees competitively and consistently in different geographic markets (see below). The range above reflects the potential base pay across the U.S. for this role (except as noted below); the applicable base pay range will depend on what ultimately is determined to be the candidate’s primary work location. Individual base pay depends on various factors, in addition to primary work location, such as complexity and responsibility of role, job duties/requirements, and relevant experience and skills. Base pay ranges are reviewed and typically updated each year. Offers are made within the base pay range applicable at the time.
At Microsoft certain roles are eligible for additional rewards, including merit increases, annual bonus and stock. These awards are allocated based on individual performance. In addition, certain roles also have the opportunity to earn sales incentives based on revenue or utilization, depending on the terms of the plan and the employee’s role. Benefits/perks listed here may vary depending on the nature of employment with Microsoft and the country work location. U.S.-based employees have access to medical, dental, and vision insurance, a 401(k) plan and company match, short-term and long-term disability coverage, basic life insurance, and wellbeing benefits, among others. U.S.-based employees also receive, per calendar year, up to 10 scheduled paid holidays, and up to 80 hours Holistic Health Time Off. Additionally, hourly/non-exempt employees accrue up to 120 hours paid vacation time, and salaried/exempt employees have Discretionary Time Off (DTO).
Our Commitment to Pay Equity
We are committed to the principle of pay equity – paying employees equitably for substantially similar work. To learn more about pay equity and our other commitments to increase representation and strengthen our culture of inclusion, check out our annual Diversity & Inclusion Report. ( https://www.microsoft.com/en-us/diversity/inside-microsoft/annual-report )
Understanding Roles at Microsoft
The top of this page displays the role for which the base pay ranges apply – Site Reliability Engineering IC3.
The way we define roles includes two things: discipline (the type of work) and career stage (scope and complexity). The career stage has two parts – the first identifies whether the role is a manager (M), an individual contributor (IC), an admin-technician-retail (ATR) job, or an intern. The second part identifies the relative seniority of the role – a higher number (or later letter alphabetically in the case of ATR) indicates greater scope and complexity.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.