Senior System Software Engineer - DC Platform Software Tools
Company: NVIDIA Corporation
Location: Santa Clara
Posted on: April 13, 2025
Job Description:
Senior System Software Engineer - DC Platform Software
ToolsSenior System Software Engineer - DC Platform Software ToolsWe
are looking for a: Senior System Software Engineer - DC Platform SW
Tools. NVIDIA's invention of the GPU in 1999 sparked the growth of
the PC gaming market, redefined modern computer graphics, and
revolutionized parallel computing. More recently, GPU deep learning
ignited modern deep learning - the next era of computing - with the
GPU acting as the brain of computers, robots, and self-driving cars
that can perceive and understand the world. Today, we are
increasingly known as "the AI computing company." We're looking to
grow our company and establish teams with the most thoughtful
people in the world. NVIDIA Grace and GPU superchips provide
performance and productivity required for strong scaling for HPC
and generative AI workload. Scale out is inherent to the design of
this massive superchip.We are looking for a Senior System Software
Engineer to join our Data Center Platform Software Tools team. You
will be responsible for the design, development, enhancement, and
deployment of tools in large-scale AI data centers. The primary
focus of these tools is to provide simple user experience in the
data center manageability life cycle from deployment, production,
service, and repair workflows. You will work closely with
cross-functional teams, including hardware engineers, system
architects, software developers, and customers to gather
requirements, create solutions and provide end-to-end simplified
manageability experience. Are you ready to change the next
generation of computing? Join us at the forefront of technological
advancement.What you'll be doing:
- Drive next generation GPU Server Software manageability
workflows for scaling AI infrastructure for Datacenters. This
infrastructure includes DGX, HGX or MGX Products. You will be
involved in ensuring proper tools are built for managing Server
Software and Firmware for data center lifecycle.
- Work with internal and external customers to understand
requirements for various tools to improve debuggability,
serviceability and runtime of data center firmware and
software.
- Contribute to all phases of product development, from product
definition, architecture, and design, through implementation,
debugging, testing and early customer support.
- Maintain detailed documentation of tool designs, capabilities,
and usage guidelines. Provide regular reports and technical
insights to internal teams on the effectiveness and improvements of
developed tools.
- Define KPIs for tools and work across various stakeholders to
improve it over time.What we need to see:
- BS, MS, or PhD in EE/CS or related field of education (or
equivalent experience) with 10+ years of experience.
- Proven record of having worked in management solutions for
large scale clusters in data centers.
- Strong and demonstrable skill in Python.
- Experience programming and debugging skills for large scale
data centers.
- Experience in SCM (e.g., Git, Perforce) and project management
tools like Jira.
- Possess excellent written and oral communication skills,
excellent work ethics, a deep sense of teamwork, love to produce
quality work and commitment to finish your tasks every single
day.
- You are a self-starter who loves to find creative solutions to
complicated problems and hands on with coding.Ways to stand out
from the crowd:
- Worked on data center deployment and management projects.
- Hands on with x86 or ARM system architecture.
- Are familiar with processor microarchitecture such as caches,
pipelining, memory hierarchy, and instruction set architecture
(ISA). Experience with code coverage and static analysis
tools.NVIDIA is widely considered to be one of the technology
world's most desirable employers. We have some of the most
forward-thinking and hardworking people on the planet working for
us. If you're creative and autonomous, we want to hear from you!The
base salary range is 184,000 USD - 356,500 USD. Your base salary
will be determined based on your location, experience, and the pay
of employees in similar positions.You will also be eligible for
equity and benefits. NVIDIA accepts applications on an ongoing
basis.NVIDIA is committed to fostering a diverse work environment
and proud to be an equal opportunity employer. As we highly value
diversity in our current and future employees, we do not
discriminate (including in our hiring and promotion practices) on
the basis of race, religion, color, national origin, gender, gender
expression, sexual orientation, age, marital status, veteran
status, disability status or any other characteristic protected by
law.About UsNVIDIA is the world leader in accelerated
computing.NVIDIA pioneered accelerated computing to tackle
challenges no one else can solve. Our work in AI and digital twins
is transforming the world's largest industries and profoundly
impacting society.
#J-18808-Ljbffr
Keywords: NVIDIA Corporation, Santa Clara , Senior System Software Engineer - DC Platform Software Tools, IT / Software / Systems , Santa Clara, California
Didn't find what you're looking for? Search again!
Loading more jobs...