Machine Learning Engineer, Staff - Model Factory
Company: d-Matrix
Location: Santa Clara
Posted on: March 29, 2025
Job Description:
At d-Matrix, we are focused on unleashing the potential of
generative AI to power the transformation of technology. We are at
the forefront of software and hardware innovation, pushing the
boundaries of what is possible. Our culture is one of respect and
collaboration.We value humility and believe in direct
communication. Our team is inclusive, and our differing
perspectives allow for better solutions. We are seeking individuals
passionate about tackling challenges and are driven by execution.
Ready to come find your playground? Together, we can help shape the
endless possibilities of AI.Location:Hybrid, working onsite at our
Santa Clara, CA headquarters 3-5 days per week.Job Title: Machine
Learning Engineer, Staff - d-Matrix Model FactoryWhat You Will
Do:d-Matrix is a pioneering company specializing in data center AI
inferencing solutions. Utilizing innovative in-memory computing
techniques, d-Matrix develops cutting-edge hardware and software
platforms designed to enhance the efficiency and scalability of
generative AI applications.The Model Factory team at d-Matrix is at
the heart of cutting-edge AI and ML model development and
deployment. We focus on building, optimizing, and deploying
large-scale machine learning models with a deep emphasis on
efficiency, automation, and scalability for the d-Matrix hardware.
If you're excited about working on state-of-the-art AI
architectures, model deployment, and optimization, this is the
perfect opportunity for you!What You Will Bring:
- Design, build, and optimize machine learning deployment
pipelines for large-scale models.
- Implement and enhance model inference frameworks.
- Develop automated workflows for model development,
experimentation, and deployment.
- Collaborate with research, architecture, and engineering teams
to improve model performance and efficiency.
- Work with distributed computing frameworks (e.g., PyTorch/XLA,
JAX, TensorFlow, Ray) to optimize model parallelism and
deployment.
- Implement scalable KV caching and memory-efficient inference
techniques for transformer-based models.
- Monitor and optimize infrastructure performance across
different levels of custom hardware hierarchy - cards, servers, and
racks, which are powered by the d-Matrix custom AI chips.
- Ensure best practices in ML model versioning, evaluation, and
monitoring.Required Qualifications:
- BS in Computer Science with 7+ years or MS in Computer Science
with 4+ years of experience.
- Strong programming skills in Python and experience with ML
frameworks like PyTorch, TensorFlow, or JAX.
- Hands-on experience with model optimization, quantization, and
inference acceleration.
- Deep understanding of Transformer architectures, attention
mechanisms, and distributed inference (Tensor Parallel, Pipeline
Parallel, Sequence Parallel).
- Knowledge of quantization (INT8, BF16, FP16) and
memory-efficient inference techniques.
- Solid grasp of software engineering best practices, including
CI/CD, containerization (Docker, Kubernetes), and MLOps.
- Strong problem-solving skills and ability to work in a
fast-paced, iterative development environment.Preferred
Qualifications:
- Experience working with cloud-based ML pipelines (AWS, GCP, or
Azure).
- Experience with LLM fine-tuning, LoRA, PEFT, and KV cache
optimizations.
- Contributions to open-source ML projects or research
publications.
- Experience with low-level optimizations using CUDA, Triton, or
XLA.Why Join Model Factory?
- Work at the intersection of AI software and custom AI hardware,
enabling cutting-edge model acceleration.
- Collaborate with world-class engineers and researchers in a
fast-moving AI-driven environment.
- Freedom to experiment, innovate, and build scalable
solutions.
- Competitive compensation, benefits, and opportunities for
career growth.This role is ideal for a self-motivated engineer
interested in applying advanced memory management techniques in the
context of large-scale machine learning inference. If you're
passionate about implementing and optimizing machine learning
models for custom Silicon, and are excited to explore cutting-edge
solutions in model inference, we encourage you to apply.Equal
Opportunity Employment Policyd-Matrix is proud to be an equal
opportunity workplace and affirmative action employer. We're
committed to fostering an inclusive environment where everyone
feels welcomed and empowered to do their best work. We hire the
best talent for our teams, regardless of race, religion, color,
age, disability, sex, gender identity, sexual orientation,
ancestry, genetic information, marital status, national origin,
political affiliation, or veteran status. Our focus is on hiring
teammates with humble expertise, kindness, dedication, and a
willingness to embrace challenges and learn together every
day.d-Matrix does not accept resumes or candidate submissions from
external agencies. We appreciate the interest and effort of
recruitment firms, but we kindly request that individuals
interested in opportunities with d-Matrix apply directly through
our official channels. This approach allows us to streamline our
hiring processes and maintain a consistent and fair evaluation of
all applicants. Thank you for your understanding and
cooperation.
#J-18808-Ljbffr
Keywords: d-Matrix, Santa Clara , Machine Learning Engineer, Staff - Model Factory, Engineering , Santa Clara, California
Didn't find what you're looking for? Search again!
Loading more jobs...