[ Master Profile / Non-Targeted ]

Josef Doornink

Site Reliability Engineer | AI Infrastructure & MLOps

Site Reliability Engineer with 12+ years specializing in distributed systems, cloud infrastructure, and large-scale Kubernetes environments (CKS/CKA certified). Focused on building platform tooling that drives engineering productivity and eliminates toil. Currently bridging SRE and AI by scaling MLOps pipelines and model serving infrastructure. Deep expertise in observability, performance tuning, and infrastructure automation.

Certifications & Courses

Machine Learning Specialization
Machine Learning Specialization
Stanford / Coursera • September 2025
HashiCorp Certified Terraform Associate
HashiCorp Certified Terraform Associate
HashiCorp • July 2022
Production Machine Learning Systems
Production Machine Learning Systems
Google Cloud / Coursera • April 2026

Featured Architecture & Tooling

Aggregated Skills

Core Technologies
Kubernetes (AKS)DockerTerraformAzureAWSGCPHelmYAMLPythonGo/GolangBashC#SQLKafkaRedisElasticSearch
AI/ML Engineering
LLM Model ServingML PipelinesDistributed Training SystemsRLHF InfrastructureAzure Machine LearningVertex AIDrift DetectionModel Finetuning Systems
Observability & SRE
New RelicAzure MonitorDistributed TracingCI/CD PipelinesGitHub ActionsPerformance ProfilingIncident ResponsePrometheusGrafanaAzure DevOps

Professional Experience

REASON BENEFIT AI CORPORATION

🚀 Startup
Lead MLOps Engineer
October 2025 — Present
  • Architect and maintain large-scale Azure Kubernetes Service (AKS) production environment for ML model training and serving, supporting distributed model inference at scale.
  • Integrate with observability frameworks for model performance tracking, latency monitoring, and resource utilization across distributed training systems.
  • Collaborate with research teams to translate experimental model architectures into production-ready systems with focus on reliability and scalability.
  • Led Distributed Training Pipeline Optimization by profiling RL (Reinforcement Learning) pipeline bottlenecks by introducing automated pipeline scripts and validations to improve training cycle time.

Trimble/Viewpoint

Lead Site Reliability Engineer (SRE) I -> II -> III
January 2019 — Present
  • Architected AKS production environments handling 10M+ requests/day across 30+ microservices at 99.9% uptime SLA.
  • Built automation tooling in Python and Go eliminating 80+ hours/month of toil and accelerating deployment velocity 3x.
  • Developed yaml-based ML pipeline orchestration tools, reducing manual overhead by 70%.
  • Reduced P99 latency 45% and improved throughput 60% through systematic profiling of distributed systems.
  • Developed Go CLI tooling (Cobra) adopted by 50+ engineers for streamlined infrastructure workflows.
  • Implemented New Relic observability stack with distributed tracing, cutting MTTR by 50%.
  • Led Kubernetes capacity planning and auto-scaling strategies supporting 200% traffic growth.
  • Built CI/CD pipelines (GitHub Actions, Azure DevOps) with automated testing and rollback mechanisms.
  • Managed 500+ cloud resources via Terraform IaC; implemented CKS security controls for SOC2 compliance.

Viewpoint

Software Developer
March 2018 — January 2019
  • Developed cloud-based SaaS applications using .NET and Angular, migrating on-premise software solutions to Azure cloud platform.
  • Built RESTful APIs for multi-tenant applications serving thousands of users with focus on performance and scalability.

Onfulfillment

Software Developer I
March 2014 — March 2018
  • Engineered multi-tenant e-commerce platform using Microsoft Stack (.NET, C#, SQL Server) integrated with third-party SaaS APIs.
  • Led 'uplift' initiative migrating legacy codebase to modern greenfield platform, improving response times by 40% measured through New Relic APM.

Legacy Biomechanics Research Lab

Biomechanical Research Engineer II
2007 — 2013
  • Lead test and development engineer for NIH-funded multimillion-dollar research project focused on bone fixation solutions.
  • Managed successful implant creation, delivery, and test methodology producing multiple US FDA-approved implants.

Publications (Subset of 11)

The Journal of Bone and Joint Surgery (JBJS) • 2009

M Bottlang, J Doornink, DC Fitzpatrick, SM Madey

The Journal of Bone and Joint Surgery (JBJS) • 2010

M Bottlang, M Lesser, J Koerber, J Doornink, S Mueller, DC Fitzpatrick...

The Journal of Bone and Joint Surgery (JBJS) • 2010

M Bottlang, J Doornink, TJ Lujan, DC Fitzpatrick, PV Marsh...

Journal of Orthopaedic Trauma • 2011

J Doornink, DC Fitzpatrick, SM Madey, M Bottlang

★ First Author
Journal of Trauma and Acute Care Surgery • 2010

J Doornink, DC Fitzpatrick, S Boldhaus, SM Madey, M Bottlang

★ First Author

Education

University of California, Davis

Master of Science
Class of 2006

California State University, Chico

Bachelor of Science, Mechanical Engineering
Class of 2003