[ Master Profile / Non-Targeted ]

Josef Doornink

Site Reliability Engineer | AI Infrastructure & MLOps

Site Reliability Engineer with 12+ years specializing in distributed systems, cloud infrastructure, and large-scale Kubernetes environments (CKS/CKA certified). Focused on building platform tooling that drives engineering productivity and eliminates toil. Currently bridging SRE and AI by scaling MLOps pipelines and model serving infrastructure. Deep expertise in observability, performance tuning, and infrastructure automation.

Certifications

Featured Architecture & Tooling

Aggregated Skills

Core Technologies
Kubernetes (AKS)DockerTerraformAzureGCPHelmYAMLPythonGo/Golang
AI/ML Engineering
LLM Model ServingML PipelinesDistributed Training SystemsRLHF InfrastructureAzure Machine LearningVertex AIModel Finetuning Systems
Observability & SRE
New RelicDistributed TracingCI/CD PipelinesGitHub ActionsPerformance ProfilingIncident ResponsePrometheusGrafana

Professional Experience - Startup

REASON BENEFIT AI CORPORATION

Lead MLOps Engineer
October 2025Present
  • Architect and maintain large-scale Azure Kubernetes Service (AKS) production environment for ML model training and serving, supporting distributed model inference at scale.
  • Build Python-based automation tools for ML pipeline orchestration, reducing manual overhead by 70% and accelerating model deployment velocity.
  • Integrate with observability frameworks for model performance tracking, latency monitoring, and resource utilization across distributed training systems.
  • Optimize model serving infrastructure through performance profiling and system-level optimizations, improving inference throughput by 40%.
  • Collaborate with research teams to translate experimental model architectures into production-ready systems with focus on reliability and scalability.
  • Implement automated testing frameworks for ML pipelines to quickly detect regressions and ensure model quality in production.

Professional Experience

Trimble/Viewpoint

Lead Site Reliability Engineer (SRE) I -> II -> III
January 2019Present
  • Architected and maintained large-scale Azure Kubernetes Service (AKS) production environments handling 10M+ requests/day across 30+ microservices with 99.9% uptime SLA.
  • Developed high-performance automation tools using Python and Go that eliminated 80+ hours/month of operational toil, accelerating deployment velocity by 3x across engineering teams.
  • Led performance optimization initiatives through systematic profiling and instrumentation, reducing P99 latency by 45% and improving throughput by 60% for distributed systems.
  • Built custom CLI tooling in Go (Cobra framework) to streamline workflows for 50+ engineers, dramatically improving team productivity through better developer experience.
  • Designed and implemented comprehensive observability stack (Prometheus, Grafana, Azure Monitor) with distributed tracing for debugging performance bottlenecks in distributed microservices.
  • Led capacity planning and performance optimization for Kubernetes clusters and backend databases, implementing auto-scaling strategies supporting 200% traffic growth.
  • Created sophisticated CI/CD pipelines using GitHub Actions and Azure DevOps with automated testing, sophisticated deployment templates, and rollback mechanisms.
  • Implemented Infrastructure as Code using Terraform managing 500+ cloud resources, enabling consistent and repeatable infrastructure provisioning.

Viewpoint

Software Developer
March 2018January 2019
  • Developed cloud-based SaaS applications using .NET and Angular, migrating on-premise software solutions to Azure cloud platform.
  • Built RESTful APIs for multi-tenant applications serving thousands of users with focus on performance and scalability.

Onfulfillment

Software Developer I
March 2014March 2018
  • Engineered multi-tenant e-commerce platform using Microsoft Stack (.NET, C#, SQL Server) integrated with third-party SaaS APIs.
  • Led 'uplift' initiative migrating legacy codebase to modern greenfield platform, improving response times by 40% measured through New Relic APM.

Legacy Biomechanics Research Lab

Biomechanical Research Engineer II
20072013
  • Lead test and development engineer for NIH-funded multimillion-dollar research project focused on bone fixation solutions.
  • Managed successful implant creation, delivery, and test methodology producing multiple US FDA-approved implants.

Publications (Subset of 11)

Far cortical locking can reduce stiffness of locked plating constructs while retaining construct strength

Citations:301
The Journal of Bone and Joint Surgery (JBJS) • 2009

M Bottlang, J Doornink, DC Fitzpatrick, SM Madey

Far cortical locking can improve healing of fractures stabilized with locking plates

Citations:299
The Journal of Bone and Joint Surgery (JBJS) • 2010

M Bottlang, M Lesser, J Koerber, J Doornink, S Mueller, DC Fitzpatrick...

Effects of construct stiffness on healing of fractures stabilized with locking plates

Citations:246
The Journal of Bone and Joint Surgery (JBJS) • 2010

M Bottlang, J Doornink, TJ Lujan, DC Fitzpatrick, PV Marsh...

Far cortical locking enables flexible fixation with periarticular locking plates

Citations:97
Journal of Orthopaedic Trauma • 2011

J Doornink, DC Fitzpatrick, SM Madey, M Bottlang

★ First Author

Effects of hybrid plating with locked and nonlocked screws on the strength of locked plating constructs in the osteoporotic diaphysis

Citations:67
Journal of Trauma and Acute Care Surgery • 2010

J Doornink, DC Fitzpatrick, S Boldhaus, SM Madey, M Bottlang

★ First Author

Education

University of California, Davis

Master of Science
Class of 2006

California State University, Chico

Bachelor of Science, Mechanical Engineering
Class of 2003