Model Optimization Engineer (Junior/Senior)

  • Hanoi / Ho Chi Minh City
  • Fulltime

Overview

You’ll lead the end-to-end optimization of large perception/action models (VLA/VLM, control policies, planners) for real-time robotics. Your work spans algorithmic compression (quantization, pruning, distillation, low-rank adaptation) and system-level acceleration (kernels, compilers, runtimes) to deliver lower latency, smaller memory/energy footprint, and deployment-grade stability on edge hardware. You’ll partner closely with RL, Controls, and Platform teams to (1) define accuracy/latency/energy budgets, (2) run disciplined ablations and calibration, and (3) work with Embedded Engineers/Edge Optimization Engineers to ship robust inference on edge devices (e.g., Jetson Orin, etc.).

Key Responsibilities

  • Apply QAT/PTQ quantization (INT8/FP8/INT4), pruning (unstructured/structured), low-rank (LoRA/DoRA), distillation (task- and policy-level).

  • Design loss/finetune schedules to meet accuracy-drop budgets (task success, safety margins).

  • Define performance metrics, test learned policies, and evaluate their effectiveness in real-world scenarios.

Required Qualifications

  • Core skills:

    • Hands-on with model compression: PTQ/QAT (INT8/FP8/INT4), structured/unstructured pruning, sparsity, distillation (task/policy-level), LoRA/DoRA.

    • Inference stacks & compilers: TensorRT, ONNX Runtime, TVM, Torch-Dynamo/Inductor, Triton, CUDA; profiling & kernel-level bottleneck hunting.

    • Calibration & evaluation rigor: representative set selection, loss/metric design, drift checks, ablation matrices (method × precision × sparsity).

  • Deployment: Comfort with mixed precision, memory tiling, operator fusion, and runtime graph capture.

  • Software: Strong Python/C++ engineering, clean interfaces, benchmarking harnesses, CI-friendly experiment scripts; clear reporting (plots/tables/videos).

Preferred Qualifications

  • Quantization-aware finetuning for sequential/causal models (transformers, diffusion-style policies), token sparsity, KV-cache optimization.

  • Experience with sensor pipelines (RGB-D, event/depth/IMU) and perception-to-action latency budgeting.

  • Hardware-specific tuning (SM scheduling, occupancy, tensor cores, DMA overlap), or DSP/NPU toolchains.

  • Publications or open-source contributions in model optimization & compilers.

Cơ hội việc làm tương tự

Liên hệ

Trở thành Đối tác của chúng tôi