VinRobotics – Nơi công nghệ được tạo ra với mục tiêu phục vụ và nâng tầm cuộc sống.

Về chúng tôi Tin tức Tuyển dụng

Trang chủ

Tuyển dụng

Model Optimization Engineer (Junior/Senior)

Hanoi / Ho Chi Minh City
Fulltime

Overview

You’ll lead the end-to-end optimization of large perception/action models (VLA/VLM, control policies, planners) for real-time robotics. Your work spans algorithmic compression (quantization, pruning, distillation, low-rank adaptation) and system-level acceleration (kernels, compilers, runtimes) to deliver lower latency, smaller memory/energy footprint, and deployment-grade stability on edge hardware. You’ll partner closely with RL, Controls, and Platform teams to (1) define accuracy/latency/energy budgets, (2) run disciplined ablations and calibration, and (3) work with Embedded Engineers/Edge Optimization Engineers to ship robust inference on edge devices (e.g., Jetson Orin, etc.).

Key Responsibilities

Apply QAT/PTQ quantization (INT8/FP8/INT4), pruning (unstructured/structured), low-rank (LoRA/DoRA), distillation (task- and policy-level).
Design loss/finetune schedules to meet accuracy-drop budgets (task success, safety margins).
Define performance metrics, test learned policies, and evaluate their effectiveness in real-world scenarios.

Required Qualifications

Core skills:
- Hands-on with model compression: PTQ/QAT (INT8/FP8/INT4), structured/unstructured pruning, sparsity, distillation (task/policy-level), LoRA/DoRA.
- Inference stacks & compilers: TensorRT, ONNX Runtime, TVM, Torch-Dynamo/Inductor, Triton, CUDA; profiling & kernel-level bottleneck hunting.
- Calibration & evaluation rigor: representative set selection, loss/metric design, drift checks, ablation matrices (method × precision × sparsity).
Deployment: Comfort with mixed precision, memory tiling, operator fusion, and runtime graph capture.
Software: Strong Python/C++ engineering, clean interfaces, benchmarking harnesses, CI-friendly experiment scripts; clear reporting (plots/tables/videos).

Preferred Qualifications

Quantization-aware finetuning for sequential/causal models (transformers, diffusion-style policies), token sparsity, KV-cache optimization.
Experience with sensor pipelines (RGB-D, event/depth/IMU) and perception-to-action latency budgeting.
Hardware-specific tuning (SM scheduling, occupancy, tensor cores, DMA overlap), or DSP/NPU toolchains.
Publications or open-source contributions in model optimization & compilers.