VinRobotics – Nơi công nghệ được tạo ra với mục tiêu phục vụ và nâng tầm cuộc sống.

Về chúng tôi Tin tức Tuyển dụng

Trang chủ

Tuyển dụng

Edge Optimization Engineer (Junior/Senior)

Hanoi / Ho Chi Minh City
Fulltime

Overview

You’ll own end-to-end performance on the robot edge: building deep-dive profilers, pinpointing kernel/op/memory bottlenecks across the full data path (model -> dataloader/IPC -> I/O -> control loops), and shipping deterministic builds that stay rock-solid under real duty cycles (thermal, brownouts, EMI). Your scope spans GPU/NPU edge targets (e.g., Nvidia Jetson Orin, TI EVMs, etc.) with clear latency budgets per subsystem (perception/VLA, policy/RL, control).

Key Responsibilities

Own end-to-end profiling. Run Nsight/PyTorch Profiler/tegrastats traces across the full perception -> policy ->control path; attribute stalls to specific ops/kernels, memory moves, cache misses, or IPC.
Set and enforce budgets. Define per-module latency/throughput/energy targets (e.g., camera→VLA ≤X ms, policy step ≤Y ms, control loop 1 kHz) and guard them with automated checks.
Kill bottlenecks fast. Apply op fusion, graph capture, async pipelines, memory tiling, pinned buffers, and stream concurrency; rewrite hot paths in CUDA/Triton when needed.
Tune per device. Generate deterministic builds for edge devices; measure kernel occupancy, tensor-core usage, DMA overlap; lock clocks when appropriate.
Package optimized artifacts. Emit TensorRT engines/ONNX bundles, quantized weights, calibration sets, and config manifests; version everything for rollback.
On-call for performance. Triage field logs, root-cause perf regressions, and ship hotfix engines without violating safety or RT guarantees.
Continuously raise the bar. Track new compiler/runtime features (TensorRT/TVM/Inductor), add them behind flags, and land wins once validated against budgets.

Required Qualifications

Profiling & diagnosis. Expert with Nsight, PyTorch Profiler, tegrastats; you attribute latency/throughput regressions to exact kernels/ops and memory traffic, then present an ROI-clear fix plan.
Real-time mindset. You set/meet per-module latency budgets and protect hard real-time control paths during optimization and deployment.
Deployment engineering. Containerized, deterministic builds that emit TRT engines/quantized weights; repeatable, one-command device bring-up.
Hardware breadth. Hands-on with Nvidia Jetson Orin, TI EVMs, etc.; comfortable validating under thermal/brownout/EMI stress so behavior stays stable.

Preferred Qualifications

Compiler/runtime chops (TensorRT, ONNX Runtime, TVM, Triton, Executorch) and op-fusion/graph-capture to reduce stalls and memory traffic. (Aligns with packaging TRT engines & deterministic pipelines.)
VLA/RL awareness to set subsystem budgets and validate perception-to-action latency end-to-end.