Cudaq Guide 測試 Skill
CUDA-Q onboarding 指南 用於 installation, test programs, GPU simulation, QPU hardware, 與 quantum applications。
這個標籤底下共有 155 篇內容。
CUDA-Q onboarding 指南 用於 installation, test programs, GPU simulation, QPU hardware, 與 quantum applications。
Modify, build, test, debug, 與 contribute to NVIDIA cuOpt (C++/CUDA, Python, server, CI)。
Install cuOpt 用於 Python, C, 或 as a server (pip, conda, Docker) — system requirements, install commands, 與 verification。
LP, MILP, 與 QP (beta) 搭配 cuOpt — C API only。
LP, MILP, 與 QP (beta) 搭配 cuOpt — CLI only (MPS files, cuopt_cli)。
Solve Linear Programming (LP), Mixed-Integer Linear Programming (MILP), 與 Quadratic Programming (QP, beta) 搭配 the Python API。
Vehicle routing (VRP, TSP, PDP) 搭配 cuOpt — Python API only。
協助處理 Cuopt Server API Python 相關工作,並依原始 Skill 說明完成設定與執行。
cuOpt REST server — what it does 與 how requests flow。
Base rules 用於 end users calling NVIDIA cuOpt (routing/LP/MILP/QP/install/server)。
Numerical optimization (LP, MILP, QP) — concepts, problem-text parsing, 與 formulation 模式。
Vehicle routing (VRP, TSP, PDP) — problem types 與 data requirements。
After solving a non-trivial problem, detect generalizable learnings 與 propose skill updates so future interactions benefit automatically。
Use when writing DALI data loading 或 preprocessing code 搭配 `nvidia.dali.experimental.dynamic` (ndd), 或 when converting DALI pipeline-mode code to dynamic mode, 或 when the user..。
NVIDIA DeepStream SDK 9.0 development 搭配 Python pyservicemaker API。
> Use this skill to bring any vision model from HuggingFace 或 NVIDIA NGC into an NVIDIA DeepStream pipeline 搭配 端到端 automation: ONNX download, SafeTensors export, TRT engi..。
指南 用於 adding support 用於 new LLM 或 VLM models in Megatron-Bridge。
Dev environment setup 用於 Megatron Bridge — container-based development, uv package management, lockfile regeneration, adding dependencies, Slurm container usage, 與 common build..。
Bump a pinned dependency (TransformerEngine, Megatron-LM, NRX, etc.), regenerate the lockfile, open a PR, 與 drive it to green by attaching a watchdog to the "CICD NeMo" 工作流..。
CI/CD 參考資料 用於 Megatron Bridge — pipeline structure, commit 與 PR 工作流, CI failure investigation, 與 common failure 模式。
Code style 與 quality rules 用於 Megatron Bridge — ruff configuration, naming conventions, type hints, mypy rules, docstrings, copyright headers, logging, 與 the code review check..。
Run Megatron-LM (MLM) 與 Megatron Bridge training 搭配 mock 或 real data。
Convert single-node scripts to multi-node Slurm sbatch jobs 與 debug common multi-node failures。
External NeMo-RL 端到端 validation 工作流 用於 Megatron-Bridge model/provider changes, including downstream compatibility checks, external RL lifecycle behavior, Megatron poli..。
Structured framework 用於 verifying numerical parity of HF<->MCore weight conversions。
Validate 與 use selective 與 full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute。
Validate 與 use CPU offloading in Megatron Bridge, including layer-level activation offloading 與 fractional optimizer state offloading 搭配 HybridDeviceOptimizer。
Validate 與 use CUDA graph capture in Megatron Bridge, including local full-iteration graphs 與 Transformer Engine scoped graphs 用於 attention, MLP, 與 MoE modules。
Validate 與 use MoE expert-parallel communication overlap in Megatron-Bridge, including overlap_moe_expert_parallel_comm, delay_wgrad_compute, 與 flex dispatcher backends such as..。
Operational 指南 用於 enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, 與 verification。
Operational 指南 用於 enabling Megatron FSDP in Megatron-Bridge, including config knobs, code anchors, pitfalls, 與 verification。
Techniques 用於 reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, 與 common OOM fixes。
協助處理 PERF MOE COMM Overlap 相關工作,並依原始 Skill 說明完成設定與執行。
Choose the right MoE token dispatcher (`alltoall`, DeepEP, 或 HybridEP) 用於 the hardware, EP degree, 與 optimization stage。
Representative MoE training playbooks by hardware platform 與 model family。
Long-context MoE training guidance 用於 Megatron Bridge。
Systematic 工作流 用於 MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper。
Practical guidance 用於 training MoE VLMs in Megatron Bridge。
Operational 指南 用於 choosing 與 combining parallelism strategies in Megatron Bridge, including sizing rules, hardware topology mapping, 與 combined parallelism configuration。
Validate 與 use packed sequences 與 long-context training in Megatron-Bridge, distinguishing offline packed SFT 用於 LLMs from in-batch packing 用於 VLMs, 與 applying the right CP..。
Operational 指南 用於 enabling TP, DP, 與 PP communication overlap in Megatron-Bridge, including config knobs, code anchors, pitfalls, 與 verification。
Recommend 與 customize Megatron Bridge recipes 用於 a user's model, GPU count, 與 training goal。
Resiliency features in Megatron Bridge including fault tolerance, straggler detection, in-process restart, preemption, 與 re-run state machine。
Testing 參考資料 用於 Megatron Bridge — unit 與 functional test layout, tier semantics (L0/L1/L2/flaky), script conventions, running 測試 locally, adding/moving/disabling 測試,..。
External verl 端到端 validation 工作流 用於 Megatron-Bridge model/provider changes。
Container-based dev environment setup 與 dependency management 用於 Megatron-LM。
協助處理 BUMP BASE Image 設計 相關工作,並依原始 Skill 說明完成設定與執行。
CI/CD 參考資料 用於 Megatron-LM。
Investigate a failing GitHub Actions run 或 job 與 create a GitHub issue 用於 the failure。
Linting 與 formatting 用於 Megatron-LM。
Domain knowledge 用於 the nightly main-to-dev sync 工作流。
Onboard 1-node GitHub MR functional 測試 用於 GB200 from existing mr-scoped 2-node 測試。
Research 與 draft a response to a GitHub issue 或 question from an external contributor。
協助處理 RUN ON Slurm 相關工作,並依原始 Skill 說明完成設定與執行。
協助處理 Split PR 設計 相關工作,並依原始 Skill 說明完成設定與執行。
測試 system 用於 Megatron-LM。
Refresh golden values from a GitHub Actions 工作流 run (failing-only 或 all jobs), score the change 搭配 average normalized relative differences, 與 produce a PR-ready summary。
Query 與 browse evaluation results stored in MLflow。
協助處理 Debug 雲端部署 相關工作,並依原始 Skill 說明完成設定與執行。
Serve a quantized 或 unquantized LLM checkpoint as an OpenAI-compatible API endpoint 使用 vLLM, SGLang, 或 TRT-LLM。
Evaluates accuracy of quantized 或 unquantized LLMs 使用 NeMo Evaluator Launcher (NEL)。
Run, monitor, analyze, 與 debug LLM evaluations via nemo-evaluator-launcher。
Monitor submitted jobs (PTQ, evaluation, 部署) on SLURM clusters。
協助處理 PTQ 相關工作,並依原始 Skill 說明完成設定與執行。
Cherry-pick merged PRs labeled 用於 a release branch into that branch, then open a PR 與 apply the cherry-pick-done label。
建立 custom LLM evaluation benchmarks 使用 the BYOB decorator framework。
Query 與 browse evaluation results stored in MLflow。
Run, monitor, analyze, 與 debug LLM evaluations via nemo-evaluator-launcher。
Interactive config wizard 用於 NeMo Evaluator Launcher (NEL)。
> 指南 用於 adding a new benchmark 或 training environment to NeMo-Gym。
>- Use when debugging a Nemo Gym run 或 reward profiling job。
> Maintain the NeMo Gym Fern docs site — add, update, move, 或 remove pages under fern/。
>- Use when creating, validating, 或 documenting Nemo Gym pivot datasets from rollout, trajectory, chat-completion, Responses API, 或 tool-call artifacts。
>- Use to help users get started 搭配 Nemo Gym reward profiling。
Autonomous NeMo-RL research agent 工作流 用於 directed hypothesis testing 與 open-ended discovery。
Brev instance operating guidance 用於 NeMo-RL agents working in /home/ubuntu/RL 搭配 limited workspace disk, a larger /ephemeral volume, 與 optional /home/ubuntu/RL/.env secrets。
建置 and dependency management 用於 NeMo-RL。
CI/CD 參考資料 用於 NeMo-RL。
Configuration conventions 用於 NeMo-RL。
Contribution conventions 用於 NeMo-RL。
NVIDIA copyright header requirements 用於 NeMo-RL。
檔案 conventions 用於 NeMo-RL。
Error handling 指南lines 用於 NeMo-RL。
Playbook 用於 launching, monitoring, stopping, 與 debugging NeMo-RL recipes on a Kubernetes cluster via the nrl-k8s CLI。
Code style 指南lines 用於 NeMo-RL (Python 與 shell)。
Interactive code review 用於 NVIDIA-NeMo/RL pull requests。
Manage durable working-session memory 用於 coding agents。
Testing conventions 用於 NeMo-RL。
建立 GitHub pull requests that follow the NemoClaw PR template。
Scan recent git commits 用於 changes that affect user-facing behavior, then draft 或 update the corresponding 檔案 pages 與 refresh generated user skills 用於 release prep。
Scans other open issues to find ones a given PR may also fix 或 accidentally break。
Cut a new semver release — bump all version strings via bump-version.ts, open a release PR, 與 after merge tag main 與 push。
Runs the daytime maintainer loop 用於 NemoClaw, prioritizing items labeled 搭配 the current version target。
Runs the end-of-day maintainer handoff 用於 NemoClaw。
Finds open GitHub PRs 搭配 security 與 priority-high labels, links each to its issue, detects duplicates (multiple PRs fixing the same issue), 與 presents a table of review candi..。
Runs the morning maintainer standup 用於 NemoClaw。
Normalizes GitHub issue 與 PR titles by removing any bracketed [NemoClaw] tag case-insensitively, even when the tag appears later in the title。
Compares competing PRs that target the same issue 與 recommends which one to merge。
Performs a comprehensive security review of code changes in a GitHub PR 或 issue。
AI-assisted label triage 用於 NVIDIA/NemoClaw issues 與 PRs。
協助處理 Nemoclaw Skills Guide 相關工作,並依原始 Skill 說明完成設定與執行。
Describes the agent skills shipped 搭配 NemoClaw 與 how to access them by cloning the repository。
協助處理 Nemoclaw USER Configure Inference 相關工作,並依原始 Skill 說明完成設定與執行。
Presents a risk framework 用於 every configurable security control in NemoClaw。
Explains how to run NemoClaw on a remote GPU instance, including the deprecated Brev compatibility path 與 the preferred installer plus onboard flow。
Installs NemoClaw, launches a sandbox, 與 runs the first agent prompt。
Adds, removes, 或 modifies allowed endpoints in the sandbox policy。
Explains operational tasks after the quickstart: listing sandboxes, status 與 health checks, logs, diagnostics, port forwards, multiple sandboxes, credential reset, rebuilds, netw..。
Inspects sandbox health, traces agent behavior, 與 diagnoses problems。
Explains how OpenClaw, OpenShell, 與 NemoClaw form the ecosystem, NemoClaw's position in the stack, what NemoClaw adds beyond the community sandbox, 與 when to prefer NemoClaw ve..。
Describes the NemoClaw plugin 與 blueprint architecture 與 how they orchestrate the OpenClaw sandbox。
部署 Nemotron Voice Agent on Workstation (x86), Jetson Thor, 或 Cloud NIMs。
"NVIDIA RAG Blueprint — deploy, configure, troubleshoot, 與 manage。
> Debug AutoDeploy accuracy regressions vs a 參考資料 score (PyTorch backend 或 published baseline)。
> Claude Code skill (trtllm-agent-toolkit): implement 或 extend TensorRT-LLM AutoDeploy fusion transforms under transform/library/ in a TensorRT-LLM checkout。
> Check whether AutoDeploy YAML configs were actually applied by analyzing server logs 與 optionally graph dumps (AD_DUMP_GRAPHS_DIR)。
> Enable 與 interpret TensorRT-LLM AutoDeploy FX graph text dumps via AD_DUMP_GRAPHS_DIR。
協助處理 AD Layer Visualizer 雲端部署 相關工作,並依原始 Skill 說明完成設定與執行。
> Translates a HuggingFace model into a prefill-only AutoDeploy custom model 使用 參考資料 custom ops, validates 搭配 hierarchical equivalence 測試。
協助處理 EXEC Local Compile 雲端部署 相關工作,並依原始 Skill 說明完成設定與執行。
協助處理 EXEC Slurm Compile 相關工作,並依原始 Skill 說明完成設定與執行。
> Write 與 implement GPU kernels 使用 NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT 用於 Triton, CUDA C++, 或 conceptual explanations。
> Optimize existing Triton kernels 用於 NVIDIA TileIR backend on Blackwell GPUs (sm_100+)。
> ONLY 用於 OpenAI Triton (@triton.jit) kernel development。
> Performance analysis coordination 工作流。
協助處理 PERF HOST Analysis 相關工作,並依原始 Skill 說明完成設定與執行。
Profiles 與 optimizes TensorRT-LLM host/CPU overhead 使用 line_profiler (搭配 nsys support planned)。
協助處理 PERF Nsight Compute Analysis 相關工作,並依原始 Skill 說明完成設定與執行。
>- Nsight Systems (nsys) CLI 用於 system-level timeline profiling。
協助處理 PERF Optimization 相關工作,並依原始 Skill 說明完成設定與執行。
協助處理 PERF Torch CUDA Graphs 相關工作,並依原始 Skill 說明完成設定與執行。
>- Identify 與 eliminate host-device synchronizations in PyTorch code。
> Code instrumentation 用於 timing workloads。
> 最佳實務 用於 contributing code to TensorRT-LLM。
> Systematic approach to exploring the TensorRT-LLM codebase before implementing new features 或 optimizations。
協助處理 Trtllm Flashinfer Upgrade 相關工作,並依原始 Skill 說明完成設定與執行。
>- Review, design, 與 refactor TensorRT-LLM PyTorch MoE code 用於 architecture fit, clean code, maintainability, 與 testability。
產生 a source-backed starting `trtllm-serve --config` YAML 用於 basic aggregate single-node PyTorch serving, aligned 搭配 checked-in TensorRT-LLM configs 與 部署 docs。
協助處理 Adding Cutile Kernel 相關工作,並依原始 Skill 說明完成設定與執行。
協助處理 Converting Cutile TO Julia 設計 相關工作,並依原始 Skill 說明完成設定與執行。
協助處理 Converting Cutile TO Triton 相關工作,並依原始 Skill 說明完成設定與執行。
Use when adding, modifying, optimizing, 或 debugging CuTile autotuning code。
協助處理 Cutile Python 相關工作,並依原始 Skill 說明完成設定與執行。
Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, 與 targeted tuning。
Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) 與 certain class(es)' implementations, 與 patching certain class(es)' in..。
Manage 與 monitor VSS alerts after the alerts profile is deployed。
Deploy, debug, 或 tear down any VSS profile 使用 a compose-centric 工作流 — config (dry-run) 搭配 env overrides, review resolved compose, then compose up。
Produce video analysis reports by discovering the deployed VSS agent, querying POST /generate 用於 a timestamped captioned summary of the clip, then formatting the agent reply as th..。
> Use this skill when working 搭配 the RTVI VLM 或 RT-VLM microservice API on VSS 3.1。
Query video analytics data 與 metrics from Elastic search via the VA-MCP server (port 9901)。
Search video archives 使用 natural language — find events, objects, actions, 與 people across recorded video 使用 fusion search (Cosmos Embed1 semantic search + CV attribute sea..。
Summarize a video by calling the VLM NIM 或 the Long Video Summarization (LVS) microservice directly。
協助處理 Video Understanding 設計 相關工作,並依原始 Skill 說明完成設定與執行。
Query VIOS REST APIs: sensor list, recording timelines, video clip extraction, snapshot capture, add/delete sensors 與 streams。
產生 video summary reports 使用 the VSS video_search_frag extension 搭配 Long Video Summarization (LVS), Enterprise RAG knowledge retrieval, 與 human-in-the-loop parameter co..。