100+ Free NVIDIA NCP-AII Practice Questions

Pass your NVIDIA-Certified Professional: AI Infrastructure exam on the first try — instant access, no signup required.

✓ No registration✓ No credit card✓ No hidden fees✓ Start practicing immediately

Not published Pass Rate

100+ Questions

100% Free

1 / 100

Question 1

Score: 0/0

Which NVIDIA GPU architecture introduced the FP8 Transformer Engine and is the GPU inside the DGX H100 platform?

Hopper

Ampere

Volta

Turing

to track

2026 Statistics

Key Facts: NVIDIA NCP-AII Exam

70-75

Exam Questions

NVIDIA NCP-AII page

120 min

Exam Duration

NVIDIA NCP-AII page

70%

Passing Score

NVIDIA Training

$400

Exam Fee (USD)

NVIDIA Training

Online

Delivery

Remotely proctored

2 years

Credential Validity

NVIDIA Certification policy

NVIDIA-Certified Professional: AI Infrastructure (NCP-AII) is a 70-75 question, 120-minute remotely proctored exam at $400 USD, with a 70% passing score. It tests system and server bring-up (31%), control-plane install and configuration (19%), cluster test and verification (33%), troubleshoot and optimize (12%), and physical-layer management (5%) on Hopper, Blackwell, NVLink, BlueField, Quantum-2, and Spectrum-X platforms.

Sample NVIDIA NCP-AII Practice Questions

Try these sample questions to test your NVIDIA NCP-AII exam readiness. Each question includes a detailed explanation. Start the interactive quiz above for the full 100+ question experience with AI tutoring.

1Which NVIDIA GPU architecture introduced the FP8 Transformer Engine and is the GPU inside the DGX H100 platform?

A.Hopper

B.Ampere

C.Volta

D.Turing

Explanation: Hopper (H100/H200) introduced the fourth-generation Tensor Cores with the Transformer Engine and FP8 (E4M3 and E5M2) data types, and is the GPU used in DGX H100. Ampere (A100) preceded Hopper and supported FP16/BF16 plus TF32 but not FP8 with the Transformer Engine. Volta and Turing are older architectures used in V100/T4-class products.

2An engineer is bringing up a DGX H100 and wants confirmation that all eight GPUs are linked through the on-chassis NVSwitch fabric at full bandwidth. Which command-line tool reports NVLink status and per-link bandwidth between the GPUs?

A.nvidia-smi nvlink --status

B.lspci -vv | grep CUDA

C.ipmitool sensor list

D.ethtool -S eth0

Explanation: nvidia-smi exposes NVLink topology and per-link state through the 'nvidia-smi nvlink --status' subcommand, which prints each link, its peer, and the negotiated bandwidth. lspci only shows PCIe enumeration, ipmitool reports BMC sensors, and ethtool is for Ethernet interfaces.

3Which NVIDIA reference architecture uses 72 Blackwell B200 GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink and NVLink Switch into a single coherent compute domain?

A.GB200 NVL72

B.DGX H100

C.HGX A100 8-GPU

D.EGX Edge

Explanation: GB200 NVL72 is the rack-scale Blackwell platform that uses fifth-generation NVLink and NVLink Switch (NVL72) to connect 72 B200 GPUs and 36 Grace CPUs as one shared NVLink domain. DGX H100 is an 8-GPU Hopper server. HGX A100 is the previous-generation Ampere baseboard. EGX is for edge inference, not large-scale training.

4When a DGX H100 boots, which subsystem performs out-of-band power, thermal, and firmware management and exposes Redfish APIs to administrators?

A.BMC

B.BlueField DPU

C.GPU SMI driver

D.Slurm controller

Explanation: The DGX baseboard management controller (BMC) handles out-of-band hardware management, including chassis power, fans, temperatures, and firmware update workflows, and exposes Redfish for automation. BlueField is an in-band data-path DPU; the SMI driver is for in-band GPU telemetry; Slurm is workload orchestration software.

5Which firmware bundle is officially used to update DGX and HGX H100 baseboards, including GPU, NVSwitch, and HMC firmware components, in one coordinated package?

A.NVIDIA Firmware Update Container (FW container) / nvfwupd

B.BIOS-only flash via dmidecode

C.Linux kernel modules from kernel.org

D.NVIDIA CUDA toolkit installer

Explanation: NVIDIA ships a Firmware Update Container (and nvfwupd tool) that updates GPU, NVSwitch, and HMC components together so versions remain consistent across the HGX baseboard. dmidecode is an inventory tool, kernel.org provides Linux kernels, and the CUDA toolkit ships compilers and libraries, not firmware.

6On a DGX H100, you need to verify that each GPU is healthy, sees expected memory size, and is in persistence mode before enrolling it in a cluster. Which single tool reports all of those states by default?

A.nvidia-smi

B.iostat

C.perf top

D.tcpdump

Explanation: nvidia-smi prints per-GPU model, memory size, ECC status, persistence mode, and utilization in a single output, which is the standard quick health check during bring-up. iostat measures block I/O, perf top profiles CPU, and tcpdump captures packets.

7An H100 SXM5 module on a DGX node uses HBM3 memory. What feature does HBM3 provide compared with the HBM2e used on A100?

A.Higher per-stack bandwidth and capacity

B.Lower bandwidth but lower power

C.Switching to GDDR6 packaging

D.Removal of ECC support

Explanation: HBM3 raises both bandwidth per stack and total capacity over HBM2e, which is why H100 advertises substantially higher GPU memory bandwidth than A100. It is not GDDR6, it is not lower bandwidth, and it retains ECC.

8An HGX H100 server has eight H100 SXM5 GPUs and four NVSwitches on the baseboard. What is the role of NVSwitch in this design?

A.Provide non-blocking all-to-all NVLink bandwidth across the eight GPUs

B.Replace the PCIe root complex for storage devices

C.Act as a Layer-3 router for Ethernet traffic

D.Manage BMC sensor data

Explanation: NVSwitch creates a fully connected NVLink fabric so any GPU can reach any other GPU at full NVLink bandwidth, which is critical for collectives such as all-reduce. It is not a PCIe replacement, not an Ethernet router, and not part of BMC telemetry.

9During bring-up of a new H100 node you find the GPUs are present in lspci but not in nvidia-smi. What is the most likely first cause to investigate?

A.The NVIDIA GPU driver is not installed or is the wrong version for the kernel

B.The InfiniBand fabric is misconfigured

C.Slurm is not installed yet

D.The Ezoic ad library failed to load

Explanation: If lspci sees the cards but nvidia-smi cannot enumerate them, the most common cause is a missing, unsigned, or kernel-mismatched NVIDIA driver. InfiniBand and Slurm are unrelated to whether nvidia-smi can talk to the GPU. Ezoic is a web-ad product and is not part of cluster bring-up.

10A team is repurposing eight DGX H100 nodes for a new cluster and wants identical, repeatable OS images that include the right kernel, drivers, and CUDA stack. Which NVIDIA-supported approach is recommended?

A.Use the DGX OS image (or Base Command Manager image) as the golden image

B.Compile a custom Linux kernel from kernel.org per node

C.Install Windows 11 and rely on default WSL drivers

D.Run only the BMC stock firmware without any host OS

Explanation: NVIDIA maintains DGX OS, an Ubuntu-based image qualified with the right kernel, drivers, CUDA, and management tooling for DGX systems; Base Command Manager can deploy this image at scale. Hand-built kernels lose vendor support, Windows 11 is not the qualified DGX host, and the BMC alone does not run user workloads.

About the NVIDIA NCP-AII Exam

The NVIDIA NCP-AII exam validates the ability to deploy, configure, test, and optimize NVIDIA-based AI infrastructure end to end: GPU server bring-up (DGX H100, HGX, GB200 NVL72), control-plane installation (Base Command Manager, GPU Operator, Slurm with Pyxis/Enroot), AI networking (Quantum-2 InfiniBand, Spectrum-X Ethernet, BlueField-3), storage (GPUDirect Storage, parallel filesystems), and performance characterization with NCCL tests, HPL, DCGM, and MLPerf.

Questions

75 scored questions

Time Limit

120 minutes

Passing Score

70%

Exam Fee

$400 (NVIDIA)

NVIDIA NCP-AII Exam Content Outline

31%

System and Server Bring-up

Hopper/Blackwell GPU architecture, DGX H100/B200/GB200 NVL72 reference designs, NVLink/NVSwitch fabric, HBM3/HBM3e memory, MIG, firmware updates, BMC/Redfish, and DGX OS image deployment.

19%

Control Plane Installation and Configuration

Base Command Manager, NVIDIA GPU Operator and Network Operator, NVIDIA Container Toolkit, NGC registry, Slurm with Pyxis/Enroot, Kubernetes device plugin and DRA, Run:ai, NVIDIA AI Enterprise.

33%

Cluster Test and Verification

NCCL tests (all_reduce_perf, all_gather_perf), HPL Linpack, perftest (ib_write_bw), ClusterKit, DCGM diagnostics, MLPerf Training, storage benchmarks, and end-to-end acceptance criteria.

12%

Troubleshoot and Optimize

XID error triage, ECC trends, thermal throttling, Nsight Systems profiling, NCCL tuning, GPUDirect RDMA validation, MFU and tokens-per-second analysis, and incident response.

Physical Layer Management

BlueField DPU mode, ConnectX-7/-8 link state (ibstat, ibdiagnet), Quantum-2 InfiniBand BER, cable and transceiver inventory, liquid cooling commissioning, and power/cooling validation.

How to Pass the NVIDIA NCP-AII Exam

What You Need to Know

Passing score: 70%
Exam length: 75 questions
Time limit: 120 minutes
Exam fee: $400

Keys to Passing

Complete 500+ practice questions
Score 80%+ consistently before scheduling
Focus on highest-weighted sections
Use our AI tutor for tough concepts

NVIDIA NCP-AII Study Tips from Top Performers

1Practice in NVIDIA Launchpad or a small DGX/HGX testbed - many questions test what you would actually run during bring-up (nvidia-smi, ibstat, dcgmi, NCCL tests).

2Memorize the five domain weights (31/19/33/12/5) and let them drive your study time per topic.

3Be fluent with the difference between Quantum-2 InfiniBand (with SHARP) and Spectrum-X Ethernet (with adaptive routing/congestion control) for AI east-west traffic.

4Drill MIG profiles on H100 (1g.10gb, 2g.20gb, 3g.40gb up to 7g.80gb) and how MIG interacts with Kubernetes device plugin / DRA.

5Run nccl-tests at scale and learn what bus_bandwidth values look healthy versus degraded; understand NCCL_IB_HCA and NCCL_SOCKET_IFNAME.

6Know the NVIDIA AI Enterprise stack and where GPU Operator, Network Operator, and the Container Toolkit fit on Kubernetes.

7Practice reading XID error codes (e.g., 13, 79) and turning them into a triage plan.

8Understand liquid-cooling commissioning checks (coolant flow, leak sensors, inlet temperature) as part of GB200 NVL72 deployments.

Frequently Asked Questions

What is on the NVIDIA NCP-AII exam?

NCP-AII tests the ability to deploy and operate AI infrastructure on NVIDIA platforms. Core topics include DGX H100, HGX, and GB200 NVL72 bring-up; NVLink and NVSwitch validation; MIG configuration; BlueField-3 DPU and Quantum-2 InfiniBand fabric; Spectrum-X Ethernet; control-plane installation with Base Command Manager, GPU Operator, Slurm with Pyxis/Enroot, and Kubernetes; cluster validation with NCCL tests, HPL, perftest, and DCGM; performance optimization, MFU, and troubleshooting XID errors and ECC trends.

How long is the exam and how many questions does it have?

NCP-AII is 70 to 75 questions delivered in 120 minutes. The format uses multiple-choice and multi-select items, and the exam is delivered online with remote proctoring.

What is the passing score for NCP-AII?

NVIDIA publishes a 70% passing score for NCP-AII. Your raw score is computed across all five domains, weighted by their published percentages, so a balanced study plan that mirrors the 31/19/33/12/5 domain weights is the safest path to passing.

How much does the NCP-AII exam cost?

The current NVIDIA NCP-AII exam fee is $400 USD per attempt, which includes the remote proctored assessment and the official digital credential upon passing. Pricing is set by NVIDIA Training and may vary by region or partner program.

Who should take NCP-AII?

NCP-AII is built for AI infrastructure engineers, data center engineers, and SREs who deploy and run NVIDIA-based AI clusters in production. NVIDIA recommends two to three years of operational experience with NVIDIA hardware in a data center environment, including GPU drivers, cluster management, and AI fabric work.

How is NCP-AII different from NCA-AIIO and NCP-AIO?

NCA-AIIO (AI Infrastructure and Operations Associate) is the entry-level credential covering AI infrastructure concepts. NCP-AIO focuses on day-2 operations of AI infrastructure. NCP-AII focuses on the deploy and configure side: bringing up servers, installing the control plane, validating with NCCL tests and HPL, and optimizing performance on NVIDIA platforms.