AI Hardware Interconnect

Posted by SZFRS Engineering Team

AI hardware cable demand has shifted dramatically in the past two years. Pre-2023 data center work was mostly 100G with some 400G in newer halls; the AI training boom compressed the migration to 400G and 800G into roughly 24 months. The scale change is real: a single AI training cluster running NVIDIA H100 or H200 GPUs at scale needs thousands of high-speed interconnects per rack, and the GPU-to-GPU bandwidth requirements drive cable selection in ways traditional Ethernet networks did not. At the same time, edge AI inference devices have grown into a separate market segment with completely different cable requirements — small embedded NVIDIA Jetson modules, Hailo accelerators, Google Coral, and similar chips need compact embedded cabling that looks more like industrial vision than data center networking.

TL;DR — Quick Answer

AI training clusters use a mix of PCIe Gen5/6 inside GPU servers, NVLink for direct GPU-to-GPU within a node, and InfiniBand NDR (400 Gbps) plus increasingly 800 Gbps for cross-rack networking. Cable form factor is OSFP or QSFP-DD800 with DAC for short runs (1-3 m) and AOC for longer runs (5-30 m). Edge AI inference devices use compact embedded cabling — MIPI CSI-2 from cameras, USB 3.0 to USB-C for sensor and accelerator connections, M.2 connector cables for storage and accelerator boards. AI vision systems blur the line between AI hardware and traditional industrial vision — using GigE Vision or USB3 Vision cable for camera connections plus internal AI accelerator routing. Below covers each segment in practical detail.

AI Training Cluster Cable — The High End

AI training cluster networking is the most demanding cable application currently in volume production. NVIDIA H100 and H200 GPUs are the dominant training accelerators, with AMD MI300X and Intel Gaudi 2/3 picking up secondary share. The GPUs talk to each other through several different paths:

  • PCIe Gen5 (32 GT/s) and Gen6 (64 GT/s). The standard interconnect for GPU-to-CPU and GPU-to-NVMe storage within a server. PCIe Gen5 enables 128 GB/s per x16 slot; Gen6 doubles this. Internal cables are typically short (under 30 cm) and use specialized PCIe extension cable for cases where the GPU sits in an add-in card position different from the server socket.
  • NVLink and NVSwitch. Direct GPU-to-GPU within a server node. NVIDIA’s proprietary high-speed link runs 900 GB/s aggregate on H100 (NVLink 4) and 1.8 TB/s on B100/B200 (NVLink 5). Cable connections via NVLink Bridge are external for some configurations; SXM module-based servers use motherboard NVSwitch with no external cable.
  • InfiniBand HDR (200 Gbps) and NDR (400 Gbps). The dominant inter-node networking for AI training clusters. NVIDIA Networking (formerly Mellanox) supplies the switches and NICs; cable connections use OSFP or QSFP-DD form factor. NDR is the current frontier; NDR2 (800 Gbps) is starting to ship.
  • Ethernet for management and storage. Standard 25G, 100G, or 400G Ethernet for cluster management, BMC traffic, and storage area networking — different from the high-speed compute fabric.

For our work, the practical choice is between DAC (Direct Attached Copper) and AOC (Active Optical Cable). DAC is cheapest at short distances — 1 to 3 meters within a single rack — but the cable is heavier, stiffer, and bulkier. AOC handles longer runs (5 to 30 meters) with thinner, more flexible cable. Beyond 30 meters, transceiver-plus-fiber separates from cable assembly. Our telecom solutions page covers data center interconnect in more detail.

DAC vs AOC — The Practical Choice

For a typical AI training rack with GPU server uplinks at 400G or 800G, DAC handles the within-rack runs and AOC handles the rack-to-spine runs. The split typically falls around 2-3 meter cable length:

  • DAC at 400G. Up to 3 meter passive runs typical, possibly 5 meter with active DAC. Above 5 meters, DAC fails signal integrity at 400G data rates.
  • DAC at 800G. 1-2 meter passive runs typical. 800G stresses copper signaling more than 400G; longer DAC requires active equalization.
  • AOC at 400G or 800G. 3-30 meter runs handle most rack-to-rack scenarios. The optical engines come from specialty providers — Innolight, Hisense Broadband, Eoptolink, Source Photonics, and a few others. Cable assembly happens around the optical engine.
  • Pluggable optical (separate transceiver plus fiber). Beyond 30 meters or where structured cabling is preferred. The transceivers (OSFP or QSFP-DD800) plug into the switch port; the fiber runs to a patch panel or another transceiver.

The cost split is significant. DAC at 800G runs roughly $200-400 per cable depending on length and vendor. AOC at the same rate runs $1,500-3,000 per cable. Pluggable optics with fiber adds further to that. AI training cluster operators try to maximize DAC use within each rack and minimize AOC use to longer rack-to-spine runs.

Edge AI Inference — A Different Market

Edge AI inference devices share the AI label with training clusters but have completely different cable requirements. NVIDIA Jetson modules (Jetson Nano, Jetson Xavier NX, Jetson Orin Nano/NX/AGX) range from $99 dev kits through $2,000 high-end inference modules. Other edge AI accelerators (Hailo-8, Google Coral USB Accelerator, Intel Movidius VPU) target lower-power applications with different connector approaches.

Where edge AI cable work appears:

  • Camera input via MIPI CSI-2. Edge AI vision applications attach cameras directly to the inference module via CSI-2 lanes. MIPI cable for these connections runs short distances (5-30 cm) on FPC or thin micro-coax.
  • USB 3.0 / USB-C accelerator and storage connections. Coral USB Accelerator, Hailo USB stick, USB cameras, USB SSDs all use standard USB cable. Distance-limited but easy integration.
  • M.2 form factor accelerator cables. Edge AI cards in M.2 form factor (Hailo-8 M.2, Coral M.2) plug directly into a host board socket. Cable assemblies appear when M.2 cards mount in carriers or extension chassis.
  • GigE Vision for industrial AI inspection. AI-enabled inspection cameras use GigE Vision the same as traditional vision cameras. The AI happens at the host PC or edge gateway, not in the camera. Cable construction matches industrial vision, covered in our robot vision cable selection blog.
  • Embedded power and signal harness. Edge AI devices in industrial enclosures need power input, signal I/O, status LED indicators, and sometimes serial debug ports. Standard JST or M12 connectors, depending on the application class.

AI Vision Systems — The Bridge

Many AI applications sit between training and edge — AI-enabled industrial vision systems. A factory inspection program runs cameras with AI processing in a nearby edge gateway or PC. The camera connections are GigE Vision or USB3 Vision, identical to traditional non-AI vision systems. The AI portion lives in software running on a Jetson AGX, an industrial PC with a discrete GPU, or a dedicated AI inference appliance. Cable selection follows traditional industrial vision principles, with standard M12 X-coded for GigE Vision, USB-C or USB Micro-B for USB3 Vision.

Where AI vision cable design differs from traditional vision: AI applications often need higher-resolution cameras (12-25 megapixel becoming common), which pushes from GigE to 10GigE Vision or even CoaXPress. The compute-heavy nature of AI inference benefits from low-latency networking; some installations use 25GigE between cameras and the inference appliance to minimize buffer delays.

Side-by-Side Comparison Table

ApplicationCable TypeSpeedDistanceCost Range
AI training rack-internal GPU networkingOSFP DAC at 400G400 Gbps1-3 m$200-400
AI training rack-internal high-speedOSFP DAC at 800G800 Gbps1-2 m$300-600
AI training rack-to-spineOSFP AOC at 400G400 Gbps3-30 m$1,000-2,000
AI training rack-to-spine high-speedOSFP AOC at 800G800 Gbps3-30 m$1,500-3,000
GPU-to-CPU within serverPCIe Gen5/Gen6 cable128 GB/s (Gen5)under 30 cmSpecialty pricing
GPU-to-GPU within serverNVLink Bridge or SXM motherboard900 GB/s+centimetersBundled with server
InfiniBand NDROSFP DAC or AOC400 Gbps1-30 m$300-2,500
Edge AI camera (Jetson)MIPI CSI-2 FPC4.5 Gbps/lane5-30 cm$5-25
Coral USB AcceleratorUSB 3.0 cable5 Gbpsunder 1 m$5-15
Hailo M.2 to hostM.2 form factorPCIe Gen3 x4direct mountBundled
Industrial AI vision cameraGigE / 10GigE / USB3 Vision1-10 Gbps5-100 m$50-500
AI server power supplyC19/C20 high-current AC1-3 m$30-100

Power and Thermal — The Hidden Challenge

AI training racks pull tremendous power. A single H100 SXM module draws up to 700W; a single B200 module up to 1,000W. An eight-GPU training server pulls 5-10 kW continuous. Rack-level power can exceed 50 kW for high-density AI training racks. This drives heavy AC power cable to each rack and high-current DC busbar systems within racks.

The thermal duty cycle stresses cable. AI training clusters run at full power for days or weeks during model training runs. Cable jacket compounds need to handle 24/7 elevated temperatures inside dense racks. Standard PVC jacket on AC power cable can soften and embrittle over a 5-7 year service life under these conditions; higher-temperature jacket compounds extend cable life accordingly. The same applies to networking cable: AOC optical engines and DAC cable bundles run warmer than typical Ethernet cable, and jacket compound matters for long-term reliability.

A Common Mistake — Treating AI Like Traditional Data Center

AI training infrastructure is not a scaled-up traditional data center. The traditional 100G Ethernet leaf-spine that worked for cloud and enterprise workloads becomes a bottleneck for AI training where every microsecond of inter-GPU latency matters. Programs that try to retrofit AI training onto 100G networks discover the fabric becomes the bottleneck and training runs take 2-3x longer than necessary. The premium for 400G or 800G InfiniBand is real but the productivity gain is much larger.

The opposite mistake — over-investing in AI training cabling for non-training workloads — also happens. Inference workloads need high-bandwidth networking less than training does; spending on InfiniBand NDR for an inference-only deployment when 100G or 400G Ethernet would have been adequate wastes budget. The cable selection should match the actual workload profile, not the perception of “AI = highest speed.”

Application Selection Framework

ApplicationRecommended CableReasoning
NVIDIA H100/H200 training clusterInfiniBand NDR 400G with OSFP DAC/AOC mixStandard for current-gen AI training
B100/B200 next-gen training800G OSFP-XD with NVLink 5Higher bandwidth requirement
AMD MI300X clusterInfiniBand or 400G/800G EthernetCompetitive alternative platform
Inference cluster (Jetson AGX, A100)25G or 100G EthernetLower bandwidth requirement
Edge AI vision (Jetson)MIPI CSI-2 + USB-CEmbedded form factor
Coral / Hailo USB acceleratorUSB 3.0 cableStandard interface
Industrial AI inspectionGigE Vision M12 X-codedStandard vision cable
AI security cameraPoE Cat6A or fiberPower-over-Ethernet integration
Embedded AI in robotUSB-C or MIPI CSI-2Compact form factor
AI inference at retail POSUSB 3.0 or USB-CPOS terminal integration

Bottom Line

AI hardware cable splits cleanly between training cluster networking (high-end OSFP DAC/AOC at 400G or 800G with InfiniBand NDR) and edge AI inference (compact MIPI CSI-2, USB 3.0, M.2 form factor cables). Traditional industrial vision cable applies to AI-enabled vision systems. Power and thermal duty cycle drive AC power cable selection in dense AI racks. Matching cable to actual workload — training versus inference, dense AI rack versus distributed edge — keeps the cost-performance balance reasonable.

Related Reading


AI Hardware Cable Program?

Send us your application — training cluster scale, GPU model, edge inference target, or AI vision integration. We’ll match cable selection to workload and quote within 48 hours for standard cable, longer for specialty optical.

Similar Posts