Thomson Explains: SMPTE 2110 - Part 4

This is where you truly become a functional engineer rather than just someone who learned definitions.

Tier 4: Applied Troubleshooting & Tools

Tier 4 is all about practical skills – using tools and techniques to diagnose and fix issues in a SMPTE 2110 environment. This is where theory meets reality. A junior engineer who masters Tier 4 will be highly valuable, because IP systems, like any complex system, will have problems – and you need to solve them efficiently when the pressure is on. We cover key tools (e.g. Wireshark, network switch diagnostics) and typical troubleshooting scenarios, including specific examples like identifying a rogue PTP clock or mis-routed stream. The focus is on strategy: where to look first, how to interpret what you see, and understanding the “symptoms” of common problems.

Network and Stream Analysis with Wireshark

Wireshark is the go-to network analyzer. If you’re not familiar with it, start using it now. It can capture and decode packets on the network, which is invaluable for IP media.

  • Setting Up: You’ll often need a computer with a 10Gb NIC (at least) and appropriate access (maybe connect to a monitor port on the switch or use a network TAP) to capture the high-bandwidth 2110 traffic. Configure Wireshark to enable the RTP and PTP protocols for decoding. Also, use capture filters (like udp port 319 to capture only PTP, or the multicast IP of a stream) to limit data, otherwise a full 10G capture will overwhelm your PC.

  • Analyzing PTP: One common use – verify PTP is working. Apply a filter for PTP (which uses UDP ports 319 and 320). You should see regular Sync and Announce messages from the Grandmaster. Check the domain field in the PTP packets – are all devices using the same domain? If you suspect a rogue PTP master, you might suddenly see Announce messages with a different source clock ID or higher priority. For example, if a camera accidentally became master, you’d see its ID in the packets. Using Wireshark, you could confirm “Yes, there are two masters on domain 127 – the legit GM and another device – causing BMCA fights.” Then you know to locate and reconfigure that rogue device (e.g. set it to slave only). This exact scenario has bitten many – a “rogue PTP Grandmaster” with higher priority can take over, and BMCA alone won’t save you.

  • The fix might be enabling authentication on PTP or simply proper configuration. But Wireshark is how you see it happening in real time.

  • Analyzing RTP Streams: Wireshark can decode RTP headers. You can see the sequence numbers, timestamps, and payload type. Key things to check: Are packets incrementing in sequence without drops? (If not, you have packet loss – network issue.) Is the timing consistent? Wireshark won’t directly tell you jitter in an obvious way, but you can export timestamps and analyze if needed. However, a simpler approach: many broadcast vendors provide 2110-specific monitoring tools (see below) that give a nicer readout of inter-packet timing. Still, Wireshark is the lowest common denominator if those aren’t around.

  • SIP/SDP Inspection: If using NMOS or any control, you might also capture HTTP API calls (e.g. NMOS commands) to debug control issues, but that’s advanced. At least, you can capture an SDP exchange. If a device offers SDP via Session Announcement Protocol (SAP) or an API call, you could see it and verify the content. Some systems periodically multicast SDP info (though this is more in legacy or certain modes, since NMOS largely replaces that).

Practical tip: Have handy Wireshark display filters for common protocols: ptp, rtp, icmp (for pings), etc. This lets you quickly drill down. For example, if a receiver isn’t getting a stream, you might first ping it (ICMP) – does it respond? If yes, network connectivity is fine. Then IGMP – did it send a join? (Wireshark on the switch monitor port could see an IGMP join from the receiver’s IP for that group). If no join, the issue might be the control system not telling it to subscribe. If join is there but still no traffic, perhaps the sender isn’t sending (check the source). Or the network didn’t propagate the join (IGMP querier issue). Each step, tools like Wireshark give evidence. Time is critical in live operations. Knowing how to quickly capture and interpret with Wireshark can save a broadcast. For example, an unexplained audio desync might be traced to slight PTP offset – Wireshark could show one device’s delay request is not getting responses (meaning it’s not fully synced). That tells you to examine that device’s PTP config or the switch’s handling of 224.0.1.129 on that segment.

Switch Configuration and Monitoring (IGMP, QoS, PTP Settings)

Your network switches are as important as any video router ever was. A junior engineer should be capable of logging into switches (or using their management GUI) and checking critical settings for a 2110 system:

  • IGMP Snooping/Querier: Ensure IGMP snooping is on for all media VLANs. Ensure one device (switch or router) is acting as the IGMP Querier on each VLAN (to send membership queries).

  • On some enterprise switches, querier is off by default if no router is present, so you might need to enable an “IGMP querier” function. A quick check if IGMP is working: the switch’s IGMP group table should list the multicast groups and which ports have members. Learn to display that (usually show ip igmp snooping groupsor similar). If a receiver isn’t getting a stream, and you see that group isn’t in the table, it means the join never got to the switch or was not processed.

  • Multicast Routing (if applicable): If you have PIM running (for multi-subnet), check that your Rendezvous Point (RP) is configured and that the multicast route ((*, G) or (S, G)) exists on the router for that stream. This can be complex, but a simple scenario: OB Truck A sending to Studio B network – the router in between must have PIM and know the groups. If you type show ip pim rp you should see the RP. If streams aren’t passing, maybe the RP is misconfigured or not elected. This is advanced, but awareness counts – if you run into this issue, it’s likely a specialized IP engineer will need to help, but as a junior engineer, you should at least understand their language.

  • Quality of Service (QoS): PTP packets are tiny but hyper-important. In congested scenarios, they must take priority. Best practice is to put PTP in the highest QoS class. Many media switches auto-classify PTP (e.g. based on IPv4 multicast 224.0.1.129 or UDP port 319/320) into an “express” queue. Ensure your switches do this. Also consider prioritizing audio over video if needed, since audio is lower bandwidth and more realtime sensitive (though in a well-engineered network, video flows shouldn’t interfere – except maybe during failover events). A junior engineer will likely not need to set up QoS from scratch, but must verify it’s in place. If PTP packets get delayed too much (jitter), devices can lose lock or drift, evidenced by audio cutting in and out sporadically. Audio is always your “canary in the coal mine”. If you’re experiencing AES67/Ravenna/Intercom dropouts, check your PTP!

  • PTP Settings in Switches: If your switches support Boundary Clock or Transparent Clock, know how they are set. Boundary Clock is generally recommended (each switch becomes a timing relay, as noted earlier).

  • Transparent Clock is another mode (seldom used in broadcast) where switches simply add their delay into a field in the packet. Or the switch might be non-PTP-aware (then it’s all multicast forwarding). The configuration depends on your design. But an example troubleshooting step: If a certain switch is not in BC mode, and you see PTP issues on devices connected there, maybe that switch is introducing jitter or delay. The solution might be to enable BC mode (if supported) or replace it with one that does. This is why architects say to choose switches with PTP support to “simplify the troubleshooting process”, it reduces weird timing issues.

  • Traffic Policing and Capacity: Check that no interfaces are dropping packets due to oversubscription. For instance, if you inadvertently send three UHD streams (each ~10 Gbps) out of a 10G link, obviously two and a half of those streams won’t make it. Monitor interface counters for drops or errors. In IP, unlike SDI, there’s a temptation to overload links because everything is just packets. A junior engineer must keep a mental or literal tally of how much bandwidth is on each link (including redundancy overhead). Many sites use a “spine-leaf” network topology where each leaf switch uplinks to spines at maybe 100 Gbps, and endpoints at 25 Gbps – even then, bandwidth planning is key. I’ve found Cisco NBM (Non-blocking Multicast) to be implemented extensively in the highest-end deployments. A quick note, though: If you have NBM configured on a particular VLAN, don’t swap the expected traffic out, or the switch will drop those packets. In other words, if you configure a VLAN for Dante traffic, don’t suddenly start trying to put 12 bit 4:4:4 4K 2110-20 traffic on it. Even if all the underlying links can support the traffic, since NBM was expecting low bandwidth Dante traffic, it’ll prevent the multicast traffic from passing through. This has caused many headaches during commissioning phases in my career.

Real example: Suppose video on one particular multicast intermittently glitches. Using switch CLI, you check the egress interface counter on the sender’s switch port – it shows drops increasing. That indicates output drops (maybe due to bursts or mis-sized egress queue). Combine that with Wireshark which might show sequence gaps at the receiver. Now you know it’s a network drop issue. You might alleviate it by enabling a larger egress buffer, or by reconfiguring the sender to narrow transmission (2110-21 compliance), or if it’s because multiple flows congest a link, re-route one flow via a different path or upgrade that link from 25 or 40 GbE to 100GbE.

PTP Monitoring and Troubleshooting (Clock Domains and Drift)

We can’t emphasize enough: keep an eye on PTP. Many vendors offer PTP monitoring tools:

  • PTP Track Hound (Meinberg) or similar utilities can listen to PTP and show offset, delay, etc. These are handy to have on a laptop that can join the PTP domain (even via management network if allowed).

  • Switch SNMP or Telemetry: Some switches can report the offset of their BC clock from the GM, etc. If you see those values creeping out of spec (more than a few microseconds), something’s wrong (perhaps high delay, network congestion or a second master conflict).

  • PTP Logging on Devices: Cameras or gateways often have a status page showing “PTP Lock: Yes” and the measured offset to master (like 100 ns or such). During troubleshooting, quickly scan a few devices: if all devices on one switch have higher offsets or are unlocked, the problem is likely on that segment (maybe that switch lost connection to GM).

  • Multiple PTP Domains (intentional): In complex environments, sometimes separate PTP domains are run (e.g. audio on domain 0, video on 127) on purpose. I would recommend to avoid this unless absolutely needed, because it complicates things. If it is done, it should be clearly documented which devices are on which domain. If you find a device on the wrong domain, you’ve found the cause of “it’s not locking”. There’s a note that “every PTP message includes the domain number so nodes can accept or discard” accordingly - meaning devices will ignore a master on a different domain. Good for isolation, bad if misconfigured. Always verify domain consistency.

A rogue PTP GM scenario example: Let’s say a visiting OB truck plugs into your network to exchange feeds and they left their PTP on, with the same domain (127, the common default). Suddenly half your gear switches to that truck’s clock (perhaps it had a higher priority1 set, or a better clock class). Now you have two unsynchronized islands – a disaster for timing. PTP’s BMCA won’t prevent it if the rogue is configured to win.

As the on-site engineer, you must detect and fix this fast. Monitoring tools or Wireshark will reveal two masters. The immediate remedy: disconnect that device or change its domain. The long-term fix: coordinate with the truck to only allow one master (maybe have them follow yours or vice versa) or use boundary isolation (route their media but not PTP across a boundary router). This example underscores why understanding PTP configuration at Tier 3 and monitoring at Tier 4 is so crucial.

2110-Aware Monitoring Tools (Probes, Multiviewers, Analyzers)

Beyond Wireshark, broadcast-specific tools can greatly aid troubleshooting:

  • IP Stream Analyzers: e.g. Telestream (formerly Tektronix) PRISM, Telestream Inspect 2110, Leader/Phabrix Qx/QxL/QxP WFMs, Bridge Technologies VB series, etc. These tools subscribe to 2110 streams and give human-friendly readouts: video payload formats, PTP lock status, timing offset, packet inter-arrival plots, etc. They often can alarm on issues (packet loss, out-of-spec timing). If your facility has one, learn how to use it. It can save you from manual analysis. For instance, a PRISM or QxP can show if a video flow violates 2110-21 timings or if audio packets are slipping. As a junior engineer, you might be tasked with setting up such a monitoring system for a critical feed, and interpreting its results for your team.

  • 2110-Capable Multiviewers: A multiviewer is a system that displays many video/audio feeds on a screen for operations monitoring. In IP world, “2110-capable” multiviewers subscribe directly to the streams (instead of needing SDI inputs). Examples: Evertz MVP-IP, Grass Valley AMPP MV, etc.

  • From an engineering perspective, the multiviewer is both a consumer of streams and a diagnostic tool – if it can’t display a stream, that’s a clue something is wrong (maybe it didn’t receive the multicast or SDP). Multiviewers often also read basic metadata (like signal format) and can alert if a feed is down. You should know how to add a stream to the multiviewer layout (i.e. subscribe it) – this often intersects with NMOS or other control.

  • Signal Generators and Test Patterns: Don’t forget the simple tools: an IP signal generator (e.g. a camera emulator or test pattern generator that outputs 2110) is useful for testing. Both the PRISM and Qx WFMs have IP generators built in, if you’ve paid for that license. If you inject a known test pattern into the network, you can methodically check if each receiver can get it. If some can and some can’t, you isolate the issue to those segments. Many vendors have test pattern apps for 2110, even software-based ones on a PC with a suitable NIC.

Example problem & tool usage: Audio channels are swapped on the output of a production. Using a 2110 analyzer, you see that the stream labeled “Camera1 Audio 1-2” actually contains silence – the audio console’s output stream perhaps wasn’t routed correctly. You then check NMOS – it shows that the camera's audio is routed to the console, but maybe the console isn’t sending it back out. Alternatively, using a multiviewer, you solo the audio channels visually/meters to confirm which source has audio. This is more of an operational debug, but it uses the tools to identify where the audio disappeared. The idea is: become comfortable reading RTP packet counters, waveform monitor displays (even if virtual), audio level meters, PTP status lights – all simultaneously if needed.

Configuration Management & Documentation

While not as flashy, an important troubleshooting aspect is having (and understanding) documentation: IP address lists, multicast assignments, device inventories, and system diagrams (signal flow charts). In an IP facility, sometimes the only way to trace a path is via the documentation (since it’s not as simple as following a physical cable). Make sure as a junior engineer you keep notes – e.g., if you assign a new multicast for a new source, update the master spreadsheet. It will help you later when something collides or if someone asks “which address is cam 5 using?”.

Mindset for Tier 4: Always form a hypothesis and test it.

  • Is the problem network-wide or isolated?

    • (Check multiple devices: if all streams on a switch break, likely a network issue. If one stream from one device is bad, likely that device or its link.)

  • Use binary search in troubleshooting: divide and conquer. For instance, subscribe to the multicast on a test PC at various points – at the source switch, at the destination switch – find where it breaks. The tools above let you do these tests without physically moving gear.

Remember, operational continuity is key. In a real incident, you might implement a quick workaround (e.g. switch to backup GM, or route the signal via a different path) to get things on air, then later investigate root cause in detail.

The knowledge in Tiers 1-3 combined with Tier 4 tools should enable you to come up with those workarounds quickly (because you understand the system enough to know alternative paths or sources).

If Tier 3 gave you the “map” of the system, Tier 4 is the ability to navigate that map under pressure, pinpointing issues and resolving them. This is where you truly become a functional engineer rather than just someone who learned definitions.

Reply

or to participate.