KitPlus | Article: IP Based Remote Production

i Over the last few years Suitcase TV has been getting involved in remote production at the software layer, and I want to talk about some of the ways we have been getting involved in remote production, including a specific trial that we did last year with BBC Sport for the Euro 2016 tournament in Paris.

Before I get into the detail of remote production, I need to review some of the basics of IP-based studios.

What enables a lot of the capabilities of an IP Studio is something called JT-NM reference architecture, which is the name of the joint task force established by a number of large organisations in 2015 to document what they believed to be the core elements that any fully IP-based studio would need to operate successfully with no SDI in sight.

That JT-NM document is what drives a lot of the standardisation work that\'s being done to solidify SMPTE 2110, but perhaps as important, beyond that. SMPTE 2110 is actually the satnav rather than the final destination.

A couple of years ago, SMPTE 2022-6 established a basic, all-in-one standard for SDI over IP, and it\'s as simple as that. You take an SDI feed, encapsulate it in real time, and transport it over IP. Job done. The all-in-one approach means that video, embedded audio (if there is any), and any data is all transferred simultaneously. However, that results in a lot of blank signal, which is essentially empty space that could be used for information, and that is wasteful and not as data efficient as it could be.

As I see it, the other issue with SMPTE 2022-6 is that if you wanted to funnel the all-in-one feed to an audio mixer, it\'s a problem because it\'s a very high bandwidth feed.

So one of the first steps on the roadmap to an IP Studio is to separate audio transport so that those workflows can be done in a different way, which is where TR-04 comes into play, which still uses video in the same way but could include embedded audio; but perhaps the better solution for an IP Studio is TR-03, which fully separates the essence, meaning that the audio is kept fully separate from any video or metadata.

TR-03, along with TR-04, have been around for a few years, and a lot of the work around SMPTE 2110 is based on standardising those two concepts and is now in the final draft stage that will form a definitive, documented standard. Until now, they have just been recommendations that have been under close scrutiny and regularly tweaked to determine precisely how they should be implemented. There have been slight changes, but the important bit is that interoperable systems using these protocols exist already. They are here, and working, now, and plenty of manufacturers produce technology and services that support the new SMPTE 2110 standard already, which comprises the majority of what an IP studio is built on. So what, exactly, is an IP studio?

For a start, JT-NM recognised that for an IP studio to be viable, you need separate transport streams for video and audio essence because of the need to feed different elements in a system, and all of those essence streams have to be referenced to a common time source. Systems of this nature are referenced over a network using PTP (SMPTE 2059), which replaces traditional genlock reference and LTC house clock methods. PTP enables transport streams to take advantage of the new protocols, which in turn means that the protocols, collectively, enable you to build an IP-only studio instead of SDI.

But what, I hear you ask, is the point if all you\'re doing is replacing SDI interconnections with IP?

If that\'s all you think you\'re doing, you\'re missing this point. Moving to IP-only is just the first step. To effectively enable an IP-based system, you need automatic device discovery so that anything plugged into the system is detected, recognised, and configured automatically without the user having to mess around with configuring often-confusing IP addresses.

An IP-based studio should also allow for hybrid workflows, and what I mean by that is a hybrid of real- and non-real time workflows. It\'s not about replicating traditional linear workflows, but it is not about ignoring or abandoning such workflows, either. An IP studio needs to be able to do both.

And once everything is in a network, you need to be able to track media processes and actions against time, which is where the PTP clock comes into its own. For example, if you\'re recording the switching decisions in the studio so you can link up iPhones in an edit suite later, PTP tracking is extremely valuable for tracking the volumes of associated metadata.

And it\'s been said many times before, using these protocols means that you can build an IP studio with readily available, commercial off-the-shelf hardware and standard IP switches. However, you do need to work closely with IP switch manufacturers to ensure they understand, or can be made to understand, the needs of broadcast video, which is not always the case.

Now to achieve this, organisations like the Advanced Media Workflow Association (AMWA) has developed a collection of protocols called Networked Media Open specification (NMOS), primarily IS-04, which looks at registration and discovery within an IP studio environment, but again, that\'s one step.

So, what\'s the next one?

The heart. The heart of an IP studio consists of, instead of a video router, you have an IP switch, hence the need to ensure you get one from someone who understands broadcast. Ideally, within the IP fabric, you need registration protocol and a time reference, which can now be built into the switchers. They no longer need to be separate processes.

So as devices come online, they register into the switch, perhaps discover the timing reference of the system so they can capture at the right rate, notify what multicast streams they are making available to the system, and identify streams of others services that are flowing into the IP system and get everything up and running.

In the near future - and I\'m talking the very near future - the entire studio system should take any device in, recognise it, configure it and proceed automatically. SMPTE 2110 will enable video and audio data to take different paths through the system, which will introduce different latencies, so time-stamping against those PTP clocks will be essential to enable audio and video synchronisation.

But let\'s talk about remote production.

There are already IP-based remote production facilities deployed in broadcast trucks, typically for high profile events, even the deployment trucks themselves is still an expensive proposition, not just for the logistics of the truck itself, but the associated on-site engineering and support. IP or not, it\'s still costly.

Our focus at Suitcase TV is how to leverage IP techniques to do remote production in a different way. As I implied, the biggest cost for remote production has always been people, who quite rightly need to be fed, housed and watered. Most like to be paid, too. However, if we can pay some of them to be just as productive and creative from a central location, it saves a considerable amount on production costs, and possibly saves a few marriages, too.

The upshot of the costs savings translates to the ability to actually do more productions, but on a lower budget, and it\'s difficult to find a downside to that.

But isn\'t remote production a case of just bringing back a load of signals from an event site to a fixed, central gallery in a production centre?

Well, yes and no. it works, but it\'s not really "true remote production". As we see it, it\'s just a case of transporting all of the sources, but as the number of source signals increase, so do the costs, and it can soon become very costly. It makes sense, for example, with high profile sporting venues that have installed dedicated fibre links, but for the lower end production market, it\'s not really practical, and sometimes economically feasible, to relocate all the signals.

The idea, which has been proven in practice, is to locate a significant portion of the traditional onsite crew typically required back to a central production facility, or reassign them to another production, hence more for the money.

However, moving the heavy iron such as a vision mixer to the event site, so high bandwidth signals can be processed there rather than be transported back to a remote production site, reduces the link bandwidth transport costs.

That sounds like a solution, right? Well, it can be, but there\'s a major problem with it. That old nemesis, latency. The operator at the remote production centre, who is making decisions on when to switch between sources, can be looking at images at least four frames, and often more, from the past. This clearly won\'t do for live, fast-moving sports or events of any kind.

Of course, for some events, a few frames delay on the switching is not going to worry anyone too much, but with so much happening so quickly these days, sport or otherwise, we believe it\'s better to screw latency right down to the absolute floor if possible. You can\'t get rid of latency entirely, and anyone who says they can also has several million pounds in a dormant bank account, just waiting for you to claim. The only truly real-time video is analogue, and we got rid of that years ago. Even SDI has latency, it\'s just very small.

So, to fully unleash the massive benefits IP-based studio and/or remote production, we have to embrace it fully, and that means dealing with its inherent latency. How do we manage it? How do we mitigate it to enable frame-accurate remote switching?

We\'ve done it by taking the sources; making lower resolution proxy versions of them; time-stamping them accurately against that PTP timeclock; and send that information back to wherever the operators are for display on a user interface that incorporates some rational delay.

Coupled with that, we introduce some buffering at the event site, where all the high resolution content is stored uncompressed in memory (not committed to disc because that would reduce quality) which is fed into vision mixing and audio processes - but with some time offset, which means that when the operator switches between sources at the point they\'re seeing the action onscreen, there\'s ample time for that switch message to get back to the event site and connect to the appropriate multi-cast stream that\'s running in a delayed mode, and with that compensation be switched on the correct frame.

Of course, that final mix is several frames behind real time, which means that with the addition latency on the fibre link back to the studio, the final programme contribution is clearly well behind. How do we resolve this and make it usable for the operator?

Simple, right?

Actually, to us. It is. And it should be to you, too.

We resolve it by doing another mix using the proxy sources we\'ve stored locally, but we\'re doing it in software. By using the proxy sources it\'s easy to simulate the time switching so that the programme operator perceives the changes happening immediately, but the real, high resolution, switching is happening at the event site just a few frames later.

With this method, any link latency can be handled. And we have another process, using the same proxy version approach, for processes like graphics.

We did exactly this in a trial with BBC Sport during Euro 2016, and it was flawless.

All this ultimately means is that what goes to air in an IP-based world is exactly what the operator intended, exactly when the operator intended it.

And in a fast-paced world, you never get a second chance to make a first intention.