So, what is the difference between a stream and a file? It may seem like a dumb and obvious question, but it's crucial in the design of many file based workflows and the applications that drive them.
Let's start with the basic and obvious answer. "You can random access a file but you sequentially process a stream". This used to be important back in the days when CPUs were slow and storage was expensive, but now that we can process UHD in software in real time and my phone can hold a visually lossless HD movie on its internal storage, this answer seems rather dated.
The reality is that a good application ought to be able to treat media like a stream when needed and like a file when needed. For example, if you're pulling a big file from offline or cloud storage, it will be arriving locally as though it were a stream. It might be going slower than real time or maybe faster than real time, but if it's a good file format then the software ought to be able to handle it like a stream. This is often called the growing file or the while scenario.
Another example might be the reception of a live IP stream in a well known codec (e.g. JPEG2000) into a facility. While it's arriving, there is an expectation of good, live stream behaviour i.e. all the synchronised elements are close to each other in the stream as well as being correctly labelled so that a device joining the stream mid-transmission is able to identify what the elements are. It would also be very nice for a device joining mid-stream to know how much of the stream it had missed. These requirements are also shared with the growing file scenario.
In fact, when you list the requirements for a good live stream and a good growing file, you find that the real differences come down to:-
Latency (time delay) between acquiring the stream and displaying / using it
Knowledge of the file's history and its future e.g. index tables
How the timing and synchronisation references are used
Let's take these one by one.
Latency. In a live streaming application, the goal is almost always to minimise this. There is always a balance between cost, complexity and latency but lower latency for live streaming is nearly always better. In a growing file scenario, there is usually more tolerance (sometimes minutes rather than seconds) for the latency of using the received data when compared to a live stream. This fact can be used in the internal structure of the file when designing the multiplex or partitions or index tables to reduce overheads and improve efficiency.
History. In a growing file scenario, the complete file may exist somewhere in the cloud or on external storage and an index table for the entire file may exist. In the MXF world, the index tables can be chunked to allow the file to be used in 10sec or 1min or 10min chunks depending on the application with the full index table being transmitted at the end of the file. In contrast, a live stream multiplex will often omit index tables and they can be recreated when the file is ingested or rewrapped on reception.
Timing & Sync. In a file, the timing and synchronisation are usually relative to the internals of the file, whereas in a live stream, the timing could be relative to the clock on the wall or to the start of transmission. In the work being carried out on the synchronisation of component IP streams, this becomes important when different elements in the stream might be sent via different routes and then re-synchronised.
Why is any of this important? It's because we are living in a world where moving data over IP networks is less of a technological barrier and the economics of the data move are rising higher in the list of things to consider. One approach for HTTP streaming uses chunks of transport stream (a live streaming multiplex) stored as files on a server and then contiguously streamed to achieve a streaming experience. Many live-highlights editing workflow use MXF files growing on disc to be randomly accessed by editing software to build a package while the file is coming in.
If you're designing a workflow where you need to manipulate the content while it's arriving, the decision of the format on the wire won't be a technological one for much longer. It will be a commercial decision based on the requirements of the workflow and the equipment available.
Is it a file? Is it a stream? Who cares! Put a blue cape on it and call it superman. It's getting the right results at the right time for the right cost that counts.