Streaming MediaStreams

A Collection of Interesting Ideas,

Editor:
Domenic Denicola (Google)
Participate:
GitHub domenic/streaming-mediastreams (new issue, open issues)
IRC: #whatwg on Freenode
Commits:
https://github.com/domenic/streaming-mediastreams/commits

Abstract

This specification allows the creation of a readable stream derived from a MediaStream object, to allow recording or other direct manipulation of the MediaStream’s contents.

Table of Contents

1. Introduction

This section is non-normative.

MediaStream objects act as opaque handles to a stream of audio and video data. These can be consumed in a variety of ways by various platform APIs, as discussed in [GETUSERMEDIA]. This specification defines a way of consuming them by creating a readable stream, whose chunks are Blobs of encoded audio/video data recorded from the stream in a standard container format.

The resulting readable stream, known as a MediaStream recorder and embodied by the MediaStreamRecorder interface, can then be read from directly by author code which wishes to manipulate these blobs. Alternately, it may be piped to another destination, or consumed by other code that takes a readable stream.

2. Example Usage

This section is non-normative.

To read six seconds of audio-video input from a user’s webcam as a single Blob, the following code could be used:

function getSixSecondsOfVideo() {
  navigator.mediaDevices.getUserMedia({ video: true }).then(mediaStream => {
    const recorder = new MediaStreamRecorder(mediaStream, { timeSlice: 6 * 1000 });
    const reader = recorder.getReader();

    return reader.read().then(({ value }) => {
      reader.cancel();
      return value;
    });
  });
}

getSixSecondsOfVideo().then(blob => ...);

This uses the timeSlice option to ensure that each chunk read from the MediaStreamRecorder is at least six seconds long. Upon receiving the first chunk, it immediately cancels the readable stream, as no more recording is necessary.

If the ultimate destination for the streaming audio-video input were somewhere else, say an [INDEXEDDB] database, then it would be more prudent to let the user agent choose the time slice, and to store the chunks as they are available:

navigator.mediaDevices.getUserMedia({ video: true }).then(mediaStream => {
  const recorder = new MediaStreamRecorder(mediaStream);
  writeReadableStreamToIndexedDBForSixSeconds(recorder);
});

let startTime;

function writeReadableStreamToIndexedDBForSixSeconds(rs) {
  const reader = rs.getReader();
  startTime = Date.now();
  return pump();

  function pump() {
    return reader.read().then(({ value }) => {
      writeBlobToIndexedDB(value); // gory details omitted

      if (Date.now() - startTime > 6 * 1000) {
        reader.cancel();
      } else {
        return pump();
      }
    });
  }
}

If you were writing to a destination which had a proper writable stream representing it, this would of course become much easier:

navigator.mediaDevices.getUserMedia({ video: true }).then(mediaStream => {
  startTime = Date.now();
  const recorder = new MediaStreamRecorder(mediaStream);

  const dest = getIndexedDBWritableStream(); // using hypothetical future capabilities

  const piping = recorder.pipeTo(dest);
  setTimeout(() => piping.cancel(), 6 * 1000); // XXX depends on cancelable promises
});

Alternately, your destination may accept readable streams, as is planned for [FETCH]. This example will continually stream video from the user’s video camera directly to a server endpoint, using standard [STREAMS] and [FETCH] idioms that work with any readable stream:

navigator.mediaDevices.getUserMedia({ video: true }).then(mediaStream => {
  const recorder = new MediaStreamRecorder(mediaStream, { type: "video/mp4" });

  return fetch("/storage/user-video.mp4", {
    body: recorder,
    headers: {
      "Content-Type": "video/mp4"
    }
  });
});
Issue #3 on GitHub: “Alternate design”

Instead of a new class subclassing ReadableStream, with a bunch of useless-seeming getters, we could just have mediaStream.recordAsReadableStream(options) returning a vanilla ReadableStream. You'd presumably add MediaStream.canRecord(...) as well.

This design seems much simpler, if there is no use case for reading the options after creation. Which I can't imagine there are...

<https://github.com/domenic/streaming-mediastreams/issues/6>

3. The MediaStreamRecorder API

[Constructor(MediaStream stream, optional MediaStreamRecorderOptions options)]
interface MediaStreamRecorder : ReadableStream {
  readonly attribute MediaStream mediaStream;
  readonly attribute DOMString type;
  readonly attribute boolean ignoreMutedMedia;
  readonly attribute unsigned long long timeSlice;
  readonly attribute unsigned long long bitRate;

  static CanPlayTypeResult canRecordType(DOMString type);
};

dictionary MediaStreamRecorderOptions {
  DOMString type;
  boolean ignoreMutedMedia = false;
  [EnforceRange] unsigned long long timeSlice = 0;
  [EnforceRange] unsigned long long bitRate;
};

All MediaStreamRecorder instances have [[mediaStream]], [[type]], [[ignoreMutedMedia]], [[timeSlice]], and [[bitRate]] internal slots.

3.1. new MediaStreamRecorder(stream, options)

  1. If options.type is present but is not a supported MIME type for media stream recording, throw a NotSupportedError DOMException.

  2. If options.type is present, let type be options.type. Otherwise, let type be a user-agent chosen default recording MIME type.

  3. If options.bitRate is present, let bitRate be options.bitRate, clamped within a range deemed acceptable by the user agent. Otherwise, let bitRate be a default bit rate, perhaps dependent on type or timeSlice.

  4. Let timeSlice be the greater of options.timeSlice and some minimum recording time slice imposed by the user agent.

  5. Call the superconstructor with appropriate underlying source and queuing strategy arguments so as to record mediaStream according to the following requirements:

    1. All data from mediaStream must be recorded as Blob chunks that are enqueued into this readable stream.

      The choice of Blob instead of, e.g., ArrayBuffer, is to allow the data to be kept in a place that is not immediately accessible to the main thread. For example, Firefox separates its media subsystem from the main thread via asynchronous dispatch. See #5 for more discussion.
    2. All such chunks must represent at least timeSlice milliseconds of data, except potentially the last one if the MediaStream ends before that much data can be recorded. Any excess length beyond timeSlice milliseconds for each chunk should be minimized.

    3. The resulting chunks must be created such that the original tracks of the MediaStream can be retrieved at playback time by standard software meant for replaying the container format specified by type. When multiple Blob chunks are enqueued, the individual Blobs need not be playable, but the concatenation of all the Blobs from a completed recording must be playable.

    4. The resulting chunks must be encoded using bitRate as the bit rate for encoding.

    5. If any track within the MediaStream is muted at any time, then either:

      1. If options.ignoreMutedMedia is true, nothing must be recorded for those tracks.

      2. Otherwise, the chunks enqueued to represent those tracks must be recorded as black frames or silence (as appropriate) while the track remains muted.

    6. If at any point mediaStream’s isolation properties change so that access to it is no longer allowed, this readable stream must be errored with a SecurityError DOMException.

    7. If recording cannot be started or at any point cannot continue (for reasons other than a security violation),

      1. A chunk containing any currently-recorded but not-yet-enqueued data must be enqueued into this readable stream.

      2. This readable stream must be errored with a TypeError.

    8. If mediaStream ends, then this readable stream must be closed.

  6. Set this@[[mediaStream]] to mediaStream, this@[[type]] to type, this@[[ignoreMutedMedia]] to options.ignoreMutedMedia, this@[[timeSlice]] to timeSlice, and this@[[bitRate]] to bitRate.

<https://github.com/domenic/streaming-mediastreams/issues/1>

<https://github.com/domenic/streaming-mediastreams/issues/4>

3.2. get MediaStreamRecorder.prototype.mediaStream

  1. Return this@[[mediaStream]].

3.3. get MediaStreamRecorder.prototype.type

  1. Return this@[[type]].

<https://github.com/domenic/streaming-mediastreams/issues/2>

3.4. get MediaStreamRecorder.prototype.ignoreMutedMedia

  1. Return this@[[ignoreMutedMedia]].

3.5. get MediaStreamRecorder.prototype.timeSlice

  1. Return this@[[timeSlice]].

3.6. get MediaStreamRecorder.prototype.bitRate

  1. Return this@[[bitRate]].

3.7. MediaStreamRecorder.canRecordType(type)

  1. If the user agent knows that it cannot record type, return "".

  2. If the user agent is confident that it can record type, return "probably".

  3. Return "maybe".

Implementers are encouraged to return "maybe" unless the type can be confidently established as being supported or not.

Acknowledgments

The editor would like to thank Jim Barnett and Travis Leithead for ther original [MEDIASTREAM-RECORDING] specification. This document is largely a reframing of their work on top of [STREAMS].

This specification is written by Domenic Denicola (Google, d@domenic.me).

Per CC0, to the extent possible under law, the editor has waived all copyright and related or neighboring rights to this work.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[ECMASCRIPT]
Allen Wirfs-Brock. ECMA-262 6th Edition, The ECMAScript 2015 Language Specification. June 2015. Standard. URL: https://people.mozilla.org/~jorendorff/es6-draft.html
[HTML]
Ian Hickson. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[STREAMS]
Domenic Denicola. Streams Standard. Living Standard. URL: https://streams.spec.whatwg.org/
[WebIDL]
Cameron McCormack. Web IDL. 19 April 2012. CR. URL: https://heycam.github.io/webidl/
[MEDIASTREAM-RECORDING]
Travis Leithead; James Barnett. MediaStream Recording. 27 January 2015. WD. URL: https://w3c.github.io/mediacapture-record/MediaRecorder.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Informative References

[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[GETUSERMEDIA]
Daniel Burnett; et al. Media Capture and Streams. 14 April 2015. LCWD. URL: https://w3c.github.io/mediacapture-main/
[IndexedDB]
Nikunj Mehta; et al. Indexed Database API. 8 January 2015. REC. URL: http://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html

IDL Index

[Constructor(MediaStream stream, optional MediaStreamRecorderOptions options)]
interface MediaStreamRecorder : ReadableStream {
  readonly attribute MediaStream mediaStream;
  readonly attribute DOMString type;
  readonly attribute boolean ignoreMutedMedia;
  readonly attribute unsigned long long timeSlice;
  readonly attribute unsigned long long bitRate;

  static CanPlayTypeResult canRecordType(DOMString type);
};

dictionary MediaStreamRecorderOptions {
  DOMString type;
  boolean ignoreMutedMedia = false;
  [EnforceRange] unsigned long long timeSlice = 0;
  [EnforceRange] unsigned long long bitRate;
};