Network Working Group                                        C. Holmberg
Internet-Draft                                                  Ericsson
Intended status: Informational                          14 February 2024
Expires: 17 August 2024


    Session Initiation Protocol (SIP) Conference Server for Publish/
                               Subscribe
                draft-holmberg-conference-pubsub-latest

Abstract

   This document describes how a Session Initiation Protocol (SIP)
   [RFC3261] conference server [RFC4353] can be used to realize a Pub/
   Sub broker to distribute non-audiovisual data (e.g., IoT sensor
   readings).  SIP agents are used to realize publishers and
   subscribers.  The main advantage of the solution is the possibility
   to use existing SIP-based audiovisual conferencing infrastructure and
   protocols to realize a Pub/Sub solution.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://cdh4u.github.io/conference-pubsub/draft-holmberg-conference-
   pubsub.html.  Status information for this document may be found at
   https://datatracker.ietf.org/doc/draft-holmberg-conference-pubsub/.

   Source for this draft and an issue tracker can be found at
   https://github.com/cdh4u/conference-pubsub.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 17 August 2024.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction
   2.  Conventions and Definitions
   3.  Generic Conference Considerations
     3.1.  Conference Creation
       3.1.1.  Conference Duration
       3.1.2.  Conference Termination
       3.1.3.  Sending and Receiving Data
       3.1.4.  Simultanous Senders
       3.1.5.  Number of Conference Participants
       3.1.6.  Data Sending Frequency
       3.1.7.  Data Synchronization
   4.  SIP Signalling Considerations
     4.1.  SIP Subject Header Field
   5.  SDP Considerations
     5.1.  SDP Direaction Attribute
   6.  SIP Event Package Considerations
     6.1.  SIP Event Package for Conference State
       6.1.1.  Extensions for PubSub
     6.2.  SIP Event Package for PublishSubscribe
       6.2.1.  Example
     6.3.  RTP Considerations
       6.3.1.  RTP Timestamp
       6.3.2.  Keep-alive and Heartbeat
       6.3.3.  Data Aggregation
     6.4.  RFC 4103 Consierations
       6.4.1.  Redundancy
       6.4.2.  Packet Loss Detection
       6.4.3.  RFC 9071 Considerations
       6.4.4.  Idle Period
     6.5.  Time Considerations
     6.6.  RTCP Considerations
     6.7.  RTP Mixing or Translating
       6.7.1.  Translating
       6.7.2.  Mixing
     6.8.  Single vs multiple RTP sessions
     6.9.  Sender Timestamp
     6.10. Data Mixing
   7.  RTP Considerations
     7.1.  RTP Payload Type (PT) for PubSub
     7.2.  RTP Header Marker Bit
     7.3.  RTP Extensions for PubSub
       7.3.1.  Simulcast and Resolution
       7.3.2.  Retransmission
   8.  RTCP Considerations
     8.1.  RTCP FB
   9.  Standardization Considerations
   10. Security Considerations
   11. IANA Considerations
   12. Normative References
   Acknowledgments
   Author's Address

1.  Introduction

   Publish/Subscribe (Pub/Sub) is an communication pattern for
   asynchronous transport of messages between endpoints.  Endpoints that
   create and send messages are referred to as publishers.  Endpoints
   that receive and consume messages are referred to as subscribers.
   Publishers do not send messages directly to subscribers.  Instead,
   publishers send messages to an intermediary, referred to as broker.
   A publisher will associate the message with a topic, and the broker
   will forward the message to each subscriber that has subscribed to
   the topic.  As the messages are sent to, and forwarded by, the
   broker, publishers and subscribers do not need to be aware of each
   other.  This enables flexible and scalable systems.

   A topic typcially describes the semantics of the data (e.g., "water-
   temperature-data") or identifies the publisher (e.g., "water-pump-
   123").  The structure and syntax of the topic depends on the Pub/Sub
   framework.  Some Pub/Sub frameworks define tree-structured topics
   (e.g. "factory/temperature/sensor-123"), and allow topic wildcarding
   (e.g., "factory/temperature/*").  This document does not define topic
   syntax or structure.  Topics are simply seen as token or string
   values.

   Some Pub/Sub frameworks do not use brokers (broker-less Pub/Sub).
   Instead, the distribution of messages are realized using network
   features, e.g., IP multicast.  Broker-less Pub/Sub is outside the
   scope of this document.

   When a publisher publishes data it associates it with a topic.

        .-----------.          .----------.  subscribe  .------------.
        |           |   data   |          |<------------+            |
        | Publisher +--------->+          |    data     | Subscriber |
        |           |          |          +------------>|            |
        '-----------'          |          |             '------------'
             ...               |  Broker  |                ...
             ...               |          |                ...
        .-----------.          |          |  subscribe  .------------.
        |           |   data   |          |<------------+            |
        | Publisher +--------->+          |    data     | Subscriber |
        |           |          |          +------------>|            |
        '-----------'          '----------'             '------------'

                  Figure 1: Publish/Subscribe Architecture

   This document describes how a Session Initiation Protocol (SIP)
   [RFC3261] conference server [RFC4353] can be used to realize a Pub/
   Sub broker to distribute non-audiovisual data (e.g., IoT sensor
   readings).  SIP agents are used to realize publishers and
   subscribers.  This is referred to Pub/Sub conference.

   The main advantage of the solution is the possibility to use existing
   SIP-based audiovisual conferencing infrastructure and protocols to
   realize a Pub/Sub solution.

   The real-time transport protocol (RTP) [RFC3550] is used to transport
   the data.  Within this document the RTP payload for T.140 text
   conversation [RFC4103] is used to transport the data.  The examples
   use Sensor Measurement Lists (SenML) [RFC8428] to encode the data.
   However, other payloads and encoding mechanisms can also be used.

   NOTE: This document is based on the generic SIP conferencing
   procedures defined in [RFC4353] and [RFC4579].

   NOTE: While RTP is a generic data transport protocol, the main usage
   has been for transport of audiovisual data (and real-time text),
   between human users.  However, at the time of writing this document,
   there is work in IETF on RTP payloads also for transport of non-
   audivisual data.

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   TO BE REMOVED BEGIN

   Synchronization source (SSRC): The source of a stream of RTP packets,
   identified by a 32-bit numeric SSRC identifier carried in the RTP
   header so as not to be dependent upon the network address.  All
   packets from a synchronization source form part of the same timing
   and sequence number space, so a receiver groups packets by
   synchronization source for playback.  Examples of synchronization
   sources include the sender of a stream of packets derived from a
   signal source such as a microphone or a camera, or an RTP mixer.  A
   synchronization source may change its data format, e.g., audio
   encoding, over time.  The SSRC identifier is a randomly chosen value
   meant to be globally unique within a particular RTP session.  A
   participant need not use the same SSRC identifier for all the RTP
   sessions in a multimedia session; the binding of the SSRC identifiers
   is provided through RTCP.  If a participantgenerates multiple streams
   in one RTP session, for example from separate video cameras, each
   MUST be identified as a different SSRC.

   Contributing source (CSRC): A source of a stream of RTP packetsthat
   has contributed to the combined stream produced by an RTP mixer.  The
   mixer inserts a list of the SSRC identifiers of the sources that
   contributed to the generation of a particular packet into the RTP
   header of that packet.  This list is called the CSRC list.  An
   example application is audio conferencing where a mixer indicates all
   the talkers whose speech was combined to produce the outgoing packet,
   allowing the receiver to indicate the current talker, even though all
   the audio packets contain the same SSRC identifier (that of the
   mixer).

   Mixer: An intermediate system that receives RTP packets from one or
   more sources, possibly changes the data format, combines the packets
   in some manner and then forwards a new RTP packet.  Since the timing
   among multiple input sources will not generally be synchronized, the
   mixer will make timing adjustments among the streams and generate its
   own timing for the combined stream.  Thus, all data packets
   originating from a mixer will be identified as having the mixer as
   their synchronization source.

   Translator: An intermediate system that forwards RTP packets with
   their synchronization source identifier intact.  Examples of
   translators include devices that convert encodings without mixing,
   replicators from multicast to unicast, and application-level filters
   in firewalls.

   TO BE REMOVED END

   This document uses the SIP terminology defined in [RFC4353] and the
   RTP terminology defined in [RFC3550].

3.  Generic Conference Considerations

   This section discusses generic (non-protocol specific) differences
   between audiovisual conferences and Pub/Sub conferences.

3.1.  Conference Creation

   A conference server might host multiple audiovisual conferences that
   share the same conference name, as the name is not used to uniqually
   identify the conference.  It is not uncommon that multiple
   audiovisual conferences share the same name, as the conference
   creators might not be aware of each other.

   When a conference server is hosting Pub/Sub conferences, it is not
   practical to host multiple Pub/Sub conferences that share the same
   topic.  Because of that, if a participants tries to create a Pub/Sub
   conference with a topic for which a Pub/Sub conference already
   exists, the conference server might choose to either reject the
   conference creation request, redirect the participant to the existing
   conference, or simply add the endpoint to the existing conference.

3.1.1.  Conference Duration

   The duration of an audiovisual conference may vary.  And, while
   longlived conferences might exist, the duration is typcially measured
   in minutes or hours.

   The interest for a PubSub topic might last for a very long time.
   Because of that, a PubSub conference associated with the topc might
   last for days, months or "infinite".  While there might be times when
   there are no PubSub participants within a conference, the conference
   server might still keep the PubSub conference "alive", as new
   participants are expected to join in the near future.  One advantage
   of keeping the conference "alive" is that participants can use the
   same conference URI whenever they are re-joining the conference.

3.1.2.  Conference Termination

   An AV conference is typically terminated once the last participant
   leaves the conference, when the conference createer leaves the
   conference, or at a pre-configured clock time.

   Within a PubSub conference, participants might join the conference
   only when they want to send or receive data.  In between they might
   leave the conference.  Because of this there might be periods when
   there are no participants wihtin the conference.  However, as
   participants might re-join later, and new participants might join the
   conference.  Therefore, as long as it can be assumed that there is an
   interest in the Topic associated with the conference, the conference
   might be kept alive even if there are no participants.

3.1.3.  Sending and Receiving Data

   Within an AV Conference, participants typically both send and receive
   media.  This can also be the case in a PubSub Conference, if the
   Publish/Subscribe traffic pattern is used to realize bi-directional
   data exchange between conference participants.  However, typically
   participants within a PubSub Conference will either send (Publish) or
   receive (Subscribe) data.  For example, sensors will typically only
   publish data, while analyticsetc applications will only subscribe to
   data.

   Note that a PubSub Conference participant might publish data to one
   Topic, while subscribing to another Topic.

3.1.4.  Simultanous Senders

   Within an AV Conference typically only one participant sends speech
   audio at any given time.  Within a PubSub Conference, multiple
   Publishers might publish data simultaneously, as data is often
   published as soon as it becomes available.  In addition, a Publisher
   typically has no idea when other Publishers are publishing data.  A
   Publishers might not even be aware of the other Publishers (if any).
   Because of this, the conference server might receive published data
   from multiple Publishers simultaneously.  Depending on how quickly
   the conference server can forward the data to the subscribers it
   might have to buffer the data, which can cause delays.

3.1.5.  Number of Conference Participants

   Within an AV Conference, the number of participants is relatively
   constant throughout the lifetime of the conference.  The number of
   participants might also be known before the conference begins, e.g.,
   based on the number of participants that have accepted an invitation
   to the conference.

   Within a PubSub Conference, the number of participants might vary
   widely throughout the lifetime of the conference.  A Publisher might
   join a conference only when it is publishing data, then leave the
   conference and re-join later again.  If a Subscriber is only
   interested in receiving data at specific times, it might also join
   the conference only for those time.

3.1.6.  Data Sending Frequency

   In an AV Conference, audio and video data is typically sent
   constantly, eventhough there are ways to temporarily stop the sending
   of data (e.g., by turning off the camera, muting the microphone etc).

   In a PubSub Conference, the data publishing frequency can vary
   widely.  In some cases, a Publisher will publish data very frequently
   (measured in milliseconds).  In other cases, a Publisher might
   publish data more seldom: once a minute, once an hour, once a day,
   etc.

3.1.7.  Data Synchronization

   Within an AV Conference, if the conference server is mixing the audio
   from all participants, the conference server needs to be able to
   ensure that the audio packets that are mixed together have been
   generated at the same time.  The conference server does not
   necessarily need to know that real clock value, only that the packets
   have been generated at the same time.

   Within a PubSub Conference, the conference server will not mix data
   from different Publishers.  In some case, for network optimization
   purpose, the conference server might forward data from multiple
   Publishers in a single packet towards the Subscribers.

4.  SIP Signalling Considerations

   The discussions within this Section are based on the procedures and
   concepts defined in [RFC4353] and [RFC4579].

4.1.  SIP Subject Header Field

   The SIP Subject header field can be used to indicate the topic
   associated with the PubSub Conference.  When a new new conference is
   created (using SIP signalling) the conference creator uses the
   Subject header field to indicate the Topic of the conference.

                     pubsub-conferences-info
                          |
                          |-- pubsub-conferences
                               |-- pubsub-conference
                               |    |-- conference-URI
                               |    |-- topic
                               |-- pubsub-conference
                               .    |-- conference-URI
                               .    |-- topic
                               .

                Figure 2: SIP Subject header field for Topic

5.  SDP Considerations

5.1.  SDP Direaction Attribute

   These attributes can be used to indicate the PubSub role of a
   participant.

   A participant can use SDP 'sendonly' attribute to indicate that it is
   acting as a Publisher.

   A participant can use SDP 'recvonly' attribute to indicate that it is
   acting as a Subscriber.

   A participant can use SDP 'sendrecv' attribute to indicate that it is
   acting as both a Publisher and a Subscriber.

   A participant can use SDP 'inactive' attribute to indicate that it is
   not acting as a Publisher nor a Subscriber.  A participant can use
   the attribute e.g., to temporary indicate to the conference server
   that it does not want to receive data associated with the conference,
   but also to indicate that it will not send data associated with the
   conference.

6.  SIP Event Package Considerations

   This section describes extensions for the SIP Event Package for
   Conference State [RFC4575] that are useful for a PubSub Conference.
   In addition, this section describes a new SIP Event Package, SIP
   Event Package for PublishSubscribe, that can be used to inform
   participants about the PubSub Conferences hosted by a conference
   server, e.g., the Topic and conference-uri associated with each
   conference.

6.1.  SIP Event Package for Conference State

   [RFC4575] defines a SIP Event Package [RFC3265] for Conference State.
   A conference participant can subscribe to the event package, and
   retrieve conferance state information, including information about
   the conference itsel, and information about other conference
   participants.  For a PubSub conference, the "subject" child element
   of the "conference-description" element can be used to indicate the
   Topic associated with the conference.

6.1.1.  Extensions for PubSub

6.1.1.1.  Maximum Number of Conference Participants

   The "maximum-user-count" element is used to indicate the maximum
   number of conference participants.  For an AudioVisual conference,
   where participants typcially both send and receive media a single
   element will be enough.  However, in a PubSub conference, a majority
   of the conference participants might be either subscribers or
   publishers.  There might be a large variation in how many publishers
   and how mnay subscribers a conference server is able to handle.
   Therefore it could be useful to have separate elements to indicate
   that, e.g., "maximum-user-count-publisher" and "maximum-user-count-
   subscriber".

6.2.  SIP Event Package for PublishSubscribe

   While the SIP Event Package for Conference State provides information
   and state information for a given conference, it does not provide
   information about other conferences that are hosted by the conference
   server.

   This section suggests a new SIP Event Package, SIP Event Package for
   PublishSubscribe.  The event package is not assoiciated with a
   specific PubSub conference, but provides information about the PubSub
   conferences hosted by the conference server.  For each PubSub
   conference hosted by conference server, the event package contains
   the conference URI and the associatd topic.  It might also contain
   additional information about each PubSub conference.  In addition, if
   the topic is associated with some namespace or dictionary, there
   might be information about that.

                     pubsub-conferences-info
                          |
                          |-- pubsub-conferences
                               |-- pubsub-conference
                               |    |-- conference-URI
                               |    |-- topic
                               |-- pubsub-conference
                               .    |-- conference-URI
                               .    |-- topic
                               .

              Figure 3: SIP Event Package for PublishSubscribe

6.2.1.  Example

   <?xml version="1.0" encoding="UTF-8"?>
   <pubsub-conferences-info
    xmlns="urn:ietf:params:xml:ns:pubsub-conferences-info"
    entity="sips:confserver@example.com"
    state="full" version="1">
   <!--
     PUBSUB CONFERENCES
   -->
    <pubsub-conferences>
     <pubsub-conference entity="sip:pubsubconf123@example.com" state="full">
      <topic>water temperature</topic>
     </pubsub-conference>
     <pubsub-conference entity="sip:pubsubconf456@example.com" state="full">
      <topic>air temperature</topic>
     </pubsub-conference>
    </pubsub-conferences>
   </pubsub-conferences-info>

NOTE: The conference-uri is defined as an pubsub-conference element attribute. As an option, it could be defined as a separate element.

NOTE: As an option, the topic could also be defined as an pubsub-conference element attribute.

      Figure 4: Example: SIP Event Package for PublishSubscribe

6.3.  RTP Considerations

6.3.1.  RTP Timestamp

   The RTP Timestamp, together with the RTCP SR (Sending Report) can be
   used by Conference servers and participants to synchronize audio and
   video received from multiple participants.

   Note that the RTCP SR messages might be terminated by the Conference
   server.

   Google RTP extension

6.3.1.1.  Payload

   In this case, the sampling timestamp is carried in the payload data,
   instead of the RTP/RTCP packets.  The disadvantage of this mechanim
   is that one needs to ensure that the data payload format always
   supports the transport of the sampling timestamp.

6.3.2.  Keep-alive and Heartbeat

   In a non-AV PubSub Conference, Publishers might stay within a PubSub
   conference for a long periods without publishing any data.
   Subscribers will not publish any data at all.  There are a couple of
   possible implications that must be considered: NAT binding keep-
   alives and heartbeats (i.e., inform other PubSub Participants that a
   PubSub Participant is still 'alive').  NAT binding keep-alives are
   needed in cases where a PubSub Participants needs to send periodic
   data in order to maintain NAT bindings between itself and the PubSub
   Conference servers.  This is important for Subscribers, but also for
   Publishers that publish data with very long intervals.

   [RFC6263] describes different RTP/RTCP mechanisms to send NAT keep-
   alives.

   The 'Empty (0-Byte) Transport Packet' mechanism does not use RTP/
   RTCP.  Instead, a participant will send an empty transport packet
   (e.g., UDP packet).  Note that this mechanism is useful mainly for
   NAT traversal purpose.  The conference server application will
   typcially not be informed about these packets, and will not forward
   the packets to other conference particiapants.  This mechanism is
   applicable to non-AV data in PubSub Conferences.

   The 'RTP Packet with Comfort Noise Payload' mechanism uses a specific
   RTP payload format for comfort noise [RFC3389].  The payload format
   has been defined for audio data, and must be supported by both
   senders and receivers.  Is not applicable for non-AV data in PubSub
   conferences.

   The 'RTCP Packets Multiplexed with RTP Packets' mechanism uses RTP/
   RTCP multiplexing, where the same 5-typle is used for the RTP and
   RTCP packets.  When there is no data to be sent, an RTCP packet can
   be sent as a keep-alive.  Note that Subscribers that do not send RTP
   packets can still send RTCP packets.

   The 'RTP Packet with Incorrect Version Number' and 'RTP Packet with
   Unknown Payload Type' mechanisms uses invalid with an unvalid RTP
   version number respectively a RTP packet with a non-negotiated
   payload type.  Receivers are expected to ignore and discard these
   types of RTP packets.  This mechanism is applicable to non-AV data in
   PubSub Conferences.

   In an AV conference, most participants will typically both send and
   receive data.  However, in a PubSub Conference many of the PubSub
   Participants will only send data (Publihser) or receive data
   (Subscriber).  However, eventhough a Subsriber does not publish any
   data, it might still use the mechanisms above if needed, and a PubSub
   Conference Server needs to be prepared to receive such RTP/RTCP
   packets from both Publishers and Subscribers.

   NOTE: One of the ideas behind the Publish/Subsribe traffic pattern is
   that Publishers and Subscribers do not need to be aware of each
   other.  In such cases there is no need to use an end-to-end heartbeat
   mechanism between Publishers and Subscribers.  Note that there might
   still be a heartbeat mechanism used between Publishers/Subsribers and
   the Broker.  However, there might be cases where Publishers and
   Subscribers are tightly coupled, and where a heartbeat mechanism is
   required.

   NOTE: Some data formats might define their own methods for sending
   heartbeats.  For example, there might a way to indicate that the
   payload is only used for heartbeat purpose, and does not contain any
   additional data.

6.3.3.  Data Aggregation

   NOTE: SenML supports aggregation.  Mutli

6.4.  RFC 4103 Consierations

   https://datatracker.ietf.org/doc/html/rfc4103

   [RFC4103] defines an RTP payload format ('text/t140') to transport
   T.140 real-time text.  Within this document, it is assumed that the
   t140 payload format is used in the RTP packets to transport the
   published data.  The reason for this is the possibility to re-use
   existing conference servers that support the payload format to
   realize the Broker.

6.4.1.  Redundancy

   https://datatracker.ietf.org/doc/html/rfc2198

   By default, when no other redundancy mechanism is supported, the
   usage of the redundancy mechanism defined in [RFC2198] is required,
   unless the network conditions can guarantee that all text will always
   be delivered from the sender to the receiver.  Additional mechanisms,
   e.g., Forward Error Correction (FEC) [RFC2733] might also be used.

   In a Publish/Subscribe network, the data delivery requirements will
   determine whether redundancy or error correction mechanism are
   needed.  In some cases, data loss might be toleraded, while in other
   cases there might be requirements that all published data reaches
   each Subscriber that has subscribed to the Topic of the data.

6.4.2.  Packet Loss Detection

   As t140 packets are only sent when there is new text to send.
   Because of that, as the time between sent packets may vary, the
   Timestamp cannot be used by a receiver to detect packet loss.
   Instead, the Sequence Number (SN) is used to detect packet loss.

   Note that, if there is no new text to send, an RTP packet that only
   contains redundant data might be sent.  This can be useful to e.g.,
   maintain NAT bindings, and as a generic heartbeat mechanism to
   indicate that the sender is still alive.

6.4.3.  RFC 9071 Considerations

   [RFC9071] provides guidance on mixing of real-time text (RTT) by a
   conference server.  This sections discusses considerations to take
   into account when T.140 is used to transport Publish/Subscribe data.
   Note that some of the considerations have also been addressed from a
   generic conference perspective.

6.4.3.1.  Simultanous Senders

   As described in Section 3.1 of [RFC9071], in RTT conferences
   typically only one participant writes and sends text (at least long
   pieces of text) at any given time.  Within a Publish/Subscribe
   network, multiple Publishers might publish data simultaneously, as
   data might be published as soon as it has been sampled, and as
   Publishers are now aware of when other Publsihers are publishing
   data.  Because of this, the conference server might receive published
   data from multiple Publishers simultaneously.  If the conference
   server is not able to simultaneously forward all published data, it
   will have to buffer the data.

6.4.4.  Idle Period

   [RFC9071] gives guidance on how to process real-time text in a
   conference server.

6.4.4.1.  Mode

   Section 1.2 of [RFC9071] describes different "modes".

6.4.4.2.  Sending Frequency

   Within a Publish/Subscribe network, the data publishing interval can
   vary widely, depending on the use-case.  In some constrained
   environements, a Publisher may publish data e.g., once a day, while
   in an industrial environment a Publisher may publish data all the
   time, with a very short publishing interval.

   Received Real-time text is often read by humans.  Because of that, it
   is important that text that was sent simultaneously by different
   senders is also received at the simultanelusly by the receivers.

   RTP-mixer-based method for multiparty-aware endpoints:

   [RFC9071] makes assumptions regarding how participant are sending
   text.  It assumes that in a typcial scenario only one participant
   will send text at any given time.  In a PubSub Conference, the same
   assumption cannot be done, as multiple publisher might
   simulateneously publish data to the same topic.  In addition, unless
   a publisher also acts as a subscriber, it does not even know when and
   if other publishers are publishing data.

   [RFC9071] focuses on two mixing solutions: 'The RTP-Mixer-Based
   Solution for Multiparty-Aware Endpoints' (Section 2.2) and 'Mixing
   for Multiparty-Unaware Endpoints (Section 2.3)'.

   In the 'The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints'
   solution, the receivers will receive the text from all senders within
   a single RTP stream from the conference server.

6.5.  Time Considerations

   In a Publish/Subscribe network, as publishers publish data
   independently from each other, there is typically no need for
   subscribers to syncrhonize or "lip-synch" the data.

6.6.  RTCP Considerations

   The Real-time Control Protocol (RTCP) is used in conjunction with
   RTP.  While RTP carries the actual data, RTCP carries information
   (e.g., statistics, control information), within an RTP session.

   NOTE: As RTCP messages sent by a Publisher might be terminated by a
   conference server (performing data mixing), essential information
   might not reach the Subscribers.  For example, an RTPC Sender Report
   (SR) that provides mapping between the absolute time and the RTP
   Timestamp might not reach the Subscribers.

   Note that if is a large number of participants within a PubSub
   Conference, and if there is a need to send RTCP messages frequently,
   the RTCP messages might consume a large portion of network- and
   conference server capacity.

6.7.  RTP Mixing or Translating

   In an AV conference, the actual time when the AV data was created is
   typcially not that important to the receiver.  It is more important
   that the conference server is able to mix and lip-synch AV data that
   has been created at the same time by the senders.

6.7.1.  Translating

   By default, when a Broker receives published data to a Topic, it
   forwards the data to each Subscriber that has subscribed to the data,
   without any processing of the data.

   Aggregation: instead of forwarding each published data directly to a
   Subscriber, the PubSub Conference server might choose to store the
   published data in an aggregated manner, and forward all stored data
   in a single RTP packet towards the Subscriber based on different
   policies, e.g., depen

   NOTE: The mechansims used by a PubSub Conference server to determine
   the constraints (network bandwidth etc) of a Subscriber are outside
   the scope of this document.

   In case of translation, the original SSRC and the Timestamp will not
   be replaced by the translator.

6.7.2.  Mixing

   As a mixer is consdiered a source by itself, it will often terminate
   received RTCP packets.  Because of this, subscribers might not
   receive the RTCP SR packets that contain the mapping between the RTP
   Timestamp time and the real clock.

   In an RTP packet, the Timestamp value typcially indicates when the
   payload data has been sampled.  The exact details depends on the
   media type and payload format.

   In case of mixing, the conference server will insert its own SSRC and
   Timestamp in the outgoing RTP packets with the mixed media.  While
   the CSRC field can provide the SSRCs of the RTP packets used to
   create the mix, the Timestamp values will be lost.  In a Publish/
   Subscribe scenarios, if the Subscribers need to know when data has
   been published, they cannot rely on getting that information from
   RTP.

   POTENTIAL STANDARDIZATION WORK: In addition to contributing SSRCs,
   also include contributing Timestamps.

   POTENTIAL STANDARDIZATION WORK: Add data sampling timestamp to RTP

6.8.  Single vs multiple RTP sessions

   An RTP Stream is identified by the data source (SSRC).

   Within a PubSub Conference, the number of publishers might be very
   large.  In addition, the number of publsihers might vary quite
   frequently, as publishers join and leave the PubSub Conference.  For
   that reason, it is not convenient for a subscriber to negotiate a
   separate RTP session for each publisher with the conference server.
   In addtion, one of the ideas behind the Publish/Subscribe pattern is
   that a Subscriber does not need to know, or be impacted, based on the
   number of Publsihers.

   In the case of real-time text, receivers need to be able to identify
   the sender of each RTP packet, so that the text from all senders is
   not mixed togheter.

   In some PubSub Conferences, Subscribers might not need to know which
   Publisher has published a specific set of data.

   However, the SSRC

6.9.  Sender Timestamp

   RTP does not provide a mechanism to indicate the time when a packet
   is sent, or when the RTP payload was sampled.

   The following experimental RTP header extension to include the
   absolute packet send time in an RTP packet:

   https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-
   code/rtp-hdrext/abs-send-time

   https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-
   code/rtp-hdrext/abs-capture-time/

   NOTE: In addition to the RTP SSRC value, the data format used in the
   RTP packet payload might have a source indicator that tells
   Subscrbiers

6.10.  Data Mixing

   The way a conference server depends on the media type.  In an AV
   conference, the audio ... For example, all incoming audio data is
   typically mixed together, so that everyone can hear everyone else.
   In case of video, if the con the conference server typcially forwards
   video stream of the participants currently spekakinh.

   In the case of non-AV data, the default behavior within a PubSub
   Conference is to forward the data from a PubSub Publisher to each
   PubSub Subscriber, without performing any mixing or selection of what
   data is forwarded.

        .-----------.          .----------.             .------------.
        |           |  data X  |          |             |            |
        | Publisher +--------->+          |    data X   | Subscriber |
        |           |          |          +------------>|            |
        '-----------'          |          |             '------------'
                               |  Broker  |                ...
                               |          |                ...
                               |          |             .------------.
                               |          |             |            |
                               |          |    data X   | Subscriber |
                               |          +------------>|            |
                               '----------'             '------------'

                Figure 5: PubSub Conference Data Forwarding

   As an optimization, if the Broker receives data from multiple
   Publishers, it may forward data from multiple Publishers in a single
   RTP payload towards the Subscribers.

   For example, if the Broker receives data in a SenML Record from
   Publisher A and Publisher B at the same time it might choose to place
   and forward the SenML Records in a SenML Pack.

       SenML Record from Publisher A:

          [
            {"n":"urn:dev:ow:10e2073a01080063","u":"Cel","v":23.1}
          ]

       SenML Record from Publisher B:

          [
            {"n":"urn:dev:ow:F0ea673a01000036","u":"Cel","v":25.7}
          ]


       SenML Pack forwarded by the Broker towards the Subscribers:

          [
            {"n":"urn:dev:ow:10e2073a01080063","u":"Cel","v":23.1},
            {"n":"urn:dev:ow:F0ea673a01000036","u":"Cel","v":25.7}
          ]

                   Figure 6: SenML Pack created by broker

7.  RTP Considerations

7.1.  RTP Payload Type (PT) for PubSub

7.2.  RTP Header Marker Bit

   This document does not define usage of the RTP header Marker bit.

7.3.  RTP Extensions for PubSub

   A large number of RTP extensions have been specified for RTP.  Many
   of the extensions have been specified with an AudioVisual use-case in
   mind.  However, many of them can also be applied when RTP is used to
   transport non-AudioVisual data.  While an extensive study of the
   usage of RTP extensions for transport of non-AudioVisual data is
   outside the scope of this document, this chapter describes a few
   extensions that have been studied in the work that lead to this
   document.

7.3.1.  Simulcast and Resolution

   While it is quite clear what "resolution" means for audio and video,
   there is no unique definition for non-AV data.

   For example, resolution can refer to the number of digits are
   included when the payload contains numeric values.

   For example, resolution can refer to how often the publisher is
   sending data.  The more freuquent, the higher the resolution.

   For example, resolution can refer to the amount of data (e.g.,
   netadata) that a publisher in sending.

   A publisher can publish the same data using different "resolutions".

7.3.2.  Retransmission

   The RTP header carries both a sequence number and a timestamp to
   allow a receiver to distinguish between lost packets and periods of
   time when no data was transmitted.

   By default RTP provides unreliable transport.  The same applies to
   different PubSub frameworks that use UDP as transport protocol.
   However, some PubSub frameworks use TCP transport, or use other
   mechanisms in order to provide reliable data delivery.  There are RTP
   extension that have been used to provide reliable data delivery, or
   to simply inform the sender that data has been lost.

   XXX specifies an RTP extension where RTP packets are re-transmitted
   by default.

7.3.2.1.  Redundant Audio Data

   [RFC2198] defines an RTP payload format for encoding redundant audio
   data.  If a data packet is lost, it might be possible to reconstruct
   the information of the lost packet from the redundant data that is
   included in the subsequent packets.  The mechanism can also be used
   for non-AV data.  However, in case of time-critical data, where the
   lost data would be considered "expired" it it would arrive as
   redundant data in a subsequent packet, the mechanism might not be
   useful (unless for logging purpose etc).

7.3.2.2.  Forward Error Detection (FEC)

   [RFC2733]

   Multiple data packets are used to create a single FEC packet.  The
   FEC payload contains information which data packets have been used to
   create the FEC packet.

8.  RTCP Considerations

8.1.  RTCP FB

   The RTCP Feedback (FB) [RFC4585]

9.  Standardization Considerations

   This document does not formally standardize any new protocol
   extensions, SIP event packages etc.  Each extension, event package
   etc described in the document would need to be standardized following
   the normal standardization procedures.  The protocol extensions and
   event packages described in this document are collated and listed
   below.

   SIP Event Package for Conference State Section XXX Extension to the
   event package for providing separate information about publishers and
   subscribers.

   SIP Event Package for PublishSubscribe Section XXX New SIP event
   package for providing information about PubSub Conferences
   (conference URI and topic) hosted by a Conference Server.

10.  Security Considerations

   The Security Considerations for SIP Conferencing apply to this
   document.

11.  IANA Considerations

   This document has no IANA actions.

12.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
              Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse-
              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
              DOI 10.17487/RFC2198, September 1997,
              <https://www.rfc-editor.org/rfc/rfc2198>.

   [RFC2733]  Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format
              for Generic Forward Error Correction", RFC 2733,
              DOI 10.17487/RFC2733, December 1999,
              <https://www.rfc-editor.org/rfc/rfc2733>.

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              DOI 10.17487/RFC3261, June 2002,
              <https://www.rfc-editor.org/rfc/rfc3261>.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              DOI 10.17487/RFC3264, June 2002,
              <https://www.rfc-editor.org/rfc/rfc3264>.

   [RFC3265]  Roach, A. B., "Session Initiation Protocol (SIP)-Specific
              Event Notification", RFC 3265, DOI 10.17487/RFC3265, June
              2002, <https://www.rfc-editor.org/rfc/rfc3265>.

   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
              Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389,
              September 2002, <https://www.rfc-editor.org/rfc/rfc3389>.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
              July 2003, <https://www.rfc-editor.org/rfc/rfc3550>.

   [RFC4103]  Hellstrom, G. and P. Jones, "RTP Payload for Text
              Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
              <https://www.rfc-editor.org/rfc/rfc4103>.

   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
              Session Initiation Protocol (SIP)", RFC 4353,
              DOI 10.17487/RFC4353, February 2006,
              <https://www.rfc-editor.org/rfc/rfc4353>.

   [RFC4575]  Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
              Session Initiation Protocol (SIP) Event Package for
              Conference State", RFC 4575, DOI 10.17487/RFC4575, August
              2006, <https://www.rfc-editor.org/rfc/rfc4575>.

   [RFC4579]  Johnston, A. and O. Levin, "Session Initiation Protocol
              (SIP) Call Control - Conferencing for User Agents",
              BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006,
              <https://www.rfc-editor.org/rfc/rfc4579>.

   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
              "Extended RTP Profile for Real-time Transport Control
              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
              DOI 10.17487/RFC4585, July 2006,
              <https://www.rfc-editor.org/rfc/rfc4585>.

   [RFC6263]  Marjou, X. and A. Sollaud, "Application Mechanism for
              Keeping Alive the NAT Mappings Associated with RTP / RTP
              Control Protocol (RTCP) Flows", RFC 6263,
              DOI 10.17487/RFC6263, June 2011,
              <https://www.rfc-editor.org/rfc/rfc6263>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

   [RFC8428]  Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C.
              Bormann, "Sensor Measurement Lists (SenML)", RFC 8428,
              DOI 10.17487/RFC8428, August 2018,
              <https://www.rfc-editor.org/rfc/rfc8428>.

   [RFC9071]  Hellström, G., "RTP-Mixer Formatting of Multiparty Real-
              Time Text", RFC 9071, DOI 10.17487/RFC9071, July 2021,
              <https://www.rfc-editor.org/rfc/rfc9071>.

Acknowledgments

   This document is based on the Master Thesis of Trung Van.

Author's Address

   Christer Holmberg
   Ericsson
   Email: christer.holmberg@ericsson.com