Network Working Group C. Holmberg Internet-Draft Ericsson Intended status: Informational 14 February 2024 Expires: 17 August 2024 Session Initiation Protocol (SIP) Conference Server for Publish/ Subscribe draft-holmberg-conference-pubsub-latest Abstract This document describes how a Session Initiation Protocol (SIP) [RFC3261] conference server [RFC4353] can be used to realize a Pub/ Sub broker to distribute non-audiovisual data (e.g., IoT sensor readings). SIP agents are used to realize publishers and subscribers. The main advantage of the solution is the possibility to use existing SIP-based audiovisual conferencing infrastructure and protocols to realize a Pub/Sub solution. About This Document This note is to be removed before publishing as an RFC. The latest revision of this draft can be found at https://cdh4u.github.io/conference-pubsub/draft-holmberg-conference- pubsub.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-holmberg-conference-pubsub/. Source for this draft and an issue tracker can be found at https://github.com/cdh4u/conference-pubsub. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 17 August 2024. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction 2. Conventions and Definitions 3. Generic Conference Considerations 3.1. Conference Creation 3.1.1. Conference Duration 3.1.2. Conference Termination 3.1.3. Sending and Receiving Data 3.1.4. Simultanous Senders 3.1.5. Number of Conference Participants 3.1.6. Data Sending Frequency 3.1.7. Data Synchronization 4. SIP Signalling Considerations 4.1. SIP Subject Header Field 5. SDP Considerations 5.1. SDP Direaction Attribute 6. SIP Event Package Considerations 6.1. SIP Event Package for Conference State 6.1.1. Extensions for PubSub 6.2. SIP Event Package for PublishSubscribe 6.2.1. Example 6.3. RTP Considerations 6.3.1. RTP Timestamp 6.3.2. Keep-alive and Heartbeat 6.3.3. Data Aggregation 6.4. RFC 4103 Consierations 6.4.1. Redundancy 6.4.2. Packet Loss Detection 6.4.3. RFC 9071 Considerations 6.4.4. Idle Period 6.5. Time Considerations 6.6. RTCP Considerations 6.7. RTP Mixing or Translating 6.7.1. Translating 6.7.2. Mixing 6.8. Single vs multiple RTP sessions 6.9. Sender Timestamp 6.10. Data Mixing 7. RTP Considerations 7.1. RTP Payload Type (PT) for PubSub 7.2. RTP Header Marker Bit 7.3. RTP Extensions for PubSub 7.3.1. Simulcast and Resolution 7.3.2. Retransmission 8. RTCP Considerations 8.1. RTCP FB 9. Standardization Considerations 10. Security Considerations 11. IANA Considerations 12. Normative References Acknowledgments Author's Address 1. Introduction Publish/Subscribe (Pub/Sub) is an communication pattern for asynchronous transport of messages between endpoints. Endpoints that create and send messages are referred to as publishers. Endpoints that receive and consume messages are referred to as subscribers. Publishers do not send messages directly to subscribers. Instead, publishers send messages to an intermediary, referred to as broker. A publisher will associate the message with a topic, and the broker will forward the message to each subscriber that has subscribed to the topic. As the messages are sent to, and forwarded by, the broker, publishers and subscribers do not need to be aware of each other. This enables flexible and scalable systems. A topic typcially describes the semantics of the data (e.g., "water- temperature-data") or identifies the publisher (e.g., "water-pump- 123"). The structure and syntax of the topic depends on the Pub/Sub framework. Some Pub/Sub frameworks define tree-structured topics (e.g. "factory/temperature/sensor-123"), and allow topic wildcarding (e.g., "factory/temperature/*"). This document does not define topic syntax or structure. Topics are simply seen as token or string values. Some Pub/Sub frameworks do not use brokers (broker-less Pub/Sub). Instead, the distribution of messages are realized using network features, e.g., IP multicast. Broker-less Pub/Sub is outside the scope of this document. When a publisher publishes data it associates it with a topic. .-----------. .----------. subscribe .------------. | | data | |<------------+ | | Publisher +--------->+ | data | Subscriber | | | | +------------>| | '-----------' | | '------------' ... | Broker | ... ... | | ... .-----------. | | subscribe .------------. | | data | |<------------+ | | Publisher +--------->+ | data | Subscriber | | | | +------------>| | '-----------' '----------' '------------' Figure 1: Publish/Subscribe Architecture This document describes how a Session Initiation Protocol (SIP) [RFC3261] conference server [RFC4353] can be used to realize a Pub/ Sub broker to distribute non-audiovisual data (e.g., IoT sensor readings). SIP agents are used to realize publishers and subscribers. This is referred to Pub/Sub conference. The main advantage of the solution is the possibility to use existing SIP-based audiovisual conferencing infrastructure and protocols to realize a Pub/Sub solution. The real-time transport protocol (RTP) [RFC3550] is used to transport the data. Within this document the RTP payload for T.140 text conversation [RFC4103] is used to transport the data. The examples use Sensor Measurement Lists (SenML) [RFC8428] to encode the data. However, other payloads and encoding mechanisms can also be used. NOTE: This document is based on the generic SIP conferencing procedures defined in [RFC4353] and [RFC4579]. NOTE: While RTP is a generic data transport protocol, the main usage has been for transport of audiovisual data (and real-time text), between human users. However, at the time of writing this document, there is work in IETF on RTP payloads also for transport of non- audivisual data. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. TO BE REMOVED BEGIN Synchronization source (SSRC): The source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback. Examples of synchronization sources include the sender of a stream of packets derived from a signal source such as a microphone or a camera, or an RTP mixer. A synchronization source may change its data format, e.g., audio encoding, over time. The SSRC identifier is a randomly chosen value meant to be globally unique within a particular RTP session. A participant need not use the same SSRC identifier for all the RTP sessions in a multimedia session; the binding of the SSRC identifiers is provided through RTCP. If a participantgenerates multiple streams in one RTP session, for example from separate video cameras, each MUST be identified as a different SSRC. Contributing source (CSRC): A source of a stream of RTP packetsthat has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the SSRC identifiers of the sources that contributed to the generation of a particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer). Mixer: An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet. Since the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments among the streams and generate its own timing for the combined stream. Thus, all data packets originating from a mixer will be identified as having the mixer as their synchronization source. Translator: An intermediate system that forwards RTP packets with their synchronization source identifier intact. Examples of translators include devices that convert encodings without mixing, replicators from multicast to unicast, and application-level filters in firewalls. TO BE REMOVED END This document uses the SIP terminology defined in [RFC4353] and the RTP terminology defined in [RFC3550]. 3. Generic Conference Considerations This section discusses generic (non-protocol specific) differences between audiovisual conferences and Pub/Sub conferences. 3.1. Conference Creation A conference server might host multiple audiovisual conferences that share the same conference name, as the name is not used to uniqually identify the conference. It is not uncommon that multiple audiovisual conferences share the same name, as the conference creators might not be aware of each other. When a conference server is hosting Pub/Sub conferences, it is not practical to host multiple Pub/Sub conferences that share the same topic. Because of that, if a participants tries to create a Pub/Sub conference with a topic for which a Pub/Sub conference already exists, the conference server might choose to either reject the conference creation request, redirect the participant to the existing conference, or simply add the endpoint to the existing conference. 3.1.1. Conference Duration The duration of an audiovisual conference may vary. And, while longlived conferences might exist, the duration is typcially measured in minutes or hours. The interest for a PubSub topic might last for a very long time. Because of that, a PubSub conference associated with the topc might last for days, months or "infinite". While there might be times when there are no PubSub participants within a conference, the conference server might still keep the PubSub conference "alive", as new participants are expected to join in the near future. One advantage of keeping the conference "alive" is that participants can use the same conference URI whenever they are re-joining the conference. 3.1.2. Conference Termination An AV conference is typically terminated once the last participant leaves the conference, when the conference createer leaves the conference, or at a pre-configured clock time. Within a PubSub conference, participants might join the conference only when they want to send or receive data. In between they might leave the conference. Because of this there might be periods when there are no participants wihtin the conference. However, as participants might re-join later, and new participants might join the conference. Therefore, as long as it can be assumed that there is an interest in the Topic associated with the conference, the conference might be kept alive even if there are no participants. 3.1.3. Sending and Receiving Data Within an AV Conference, participants typically both send and receive media. This can also be the case in a PubSub Conference, if the Publish/Subscribe traffic pattern is used to realize bi-directional data exchange between conference participants. However, typically participants within a PubSub Conference will either send (Publish) or receive (Subscribe) data. For example, sensors will typically only publish data, while analyticsetc applications will only subscribe to data. Note that a PubSub Conference participant might publish data to one Topic, while subscribing to another Topic. 3.1.4. Simultanous Senders Within an AV Conference typically only one participant sends speech audio at any given time. Within a PubSub Conference, multiple Publishers might publish data simultaneously, as data is often published as soon as it becomes available. In addition, a Publisher typically has no idea when other Publishers are publishing data. A Publishers might not even be aware of the other Publishers (if any). Because of this, the conference server might receive published data from multiple Publishers simultaneously. Depending on how quickly the conference server can forward the data to the subscribers it might have to buffer the data, which can cause delays. 3.1.5. Number of Conference Participants Within an AV Conference, the number of participants is relatively constant throughout the lifetime of the conference. The number of participants might also be known before the conference begins, e.g., based on the number of participants that have accepted an invitation to the conference. Within a PubSub Conference, the number of participants might vary widely throughout the lifetime of the conference. A Publisher might join a conference only when it is publishing data, then leave the conference and re-join later again. If a Subscriber is only interested in receiving data at specific times, it might also join the conference only for those time. 3.1.6. Data Sending Frequency In an AV Conference, audio and video data is typically sent constantly, eventhough there are ways to temporarily stop the sending of data (e.g., by turning off the camera, muting the microphone etc). In a PubSub Conference, the data publishing frequency can vary widely. In some cases, a Publisher will publish data very frequently (measured in milliseconds). In other cases, a Publisher might publish data more seldom: once a minute, once an hour, once a day, etc. 3.1.7. Data Synchronization Within an AV Conference, if the conference server is mixing the audio from all participants, the conference server needs to be able to ensure that the audio packets that are mixed together have been generated at the same time. The conference server does not necessarily need to know that real clock value, only that the packets have been generated at the same time. Within a PubSub Conference, the conference server will not mix data from different Publishers. In some case, for network optimization purpose, the conference server might forward data from multiple Publishers in a single packet towards the Subscribers. 4. SIP Signalling Considerations The discussions within this Section are based on the procedures and concepts defined in [RFC4353] and [RFC4579]. 4.1. SIP Subject Header Field The SIP Subject header field can be used to indicate the topic associated with the PubSub Conference. When a new new conference is created (using SIP signalling) the conference creator uses the Subject header field to indicate the Topic of the conference. pubsub-conferences-info | |-- pubsub-conferences |-- pubsub-conference | |-- conference-URI | |-- topic |-- pubsub-conference . |-- conference-URI . |-- topic . Figure 2: SIP Subject header field for Topic 5. SDP Considerations 5.1. SDP Direaction Attribute These attributes can be used to indicate the PubSub role of a participant. A participant can use SDP 'sendonly' attribute to indicate that it is acting as a Publisher. A participant can use SDP 'recvonly' attribute to indicate that it is acting as a Subscriber. A participant can use SDP 'sendrecv' attribute to indicate that it is acting as both a Publisher and a Subscriber. A participant can use SDP 'inactive' attribute to indicate that it is not acting as a Publisher nor a Subscriber. A participant can use the attribute e.g., to temporary indicate to the conference server that it does not want to receive data associated with the conference, but also to indicate that it will not send data associated with the conference. 6. SIP Event Package Considerations This section describes extensions for the SIP Event Package for Conference State [RFC4575] that are useful for a PubSub Conference. In addition, this section describes a new SIP Event Package, SIP Event Package for PublishSubscribe, that can be used to inform participants about the PubSub Conferences hosted by a conference server, e.g., the Topic and conference-uri associated with each conference. 6.1. SIP Event Package for Conference State [RFC4575] defines a SIP Event Package [RFC3265] for Conference State. A conference participant can subscribe to the event package, and retrieve conferance state information, including information about the conference itsel, and information about other conference participants. For a PubSub conference, the "subject" child element of the "conference-description" element can be used to indicate the Topic associated with the conference. 6.1.1. Extensions for PubSub 6.1.1.1. Maximum Number of Conference Participants The "maximum-user-count" element is used to indicate the maximum number of conference participants. For an AudioVisual conference, where participants typcially both send and receive media a single element will be enough. However, in a PubSub conference, a majority of the conference participants might be either subscribers or publishers. There might be a large variation in how many publishers and how mnay subscribers a conference server is able to handle. Therefore it could be useful to have separate elements to indicate that, e.g., "maximum-user-count-publisher" and "maximum-user-count- subscriber". 6.2. SIP Event Package for PublishSubscribe While the SIP Event Package for Conference State provides information and state information for a given conference, it does not provide information about other conferences that are hosted by the conference server. This section suggests a new SIP Event Package, SIP Event Package for PublishSubscribe. The event package is not assoiciated with a specific PubSub conference, but provides information about the PubSub conferences hosted by the conference server. For each PubSub conference hosted by conference server, the event package contains the conference URI and the associatd topic. It might also contain additional information about each PubSub conference. In addition, if the topic is associated with some namespace or dictionary, there might be information about that. pubsub-conferences-info | |-- pubsub-conferences |-- pubsub-conference | |-- conference-URI | |-- topic |-- pubsub-conference . |-- conference-URI . |-- topic . Figure 3: SIP Event Package for PublishSubscribe 6.2.1. Example water temperature air temperature NOTE: The conference-uri is defined as an pubsub-conference element attribute. As an option, it could be defined as a separate element. NOTE: As an option, the topic could also be defined as an pubsub-conference element attribute. Figure 4: Example: SIP Event Package for PublishSubscribe 6.3. RTP Considerations 6.3.1. RTP Timestamp The RTP Timestamp, together with the RTCP SR (Sending Report) can be used by Conference servers and participants to synchronize audio and video received from multiple participants. Note that the RTCP SR messages might be terminated by the Conference server. Google RTP extension 6.3.1.1. Payload In this case, the sampling timestamp is carried in the payload data, instead of the RTP/RTCP packets. The disadvantage of this mechanim is that one needs to ensure that the data payload format always supports the transport of the sampling timestamp. 6.3.2. Keep-alive and Heartbeat In a non-AV PubSub Conference, Publishers might stay within a PubSub conference for a long periods without publishing any data. Subscribers will not publish any data at all. There are a couple of possible implications that must be considered: NAT binding keep- alives and heartbeats (i.e., inform other PubSub Participants that a PubSub Participant is still 'alive'). NAT binding keep-alives are needed in cases where a PubSub Participants needs to send periodic data in order to maintain NAT bindings between itself and the PubSub Conference servers. This is important for Subscribers, but also for Publishers that publish data with very long intervals. [RFC6263] describes different RTP/RTCP mechanisms to send NAT keep- alives. The 'Empty (0-Byte) Transport Packet' mechanism does not use RTP/ RTCP. Instead, a participant will send an empty transport packet (e.g., UDP packet). Note that this mechanism is useful mainly for NAT traversal purpose. The conference server application will typcially not be informed about these packets, and will not forward the packets to other conference particiapants. This mechanism is applicable to non-AV data in PubSub Conferences. The 'RTP Packet with Comfort Noise Payload' mechanism uses a specific RTP payload format for comfort noise [RFC3389]. The payload format has been defined for audio data, and must be supported by both senders and receivers. Is not applicable for non-AV data in PubSub conferences. The 'RTCP Packets Multiplexed with RTP Packets' mechanism uses RTP/ RTCP multiplexing, where the same 5-typle is used for the RTP and RTCP packets. When there is no data to be sent, an RTCP packet can be sent as a keep-alive. Note that Subscribers that do not send RTP packets can still send RTCP packets. The 'RTP Packet with Incorrect Version Number' and 'RTP Packet with Unknown Payload Type' mechanisms uses invalid with an unvalid RTP version number respectively a RTP packet with a non-negotiated payload type. Receivers are expected to ignore and discard these types of RTP packets. This mechanism is applicable to non-AV data in PubSub Conferences. In an AV conference, most participants will typically both send and receive data. However, in a PubSub Conference many of the PubSub Participants will only send data (Publihser) or receive data (Subscriber). However, eventhough a Subsriber does not publish any data, it might still use the mechanisms above if needed, and a PubSub Conference Server needs to be prepared to receive such RTP/RTCP packets from both Publishers and Subscribers. NOTE: One of the ideas behind the Publish/Subsribe traffic pattern is that Publishers and Subscribers do not need to be aware of each other. In such cases there is no need to use an end-to-end heartbeat mechanism between Publishers and Subscribers. Note that there might still be a heartbeat mechanism used between Publishers/Subsribers and the Broker. However, there might be cases where Publishers and Subscribers are tightly coupled, and where a heartbeat mechanism is required. NOTE: Some data formats might define their own methods for sending heartbeats. For example, there might a way to indicate that the payload is only used for heartbeat purpose, and does not contain any additional data. 6.3.3. Data Aggregation NOTE: SenML supports aggregation. Mutli 6.4. RFC 4103 Consierations https://datatracker.ietf.org/doc/html/rfc4103 [RFC4103] defines an RTP payload format ('text/t140') to transport T.140 real-time text. Within this document, it is assumed that the t140 payload format is used in the RTP packets to transport the published data. The reason for this is the possibility to re-use existing conference servers that support the payload format to realize the Broker. 6.4.1. Redundancy https://datatracker.ietf.org/doc/html/rfc2198 By default, when no other redundancy mechanism is supported, the usage of the redundancy mechanism defined in [RFC2198] is required, unless the network conditions can guarantee that all text will always be delivered from the sender to the receiver. Additional mechanisms, e.g., Forward Error Correction (FEC) [RFC2733] might also be used. In a Publish/Subscribe network, the data delivery requirements will determine whether redundancy or error correction mechanism are needed. In some cases, data loss might be toleraded, while in other cases there might be requirements that all published data reaches each Subscriber that has subscribed to the Topic of the data. 6.4.2. Packet Loss Detection As t140 packets are only sent when there is new text to send. Because of that, as the time between sent packets may vary, the Timestamp cannot be used by a receiver to detect packet loss. Instead, the Sequence Number (SN) is used to detect packet loss. Note that, if there is no new text to send, an RTP packet that only contains redundant data might be sent. This can be useful to e.g., maintain NAT bindings, and as a generic heartbeat mechanism to indicate that the sender is still alive. 6.4.3. RFC 9071 Considerations [RFC9071] provides guidance on mixing of real-time text (RTT) by a conference server. This sections discusses considerations to take into account when T.140 is used to transport Publish/Subscribe data. Note that some of the considerations have also been addressed from a generic conference perspective. 6.4.3.1. Simultanous Senders As described in Section 3.1 of [RFC9071], in RTT conferences typically only one participant writes and sends text (at least long pieces of text) at any given time. Within a Publish/Subscribe network, multiple Publishers might publish data simultaneously, as data might be published as soon as it has been sampled, and as Publishers are now aware of when other Publsihers are publishing data. Because of this, the conference server might receive published data from multiple Publishers simultaneously. If the conference server is not able to simultaneously forward all published data, it will have to buffer the data. 6.4.4. Idle Period [RFC9071] gives guidance on how to process real-time text in a conference server. 6.4.4.1. Mode Section 1.2 of [RFC9071] describes different "modes". 6.4.4.2. Sending Frequency Within a Publish/Subscribe network, the data publishing interval can vary widely, depending on the use-case. In some constrained environements, a Publisher may publish data e.g., once a day, while in an industrial environment a Publisher may publish data all the time, with a very short publishing interval. Received Real-time text is often read by humans. Because of that, it is important that text that was sent simultaneously by different senders is also received at the simultanelusly by the receivers. RTP-mixer-based method for multiparty-aware endpoints: [RFC9071] makes assumptions regarding how participant are sending text. It assumes that in a typcial scenario only one participant will send text at any given time. In a PubSub Conference, the same assumption cannot be done, as multiple publisher might simulateneously publish data to the same topic. In addition, unless a publisher also acts as a subscriber, it does not even know when and if other publishers are publishing data. [RFC9071] focuses on two mixing solutions: 'The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints' (Section 2.2) and 'Mixing for Multiparty-Unaware Endpoints (Section 2.3)'. In the 'The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints' solution, the receivers will receive the text from all senders within a single RTP stream from the conference server. 6.5. Time Considerations In a Publish/Subscribe network, as publishers publish data independently from each other, there is typically no need for subscribers to syncrhonize or "lip-synch" the data. 6.6. RTCP Considerations The Real-time Control Protocol (RTCP) is used in conjunction with RTP. While RTP carries the actual data, RTCP carries information (e.g., statistics, control information), within an RTP session. NOTE: As RTCP messages sent by a Publisher might be terminated by a conference server (performing data mixing), essential information might not reach the Subscribers. For example, an RTPC Sender Report (SR) that provides mapping between the absolute time and the RTP Timestamp might not reach the Subscribers. Note that if is a large number of participants within a PubSub Conference, and if there is a need to send RTCP messages frequently, the RTCP messages might consume a large portion of network- and conference server capacity. 6.7. RTP Mixing or Translating In an AV conference, the actual time when the AV data was created is typcially not that important to the receiver. It is more important that the conference server is able to mix and lip-synch AV data that has been created at the same time by the senders. 6.7.1. Translating By default, when a Broker receives published data to a Topic, it forwards the data to each Subscriber that has subscribed to the data, without any processing of the data. Aggregation: instead of forwarding each published data directly to a Subscriber, the PubSub Conference server might choose to store the published data in an aggregated manner, and forward all stored data in a single RTP packet towards the Subscriber based on different policies, e.g., depen NOTE: The mechansims used by a PubSub Conference server to determine the constraints (network bandwidth etc) of a Subscriber are outside the scope of this document. In case of translation, the original SSRC and the Timestamp will not be replaced by the translator. 6.7.2. Mixing As a mixer is consdiered a source by itself, it will often terminate received RTCP packets. Because of this, subscribers might not receive the RTCP SR packets that contain the mapping between the RTP Timestamp time and the real clock. In an RTP packet, the Timestamp value typcially indicates when the payload data has been sampled. The exact details depends on the media type and payload format. In case of mixing, the conference server will insert its own SSRC and Timestamp in the outgoing RTP packets with the mixed media. While the CSRC field can provide the SSRCs of the RTP packets used to create the mix, the Timestamp values will be lost. In a Publish/ Subscribe scenarios, if the Subscribers need to know when data has been published, they cannot rely on getting that information from RTP. POTENTIAL STANDARDIZATION WORK: In addition to contributing SSRCs, also include contributing Timestamps. POTENTIAL STANDARDIZATION WORK: Add data sampling timestamp to RTP 6.8. Single vs multiple RTP sessions An RTP Stream is identified by the data source (SSRC). Within a PubSub Conference, the number of publishers might be very large. In addition, the number of publsihers might vary quite frequently, as publishers join and leave the PubSub Conference. For that reason, it is not convenient for a subscriber to negotiate a separate RTP session for each publisher with the conference server. In addtion, one of the ideas behind the Publish/Subscribe pattern is that a Subscriber does not need to know, or be impacted, based on the number of Publsihers. In the case of real-time text, receivers need to be able to identify the sender of each RTP packet, so that the text from all senders is not mixed togheter. In some PubSub Conferences, Subscribers might not need to know which Publisher has published a specific set of data. However, the SSRC 6.9. Sender Timestamp RTP does not provide a mechanism to indicate the time when a packet is sent, or when the RTP payload was sampled. The following experimental RTP header extension to include the absolute packet send time in an RTP packet: https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native- code/rtp-hdrext/abs-send-time https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native- code/rtp-hdrext/abs-capture-time/ NOTE: In addition to the RTP SSRC value, the data format used in the RTP packet payload might have a source indicator that tells Subscrbiers 6.10. Data Mixing The way a conference server depends on the media type. In an AV conference, the audio ... For example, all incoming audio data is typically mixed together, so that everyone can hear everyone else. In case of video, if the con the conference server typcially forwards video stream of the participants currently spekakinh. In the case of non-AV data, the default behavior within a PubSub Conference is to forward the data from a PubSub Publisher to each PubSub Subscriber, without performing any mixing or selection of what data is forwarded. .-----------. .----------. .------------. | | data X | | | | | Publisher +--------->+ | data X | Subscriber | | | | +------------>| | '-----------' | | '------------' | Broker | ... | | ... | | .------------. | | | | | | data X | Subscriber | | +------------>| | '----------' '------------' Figure 5: PubSub Conference Data Forwarding As an optimization, if the Broker receives data from multiple Publishers, it may forward data from multiple Publishers in a single RTP payload towards the Subscribers. For example, if the Broker receives data in a SenML Record from Publisher A and Publisher B at the same time it might choose to place and forward the SenML Records in a SenML Pack. SenML Record from Publisher A: [ {"n":"urn:dev:ow:10e2073a01080063","u":"Cel","v":23.1} ] SenML Record from Publisher B: [ {"n":"urn:dev:ow:F0ea673a01000036","u":"Cel","v":25.7} ] SenML Pack forwarded by the Broker towards the Subscribers: [ {"n":"urn:dev:ow:10e2073a01080063","u":"Cel","v":23.1}, {"n":"urn:dev:ow:F0ea673a01000036","u":"Cel","v":25.7} ] Figure 6: SenML Pack created by broker 7. RTP Considerations 7.1. RTP Payload Type (PT) for PubSub 7.2. RTP Header Marker Bit This document does not define usage of the RTP header Marker bit. 7.3. RTP Extensions for PubSub A large number of RTP extensions have been specified for RTP. Many of the extensions have been specified with an AudioVisual use-case in mind. However, many of them can also be applied when RTP is used to transport non-AudioVisual data. While an extensive study of the usage of RTP extensions for transport of non-AudioVisual data is outside the scope of this document, this chapter describes a few extensions that have been studied in the work that lead to this document. 7.3.1. Simulcast and Resolution While it is quite clear what "resolution" means for audio and video, there is no unique definition for non-AV data. For example, resolution can refer to the number of digits are included when the payload contains numeric values. For example, resolution can refer to how often the publisher is sending data. The more freuquent, the higher the resolution. For example, resolution can refer to the amount of data (e.g., netadata) that a publisher in sending. A publisher can publish the same data using different "resolutions". 7.3.2. Retransmission The RTP header carries both a sequence number and a timestamp to allow a receiver to distinguish between lost packets and periods of time when no data was transmitted. By default RTP provides unreliable transport. The same applies to different PubSub frameworks that use UDP as transport protocol. However, some PubSub frameworks use TCP transport, or use other mechanisms in order to provide reliable data delivery. There are RTP extension that have been used to provide reliable data delivery, or to simply inform the sender that data has been lost. XXX specifies an RTP extension where RTP packets are re-transmitted by default. 7.3.2.1. Redundant Audio Data [RFC2198] defines an RTP payload format for encoding redundant audio data. If a data packet is lost, it might be possible to reconstruct the information of the lost packet from the redundant data that is included in the subsequent packets. The mechanism can also be used for non-AV data. However, in case of time-critical data, where the lost data would be considered "expired" it it would arrive as redundant data in a subsequent packet, the mechanism might not be useful (unless for logging purpose etc). 7.3.2.2. Forward Error Detection (FEC) [RFC2733] Multiple data packets are used to create a single FEC packet. The FEC payload contains information which data packets have been used to create the FEC packet. 8. RTCP Considerations 8.1. RTCP FB The RTCP Feedback (FB) [RFC4585] 9. Standardization Considerations This document does not formally standardize any new protocol extensions, SIP event packages etc. Each extension, event package etc described in the document would need to be standardized following the normal standardization procedures. The protocol extensions and event packages described in this document are collated and listed below. SIP Event Package for Conference State Section XXX Extension to the event package for providing separate information about publishers and subscribers. SIP Event Package for PublishSubscribe Section XXX New SIP event package for providing information about PubSub Conferences (conference URI and topic) hosted by a Conference Server. 10. Security Considerations The Security Considerations for SIP Conferencing apply to this document. 11. IANA Considerations This document has no IANA actions. 12. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, DOI 10.17487/RFC2198, September 1997, . [RFC2733] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, DOI 10.17487/RFC2733, December 1999, . [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, DOI 10.17487/RFC3261, June 2002, . [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, DOI 10.17487/RFC3264, June 2002, . [RFC3265] Roach, A. B., "Session Initiation Protocol (SIP)-Specific Event Notification", RFC 3265, DOI 10.17487/RFC3265, June 2002, . [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, September 2002, . [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, . [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, . [RFC4353] Rosenberg, J., "A Framework for Conferencing with the Session Initiation Protocol (SIP)", RFC 4353, DOI 10.17487/RFC4353, February 2006, . [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A Session Initiation Protocol (SIP) Event Package for Conference State", RFC 4575, DOI 10.17487/RFC4575, August 2006, . [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol (SIP) Call Control - Conferencing for User Agents", BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, . [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, DOI 10.17487/RFC4585, July 2006, . [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for Keeping Alive the NAT Mappings Associated with RTP / RTP Control Protocol (RTCP) Flows", RFC 6263, DOI 10.17487/RFC6263, June 2011, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8428] Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C. Bormann, "Sensor Measurement Lists (SenML)", RFC 8428, DOI 10.17487/RFC8428, August 2018, . [RFC9071] Hellström, G., "RTP-Mixer Formatting of Multiparty Real- Time Text", RFC 9071, DOI 10.17487/RFC9071, July 2021, . Acknowledgments This document is based on the Master Thesis of Trung Van. Author's Address Christer Holmberg Ericsson Email: christer.holmberg@ericsson.com