protocol.pod

   1 =head1 NAME
   2
   3 DXSpiderWeb Orthogonal Communications Protocol
   4
   5 =head1 SYNOPSIS
   6
   7  <Origin>,<TimeSeq>,<Hop>,<FrmUser>,<To>,<ToUser>|<Tag>,<Data>...
   8
   9 =head1 ABSTRACT
  10
  11 For many years DX Clusters have used a protocol which was designed
  12 for a non-looped tree of nodes. This has probably never, reliably,
  13 been achieved in practice; certainly not recently. This document
  14 describes a complete replacement for that protocol. It allows a
  15 fully looped network, is inherently extensible and should be simple
  16 to implement (especially in perl).
  17
  18 All implementations of this protocol shall B<only> use this protocol
  19 for inter-node communications.
  20
  21 =head1 DESCRIPTION
  22
  23 This protocol is encoded in UTF8 with HTTP style escaping. It is
  24 designed to be an extensible basis for any type of one to many
  25 "instant" line-based communications tasks.
  26
  27 This protocol is designed to be flood routed in a meshed network in
  28 as efficient a manner as possible.
  29
  30 The protocol consists of a L<Routing Section> and a L<Command Section>.
  31 The two sections are separated with the '|' character.
  32
  33 Most of this document is concerned with the L<Routing Section>, however
  34 some L<Standard Commands> which all implementation should issue and
  35 must accept are described.
  36
  37 =head2 Routing Section
  38
  39 The application that implements this protocol is essentially a line
  40 oriented message router. One line equals one message. Each line is
  41 effectively a datagram.
  42
  43 It is assumed that nodes are connected to
  44 each other using a "reliable" streaming protocol such as TCP/IP or
  45 AX25. Having said that: in context, elements of the protocol could be
  46 multi or broadcast, either "as is" or wrapped in some other framing
  47 protocol.
  48
  49 Because this is an unreliable, best effort, "please route my packets
  50 through your node" protocol, there is no guarantee that a message
  51 will get to the other side of a mesh of nodes. There may be a
  52 discontinuity either caused by outage or deliberate filtering.
  53
  54 However, as it is envisaged that most messages will be flood routed or,
  55 in the case of directed messages (those that have a E<lt>tonodeE<gt> or
  56 E<lt>touserE<gt>) down all interfaces showing a route for that
  57 direction, it is unlikely that messages will be lost in practice.
  58
  59 =head3 Field Description
  60
  61 Only the first three fields in the L<routing section> are compulsory
  62 and indicate that this is a broadcast to be sent to all nodes coming
  63 from the L<Origin>. If the message needs to be identified as coming
  64 from a user on a node, then the L<FrmUser> field is added.
  65
  66 Adding a L<To> and/or L<ToUser> field will restrict the destinations
  67 or recipients that receive this message.
  68
  69 The L<Hop> field is incremented on receipt of a message on a node.
  70
  71 Fields are separated by the comma ',' character with the last field
  72 required followed by the vertical bar '|' character.
  73
  74 If trailing fields are missed out then superfluous commas can also
  75 be left out. If intervening fields are missing then no space needs
  76 to be left for the separating comma.
  77
  78 The characters allowed in the routing section are restricted. Any
  79 invalid characters in any field will cause the whole message to be
  80 silently dropped.
  81
  82 More detailed descriptions of the fields follow:
  83
  84 =over
  85
  86 =item Origin
  87
  88 This is a compulsory field. It is the name of the originating node.
  89 The field can contain up to 12 characters in the set [-A-Z0-9_] in
  90 any order. Higher layers may restrict this further.
  91
  92 The field must not be changed by any other node.
  93
  94 =item TimeSeq
  95
  96 This is a compulsory field. It is a 10 hexadecimal digit string which
  97 consists of a day no (1-31), seconds within that day (0-86399) [6
  98 hex digits] that are concatenated with a sequence number (0-65535)
  99 [4 hex digits] making the total of 10.
 100
 101 The date portion is constructed as:
 102
 103   my $date = ((gmtime)[3] << 18) | (time % 86400);
 104
 105 The sequence number is simply an unsigned short (or 16 bit) number
 106 starting at 0.
 107
 108 Each message originated at this node will increment the sequence
 109 number.
 110
 111 =item Hop
 112
 113 This is a compulsory field. It is the number of hops from the
 114 originating node. It is incremented immediately on receipt and
 115 before determining its value.
 116
 117 So the originating node sends a message with a L<Hop> of 0, the
 118 neighbouring nodes must increment this field before passing
 119 it on to higher layers for onward processing.
 120
 121 Implementations may have an upper limit to this field and may
 122 silently drop incoming messages with a L<Hop> count greater than the
 123 limit.
 124
 125 =item FrmUser
 126
 127 This field is optional. It is the identifier of the originating
 128 user.  If it is missing then the message is
 129 assumed to come from the originating node itself.
 130
 131 It can consist of up to 12 characters in the set [-A-Z0-9_]
 132 in any order. Higher layers may restrict this further.
 133
 134 =item To
 135
 136 This field is optional. It is a string of up to 12 characters
 137 in the set [-A-Z0-9_] in any order.
 138
 139 This field is used either to indicate particular node destination
 140 or to differentiate this broadcast in some way by making this
 141 message as a member of a L<Channel>. Any message can be sent
 142 down any L<Channel>. The names of L<Channel>s and their usage
 143 is entirely up to the implementor.
 144
 145 It is assumed that node names can be differentiated from user
 146 names and L<Channel> names.
 147
 148 If the field is set to a particular node destination, it will
 149 be routed (rather than broadcast) to that node. However, any
 150 intervening nodes are free to duplicate the message and send
 151 it down more than one, likely looking, interface - depending on any
 152 network policies that may pertain.
 153
 154 =item ToUser
 155
 156 This field is optional. It is a string of up to 12 characters
 157 in the set [-A-Z0-9_] in any order. Higher layers may restrict
 158 this further.
 159
 160 Conventionally this field is used to indicate the user to whom
 161 this message is directed. In an ideal world the L<To> field
 162 will be set, by the originating node, to the identifier of the node
 163 on which this user resides.
 164
 165 If the L<To> field is not set then this message will be
 166 broadcast. However, should a node become apparent (on route)
 167 then nodes are free to fill in the L<To> field and proceed
 168 with a more directed approach.
 169
 170 If it becomes apparent (on route) that there may be more than
 171 one possible L<To> destination for a L<ToUser> then a node
 172 may duplicate the message (keeping the same L<TimeSeq>) and
 173 route it onwards. Because of the L<deduplication> inherent in
 174 the system, it is indeterminate as to which destination will
 175 receive the message. It is possible for all or just some
 176 destinations to receive the message. The tuple (L<Origin>,
 177 L<TimeSeq>) will determine uniqueness.
 178
 179 This field can, in the case where L<To>
 180 is set to the name of a node, be set to a L<Channel>. If this
 181 is the case then this will cause this message to be sent to
 182 a L<Channel> on the L<To> node only.
 183
 184 =back
 185
 186 =head3 Channel
 187
 188 Channels are a concept very similar to that on IRC. It is a
 189 way of segregating data flows in a network. In principle, subject
 190 to local policy or application requirements, any data (or
 191 L<Command Section>) can be sent down any channel.
 192
 193 It is up to the implementation whether to use this feature or not.
 194
 195 =head3 Routing
 196
 197 It is assumed that nodes will be connected in a looped network with
 198 more than one route available (in many cases) to another node.
 199
 200 In anycase, most traffic is not directed, but broadcast to all users
 201 on all nodes.
 202
 203 Each message is uniquely identified by the (L<Origin>,L<TimeSeq>)
 204 tuple. The basic system will learn which interfaces can see what nodes
 205 by looking at the tuple and merging that with the L<Hop> count.
 206 Each interface remembers the latest L<TimeSeq> with the lowest L<Hop>
 207 for each L<Origin> that arrives on that interface. It also remembers
 208 the number of messages for that L<Origin> that has been received on
 209 that interface.
 210
 211 Any message for onward broadcast is duplicated and sent out on all
 212 interfaces that it did not come in on.
 213
 214 Any message that is directed to a particular node will be sent out on
 215 the "best" interface based on routing information gathered so far. If there
 216 is more than one possible route then, depending on network or local
 217 policy, the message may be duplicated and sent on other interfaces
 218 as well.
 219
 220 =head3 DeDuplication
 221
 222 On receipt of a message, its unique tuple (L<Origin>,L<TimeSeq>) is
 223 checked against a hash table. If it exists: the message is silently
 224 dropped. If it does not exist in the hash table then the tuple is
 225 added.
 226
 227 The hash table is periodically cleaned, removing tuples that
 228 have expired. The length of time a tuple remains in the hash table
 229 is implementation dependant but could easily be several days, if
 230 required.
 231
 232 This mechanism only ensures that a message broadcast around the network
 233 travels the least distance and through the fewest nodes possible. It
 234 is up to higher layers to make sure that data carried is not, itself,
 235 duplicated!
 236
 237 =head2 Command Section
 238
 239 The Command Section of the message contains the actual data being
 240 passed. It is called the Command Section because all commands
 241 are identified with a L<Command Tag> which is implemented by
 242 the software using this protocol.
 243
 244 =head3 Command Tag
 245
 246 The Command Tag consists of string of uppercase letters and digits, starting
 247 with a leading, uppercase, letter. Tags should be as short as is meaningful.
 248
 249 Valid tags would be:
 250
 251  DX
 252  PC23
 253  ANN
 254
 255 Invalid tags include:
 256
 257  1AAA
 258  dx
 259  Ann
 260
 261 There are a number of standard commands which must be accepted by
 262 all implementations.
 263
 264 =head1 AUTHOR
 265
 266 Dirk Koopman, G1TLH, E<lt>djk@tobit.co.ukE<gt>
 267
 268 =head1 COPYRIGHT AND LICENSE
 269
 270 Copyright 2004 by Dirk Koopman, G1TLH
 271
 272 This library is free software; you can redistribute it and/or modify
 273 it under the same terms as Perl itself.
 274
 275 =cut
 276
 277