doc/protocol/protocol.tex

   1 \documentclass[twoside,a4paper,11pt]{article}
   2
   3 \usepackage[T1]{fontenc}
   4 \usepackage[utf8x]{inputenc}
   5 \usepackage{reqlist}
   6
   7 \newcommand{\url}[1]{\texttt{<#1>}}
   8 \newcommand{\unix}{\textsc{Unix}}
   9
  10 \title{Dolda Connect protocol}
  11 \author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}}
  12
  13 \begin{document}
  14
  15 \maketitle
  16
  17 \section{Introduction}
  18 Dolda Connect consists partly of a daemon (a.k.a. server) that runs in
  19 the background and carries out all the actual work, and a number of
  20 client programs (a.k.a. user interfaces) that connect to the daemon in
  21 order to tell it what to do. In order for the daemon and the clients
  22 to be able to talk to each other, a protocol is needed. This document
  23 intends to document that protocol, so that third parties can write
  24 their own client programs.
  25
  26 It is worthy of note that there exists a library, called
  27 \texttt{libdcui} that carries out much of the low level work of
  28 speaking the protocol, facilitating the creation of new client
  29 programs. In itself, \texttt{libdcui} is written in the C programming
  30 language and is intended to be used by other programs written in C,
  31 but there also exist wrapper libraries for both GNU Guile (the GNU
  32 project's Scheme interpreter) and for Python. The former is
  33 distributed with the main Dolda Connect source tree, while the latter
  34 is distributed separately (for technical reasons). To get a copy,
  35 please refer to Dolda Connect's homepage at
  36 \url{http://www.dolda2000.com}.
  37
  38 \section{Transport format}
  39 Note: Everything covered in this section is handled by the
  40 \texttt{libdcui} library. Thus, if you read this because you just want
  41 to write a client, and are using the library (or any of the wrapper
  42 libraries), you can safely skip over this section. It may still be
  43 interesting to read in order to understand the semantics of the
  44 protocol, however.
  45
  46 The protocol can be spoken over any channel that features a
  47 byte-oriented, reliable virtual (or not) circuit. Usually, it is
  48 spoken over a TCP connection or a byte-oriented \unix\ socket. The
  49 usual port number for TCP connections is 1500, but any port could be
  50 used\footnote{However, port 1500 is what the \texttt{libdcui} library
  51   uses if no port is explicitly stated, so it is probably to be
  52   preferred}.
  53
  54 \subsection{Informal description}
  55
  56 On top of the provided byte-oriented connection, the most basic level
  57 of the protocol is a stream of Unicode characters, encoded with
  58 UTF-8. The Unicode stream is then grouped in two levels: lines
  59 consisting of words (a.k.a. tokens). Lines are separated by CRLF
  60 sequences (\emph{not} just CR or LF), and words are separated by
  61 whitespace. Both whitespace and CRLFs can be quoted, however,
  62 overriding their normal interpretation of separators and allowing them
  63 to be parts of words. NUL characters are not allowed to be transferred
  64 at all, but all other Unicode codepoints are allowed.
  65
  66 Lines transmitted from the daemon to the client are slightly
  67 different, however. They all start with a three-digit code, followed
  68 by either a space or a dash\footnote{Yes, this is inspired by FTP and
  69   SMTP.}, followed by the normal sequence of words. The three-digit
  70 code identifies that type of line. Overall, the protocol is a
  71 lock-step protocol, where the clients sends one line that is
  72 interpreted as a request, and the daemon replies with one or more
  73 lines. In a multi-line response, all lines except the last have the
  74 three-digit code followed by a dash. The last line of a multi-line
  75 response and the only line of a single-line response have the
  76 three-digit code followed by a space. All lines of a multi-line
  77 response have the same three-digit code. The client is not allowed to
  78 send another request until the last line of the previous response has
  79 been received. The exception is that the daemon might send (but only
  80 if the client has requested it to do so) sporadic lines of
  81 asynchronous notification messages. Notification message lines are
  82 distinguished by having their three-digit codes always begin with the
  83 digit 6. Otherwise, the first digit of the three-digit code indicates
  84 the overall success or failure of a request. Codes beginning with 2
  85 indicate the the request to which they belong succeeded. Codes
  86 beginning with 3 indicate that the request succeeded in itself, but
  87 that it is considered part of a sequence of commands, and that the
  88 sequence still requires additional interaction before considered
  89 successful. Codes beginning with 5 are indication of errors. The
  90 remaining two digits merely distinguish between different
  91 outcomes. Note that notification message lines may come at \emph{any}
  92 time, even in the middle of multiline responses (though not in the
  93 middle of another line). There are no multiline notifications.
  94
  95 The act of connecting to the daemon is itself considered a request,
  96 solicitating a success or failure response, so it is the daemon that
  97 first transmits actual data. A failure response may be provoked by a
  98 client connecting from a prohibited source.
  99
 100 Quoting of special characters in words may be done in two ways. First,
 101 the backslash character escapes any special interpretation of the
 102 character that comes after it, no matter where or what the following
 103 character is (it is not required even to be a special
 104 character). Thus, the only way to include a backslash in a word is to
 105 escape it with another backslash. Second, any interpretation of
 106 whitespace may be escaped using the citation mark character (only the
 107 ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a
 108 string containing whitespace in citation marks. (Note that the citation
 109 marks need not necessarily be placed at the word boundaries, so the
 110 string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab
 111   cd}''.) Technically, this dual layer of quoting may seem like a
 112 liability when implementing the protocol, but it is quite convenient
 113 when talking directly to the daemon with a program such as
 114 \texttt{telnet}.
 115
 116 \subsection{Formal description}
 117
 118 Formally, the syntax of the protocol may be defined with the following
 119 BNF rules. Note that they all operate on Unicode characters, not bytes.
 120
 121 \begin{tabular}{lcl}
 122 <session> & ::= & <SYN> <response> \\
 123  & & | <session> <transaction> \\
 124  & & | <session> <notification> \\
 125 <transaction> & ::= & <request> <response> \\
 126 <request> & ::= & <line> \\
 127 <response> & ::= & <resp-line-last> \\
 128  & & | <resp-line-not-last> <response> \\
 129  & & | <notification> <response> \\
 130 <resp-line-last> & ::= & <resp-code> <SPACE> <line> \\
 131 <resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\
 132 <notification> & ::= & <notification-code> <SPACE> <line> \\
 133 <resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\
 134  & & | ``\texttt{3}'' <digit> <digit> \\
 135  & & | ``\texttt{5}'' <digit> <digit> \\
 136 <notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\
 137 <line> & ::= & <CRLF> \\
 138  & & | <word> <ws> <line> \\
 139 <word> & ::= & <COMMON-CHAR> \\
 140  & & | ``\texttt{$\backslash$}'' <CHAR> \\
 141  & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\
 142  & & | <word> <word> \\
 143 <quoted-word> & ::= & ``'' \\
 144  & & | <COMMON-CHAR> <quoted-word> \\
 145  & & | <ws> <quoted-word> \\
 146  & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\
 147 <ws> & ::= & <1ws> | <1ws> <ws> \\
 148 <1ws> & ::= & <SPACE> | <TAB> \\
 149 <digit> & ::= & ``\texttt{0}'' |
 150 ``\texttt{1}'' | ``\texttt{2}'' |
 151 ``\texttt{3}'' | ``\texttt{4}'' \\
 152 & & | ``\texttt{5}'' | ``\texttt{6}'' |
 153 ``\texttt{7}'' | ``\texttt{8}'' |
 154 ``\texttt{9}''
 155 \end{tabular}
 156
 157 As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009,
 158 <CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR>
 159 is any Unicode character except U+0000, <COMMON-CHAR> is any
 160 Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020,
 161 U+0022 and U+005C, and <SYN> is the out-of-band message that
 162 establishes the communication channel\footnote{This means that the
 163   communication channel must support such a message. For example, raw
 164   RS-232 would be hard to support.}. The following constraints also
 165 apply:
 166 \begin{itemize}
 167 \item <SYN> and <request> must be sent from the client to the daemon.
 168 \item <response> and <notification> must be sent from the daemon to
 169   the client.
 170 \end{itemize}
 171 Note that the definition of <word> means that the only way to
 172 represent an empty word is by a pair of citation marks.
 173
 174 In each request line, there should be at least one word, but it is not
 175 considered a syntax error if there is not. The first word in each
 176 request line is considered the name of the command to be carried out
 177 by the daemon. An empty line is a valid request as such, but since no
 178 matching command, it will provoke the same kind of error response as
 179 if a request with any other non-existing command were sent. Any
 180 remaining words on the line are considered arguments to the command.
 181
 182 \section{Requests}
 183 For each arriving request, the daemon checks so that the request
 184 passes a number of tests before carrying it out. First, it matches the
 185 name of the command against the list of known commands to see if the
 186 request calls a valid command. If the command is not valid, the daemon
 187 sends a reponse with code 500. Then, it checks so that the request has
 188 the minimum required number of parameters for the given command. If it
 189 does not, it responds with a 501 code. Last, it checks so that the
 190 user account issuing the request has the necessary permissions to have
 191 the request carried out. If it does not, it responds with a 502
 192 code. After that, any responses are individual to the command in
 193 question. The intention of this section is to list them all.
 194
 195 \subsection{Permissions}
 196
 197 As for the permissions mentioned above, it is outside the scope of
 198 this document to describe the administration of
 199 permissions\footnote{Please see the \texttt{doldacond.conf(5)} man
 200   page for more information on that topic.}, but some commands require
 201 certain permission, they need at least be specified. When a connection
 202 is established, it is associated with no permissions. At that point,
 203 only requests that do not require any permissions can be successfully
 204 issued. Normally, the first thing a client would do is to authenticate
 205 to the daemon. At the end of a successful authentication, the daemon
 206 associates the proper permissions with the connection over which
 207 authentication took place. The possible permissions are listed in
 208 table \ref{tab:perm}.
 209
 210 \begin{table}
 211   \begin{tabular}{rl}
 212     Name & General description \\
 213     \hline
 214     \texttt{admin} & Required for all commands that administer the
 215     daemon. \\
 216     \texttt{fnetctl} & Required for all commands that alter the state of
 217     connected hubs. \\
 218     \texttt{trans} & Required for all commands that alter the state of
 219     file transfers. \\
 220     \texttt{transcu} & Required specifically for cancelling uploads. \\
 221     \texttt{chat} & Required for exchanging chat messages. \\
 222     \texttt{srch} & Required for issuing and querying searches. \\
 223   \end{tabular}
 224   \caption{The list of available permissions}
 225   \label{tab:perm}
 226 \end{table}
 227
 228 \subsection{Protocol revisions}
 229
 230 Since Dolda Connect is developing, its command set may change
 231 occasionally. Sometimes new commands are added, sometimes commands
 232 change argument syntax, and sometimes commands are removed. In order
 233 for clients to be able to cleanly cope with such changes, the protocol
 234 is revisioned. When a client connects to the daemon, the daemon
 235 indicates in the first response it sends the range of protocol
 236 revisions it supports, and each command listed below specifies the
 237 revision number from which its current specification is valid. A
 238 client should should check the revision range from the daemon so that
 239 it includes the revision that incorporates all commands that it wishes
 240 to use.
 241
 242 Whenever the protocol changes at all, it is given a new revision
 243 number. If the entire protocol is backwards compatible with the
 244 previous version, the revision range sent by the server is updated to
 245 extend forward to the new revision. If the protocol in any way is not
 246 compatible with the previous revision, the revision range is moved
 247 entirely to the new revision. Therefore, a client can check for a
 248 certain revision and be sure that everything it wants is supported by
 249 the daemon.
 250
 251 \subsection{List of commands}
 252
 253 Follows does a (hopefully) exhaustive listing of all commands valid
 254 for a request. For each possible request, it includes the name of the
 255 command for the request, the permissions required, the syntax the
 256 entire request line, and the possible responses.
 257
 258 The syntax of the request and response lines is described in a format
 259 like that traditional of \unix\ man pages, with a number of terms,
 260 each corresponding to a word in the line. Each term in the syntax
 261 description is either a literal string, written in lower case; an
 262 argument, written in uppercase and meant to be replaced by some other
 263 text as described; an optional term, enclosed in brackets
 264 (``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives,
 265 enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated
 266 by pipes (``\texttt{|}''). Possible repetition of a term is indicated
 267 by three dots (``\texttt{...}''), and, for the purpose of repition,
 268 terms may be groups with parentheses (``\texttt{(}'' and
 269 ``\texttt{)}'').
 270
 271 Two things should be noted regarding the responses. First, in the
 272 syntax description of responses, the response code is given as the
 273 first term, even though it is not actually considered a word. Second,
 274 more words may follow after the specified syntax, and should be
 275 discarded by a client. Many responses use that to include a human
 276 readable string to indicate the conclusion of the request.
 277
 278 \subsubsection{Connection}
 279 As mentioned above, the act of connecting to the daemon is itself
 280 considered a request, soliciting a response. Such a request obviously
 281 has no command name and no syntax, but needs a description
 282 nonetheless.
 283
 284 \noperm
 285
 286 \begin{responses}
 287   \response{200}
 288   The old response given by daemons not yet using the revisioned
 289   protocol. Clients receiving this response should consider it an
 290   error.
 291 \end{responses}
 292
 293 \end{document}