Use longtable for protocol BNF.
[doldaconnect.git] / doc / protocol / protocol.tex
CommitLineData
4ae8ca60
FT
1\documentclass[twoside,a4paper,11pt]{article}
2
66e1551f
FT
3\usepackage[T1]{fontenc}
4\usepackage[utf8x]{inputenc}
f6d0f511 5\usepackage[ps2pdf]{hyperref}
66e1551f 6\usepackage{reqlist}
f7932303 7\usepackage{longtable}
66e1551f 8
f6d0f511 9\newcommand{\urlink}[1]{\texttt{<#1>}}
4ae8ca60
FT
10\newcommand{\unix}{\textsc{Unix}}
11
12\title{Dolda Connect protocol}
13\author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}}
14
15\begin{document}
16
17\maketitle
18
47b71ed4
FT
19\tableofcontents
20
4ae8ca60
FT
21\section{Introduction}
22Dolda Connect consists partly of a daemon (a.k.a. server) that runs in
23the background and carries out all the actual work, and a number of
24client programs (a.k.a. user interfaces) that connect to the daemon in
25order to tell it what to do. In order for the daemon and the clients
26to be able to talk to each other, a protocol is needed. This document
27intends to document that protocol, so that third parties can write
28their own client programs.
29
30It is worthy of note that there exists a library, called
31\texttt{libdcui} that carries out much of the low level work of
32speaking the protocol, facilitating the creation of new client
33programs. In itself, \texttt{libdcui} is written in the C programming
34language and is intended to be used by other programs written in C,
35but there also exist wrapper libraries for both GNU Guile (the GNU
36project's Scheme interpreter) and for Python. The former is
37distributed with the main Dolda Connect source tree, while the latter
38is distributed separately (for technical reasons). To get a copy,
39please refer to Dolda Connect's homepage at
f6d0f511 40\urlink{http://www.dolda2000.com}.
4ae8ca60
FT
41
42\section{Transport format}
66e1551f
FT
43Note: Everything covered in this section is handled by the
44\texttt{libdcui} library. Thus, if you read this because you just want
45to write a client, and are using the library (or any of the wrapper
46libraries), you can safely skip over this section. It may still be
47interesting to read in order to understand the semantics of the
48protocol, however.
49
4ae8ca60
FT
50The protocol can be spoken over any channel that features a
51byte-oriented, reliable virtual (or not) circuit. Usually, it is
52spoken over a TCP connection or a byte-oriented \unix\ socket. The
53usual port number for TCP connections is 1500, but any port could be
54used\footnote{However, port 1500 is what the \texttt{libdcui} library
55 uses if no port is explicitly stated, so it is probably to be
66e1551f
FT
56 preferred}.
57
58\subsection{Informal description}
4ae8ca60
FT
59
60On top of the provided byte-oriented connection, the most basic level
61of the protocol is a stream of Unicode characters, encoded with
62UTF-8. The Unicode stream is then grouped in two levels: lines
63consisting of words (a.k.a. tokens). Lines are separated by CRLF
64sequences (\emph{not} just CR or LF), and words are separated by
65whitespace. Both whitespace and CRLFs can be quoted, however,
66overriding their normal interpretation of separators and allowing them
67to be parts of words. NUL characters are not allowed to be transferred
68at all, but all other Unicode codepoints are allowed.
69
70Lines transmitted from the daemon to the client are slightly
71different, however. They all start with a three-digit code, followed
72by either a space or a dash\footnote{Yes, this is inspired by FTP and
73 SMTP.}, followed by the normal sequence of words. The three-digit
74code identifies that type of line. Overall, the protocol is a
75lock-step protocol, where the clients sends one line that is
76interpreted as a request, and the daemon replies with one or more
77lines. In a multi-line response, all lines except the last have the
78three-digit code followed by a dash. The last line of a multi-line
79response and the only line of a single-line response have the
80three-digit code followed by a space. All lines of a multi-line
81response have the same three-digit code. The client is not allowed to
82send another request until the last line of the previous response has
66e1551f
FT
83been received. The exception is that the daemon might send (but only
84if the client has requested it to do so) sporadic lines of
85asynchronous notification messages. Notification message lines are
86distinguished by having their three-digit codes always begin with the
87digit 6. Otherwise, the first digit of the three-digit code indicates
88the overall success or failure of a request. Codes beginning with 2
89indicate the the request to which they belong succeeded. Codes
90beginning with 3 indicate that the request succeeded in itself, but
91that it is considered part of a sequence of commands, and that the
92sequence still requires additional interaction before considered
93successful. Codes beginning with 5 are indication of errors. The
94remaining two digits merely distinguish between different
95outcomes. Note that notification message lines may come at \emph{any}
96time, even in the middle of multiline responses (though not in the
97middle of another line). There are no multiline notifications.
98
99The act of connecting to the daemon is itself considered a request,
100solicitating a success or failure response, so it is the daemon that
101first transmits actual data. A failure response may be provoked by a
102client connecting from a prohibited source.
103
104Quoting of special characters in words may be done in two ways. First,
105the backslash character escapes any special interpretation of the
106character that comes after it, no matter where or what the following
107character is (it is not required even to be a special
108character). Thus, the only way to include a backslash in a word is to
109escape it with another backslash. Second, any interpretation of
110whitespace may be escaped using the citation mark character (only the
111ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a
112string containing whitespace in citation marks. (Note that the citation
113marks need not necessarily be placed at the word boundaries, so the
114string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab
115 cd}''.) Technically, this dual layer of quoting may seem like a
116liability when implementing the protocol, but it is quite convenient
117when talking directly to the daemon with a program such as
118\texttt{telnet}.
119
120\subsection{Formal description}
121
122Formally, the syntax of the protocol may be defined with the following
123BNF rules. Note that they all operate on Unicode characters, not bytes.
124
f7932303 125\begin{longtable}{lcl}
66e1551f
FT
126<session> & ::= & <SYN> <response> \\
127 & & | <session> <transaction> \\
128 & & | <session> <notification> \\
129<transaction> & ::= & <request> <response> \\
130<request> & ::= & <line> \\
131<response> & ::= & <resp-line-last> \\
132 & & | <resp-line-not-last> <response> \\
133 & & | <notification> <response> \\
134<resp-line-last> & ::= & <resp-code> <SPACE> <line> \\
135<resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\
136<notification> & ::= & <notification-code> <SPACE> <line> \\
137<resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\
138 & & | ``\texttt{3}'' <digit> <digit> \\
139 & & | ``\texttt{5}'' <digit> <digit> \\
140<notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\
141<line> & ::= & <CRLF> \\
142 & & | <word> <ws> <line> \\
143<word> & ::= & <COMMON-CHAR> \\
144 & & | ``\texttt{$\backslash$}'' <CHAR> \\
145 & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\
146 & & | <word> <word> \\
147<quoted-word> & ::= & ``'' \\
148 & & | <COMMON-CHAR> <quoted-word> \\
149 & & | <ws> <quoted-word> \\
150 & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\
151<ws> & ::= & <1ws> | <1ws> <ws> \\
152<1ws> & ::= & <SPACE> | <TAB> \\
153<digit> & ::= & ``\texttt{0}'' |
154``\texttt{1}'' | ``\texttt{2}'' |
155``\texttt{3}'' | ``\texttt{4}'' \\
156& & | ``\texttt{5}'' | ``\texttt{6}'' |
157``\texttt{7}'' | ``\texttt{8}'' |
158``\texttt{9}''
f7932303 159\end{longtable}
66e1551f
FT
160
161As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009,
162<CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR>
163is any Unicode character except U+0000, <COMMON-CHAR> is any
164Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020,
165U+0022 and U+005C, and <SYN> is the out-of-band message that
166establishes the communication channel\footnote{This means that the
167 communication channel must support such a message. For example, raw
168 RS-232 would be hard to support.}. The following constraints also
169apply:
170\begin{itemize}
171\item <SYN> and <request> must be sent from the client to the daemon.
172\item <response> and <notification> must be sent from the daemon to
173 the client.
174\end{itemize}
175Note that the definition of <word> means that the only way to
176represent an empty word is by a pair of citation marks.
177
178In each request line, there should be at least one word, but it is not
179considered a syntax error if there is not. The first word in each
180request line is considered the name of the command to be carried out
181by the daemon. An empty line is a valid request as such, but since no
182matching command, it will provoke the same kind of error response as
183if a request with any other non-existing command were sent. Any
184remaining words on the line are considered arguments to the command.
185
186\section{Requests}
187For each arriving request, the daemon checks so that the request
188passes a number of tests before carrying it out. First, it matches the
189name of the command against the list of known commands to see if the
190request calls a valid command. If the command is not valid, the daemon
191sends a reponse with code 500. Then, it checks so that the request has
192the minimum required number of parameters for the given command. If it
193does not, it responds with a 501 code. Last, it checks so that the
194user account issuing the request has the necessary permissions to have
195the request carried out. If it does not, it responds with a 502
196code. After that, any responses are individual to the command in
197question. The intention of this section is to list them all.
198
199\subsection{Permissions}
200
201As for the permissions mentioned above, it is outside the scope of
202this document to describe the administration of
203permissions\footnote{Please see the \texttt{doldacond.conf(5)} man
204 page for more information on that topic.}, but some commands require
205certain permission, they need at least be specified. When a connection
206is established, it is associated with no permissions. At that point,
207only requests that do not require any permissions can be successfully
208issued. Normally, the first thing a client would do is to authenticate
209to the daemon. At the end of a successful authentication, the daemon
210associates the proper permissions with the connection over which
211authentication took place. The possible permissions are listed in
212table \ref{tab:perm}.
213
214\begin{table}
215 \begin{tabular}{rl}
216 Name & General description \\
217 \hline
218 \texttt{admin} & Required for all commands that administer the
219 daemon. \\
220 \texttt{fnetctl} & Required for all commands that alter the state of
221 connected hubs. \\
222 \texttt{trans} & Required for all commands that alter the state of
223 file transfers. \\
224 \texttt{transcu} & Required specifically for cancelling uploads. \\
225 \texttt{chat} & Required for exchanging chat messages. \\
226 \texttt{srch} & Required for issuing and querying searches. \\
227 \end{tabular}
228 \caption{The list of available permissions}
229 \label{tab:perm}
230\end{table}
231
232\subsection{Protocol revisions}
03ee2e4a 233\label{rev}
66e1551f
FT
234Since Dolda Connect is developing, its command set may change
235occasionally. Sometimes new commands are added, sometimes commands
236change argument syntax, and sometimes commands are removed. In order
237for clients to be able to cleanly cope with such changes, the protocol
238is revisioned. When a client connects to the daemon, the daemon
239indicates in the first response it sends the range of protocol
240revisions it supports, and each command listed below specifies the
241revision number from which its current specification is valid. A
242client should should check the revision range from the daemon so that
243it includes the revision that incorporates all commands that it wishes
244to use.
245
246Whenever the protocol changes at all, it is given a new revision
247number. If the entire protocol is backwards compatible with the
248previous version, the revision range sent by the server is updated to
249extend forward to the new revision. If the protocol in any way is not
250compatible with the previous revision, the revision range is moved
251entirely to the new revision. Therefore, a client can check for a
252certain revision and be sure that everything it wants is supported by
253the daemon.
254
03ee2e4a
FT
255At the time of this writing, the latest protocol revision is 2. Please
256see the file \texttt{doc/protorev} that comes with the Dolda Connect
257source tree for a full list of revisions and what changed between
258them.
259
66e1551f
FT
260\subsection{List of commands}
261
262Follows does a (hopefully) exhaustive listing of all commands valid
263for a request. For each possible request, it includes the name of the
03ee2e4a 264command for the request, the permissions required, the syntax for the
66e1551f
FT
265entire request line, and the possible responses.
266
267The syntax of the request and response lines is described in a format
268like that traditional of \unix\ man pages, with a number of terms,
269each corresponding to a word in the line. Each term in the syntax
270description is either a literal string, written in lower case; an
271argument, written in uppercase and meant to be replaced by some other
272text as described; an optional term, enclosed in brackets
273(``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives,
274enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated
275by pipes (``\texttt{|}''). Possible repetition of a term is indicated
276by three dots (``\texttt{...}''), and, for the purpose of repition,
277terms may be groups with parentheses (``\texttt{(}'' and
278``\texttt{)}'').
279
280Two things should be noted regarding the responses. First, in the
281syntax description of responses, the response code is given as the
282first term, even though it is not actually considered a word. Second,
283more words may follow after the specified syntax, and should be
284discarded by a client. Many responses use that to include a human
285readable string to indicate the conclusion of the request.
286
287\subsubsection{Connection}
288As mentioned above, the act of connecting to the daemon is itself
289considered a request, soliciting a response. Such a request obviously
290has no command name and no syntax, but needs a description
291nonetheless.
292
03ee2e4a
FT
293\revision{1}
294
66e1551f
FT
295\noperm
296
297\begin{responses}
298 \response{200}
299 The old response given by daemons not yet using the revisioned
300 protocol. Clients receiving this response should consider it an
301 error.
03ee2e4a
FT
302 \response{201 LOREV HIREV}
303 Indicates that the connection is accepted. The \param{LOREV} and
304 \param{HIREV} parameters specify the range of supported protocol
305 revisions, as described in section \ref{rev}.
306 \response{502 REASON}
307 The connection is refused by the daemon and will be closed. The
308 \param{REASON} parameter states the reason for the refusal in
309 English\footnote{So it is probably not suitable for localized
310 programs}.
66e1551f 311\end{responses}
4ae8ca60 312
f6d0f511
FT
313\input{commands}
314
4ae8ca60 315\end{document}