Use ps2pdf hyperrefs and include commands.tex.
[doldaconnect.git] / doc / protocol / protocol.tex
CommitLineData
4ae8ca60
FT
1\documentclass[twoside,a4paper,11pt]{article}
2
66e1551f
FT
3\usepackage[T1]{fontenc}
4\usepackage[utf8x]{inputenc}
f6d0f511 5\usepackage[ps2pdf]{hyperref}
66e1551f
FT
6\usepackage{reqlist}
7
f6d0f511 8\newcommand{\urlink}[1]{\texttt{<#1>}}
4ae8ca60
FT
9\newcommand{\unix}{\textsc{Unix}}
10
11\title{Dolda Connect protocol}
12\author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}}
13
14\begin{document}
15
16\maketitle
17
18\section{Introduction}
19Dolda Connect consists partly of a daemon (a.k.a. server) that runs in
20the background and carries out all the actual work, and a number of
21client programs (a.k.a. user interfaces) that connect to the daemon in
22order to tell it what to do. In order for the daemon and the clients
23to be able to talk to each other, a protocol is needed. This document
24intends to document that protocol, so that third parties can write
25their own client programs.
26
27It is worthy of note that there exists a library, called
28\texttt{libdcui} that carries out much of the low level work of
29speaking the protocol, facilitating the creation of new client
30programs. In itself, \texttt{libdcui} is written in the C programming
31language and is intended to be used by other programs written in C,
32but there also exist wrapper libraries for both GNU Guile (the GNU
33project's Scheme interpreter) and for Python. The former is
34distributed with the main Dolda Connect source tree, while the latter
35is distributed separately (for technical reasons). To get a copy,
36please refer to Dolda Connect's homepage at
f6d0f511 37\urlink{http://www.dolda2000.com}.
4ae8ca60
FT
38
39\section{Transport format}
66e1551f
FT
40Note: Everything covered in this section is handled by the
41\texttt{libdcui} library. Thus, if you read this because you just want
42to write a client, and are using the library (or any of the wrapper
43libraries), you can safely skip over this section. It may still be
44interesting to read in order to understand the semantics of the
45protocol, however.
46
4ae8ca60
FT
47The protocol can be spoken over any channel that features a
48byte-oriented, reliable virtual (or not) circuit. Usually, it is
49spoken over a TCP connection or a byte-oriented \unix\ socket. The
50usual port number for TCP connections is 1500, but any port could be
51used\footnote{However, port 1500 is what the \texttt{libdcui} library
52 uses if no port is explicitly stated, so it is probably to be
66e1551f
FT
53 preferred}.
54
55\subsection{Informal description}
4ae8ca60
FT
56
57On top of the provided byte-oriented connection, the most basic level
58of the protocol is a stream of Unicode characters, encoded with
59UTF-8. The Unicode stream is then grouped in two levels: lines
60consisting of words (a.k.a. tokens). Lines are separated by CRLF
61sequences (\emph{not} just CR or LF), and words are separated by
62whitespace. Both whitespace and CRLFs can be quoted, however,
63overriding their normal interpretation of separators and allowing them
64to be parts of words. NUL characters are not allowed to be transferred
65at all, but all other Unicode codepoints are allowed.
66
67Lines transmitted from the daemon to the client are slightly
68different, however. They all start with a three-digit code, followed
69by either a space or a dash\footnote{Yes, this is inspired by FTP and
70 SMTP.}, followed by the normal sequence of words. The three-digit
71code identifies that type of line. Overall, the protocol is a
72lock-step protocol, where the clients sends one line that is
73interpreted as a request, and the daemon replies with one or more
74lines. In a multi-line response, all lines except the last have the
75three-digit code followed by a dash. The last line of a multi-line
76response and the only line of a single-line response have the
77three-digit code followed by a space. All lines of a multi-line
78response have the same three-digit code. The client is not allowed to
79send another request until the last line of the previous response has
66e1551f
FT
80been received. The exception is that the daemon might send (but only
81if the client has requested it to do so) sporadic lines of
82asynchronous notification messages. Notification message lines are
83distinguished by having their three-digit codes always begin with the
84digit 6. Otherwise, the first digit of the three-digit code indicates
85the overall success or failure of a request. Codes beginning with 2
86indicate the the request to which they belong succeeded. Codes
87beginning with 3 indicate that the request succeeded in itself, but
88that it is considered part of a sequence of commands, and that the
89sequence still requires additional interaction before considered
90successful. Codes beginning with 5 are indication of errors. The
91remaining two digits merely distinguish between different
92outcomes. Note that notification message lines may come at \emph{any}
93time, even in the middle of multiline responses (though not in the
94middle of another line). There are no multiline notifications.
95
96The act of connecting to the daemon is itself considered a request,
97solicitating a success or failure response, so it is the daemon that
98first transmits actual data. A failure response may be provoked by a
99client connecting from a prohibited source.
100
101Quoting of special characters in words may be done in two ways. First,
102the backslash character escapes any special interpretation of the
103character that comes after it, no matter where or what the following
104character is (it is not required even to be a special
105character). Thus, the only way to include a backslash in a word is to
106escape it with another backslash. Second, any interpretation of
107whitespace may be escaped using the citation mark character (only the
108ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a
109string containing whitespace in citation marks. (Note that the citation
110marks need not necessarily be placed at the word boundaries, so the
111string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab
112 cd}''.) Technically, this dual layer of quoting may seem like a
113liability when implementing the protocol, but it is quite convenient
114when talking directly to the daemon with a program such as
115\texttt{telnet}.
116
117\subsection{Formal description}
118
119Formally, the syntax of the protocol may be defined with the following
120BNF rules. Note that they all operate on Unicode characters, not bytes.
121
122\begin{tabular}{lcl}
123<session> & ::= & <SYN> <response> \\
124 & & | <session> <transaction> \\
125 & & | <session> <notification> \\
126<transaction> & ::= & <request> <response> \\
127<request> & ::= & <line> \\
128<response> & ::= & <resp-line-last> \\
129 & & | <resp-line-not-last> <response> \\
130 & & | <notification> <response> \\
131<resp-line-last> & ::= & <resp-code> <SPACE> <line> \\
132<resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\
133<notification> & ::= & <notification-code> <SPACE> <line> \\
134<resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\
135 & & | ``\texttt{3}'' <digit> <digit> \\
136 & & | ``\texttt{5}'' <digit> <digit> \\
137<notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\
138<line> & ::= & <CRLF> \\
139 & & | <word> <ws> <line> \\
140<word> & ::= & <COMMON-CHAR> \\
141 & & | ``\texttt{$\backslash$}'' <CHAR> \\
142 & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\
143 & & | <word> <word> \\
144<quoted-word> & ::= & ``'' \\
145 & & | <COMMON-CHAR> <quoted-word> \\
146 & & | <ws> <quoted-word> \\
147 & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\
148<ws> & ::= & <1ws> | <1ws> <ws> \\
149<1ws> & ::= & <SPACE> | <TAB> \\
150<digit> & ::= & ``\texttt{0}'' |
151``\texttt{1}'' | ``\texttt{2}'' |
152``\texttt{3}'' | ``\texttt{4}'' \\
153& & | ``\texttt{5}'' | ``\texttt{6}'' |
154``\texttt{7}'' | ``\texttt{8}'' |
155``\texttt{9}''
156\end{tabular}
157
158As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009,
159<CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR>
160is any Unicode character except U+0000, <COMMON-CHAR> is any
161Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020,
162U+0022 and U+005C, and <SYN> is the out-of-band message that
163establishes the communication channel\footnote{This means that the
164 communication channel must support such a message. For example, raw
165 RS-232 would be hard to support.}. The following constraints also
166apply:
167\begin{itemize}
168\item <SYN> and <request> must be sent from the client to the daemon.
169\item <response> and <notification> must be sent from the daemon to
170 the client.
171\end{itemize}
172Note that the definition of <word> means that the only way to
173represent an empty word is by a pair of citation marks.
174
175In each request line, there should be at least one word, but it is not
176considered a syntax error if there is not. The first word in each
177request line is considered the name of the command to be carried out
178by the daemon. An empty line is a valid request as such, but since no
179matching command, it will provoke the same kind of error response as
180if a request with any other non-existing command were sent. Any
181remaining words on the line are considered arguments to the command.
182
183\section{Requests}
184For each arriving request, the daemon checks so that the request
185passes a number of tests before carrying it out. First, it matches the
186name of the command against the list of known commands to see if the
187request calls a valid command. If the command is not valid, the daemon
188sends a reponse with code 500. Then, it checks so that the request has
189the minimum required number of parameters for the given command. If it
190does not, it responds with a 501 code. Last, it checks so that the
191user account issuing the request has the necessary permissions to have
192the request carried out. If it does not, it responds with a 502
193code. After that, any responses are individual to the command in
194question. The intention of this section is to list them all.
195
196\subsection{Permissions}
197
198As for the permissions mentioned above, it is outside the scope of
199this document to describe the administration of
200permissions\footnote{Please see the \texttt{doldacond.conf(5)} man
201 page for more information on that topic.}, but some commands require
202certain permission, they need at least be specified. When a connection
203is established, it is associated with no permissions. At that point,
204only requests that do not require any permissions can be successfully
205issued. Normally, the first thing a client would do is to authenticate
206to the daemon. At the end of a successful authentication, the daemon
207associates the proper permissions with the connection over which
208authentication took place. The possible permissions are listed in
209table \ref{tab:perm}.
210
211\begin{table}
212 \begin{tabular}{rl}
213 Name & General description \\
214 \hline
215 \texttt{admin} & Required for all commands that administer the
216 daemon. \\
217 \texttt{fnetctl} & Required for all commands that alter the state of
218 connected hubs. \\
219 \texttt{trans} & Required for all commands that alter the state of
220 file transfers. \\
221 \texttt{transcu} & Required specifically for cancelling uploads. \\
222 \texttt{chat} & Required for exchanging chat messages. \\
223 \texttt{srch} & Required for issuing and querying searches. \\
224 \end{tabular}
225 \caption{The list of available permissions}
226 \label{tab:perm}
227\end{table}
228
229\subsection{Protocol revisions}
03ee2e4a 230\label{rev}
66e1551f
FT
231Since Dolda Connect is developing, its command set may change
232occasionally. Sometimes new commands are added, sometimes commands
233change argument syntax, and sometimes commands are removed. In order
234for clients to be able to cleanly cope with such changes, the protocol
235is revisioned. When a client connects to the daemon, the daemon
236indicates in the first response it sends the range of protocol
237revisions it supports, and each command listed below specifies the
238revision number from which its current specification is valid. A
239client should should check the revision range from the daemon so that
240it includes the revision that incorporates all commands that it wishes
241to use.
242
243Whenever the protocol changes at all, it is given a new revision
244number. If the entire protocol is backwards compatible with the
245previous version, the revision range sent by the server is updated to
246extend forward to the new revision. If the protocol in any way is not
247compatible with the previous revision, the revision range is moved
248entirely to the new revision. Therefore, a client can check for a
249certain revision and be sure that everything it wants is supported by
250the daemon.
251
03ee2e4a
FT
252At the time of this writing, the latest protocol revision is 2. Please
253see the file \texttt{doc/protorev} that comes with the Dolda Connect
254source tree for a full list of revisions and what changed between
255them.
256
66e1551f
FT
257\subsection{List of commands}
258
259Follows does a (hopefully) exhaustive listing of all commands valid
260for a request. For each possible request, it includes the name of the
03ee2e4a 261command for the request, the permissions required, the syntax for the
66e1551f
FT
262entire request line, and the possible responses.
263
264The syntax of the request and response lines is described in a format
265like that traditional of \unix\ man pages, with a number of terms,
266each corresponding to a word in the line. Each term in the syntax
267description is either a literal string, written in lower case; an
268argument, written in uppercase and meant to be replaced by some other
269text as described; an optional term, enclosed in brackets
270(``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives,
271enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated
272by pipes (``\texttt{|}''). Possible repetition of a term is indicated
273by three dots (``\texttt{...}''), and, for the purpose of repition,
274terms may be groups with parentheses (``\texttt{(}'' and
275``\texttt{)}'').
276
277Two things should be noted regarding the responses. First, in the
278syntax description of responses, the response code is given as the
279first term, even though it is not actually considered a word. Second,
280more words may follow after the specified syntax, and should be
281discarded by a client. Many responses use that to include a human
282readable string to indicate the conclusion of the request.
283
284\subsubsection{Connection}
285As mentioned above, the act of connecting to the daemon is itself
286considered a request, soliciting a response. Such a request obviously
287has no command name and no syntax, but needs a description
288nonetheless.
289
03ee2e4a
FT
290\revision{1}
291
66e1551f
FT
292\noperm
293
294\begin{responses}
295 \response{200}
296 The old response given by daemons not yet using the revisioned
297 protocol. Clients receiving this response should consider it an
298 error.
03ee2e4a
FT
299 \response{201 LOREV HIREV}
300 Indicates that the connection is accepted. The \param{LOREV} and
301 \param{HIREV} parameters specify the range of supported protocol
302 revisions, as described in section \ref{rev}.
303 \response{502 REASON}
304 The connection is refused by the daemon and will be closed. The
305 \param{REASON} parameter states the reason for the refusal in
306 English\footnote{So it is probably not suitable for localized
307 programs}.
66e1551f 308\end{responses}
4ae8ca60 309
f6d0f511
FT
310\input{commands}
311
4ae8ca60 312\end{document}