Blob Blame History Raw
\documentclass[12pt,fleqn]{amsart}

\usepackage{amsmath}
\usepackage{xspace}
\usepackage{amssymb}
\usepackage{bcprules}

\newcommand{\ensmath}[1]{\ensuremath{#1}\xspace}

\newcommand{\opnam}[1]{\ensmath{\operatorname{\mathit{#1}}}}
\newcommand{\nget}{\opnam{get}}
\newcommand{\nput}{\opnam{put}}
\newcommand{\nparse}{\opnam{parse}}
\newcommand{\ncreate}{\opnam{create}}
\newcommand{\nkey}{\opnam{key}}
\newcommand{\lget}[1]{\opnam{get}{#1}}
\newcommand{\lparse}[1]{\nparse{#1}}
\newcommand{\lput}[2]{\opnam{put}{#1}\,{(#2)}}
\newcommand{\lcreate}[2]{\opnam{create}{#1}\,{(#2)}}
\newcommand{\lkey}[1]{\nkey{#1}}

\newcommand{\suff}{\ensmath{\operatorname{suff}}}
\newcommand{\pref}{\ensmath{\operatorname{pref}}}
\newcommand{\lenstype}[3][K]{\ensmath{{#2}\stackrel{#1}{\Longleftrightarrow}{#3}}}
\newcommand{\tree}[1]{\ensmath{[#1]}}
\newcommand{\niltree}{\ensmath{[]}}
\newcommand{\nildict}{\ensmath{\{\!\}}}
\newcommand{\Regexp}{\ensmath{\mathcal R}}
\newcommand{\reglang}[1]{\ensmath{[\![{#1}]\!]}}
\newcommand{\lens}[1]{\opnam{#1}}
\newcommand{\eps}{\ensmath{\epsilon}}
\newcommand{\conc}[2]{\ensmath{#1\cdot #2}}
\newcommand{\uaconc}[2]{\ensmath{#1\cdot^{!} #2}}
\newcommand{\xconc}[2]{\ensmath{#1\odot #2}}
\newcommand{\dconc}[2]{\ensmath{#1\oplus #2}}
\newcommand{\alt}[2]{\ensmath{#1\,|\,#2}}
\newcommand{\cstar}[1]{\ensmath{#1^*}}
\newcommand{\uastar}[1]{\ensmath{#1^{!*}}}
\newcommand{\Trees}{\ensmath{\mathcal T}}
\newcommand{\Words}{\ensmath{\Sigma^*_{\rhd}}}
\newcommand{\tmap}[2]{\ensmath{#1\mapsto #2}}
\newcommand{\tmapv}[3]{\ensmath{#1 = #2\mapsto #3}}
\newcommand{\tmaptt}[2]{\ensmath{{\mathtt #1}\mapsto {\mathtt #2}}}
\newcommand{\dom}[1]{\ensmath{\mathrm{dom}(#1)}}
\newcommand{\List}[1]{\ensmath{\mathtt{List(#1)}}}
\newcommand{\redigits}{\ensmath{\mathtt{D}}}
\newcommand{\Skel}{\ensmath{\mathcal{S}}}
\newcommand{\Powerset}{\ensmath{\mathcal{P}}}
\newcommand{\Keys}{\ensmath{\mathcal{K}}}
\newcommand{\key}[1]{\ensmath{\kappa(#1)}}
\newcommand{\lto}{\ensmath{\longrightarrow}}
\newcommand{\Dictlangs}{\ensmath{\List{\Powerset(\Skel)}}}
\newcommand{\lookup}{\opnam{lookup}}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
{\theoremstyle{definition}
  \newtheorem{defn}{Definition}
  \newtheorem*{remark}{Remark}
  \newtheorem*{example}{Example}}

\begin{document}

\section{Trees}
We use trees as the abstract view for our lenses; since the concrete data
structure, strings, is ordered, and to support some properties of lenses
that seem sensible intuitively, the trees differ from garden-variety trees
in a number of ways:
\begin{itemize}
  \item a tree node consists of three pieces of data: a \emph{label}, a
    \emph{value} and an ordered list of \emph{children}, each of them a
    tree by themselves.
  \item the labels for tree nodes are either words not containing a slash
    {\tt /} or the special symbol $\rhd$; in the implementation $\rhd$
    corresponds to {\tt NULL}. The latter is used to indicate
    that an entry in the tree corresponds to text that was deleted.
  \item the children of a tree node form a list of subtrees, i.e. are
    ordered. In addition, several subtrees in such a list may use the same
    label. This makes it possible to accommodate concrete files where
    entries that are logically connected are stored scattered between
    unrelated entries like the {\tt AcceptEnv} entries in {\tt
      sshd\_config}.
\end{itemize}

We write $\tmapv{k}{v}{t}$ for a tree node with label $k$, value $v$ and
children $t$. If it is clear from the context, or unimportant, $v$ will
often be omitted.

\subsection{Tree labels}
We take the tree labels from the set of \emph{path components} $\Keys =
(\Sigma \setminus \{ {\mathtt /} \})^+ \cup \{ \rhd \}$, that is,
a tree label is any word not containing a backslash or the special symbol
$\rhd$. For tree labels, we define a partial concatenation operator
$\odot$, as
\begin{equation*}
  \xconc{k_1}{k_2} = \begin{cases}
    k_1 & \text{if } k_2 = \rhd\\
    k_2 & \text{if } k_1 = \rhd\\
    \text{undefined} & \text{otherwise}
  \end{cases}
\end{equation*}

Defining tree labels in this way (1) guarantees that there is a
one--to--one correspondence between a tree label and the word it came from
in the concrete text and (2) avoids any pain in splitting tree labels in
the \nput direction.

\subsection{Trees}
The set of \emph{ordered trees} \Trees over $\Words$ is recursively
defined as
\begin{itemize}
\item The empty tree $\rhd$
\item For any words $k\in\Keys$, $v\in\Words$ and any tree $t\in\Trees$,
  $\tree{\tmapv{k}{v}{t}}$ is in \Trees
\item For any $n$ and trees $\tree{\tmapv{k_i}{v_i}{t_i}} \in Trees$,
  the list
  $[\tmapv{k_1}{v_1}{t_1};\tmapv{k_2}{v_2}{t_2};\ldots;\tmapv{k_n}{v_n}{t_n}]$
  is in \Trees
\end{itemize}

Note that this allows the same key to be used multiple times in a tree; for
example, $[\tmaptt{a}{x}; \tmaptt{a}{x}]$ is a valid tree and
different from $[\tmaptt{a}{x}]$.

The domain of a tree $\dom{t}$ is the list of all its labels, i.e. an
element of $\List{\Keys}$; for a tree $t =
\tree{\tmap{k_1}{t_1};\ldots;\tmap{k_n}{t_n}}$, $\dom{t} =
     [k_1;\ldots;k_n].$

%% Why ?
%A tree $t$ can also be viewed as a total map from \Words to
%$\List{\Trees\cup\Words}$, defined as
%\begin{equation*}
%  t(k) =
%  \begin{cases}
%    v & \text{for } t = \tree{v}\\
%    [t_1] :: t_2(k) & \text{for } t = t' :: t_2
%                      \text{ and } t' = \tree{\tmap{k}{t_1}}\\
%    t_2(k) & \text{for } t = t' :: t_2
%             \text{ and } t' = \tree{\tmap{k'}{t_1}} \text{with } k\neq k'
%  \end{cases}
%\end{equation*}
%
%Note that this definition implies that for a tree $t = \tree{\tmap{k}{v}}$
%with $k,v\in\Words$, $t(k)=[v].$ For the tree $t_{\mathtt{aa}} = [\tmaptt{a}{x};
%  \tmaptt{a}{x}]$, $t_{\mathtt{aa}}(\mathtt a) = [\mathtt x; \mathtt x]$.

The concatenation of trees $\conc{t_1}{t_2}$ is simply list concatenation.

For sets $K\subset\Keys$ and $T\subset\Trees$, $\tree{\tmap{K}{T}}$
denotes the set of all trees $t = \tree{\tmap{k}{t'}}$ with $k\in K$, $t'
\in T$.

\subsection{Concatenation and iteration}
For a tree $t\in \Trees$, we define its underlying \emph{key language}
$\key{t}$ by
\begin{equation*}
  \key{t} = \begin{cases}
    {\tt /} & \text{if } t = \rhd \text{ or } t = \tree{\tmap{\rhd}{t_1}}\\
    k \cdot {\mathtt /} & \text{if } t = \tree{\tmap{k}{t_1}}\\
    k\cdot {\mathtt /}\cdot \key{t_2} & \text{for } t = \tree{\tmap{k}{t_1};t_2}
  \end{cases}
\end{equation*}
where $\conc{k_1}{k_2}$ is normal string concatenation. The key language of
a set of trees $\key{T}$ is defined as $\{\key{t} | t \in T\}$.

In analogy to languages, we call two tree sets $T_1, T_2 \subseteq \Trees$
\emph{unambiguously concatenable} if the key languages $\key{T_1}$ and
$\key{T_2}$ are unambiguously concatenable. A tree set $T\in\Trees$ is
\emph{unambiguously iterable} if the underlying key language $\key{T}$ is
unambiguously iterable.

\subsection{Public tree operations}
We need the public API to support the following operations. The set
$P\subseteq \Sigma^*$ are paths

\begin{itemize}
  \item $\opnam{lookup}(p, t): P \times \Trees \lto \Trees$ finds the tree
    with path $p$
  \item $\opnam{assign}(p, v, t): P \times \Sigma^* \times \Trees \lto
    \Trees$ assigns the value $v$ to the tree node $p$
  \item $\opnam{remove}(p, t): P \times \Trees \lto \Trees$ removes the
    subtree denoted by $p$
  \item $\opnam{get}(p, t): P \times \Trees \lto \Sigma^*$ looks up the
    value associated with $p$
    \item $\opnam{ls}(p, t): P \times \Trees \lto \List{\Trees}$ lists all
      the subtrees underneath $p$
\end{itemize}

\section{Lenses}

Lenses map between strings in the regular language $C$ and trees
$T\subseteq\Trees$. They can also produce keys from a regular language $K$;
these keys are used by the $\lens{subtree}$ lens to construct new
trees.

A lens $l$ consists of the functions $\nget$, $\nput$, $\ncreate$, and
$\nparse$.

Lenses here are written as $l:\lenstype[K,S,L]{C}{T}$ where $K$ and $C$ are
regular languages and $T\subseteq\Trees$.  The skeletons $S\subseteq\Skel$
and dictionary type specifications $L$ are as for Boomerang (really ??)
Intuitively, the notation says that $l$ is a lens that takes strings from
$C$ and transforms them to trees in $T$. Generally,

\infrule{C\subseteq\Words \andalso K\subseteq\List{\Keys}
           \andalso T\subseteq\Trees
           \andalso S\subseteq\Skel \andalso L \in \Dictlangs}
        {\lens{l} \in \lenstype[K,S,L]{C}{T}}

\begin{align*}
  \nget &\in C \lto T\\
  \nparse &\in C \lto K \times S \times D(L)\\
  \nput &\in T \lto K \times S \times D(L) \lto C \times D(L)\\
  \ncreate &\in T \lto K \times D(L) \lto C \times D(L)
\end{align*}

\subsection{const}

The $\lens{const} E\,t\,v$ maps words matching $E$ in the \nget direction
to a fixed tree $t$ and maps that fixed tree $t$ back in the \nput
direction. When text needs to be created from $t$, it produces the default
word $v$.

\infrule{E\in\Regexp \andalso t\in T\andalso u\in\reglang{E} \andalso L \in \Dictlangs}
        {\lens{const} E\,t\,u \in \lenstype[\rhd,\reglang{E},L]{\reglang{E}}{\{t\}}}

\begin{align*}
  \lget{c} &= t\\
  \lparse{c} &= \rhd, c, \nildict\\
  \lput{t}{k,s,d} &= s, d\\
  \lcreate{t}{k,d} &= u, d
\end{align*}

The \lens{del} lens is syntactic sugar: $\lens{del} E\,u = \lens{const}
E\,\niltree\,u$.

\subsection{copy}

Copies a word into a leaf.

\infrule{E\in\Regexp \andalso L\in\Dictlangs}
        {\lens{copy}\in\lenstype[\rhd,\reglang{E},L]{\reglang{E}}{\tree{\reglang{E}}}}

\begin{align*}
  \lget{c} &= \tree{c}\\
  \lparse{c} &= \rhd, c, \nildict\\
  \lput{\tree{v}}{k,s,d} &= v, d\\
  \lcreate{\tree{v}}{k,d} &= v, d
\end{align*}

\subsection{seq}

Gets the next value from a sequence as the key. We assume there's a
generator $\mathtt{nextval}: \Sigma^* \to \mathbb{N}$ that returns
successive numbers on each invocation. \redigits is the regular expression
{\tt [0-9]+} that matches positive numbers.

\infrule{w\in\Sigma^* \andalso L\in\Dictlangs \andalso n = \mathtt{nextval}(w)}
        {\lens{seq} w\in\lenstype[\reglang{\redigits},\eps,L]{\eps}{\niltree}}

\begin{align*}
  \lget{\eps} &= \niltree\\
  \lparse{\eps} &= n, \eps, \nildict\\
  \lput{\niltree}{k,\eps,d} &= \eps, d\\
  \lcreate{\niltree}{k,d} &= \eps,d
\end{align*}

\subsection{label}

Uses a fixed tree label

\infrule{w\in\Sigma^* \andalso L\in\Dictlangs}
        {\lens{label} w\in\lenstype[w,\eps,L]{\eps}{\niltree}}

\begin{align*}
  \lget{\eps} &= \niltree\\
  \lparse{\eps} &= w,\eps,\nildict\\
  \lput{\niltree}{w,\eps,d} &= \eps,d\\
  \lcreate{\niltree}{k,d} &= \eps,d
\end{align*}

\subsection{key}

Uses a parsed tree label

\infrule{E\in\Regexp \andalso L\in\Dictlangs}
        {\lens{key} E\in\lenstype[\reglang{E},\eps,L]{\reglang{E}}{\niltree}}

\begin{align*}
  \lget{c} &= \niltree\\
  \lparse{c} &= c,\eps,\nildict\\
  \lput{\niltree}{c,\eps,d} &= c,d\\
  \lcreate{\niltree}{c, d} &= c,d
\end{align*}

\subsection{subtree}

The subtree combinator $[l]$ constructs a subtree from $l$

\infrule{l\in\lenstype[K,S,L]{C}{T}}
        {[l]\in\lenstype[\rhd,\square,S::L]{C}{\tree{\tmap{K}{T}}}}

\begin{align*}
  \lget{c} &= \tree{\tmap{l.\lkey{c}}{l.\lget{c}}}\\
  \lparse{c} &= \rhd, \square, \{\tmap{l.\lkey{c}}{[l.\lparse{c}]}\}\\
  \lput{\tree{\tmap{k}{t}}}{k',\square,d} &=
  \begin{cases}
    \pi_1\left(l.\lput{t}{k,\bar{s},\bar{d}}\right),d' & \text{if } (\bar{k}, \bar{s}, \bar{d}), d' = \lookup(k, d)\\
    \pi_1\left(l.\lcreate{t}{k,\nildict}\right), d & \text{if } \lookup(k,
    d) \text{ undefined}
  \end{cases}\\
  \lcreate{\tree{\tmap{k}{t}}}{k',d} &= \lput{\tree{\tmap{k}{t}}}{k',\square,d}
\end{align*}

We store a triple $(k,s,d)$ in dictionaries, but we don't use the stored
key $k$.

\subsection{concat}

The concat combinator $\conc{l_1}{l_2}$ joins two trees.

\infrule{l_1\in\lenstype[K_1,S_1,L]{C_1}{T_1}
         \andalso l_2\in\lenstype[K_2,S_2,L]{C_2}{T_2}
         \andalso \uaconc{C_1}{C_2}
         \andalso \uaconc{\key{T_1}}{\key{T_2}}}
        {\conc{l_1}{l_2}\in\lenstype[\conc{K_1}{K_2},S_1\times S_2, L]{\conc{C_1}{C_2}}{\conc{T_1}{T_2}}}

\begin{align*}
  \lget{(\conc{c_1}{c_2})} &= \conc{(l_1.\lget{c_1})}{(l_2.\lget{c_2})}\\
  \lparse{\conc{c_1}{c_2}} &= \xconc{k_1}{k_2}, (s_1,s_2), \dconc{d_1}{d_2} \\
  \lput{\conc{t_1}{t_2}}{k, (s_1,s_2), d_1} &= \conc{c_1}{c_2}, d_3 \\
  & \text{where } c_i, d_{i+1} = l_i.\lput{t_i}{k, s_i, d_i} \\
  \lcreate{\conc{t_1}{t_2}}{k, d_1} &= \conc{c_1}{c_2}, d_3\\
    & \text{where } c_i, d_{i+1} = l_i.\lcreate{t_i}{k, d_i}
\end{align*}

\subsection{union}

The union combinator $\alt{l_1}{l_2}$ chooses.

\infrule{l_i\in\lenstype[K_i,S_i,L]{C_i}{T_i}\text{ for } i=1,2
  \andalso C_1 \cap C_2 = \emptyset
  \andalso S_1 \cap S_2 = \emptyset
  \andalso \key{T_1} \cap \key{T_2} = \emptyset}
  {\alt{l_1}{l_2}\in\lenstype[K_1\cup K_2, S_1\cup S_2, L]{C_1 \cup C_2}{T_1 \cup T_2}}

\begin{align*}
  \lget{c} &=
     \begin{cases}
       l_1.\lget{c} & \text{if } c \in C_1\\
       l_2.\lget{c} & \text{if } c \in C_2\\
     \end{cases}
  \\
  \lparse{c} &=
     \begin{cases}
       l_1.\lparse{c} & \text{if } c \in C_1\\
       l_2.\lparse{c} & \text{if } c \in C_2\\
     \end{cases}
  \\
  \lput{t}{k, s, d} &=
     \begin{cases}
       l_1.\lput{t}{k, s, d} & \text{if } t, s \in T_1 \times S_1 \\
       l_2.\lput{t}{k, s, d} & \text{if } t, s \in T_2 \times S_2 \\
       l_1.\lcreate{t}{k, d} & \text{if } t, s \in (T_1\setminus T_2)
       \times S_2\\
       l_2.\lcreate{t}{k, d} & \text{if } t, s \in (T_2\setminus T_1)
       \times S_1
     \end{cases}\\
  \lcreate{t}{k, d} &=
     \begin{cases}
       l_1.\lcreate{t}{k, d} & \text{if } t\in T_1\\
       l_2.\lcreate{t}{k, d} & \text{if } t\in T_2\setminus T_1
     \end{cases}
\end{align*}

\subsection{star}

The star combinator $\cstar{l}$ iterates.

\infrule{l\in\lenstype{C}{T} \andalso \uastar{C} \andalso \uastar{\key{T}}}
        {\cstar{l}\in\lenstype[\cstar{K}]{\cstar{C}}{\cstar{T}}}

\begin{align*}
  \lget{c_1\cdots c_n} &= (l.\lget{c_1}) \cdots (l.\lget{c_n})\\
  \lparse{c_1\cdots c_n} &= k_1\odot\ldots\odot k_n, [s_1;\ldots;s_n],
                            d_1\oplus\ldots\oplus d_n\\
                            & \text{where } k_i,s_i,d_i = l.\lparse{c_i}\\
  \lput{t_1\cdots t_n}{k, [s_1;\cdots;s_m], d_1} &= (c_1 \cdots c_n), d_{n+1}\\
  \text{where } &c_i, d_{i+1} =
    \begin{cases}
      l.\lput{t_i}{k, s_i, d_i} & \text{for } 1 \leq i \leq \min(m,n)\\
      l.\lcreate{t_i}{k, d_i} & m+1 \leq i \leq n
    \end{cases}\\
  \lcreate{t_1 \cdots t_n}{k, d_1} &= (c_1\cdots c_n), d_{n+1}\\
  & \text{where } c_i, d_{i+1} = l.\lcreate{t_i}{k, d_i}
\end{align*}

Want reordering and insertion in the middle to be reflected. If
$\lget{\conc{c_1}{c_2}} = \conc{t_1}{t_2}$, want
$\lput{(\conc{t_2}{t_1})}{\conc{c_1}{c_2}} =
\conc{l.\lput{t_2}{c_2}}{l.\lput{t_1}{c_1}}$ This can only happen if the
information to be reordered is in subtrees. In particular, comment lines
need to become their own subtree, with some support from the language to
create `hidden' entries. Simplest: allow {\tt NULL} as the key for a
subtree and ignore such tree entries in the public API.

Need to split a tree $t\in T$ into subtrees according to ?? Keeping a fake
`slot' $\rhd$ around for text that didn't produce a tree should help with
that.

For $K^*$ to make any sense, must have $\rhd\in K$ and the application of
$(l^*).\nparse$ must return $\rhd$ for all except at most one application.

\section{Regular expressions and languages}

For type checking, we need to compute the following properties of regular
languages $R, R_1, R_2$
\begin{itemize}
\item decide unambiguous concatenation $\uaconc{R_1}{R_2}$ and compute
  $\conc{R_1}{R_2}$
\item decide unambiguous iteration $\uastar{R}$ and compute $R^*$
\item disjointness $R_1 \cap R_2 = \emptyset$ (we don't need general
  intersection, though I don't know of a quicker way to decide
  disjointness)
\item compute the regular language $R = \reglang{E}$ for a regular
  expression $E\in\Regexp$
\end{itemize}
\end{document}

%% TeX-parse-self: t
%% TeX-auto-save: t

%% Local Variables:
%% TeX-master: "lenses.tex"
%% compile-command: "pdflatex -file-line-error -halt-on-error lenses.tex"
%% End: