In this paper, we consider a synchronization problem between nodes A and B that are connected through a twoway communication channel. Node A contains a binary file X of length n and node B contains a binary file Y that is generated by randomly deleting bits from X, by a small deletion rate β. The location of deleted bits is not known to either node A or node B. We offer a deterministic synchronization scheme between nodes A and B that needs a total of O(nβ log 1 β ) transmitted bits and reconstructs X at node B with probability of error that is exponentially low in the size of X. Orderwise, the rate of our scheme matches the optimal rate for this channel.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 2 have been many improvements over the baseline approach. For example Suel et al. [9] proposed a protocol that in certain cases can save up to 50% of bandwidth over RSYNC. There are also more specialized synchronization tools, such as VSYNC [10], which synchronizes between video files.
B. Our ContributionWhile most of the previous work has concentrated on synchronizing from a fixed number of edits between two files X and Y , in this paper we are interested in a more practical scenario, which is synchronizing from a fixed rate of edits between two files. We only study synchronization from deletions, and will discuss possible extensions to the more general case of deletions and insertions at the end of the paper. More specifically, we consider synchronization between node A and node B where node A has a binary string X that is generated by an i.i.d. Bernoulli process of parameter 1 2 . Node B has a binary string Y that is generated from X by randomly and independently deleting bits of X with probability β that is very small. We are interested in an optimal transmission protocol for synchronizing between nodes A and B when n, the length of X, is large.We remark that, throughout the paper, by small β we implicitly mean that there exists β 0 > 0 such that our discussion is valid for all β < β 0 . Furthermore, by large n we implicitly mean that for every β < β 0 there exists a positive integer n β such that our discussion is valid for all n > n β .In order to evaluate a lower bound on the optimal number of transmitted bits between nodes A and B, suppose that node A has access to string Y . Then, the optimal number of transmitted bits to node B, needed for reconstructing X is H(X|Y ), which is the conditional entropy of string X given string Y . Ma et al.[11] considered a more general set-up where the deletion pattern follows a stationary Markov chain. By applying the result of [11] to our model, for small values of β, the entropy H(X|Y ) can be estimated as follows