Journal of the ACM (JACM)

Latest Articles

Subcubic Equivalences Between Path, Matrix, and Triangle Problems

We say an algorithm on n × n matrices with integer entries in [−M,M] (or n-node graphs with edge weights from [−M,M])... (more)

Ontology-Mediated Queries: Combined Complexity and Succinctness of Rewritings via Circuit Complexity

We give solutions to two fundamental computational problems in ontology-based data access with the W3C standard ontology language OWL 2 QL: the succinctness problem for first-order rewritings of ontology-mediated queries (OMQs) and the complexity problem for OMQ answering. We classify OMQs according to the shape of their conjunctive... (more)

General Belief Revision

In artificial intelligence, a key question concerns how an agent may rationally revise its beliefs in light of new information. The standard (AGM) approach to belief revision assumes that the underlying logic contains classical propositional logic. This is a significant limitation, since many representation schemes in AI don’t subsume... (more)

Weakest Precondition Reasoning for Expected Runtimes of Randomized Algorithms

This article presents a wp--style calculus for obtaining bounds on the expected runtime of randomized algorithms. Its application includes determining... (more)

The Cost of Unknown Diameter in Dynamic Networks

For dynamic networks with unknown diameter, we prove novel lower bounds on the time complexity of a range of basic distributed computing problems. Together with trivial upper bounds under dynamic networks with known diameter for these problems, our lower bounds show that the complexities of all these problems are sensitive to whether the diameter... (more)

On Algebraic Branching Programs of Small Width

In 1979, Valiant showed that the complexity class VPe of families with polynomially bounded formula size is contained in the class VPs of families that have algebraic branching programs (ABPs) of polynomially bounded size. Motivated by the problem of separating these classes, we study the topological closure &overline;VPe, i.e., the class of... (more)


About JACM

The Journal of the ACM (JACM) provides coverage of the most significant work on principles of computer science, broadly construed. The scope of research we cover encompasses contributions of lasting value to any area of computer science. To be accepted, a paper must be judged to be truly outstanding in its field.  JACM is interested  in work in core computer science and at the boundaries, both the boundaries of subdisciplines of computer science and the boundaries between computer science and other fields.  READ MORE

Editorial Process

The Journal of the ACM begins the refereeing process with a "quick review", to check whether the manuscript has a plausible chance of meeting JACM's high standards, even if all the claimed results are correct. JACM tries to cover a broad spectrum of areas, and can only accept 4-5 papers in any given area every year. Thus, we try to focus on the most significant papers in each area, that would be of interest to the broad community, and reject many papers that would be accepted by other journals. READ MORE

Important Note on P/NP

Some submissions purport to solve a long-standing open problem in complexity theory, such as the P/NP problem. Many of these turn out to be mistaken, and such submissions tax JACM volunteer editors and reviewers.  READ MORE

Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning

We prove that any algorithm for learning parities requires either a memory of quadratic size or an exponential number of samples. This proves a recent conjecture of Steinhardt, Valiant and Wager and shows that for some learning problems a large storage space is crucial. More formally, in the problem of parity learning, an unknown string $x \in \{0,1\}^n$ was chosen uniformly at random. A learner tries to learn $x$ from a stream of samples $(a_1, b_1), (a_2, b_2)... $, where each $a_t$ is uniformly distributed over $\{0,1\}^n$ and $b_t$ is the inner product of $a_t$ and $x$, modulo 2. We show that any algorithm for parity learning, that uses less than $n^2/25$ bits of memory, requires an exponential number of samples. Previously, there was no non-trivial lower bound on the number of samples needed, for any learning problem, even if the allowed memory size is $O(n)$ (where $n$ is the space needed to store one sample). We also give an application of our result in the field of bounded-storage cryptography. We show an encryption scheme that requires a private key of length $n$, as well as time complexity of $n$ per encryption/decryption of each bit, and is provenly and unconditionally secure as long as the attacker uses less than $n^2/25$ memory bits and the scheme is used at most an exponential number of times. Previous works on bounded-storage cryptography assumed that the memory size used by the attacker is at most linear in the time needed for encryption/decryption.

Computing the Homology of Basic Semialgebraic Sets in Weak Exponential Time

We describe and analyze an algorithm for computing the homology (Betti numbers and torsion coefficients) of basic semialgebraic sets which works in weak exponential time. That is, out of a set of exponentially small measure in the space of data the cost of the algorithm is exponential in the size of the data. All algorithms previously proposed for this problem have a complexity which is doubly exponential (and this is so for almost all data).

Approximate counting, the Lovasz local lemma, and inference in graphical models

In this paper we introduce a new approach for approximately counting in bounded degree systems with higher-order constraints. Our main result is an algorithm to approximately count the number of solutions to a CNF formula ¦ when the width is logarithmic in the maximum degree. This closes an exponential gap between the known upper and lower bounds. Moreover our algorithm extends straightforwardly to approximate sampling, which shows that under Lovasz Local Lemma-like conditions it is not only possible to find a satisfying assignment, it is also possible to generate one approximately uniformly at random from the set of all satisfying assignments. Our approach is a significant departure from earlier techniques in approximate counting, and is based on a framework to bootstrap an oracle for computing marginal probabilities on individual variables. Finally, we give an application of our results to show that it is algorithmically possible to sample from the posterior distribution in an interesting class of graphical models.

Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation)

The goal of this paper is to identify fundamental limitations on how efficiently algorithms implemented on platforms such as MapReduce and Hadoop can compute the central problems in the motivating application domains, such as graph connectivity problems. We introduce an abstract model of massively parallel computation, where essentially the only restrictions are that the fan-in of each machine is limited to s bits, where s is smaller than the input size n, and that computation proceeds in synchronized rounds, with no communication between different ma- chines within a round. Lower bounds on the round complexity of a problem in this model apply to every computing platform that shares the most basic design principles of MapReduce-type systems. We prove that computations in our model that use few rounds can be represented as low-degree polynomials over the reals. This connection allows us to translate a lower bound on the (approximate) polynomial degree of a Boolean function to a lower bound on the round complexity of every (randomized) massively parallel computation of that function. These lower bounds apply even in the unbounded width version of our model, where the number of machines can be arbitrarily large. As one example of our general results, computing any non-trivial monotone graph property  such as connectivity  requires a super-constant number of rounds when every machine can accept only a sub-polynomial (in n) number of input bits s. Finally, we prove that, in two senses, our lower bounds are the best one could hope for. For the unbounded-width model, we prove a matching upper bound. Restricting to a polynomial number of machines, we show that asymptotically better lower bounds would separate P from NC1.

Scaling Exponential Backoff: Constant Throughput, Polylogarithmic Channel-Access Attempts, and Robustness

Randomized exponential backoff is a widely deployed technique for coordinating access to a shared resource. A good backoff protocol should, arguably, satisfy three natural properties: (i) it should provide constant throughput, wasting as little time as possible; (ii) it should require few failed access attempts, minimizing the amount of wasted effort; and (iii) it should be robust, continuing to work efficiently even if some of the access attempts fail for spurious reasons. Unfortunately, exponential backoff has some well-known limita- tions in two of these areas: it provides poor (sub-constant) throughput (in the worst case), and is not robust (to adversarial disruption). The goal of this paper is to fix exponential backoff by making it scalable, particularly focusing on the case where processes arrive in an on-line, worst-case fashion. We present a relatively simple backoff proto- col, RE-BACKOFF, that has, at its heart, a version of exponential backoff. It guarantees expected constant throughput with dynamic process arrivals and requires only an expected polylogarithmic number of access attempts per process. RE-BACKOFF is also robust to periods where the shared resource is unavailable for a period of time. If it is unavailable for D time slots, RE-BACKOFF provides the following guarantees. When the number of packets is a finite n, the average expected number of access attempts for successfully sending a packet is O(log2(n + D)). In the infinite case, the average expected number of access attempts for successfully sending a packet is O(log2(· + D)) where · is the maximum number of processes that are ever in the system concurrently.

Near-optimal linear decision trees for k-SUM and related problems

We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant $k$, we construct linear decision trees that solve the $k$-SUM problem on $n$ elements using $O(n \log^2 n)$ linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two $k$-subsets; when viewed as linear queries, comparison queries are $2k$-sparse and have only $\{-1,0,1\}$ coefficients. We give similar constructions for sorting sumsets $A+B$ and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of ``inference dimension", recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension.

Capacity Upper Bounds for Deletion-Type Channels

We develop a systematic approach, based on convex programming and real analysis, for obtaining upper bounds on the capacity of the binary deletion channel and, more generally, channels with i.i.d. insertions and deletions. Other than the classical deletion channel, we give a special attention to the Poisson-repeat channel introduced by Mitzenmacher and Drinea (IEEE Transactions on Information Theory, 2006). Our framework can be applied to obtain capacity upper bounds for any repetition distribution (the deletion and Poisson-repeat channels corresponding to the special cases of Bernoulli and Poisson distributions). Our techniques essentially reduce the task of proving capacity upper bounds to maximizing a univariate, real-valued, and often concave function over a bounded interval. The corresponding univariate function is carefully designed according to the underlying distribution of repetitions and the choices vary depending on the desired strength of the upper bounds as well as the desired simplicity of the function (e.g., being only efficiently computable versus having an explicit closed-form expression in terms of elementary, or common special, functions). Among our results, we show the following: 1. The capacity of the binary deletion channel with deletion probability $d$ is at most $(1-d) \log \varphi$ for $d \geq 1/2$, and, assuming the capacity function is convex, is at most $1-d \log(4/\varphi)$ for $d<1/2$, where $\varphi=(1+\sqrt{5})/2$ is the golden ratio. This is the first nontrivial capacity upper bound for any value of $d$ outside the limiting case $d \to 0$ that is fully explicit and proved without computer assistance. 2. We derive the first set of capacity upper bounds for the Poisson-repeat channel. Our results uncover further striking connections between this channel and the deletion channel, and suggest, somewhat counter-intuitively, that the Poisson-repeat channel is actually analytically simpler than the deletion channel and may be of key importance to a complete understanding of the deletion channel. 3. We derive several novel upper bounds on the capacity of the deletion channel. All upper bounds are maximums of efficiently computable, and concave, univariate real functions over a bounded domain. In turn, we upper bound these functions in terms of explicit elementary and standard special functions, whose maximums can be found even more efficiently (and sometimes, analytically, for example for $d=1/2$). Along the way, we develop several new techniques of potentially independent interest. For example, we develop systematic techniques to study channels with mean constraints over the reals. Furthermore, we motivate the study of novel probability distributions over non-negative integers, as well as novel special functions which could be of interest to mathematical analysis.

Deterministic Edge Connectivity in Near-Linear Time

We present a deterministic algorithm that computes the edge-connectivity of a graph in near-linear time. This is for a simple undirected unweighted graph $G$ with $n$ vertices and $m$ edges. This is the first $o(mn)$ time deterministic algorithm for the problem. Our algorithm is easily extended to find a concrete minimum edge-cut. In fact, we can construct the classic cactus representation of all minimum cuts in near-linear time. The previous fastest deterministic algorithm by Gabow from STOC'91 took $\tO(m+\lambda^2 n)$, where $\lambda$ is the edge connectivity, but $\lambda$ can be as big as $n-1$. At STOC'96 Karger presented a randomized near linear time Monte Carlo algorithm for the minimum cut problem. As he points out, there is no better way of certifying the minimality of the returned cut than to use Gabow's slower deterministic algorithm and compare sizes. Our main technical contribution is a near-linear time algorithm that contracts vertex sets of a simple input graph $G$ with minimum degree $\delta$, producing a multigraph $\bbar G$ with $\tO(m/\delta)$ edges which preserves all minimum cuts of $G$ with at least two vertices on each side. In our deterministic near-linear time algorithm, we will decompose the problem via low-conductance cuts found using PageRank a la Brin and Page (1998), as analyzed by Andersson, Chung, and Lang at FOCS'06. Normally such algorithms for low-conductance cuts are randomized Monte Carlo algorithms, because they rely on guessing a good start vertex. However, in our case, we have so much structure that no guessing is needed.

On the decidability of membership in matrix-exponential semigroups

Given k+1 square matrices A_1, ..., A_k, C, all of the same dimension and with real algebraic coordinates, we examine the problem of deciding whether there exist matrices of the form exp(A_i t_j) whose product equals C. Our results have applications to reachability problems for linear hybrid automata. Our decidability proof relies on a number of theorems from algebraic and transcendental number theory, most notably those of Baker, Lindemann, and Masser, as well as some useful geometric and linear-algebraic results, including the Minkowski-Weyl theorem and a new (to the best of our knowledge) result about the uniqueness of strictly upper triangular matrix logarithms of upper unitriangular matrices. On the other hand, our undecidability results are shown by reduction from Hilbert's Tenth Problem, through a series of matrix constructions.

Invited Article Foreword for 65.6

Settling the query complexity of non-adaptive junta testing

We prove that any non-adaptive algorithm that tests whether an unknown Boolean function f:{0,1}n’{0,1} is a k-junta or µ-far from every k-junta must make ©(k3/2/µ) many queries for a wide range of parameters k and µ. Our result dramatically improves previous lower bounds from [BGSMdW13, STW15], and is essentially optimal given Blais's non-adaptive junta tester from [Bla08], which makes O(k3/2)/µ queries. Combined with the adaptive tester of [Bla09] which makes O(k log k + k/µ) queries, our result shows that adaptivity enables polynomial savings in query complexity for junta testing.

The PCL Theorem. Transactions cannot be Parallel, Consistent and Live.

We establish the PCL theorem which states that it is impossible to design a transactional memory algorithm which ensures (1) parallelism, i.e. transactions do not need to synchronize unless they access the same application objects, (2) very little consistency, i.e. a consistency condition, called weak adaptive consistency, introduced here and which is weaker than snapshot isolation, processor consistency, and any other consistency condition stronger than them (such as opacity, serializability, causal serializability, etc.), and (3) very little liveness, i.e. that transactions eventually commit if they run solo.

Engineering with Logic: Rigorous Test-Oracle Specification and Validation for TCP/IP and the Sockets API

Conventional computer engineering relies on test-and-debug development processes, with the behaviour of common interfaces described (at best) with prose specification documents. But prose specifications cannot be used in test-and-debug development in any automated way, and prose is a poor medium for expressing complex (and loose) specifications. The TCP/IP protocols and Sockets API are a good example of this: they play a vital role in modern communication and computation, and interoperability between implementations is essential. But what exactly they are is surprisingly obscure: their original development focussed on ``rough consensus and running code'', augmented by prose RFC specifications that do not precisely define what it means for an implementation to be correct. Ultimately, the actual standard is the de facto one of the common implementations, including, for example, the 15\,000--20\,000 lines of the BSD implementation --- optimised and multithreaded C code, time-dependent, with asynchronous event handlers, intertwined with the operating system, and security-critical. This paper reports on work done in the Netsem project to develop lightweight mathematically rigorous techniques that can be applied to such systems: to specify their behaviour precisely (but loosely enough to permit the required implementation variation) and to test whether these specifications and the implementations correspond, with specifications that are executable as test oracles. We developed post-hoc specifications of TCP, UDP, and the Sockets API, both of the service that they provide to applications (in terms of TCP bidirectional stream connections), and of the internal operation of the protocol (in terms of TCP segments and UDP datagrams), together with a testable abstraction function relating the two. These specifications are rigorous, detailed, readable, with broad coverage, and are rather accurate. Working within a general-purpose proof assistant (HOL4), we developed language idioms (within higher-order logic) in which to write the specifications: operational semantics with nondeterminism, time, system calls, monadic relational programming, etc. We followed an experimental semantics approach, validating the specifications against several thousand traces captured from three implementations (FreeBSD, Linux, and WinXP). Many differences between these were identified, and a number of bugs. Validation was done using a special-purpose symbolic model checker programmed above HOL4. Having demonstrated that our logic-based engineering techniques suffice for handling real-world protocols, we argue that similar techniques could be applied to future critical software infrastructure at design time, leading to cleaner designs and (via specification-based testing) more robust and predictable implementations. In cases where specification looseness can be controlled, this should be possible with lightweight techniques, without the need for a general-purpose proof assistant, at relatively little cost.

Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems

We present prior robust algorithms for a large class of resource allocation problems where requests arrive one-by-one (online), drawn independently from an {\em unknown} distribution at every step. We design a single algorithm that, for every possible underlying distribution, obtains a $1-\epsilon$ fraction of the profit obtained by an algorithm that knows the entire request sequence ahead of time. The factor $\epsilon$ approaches $0$ when no single request consumes/contributes a significant fraction of the global consumption/contribution by all requests together. We show that the trade-off we obtain here that determines how fast $\epsilon$ approaches $0$, is near optimal: we give a nearly matching lower bound showing that the trade-off cannot be improved much beyond what we obtain. Going beyond the model of a static underlying distribution, we introduce the {\em adversarial stochastic input} model, where an adversary, possibly in an adaptive manner, controls the distributions from which the requests are drawn at each step. Placing no restriction on the adversary, we design an algorithm that obtains a $1-\epsilon$ fraction of the optimal profit obtainable w.r.t. the worst distribution in the adversarial sequence. Further, if the algorithm is given one number per distribution, namely, the optimal profit possible for each of the adversary's distribution, we design an algorithm that achieves a $1-\epsilon$ fraction of the weighted average of the optimal profit of each distribution the adversary picks. In the offline setting we give a fast algorithm to solve very large LPs with both packing and covering constraints. We give algorithms to approximately solve (within a factor of $1+\epsilon$) the mixed packing-covering problem with $O(\frac{\gamma m \log (n/\delta)}{\epsilon^2})$ oracle calls where the constraint matrix of this LP has dimension $n\times m$, the success probability of the algorithm is $1-\delta$, and $\gamma$ quantifies how significant a single request is when compared to the sum total of all requests. We discuss implications of our results to several special cases including online combinatorial auctions, network routing and the adwords problem.

Circuit complexity, proof complexity, and polynomial identity testing: The ideal proof system

We introduce a new and natural algebraic proof system, whose complexity measure is essentially the algebraic circuit size of Nullstellensatz certificates. This enables us to exhibit close connections between effective Nullstellensatzë, proof complexity, and (algebraic) circuit complexity. In particular, we show that any super-polynomial lower bound on any Boolean tautology in our proof system implies that the permanent does not have polynomial-size algebraic circuits (VNP ` VP). We also show that super-polynomial lower bounds on the number of lines in Polynomial Calculus proofs imply the Permanent versus Determinant Conjecture. Note that there was no proof system prior to ours for which lower bounds on an arbitrary tautology implied any computational lower bound. Our proof system helps clarify the relationships between previous algebraic proof systems, and begins to shed light on why proof complexity lower bounds for various proof systems have been so much harder than lower bounds on the corresponding circuit classes. In doing so, we highlight the importance of polynomial identity testing (PIT) in proof complexity. In particular, we use PIT to illuminate AC0[p]-Frege lower bounds, which have been open for nearly 30 years, with no satisfactory explanation as to their apparent difficulty. Finally, we explain the obstacles that must be overcome in any attempt to extend techniques from algebraic circuit complexity to prove lower bounds in proof complexity. Using the algebraic structure of our proof system, we propose a novel route to such lower bounds. Although such lower bounds remain elusive, this proposal should be contrasted with the difficulty of extending AC0[p] circuit lower bounds to AC0[p]-Frege lower bounds.

Exact Algorithms via Monotone Local Search

We give a new general approach for designing exact exponential-time algorithms for subset prob- lems. In a subset problem the input implicitly describes a family of sets over a universe of size n and the task is to determine whether the family contains at least one set. A typical example of a subset problem is Weighted d-SAT. Here, the input is a CNF-formula with clauses of size at most d, and an integer W. The universe is the set of variables and the variables have integer weights. The family contains all the subsets S of variables such that the total weight of the vari- ables in S does not exceed W, and setting the variables in S to 1 and the remaining variables to 0 satisfies the formula. Our approach is based on monotone local search, where the goal is to extend a partial solution to a solution by adding as few elements as possible. More formally, in the extension problem we are also given as input a subset X of the universe and an integer k. The task is to determine whether one can add at most k elements to X to obtain a set in the (implicitly defined) family. Our main result is that a c^kn^{O(1)} time algorithm for the extension problem immediately yields a randomized algorithm for finding a solution of any size with running time O((2 1/c)^n). c In many cases, the extension problem can be reduced to simply finding a solution of size at most k. Furthermore, efficient algorithms for finding small solutions have been extensively studied in the field of parameterized algorithms. Directly applying these algorithms, our theorem yields in one stroke significant improvements over the best known exponential-time algorithms for several well-studied problems, including d-Hitting Set, Feedback Vertex Set, Node Unique Label Cover, and Weighted d-SAT. Our results demonstrate an interesting and very concrete connection between parameterized algorithms and exact exponential-time algorithms. We also show how to derandomize our algorithms at the cost of a subexponential multiplicative factor in the running time. Our derandomization is based on an efficient construction of a new pseudo-random object that might be of independent interest. Finally, we extend our methods to establish new combinatorial upper bounds and develop enumeration algorithms.

Parallel Metric Tree Embedding based on an Algebraic View on Moore-Bellman-Ford

A metric tree embedding of expected stretch ± e 1 maps a weighted n-node graph G = (V, E, É) to a weighted tree T = (V_T, E_T , É_T) with V † V_T such that, for all v,w  V, dist(v, w, G) d dist(v, w, T) and E[dist(v, w, T)] d ± dist(v, w, G). Such embeddings are highly useful for designing fast approximation algorithms, as many hard problems are easy to solve on tree instances. However, to date the best parallel (polylog n)-depth algorithm that achieves an asymptotically optimal expected stretch of ±  O(log n) requires ©(n^2) work and a metric as input. In this paper, we show how to achieve the same guarantees using polylog n depth and weak-O(m^(1+µ)) work, where m = |E| and µ > 0 is an arbitrarily small constant. Moreover, one may further reduce the work to weak-O(m + n^(1+µ)) at the expense of increasing the expected stretch to O(µ^(-1) log n). Our main tool in deriving these parallel algorithms is an algebraic characterization of a generalization of the classic Moore-Bellman-Ford algorithm. We consider this framework, which subsumes a variety of previous ``Moore-Bellman-Ford-like'' algorithms, to be of independent interest and discuss it in depth. In our tree embedding algorithm, we leverage it for providing efficient query access to an approximate metric that allows sampling the tree using polylog n depth and weak-O(m) work. We illustrate the generality and versatility of our techniques by various examples and a number of additional results. Specifically, we - improve the state of the art for determining metric tree embeddings in the Congest model, - determine a (1 + µ')-approximate metric regarding the distances in a graph G in polylogarithmic depth and weak-O(nm^(1+µ)) work, and - improve upon the state of the art regarding the k-median and the the buy-at-bulk network design problems.

From Real-Time Logic to Timed Automata

We show how to transform formulae written in the real-time temporal logic MITL into timed automata recognizing their satisfying models, taken to be continuous-time Boolean signals. This compositional construction is much simpler than previously known; it supports both past and future operators and can easily be extended.

All ACM Journals | See Full Journal Index

Search JACM
enter search term and/or author name