We study learning problems involving arbitrary classes of functions $F$, distributions $X$ and targets $Y$. Because \emph{proper} learning procedures, i.e., procedures that are only allowed to select functions in $F$, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that $F$ is convex), we consider \emph{unrestricted learning procedures} that are free to choose functions outside the given class. We present a new unrestricted procedure that is optimal in a very strong sense: the required sample complexity is essentially the best one can hope for, and the estimate holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with the what one would expect if $F$ were convex, even when $F$ is not. And if $F$ is convex, the procedure turns out to be proper. Thus, the unrestricted procedure is actually optimal in both realms, for convex classes as a proper procedure and for arbitrary classes as an unrestricted procedure.

We study the problem of deterministically exploring an undirected and initially unknown graph with n vertices either by a single agent equipped with a set of pebbles, or by a set of collaborating agents. The vertices of the graph are unlabeled and cannot be distinguished by the agents, but the edges incident to a vertex have locally distinct labels. The graph is explored when all vertices are visited by at least one agent. In this setting, it is known that for a single agent without pebbles (log n) bits of memory are necessary and sufficient to explore any graph with at most n vertices. We are interested in how the memory requirement decreases as the agent may mark vertices by dropping and retrieving distinguishable pebbles, or when multiple agents jointly explore the graph. We give tight results for both questions showing that for a single agent with constant memory (log log n) pebbles are necessary and sufficient for exploration. We further prove that using collaborating agents instead of pebbles does not help as (log log n) agents with constant bits of memory each are necessary and sufficient for exploration. For the upper bounds, we devise an algorithm for a single agent with constant memory that explores any n-vertex graph using O(log log n) pebbles, even when n is not known a priori. The algorithm terminates after polynomial time and returns to the starting vertex. Since an additional agent is at least as powerful as a pebble, this implies that O(log log n) agents with constant memory can explore any n-vertex graph. For the lower bound, we show that the number of agents needed for exploring any graph with at most n vertices is already ©(log log n) when we allow each agent to have at most O(log n^(1-µ)) bits of memory for some µ>0. This also implies that a single agent with sublogarithmic memory needs (log log n) pebbles to explore any n-vertex graph.

Query answering over incomplete data invariably relies on the standard notion of certain answers which gives a very coarse classification of query answers into those that are certain and those that are not. Our goal is to refine it by measuring how close an answer is to certainty. This measure is defined as the probability that the query is true under a random interpretation of missing information in a database. Since there are infinitely many such interpretations, to pick one at random we adopt the approach used in the study of asymptotic properties and 0--1 laws for logical sentences, and define the measure as the limit of a sequence. We prove that without any restrictions imposed, the standard model of missing data admits the 0--1 law. That is, the limit always exists and can be only 0 or 1 for a very large class of queries. In other words, query answers are either almost certainly true, or almost certainly false. We show that almost certainly true answers are precisely those returned by the naive evaluation of the query. When restrictions are imposed and databases are required to satisfy constraints, the measure is the conditional probability of the query being true if the constraints are true. This too is defined as a limit; we prove that it always exists, can be an arbitrary rational number, and is computable. For some constraints, such as functional dependencies, the 0-1 law continues to hold. We also look at evaluation procedures based on many-valued logics, as used in relational database systems that handle incomplete information. We identify conditions when such evaluation procedures return almost certainly true answers, and explain reasons why real-life DBMSs break such conditions and can thus return arbitrarily bad answers. As another refinement of the notion of certainty, we introduce a comparison of query answers: an answer with a larger set of interpretations that make it true is better. We identify the precise complexity of such comparisons, and of finding sets of best answers, for first-order queries.

We prove that the Weisfeiler-Leman (WL) dimension of the class of all finite planar graphs is at most 3. In particular, every finite planar graph is definable in first-order logic with counting using at most 4 variables. The previously best known upper bounds for the dimension and number of variables were 14 and 15, respectively. First we show that, for dimension 3 and higher, the WL-algorithm correctly tests isomorphism of graphs in a minor-closed class whenever it determines the orbits of the automorphism group of any arc-colored 3-connected graph belonging to this class. Then we prove that, apart from several exceptional graphs (which have WL-dimension at most 2), the individualization of two correctly chosen vertices of a colored 3-connected planar graph followed by the 1-dimensional WL-algorithm produces the discrete vertex partition. This implies that the 3-dimensional WL-algorithm determines the orbits of a colored 3-connected planar graph. As a byproduct of the proof, we get a classification of the 3-connected planar graphs with fixing number 3.

The geometric intersection number of a curve on a surface is the minimal number of self-intersections of any homotopic curve, i.e. of any curve obtained by continuous deformation. Given a curve $c$ represented by a closed walk of length at most $\ell$ on a combinatorial surface of complexity $n$ we describe simple algorithms to (1) compute the geometric intersection number of $c$ in $O(n+ \ell^2)$ time, (2) construct a curve homotopic to $c$ that realizes this geometric intersection number in $O(n+\ell^4)$ time, (3) decide if the geometric intersection number of $c$ is zero, i.e. if $c$ is homotopic to a simple curve, in $O(n+\ell\log\ell)$ time. The algorithms for (2) and (3) are restricted to orientable surfaces, but the algorithm for (1) is also valid on non-orientable surfaces. To our knowledge, no exact complexity analysis had yet appeared on those problems. An optimistic analysis of the complexity of the published algorithms for problems (1) and (3) gives at best a $O(n+g^2\ell^2)$ time complexity on a genus $g$ surface without boundary. No polynomial time algorithm was known for problem (2) for surfaces without boundary. Interestingly, our solution to problem (3) provides a quasi-linear algorithm to a problem raised by Poincar\'e more than a century ago. Finally, we note that our algorithm for problem (1) extends to computing the geometric intersection number of two curves of length at most $\ell$ in $O(n+ \ell^2)$ time.

Contexts are terms with one `hole', i.e. a place in which we can substitute an~argument. In context unification we are given an equation over terms with variables representing contexts and ask about the satisfiability of this equation. Context unification is a natural subvariant of second-order unification, which is undecidable, and a generalization of word equations, which are decidable, at the same time. It is the unique problem between those two whose decidability remained unknown (for already almost two decades). In this paper we show that the context unification is in PSPACE and in EXPTIME, when tree regular constraints are also allowed. Those results are obtained by extending the recompression technique, recently developed by the author and used in particular to obtain a new PSPACE algorithm for satisfiability of word equations, to context unification. The recompression is based on performing simple compression rules (replacing pairs of neighbouring function symbols), which are (conceptually) applied on the solution of the context equation and modifying the equation in a way so that such compression steps can be performed directly on the equation, without the knowledge of the actual solution. The crucial property is that when the compression operation is chosen in appropriate way, the instance stays polynomial-size.