Discrepancy of stratified samples from partitions of the unit cube

We extend the notion of jittered sampling to arbitrary partitions and study the discrepancy of the related point sets. Let $\mathbf{Omega}=(\Omega_1,\ldots,\Omega_N)$ be a partition of $[0,1]^d$ and let the $i$th point in $\mathcal{P}$ be chosen uniformly in the $i$th set of the partition (and stochastically independent of the other points), $i=1,\ldots,N$. For the study of such sets we introduce the concept of a uniformly distributed triangular array and compare this notion to related notions in the literature. We prove that the expected ${\mathcal{L}_p}$-discrepancy, $\mathbb{E} {\mathcal{L}_p}(\mathcal{P}_{\mathbf{Omega}})^p$, of a point set $\mathcal{P}_\mathbf{Omega}$ generated from any equivolume partition $\mathbf{Omega}$ is always strictly smaller than the expected ${\mathcal{L}_p}$-discrepancy of a set of $N$ uniform random samples for $p>1$. For fixed $N$ we consider classes of stratified samples based on equivolume partitions of the unit cube into convex sets or into sets with a uniform positive lower bound on their reach. It is shown that these classes contain at least one minimizer of the expected ${\mathcal{L}_p}$-discrepancy. We illustrate our results with explicit constructions for small $N$. In addition, we present a family of partitions that seems to improve the expected discrepancy of Monte Carlo sampling by a factor of 2 for every $N$.


Introduction
Given a finite set P = {x 1 , . . . , x N } of N points in [0, 1] d one way to quantify how well-spread these points are, is to calculate the L p -discrepancy i.e. the L p norm of the so-called discrepancy function. For an infinite sequence S the L p -discrepancy L p (S N ) is the L p -discrepancy of the first N elements, S N , of S. Another important irregularity measure is the star-discrepancy defined as D * (P) := sup The L 2 -discrepancy is a particularly well studied and understood measure for the irregularities of point sets. We refer to the book [5] and the excellent survey [6] for further details. In particular, and in contrast to other measures such as the star-discrepancy, it is known how to construct deterministic point sets with the optimal order of magnitude of the L 2 -discrepancy; see [1,6,7]. For d = 2 the optimal order of the L 2 -discrepancy for finite point sets is known to be O( √ log N /N ), which already goes back to a result of Davenport [4]. The optimality of these constructions follows from a seminal result of Roth [25] who derived a general lower bound for the L 2 -discrepancy of arbitrary sets of N points in [0, 1] d ; see e.g. [5,Theorem 3.20]. While deterministic point sets with small discrepancy are widely used in the context of numerical integration, simulations of different real world phenomena may require an element of randomness. The expected discrepancy of a set P N of N i.i.d. uniform random points in [0, 1] d is of order O(1/ √ N ) and as such independent of the dimension. For two-dimensional point sets of N i.i.d. uniform random points, we thus also have an expected discrepancy of order O(1/ √ N ) similar to the two-dimensional regular grid (whose discrepancy is known to get worse as the dimension increases). Randomized quasi-Monte Carlo (RQMC) sampling is a popular method to randomize deterministic point sets; see [9] for an excellent introduction. Clever constructions of deterministic point sets, so called quasi-Monte Carlo (QMC) sampling can significantly improve the asymptotic order of integration errors when compared to classical Monte Carlo sampling. RQMC basically takes a deterministic QMC point set as an input and uses a randomisation technique (e.g. a random shift modulo 1 or a so-called digital shift) to generate a new point set, which can be shown to have improved uniform distribution properties compared to Monte Carlo samples, while still enjoying the advantages of being 'random' in theoretical analysis; see [2,8,11,21,22,23]. The starting point for our work, can also be considered as a basic RQMC technique which was already discussed in [11] in a slightly more general form. Classical jittered sampling for N = m d combines the simplicity of grids with uniform random sampling by partitioning [0, 1] d into m d axis-aligned congruent cubes and placing a random point inside each of them; see Fig.  1. Jittered sampling is sometimes referred to as 'stratified sampling' in the literature, but we will use the term 'stratified sampling' in a more broad sense as outlined below. Motivated by recent progress [19,20] the aim of this paper is to take a systematic look at jittered sampling and its extension based on more general partitions Ω = (Ω 1 , . . . , Ω N ) of [0, 1] d . We consider stratified sampling, where [0, 1] d is partitioned into N subsets Ω 1 , . . . , Ω N and the ith point in P is chosen uniformly in the ith set of the partition (and stochastically independent of the other points), i = 1, . . . , N . If N = m d and the partition consists of the above mentioned axis-aligned congruent cubes, we obtain jittered sampling as a special case. Besides results for fixed N , we are also interested in the behavior of stratified samples derived from sequences of partitions when N becomes large.
At this point, we would like to emphasize that sequences of partitions that can be used in stratified sampling are more general than those in Kakutani's splitting procedure and its variants [16,24,30]. Apart from the obvious difference that these procedures restrict considerations to d = 1, the partitions in the present paper need not be nested. This means that the partition in step N + 1 is not necessarily obtained as a refinement of the partition in step N .
It was shown in [19] that the asymptotic order of the star-discrepancy of a point set obtained from jittered sampling is O(N − 1 2 − 1 2d ). Thus, taking partitions can significantly improve the expected discrepancy of (random) point sets in small dimensions d ≥ 2. We are interested in the following main questions: (1) In which sense are sequences of stratified sample points uniformly distributed as their number N increases? What is the connection to similar notions for partitions in the literature? (2) Does stratified sampling yield smaller or larger mean discrepancy than Monte Carlo sampling with N i.i.d. uniform random points? Are there discrepancy notions and assumptions assuring that stratified sampling is strictly better?
(3) Is there a 'best' partition for a given N in terms of a chosen mean discrepancy? (4) Is there a simple family of partitions {Ω (N ) } N ≥1 that gives reasonable results for all N and not just for square numbers of points as in the case of classical jittered sampling? (5) Can we improve classical jittered sampling with stratified sampling? Section 2 presents our answers to these questions in the order given. In Section 3 we prove our main theoretical results and illustrate them with examples. Section 4 introduces and explores an infinite family of partitions and contains more examples as well as numerical results.
and the sets do not overlap in the L 1 -sense, so Ω i ∩ Ω j is a Lebesgue-null set for all i = j in {1, . . . , N }. It should be emphasized that we prefer this condition to the stronger one requiring pairwise disjoint sets, as we later want to work with closed sets. When the sets Ω i and Ω j are convex, then |Ω i ∩ Ω j | = 0 is equivalent to saying that Ω i and Ω j do not have any interior points in common. In contrast to stratified sampling in classical sampling theory (see, e.g. [29]), we sample only one point in each of the strata. For the moment, we pose no other geometric conditions on the partition, in particular the sets Ω i need not be connected. Any such partition gives rise to an N -element stratified sample P Ω of N random points derived from the partition by picking a random uniform (random w.r.t. the normalized Lebesgue measure) point from each Ω i in a stochastically independent manner. As ground model for comparison we often will consider the set of Monte Carlo samples P N consisting of N i.i.d. (independent and identically distributed) uniform random points in the unit cube.
For a set P ⊂ [0, 1] d of finitely many points in the unit cube let be the proportion of points falling in a test cube [0, x[ with x ∈ [0, 1] d . For a Monte Carlo sample, Z x (P N ) is a random variable with mean |[0, x[|, but Z x (P Ω ) need not be unbiased for |[0, x[| when Ω is a partition of the unit cube. We show in Proposition 2 below that Z x (P Ω ) has mean |[0, x[| for all x ∈ [0, 1] d if and only if the partition is equivolume, that is, if |Ω 1 | = · · · = |Ω N |. This corresponds to the concept of self-weighting stratifications in classical sampling of finite populations: as the samples in all strata are equally large (just one point per stratum), the strata must be equal in size. The assumption of equivolume partitions is often convenient, as it allows us to interpret the mean of L p p (P Ω ) as an integrated centered pth mean; see Equations (8) and (10) N } be the stratified sample associated to the N th partition. Note that we use capital letters whenever points are random. The fact that partitions for different N need not be related to one another implies that the set of all sampling points forms a triangular array, and we are thus led to define a uniform distribution property for those.
A sequence (x i ) in the unit cube is uniformly distributed in the usual sense, if and only if the triangular array (x 1 , . . . , x N ) N ∈N is uniformly distributed in the sense of Definition 1. Hence, our definition generalizes the usual one. As in the classical case, uniform distribution of triangular arrays is equivalent to the weak convergence of the sequence of 'empirical distributions', where the N th of those distributions sits on the points x giving equal mass to each of them. In other words, for all continuous functions f on the unit cube.
The well-known fact that a sequence of Monte Carlo samples (P N ) is almost surely uniformly distributed follows directly from the strong law of large numbers. A corresponding statement for the sequences in Definition 1 is based on the strong law of large numbers for triangular arrays. Note that it does not require that the sampling points for different partitions Ω (N ) are independent. This is why we do not introduce this assumption, although it is typically satisfied in the applications we have in mind.
In particular, if {Ω (N ) } N ≥1 is a sequence of finite partitions of the unit cube such that all partitions are equivolume, then |Ω (N ) i | = 1/N , so (4) is satisfied even without taking the limit. As a consequence, sequences of equivolume partitions are uniformly distributed; this is implication (c) in Fig. 2 We conclude this section with a comparison of Definition 1 with the notion of equidistributed partitions from [3], which we now recall using our notation. A sequence {Ω (N ) } N ≥1 of finite partitions of the unit cube is called equidistributed if for any choice of x N ∈N is uniformly distributed (Definition 1). Actually, in [3] this notion is introduced and exploited in larger generality, replacing the unit cube with a general separable metric space. In . For all N these partitions are equivolume and thus the stratified sample based on them is a.s. uniformly distributed. However, this sequence of partitions is not equidistributed, as one can choose x i ∈ Ω (N ) i such that its second coordinate is 1. Then, for x = (0, 0) and y = (x, y) with arbitrary x, y ∈ [0, 1], as in Fig. 3, the left hand side of (3) is always zero.

• •
• y x m1 2.2. The strong partition principle for stratified sampling. Discrepancy measures can be used to compare sets of sampling points. In the case of a set P of random sampling points the mean L p -discrepancy EL p p (P) is often employed, where E denotes the probabilistic expectation. One should correctly call EL p p (P) the 'mean pth power L p -discrepancy', but we prefer the shorter, slightly misleading form for breviety.
Certainly, a stratified sample need not be better than a Monte Carlo sample. Consider for instance a partition Ω with N sets in [0, 1] 2 where the N − 1 partitioning sets Ω 1 , . . . , Ω N −1 are all subsets of [δ, 1] 2 with some δ ∈]0, 1[. Then the mean L 2 -discrepancy satisfies for all δ > (5/9) 1/4 ≈ 0.86 and N ≥ 2, where the last inequality uses (15). In contrast to this, stratified samples from equivolume partitions are never worse than Monte Carlo samples in terms of the mean L 2 -discrepancy according to the Partition Principle in [19,Theorem 1.2]. We strengthen this result in two directions showing firstly that stratified samples from equivolume partitions are strictly better, and secondly that L 2 -discrepancy can be replaced by L p -discrepancy with arbitrary p > 1. The main ingredient of our proof is a result by Hoeffding [14] stating that among all Poisson-binomial distributions with given mean, the classical binomial distribution is the most spread-out.  EL p (P Ω ) p < EL p (P N ) p for all p > 1.
The proof of this theorem will be given in Section 3.2. One can understand (5) as a continuous analog and extension to the statement in finite population sampling theory that self-weighted stratified sampling is always better (in terms of variance) than simple random sampling, both taken with replacement.
for all N ≥ 2. Both sampling schemes have the same asymptotic order 1/N , but stratified sampling has a better leading constant; see Section 3.2 for details.

2.3.
Partitions with best average discrepancy. For a given N ≥ 2, it is an open problem to assure the existence of a partition whose associated stratified sample has lowest mean discrepancy among all partitions consisting of N sets. We will show such existence results for certain equivolume partitions. These results are all based on the topological standard argument that a continuous function attains its minimum on a compact set. To assure that EL p (P Ω ) p is continuous as a function of the partitioning sets in some suitable metric we will assume certain regularity conditions. More precisely, we assume that there is r > 0 such that the sets Ω 1 , . . . , Ω N of the partition have reach at least r, meaning that for any point x with distance less than r from Ω i there is a unique closest point to x in Ω i , i = 1, . . . , N ; see Fig. 4. This class is very general and contains for instance all closed convex sets (which actually have infinite reach). Also, any compact closed set in R 2 whose boundary is a piecewise C 2 -curve of R d such that its finitely many vertices are 'convex', has positive reach.
Theorem 2. Let r > 0, N ∈ N, and p ≥ 1 be given. Let P N (r) denote the set of all equivolume partitions of [0, 1] d into N sets of reach at least r. Then there exists (at least) one partition Ω * ∈ P N (r) that minimizes the mean L p -discrepancy on P N (r); i.e.
A corresponding statement holds true for the mean star-discrepancy.
Also smaller classes of partitions can be treated. We name here the case of convex partitions, which might be relevant for applications, as all sets constituting a convex partition of [0, 1] d are actually polytopes, with their numbers of vertices being uniformly bounded when N is given. Hence, convex partitions can be described using finitely many parameters, and thus are, at least in principle, computationally tractable. Theorem 3. Let N ∈ N, and p ≥ 1 be given. Let P conv N denote the set of all equivolume partitions of [0, 1] d into N convex sets. Then there exists (at least) one partition Ω * ∈ P conv N that minimizes the mean L p -discrepancy on P conv N ; i.e. min A corresponding statement holds true for the mean star-discrepancy.
The proofs of Theorems 2 and 3 in Section 3.4 reveal that these existence statements also hold if L p p or D * N are replaced by any discrepancy notion which is a measurable bounded function of the sample points. This indicates that the properties of the discrepancy are only of marignal importance while the regularity assumptions on the partitioning sets are crucial for our method of proof. We do not claim nor expect that the strong regularity conditions in the above theorems are necessary for an existence statement to hold. For illustration we determine the optimal convex partition of [0, 1] 2 for N = 2 in Subsection 3.5.
2.4. Explicit stratification strategies for arbitrary N . Next, we suggest and motivate a general and versatile construction of partitions for arbitrary N . We define partitions of the unit square generated by parallel lines which are orthogonal to the diagonal of the square. As a special case we consider the partitions Ω  Importantly, this construction enables us also to systematically study the role of the equivolume property. In a first step, in Example 3 in Section 4.2 we improve the minimal convex equivolume partition for N = 2 obtained in Example 2 by shifting the separating line along the diagonal. In Section 4.3 we extend this analysis to the case N = 3. We parametrise all such partitions into three sets and determine the minimal partition among them. It turns out that these partitions into three sets have a rich and interesting global structure with respect to their expected discrepancy.
Finally, we have examples of partitions within this family and for small N that show that it is possible to improve classical jittered sampling by relaxing the equivolume constraint.

Conclusions and open questions.
In conclusion, our results show that if partitions are needed to generate stratified samples for arbitrary N , we suggest to use lines that are orthogonal to the diagonal of the unit square. Within this family it seems that equivolume partitions Ω (N ) * are a reasonably good pick; see Section 4.5 for details. Secondly, our examples for N = 2, 3, 4 show that the expected discrepancy can be improved if we drop the equivolume property. This is in line with the results from [20] and deserves further attention. It certainly relates to the well-known general observation that the L 2 -discrepancy exaggerates the importance of points lying close to the origin (see [18, pg 13f]). Question 1. Are there properties of sequences of partitions, other than equivolume, that improve asymptotically the expected discrepancy of Monte Carlo sampling?
Our example for N = 4 supports the idea brought forward in [19] that classical jittered sampling may not give the lowest expected discrepancy for large N . Question 2. Is there an infinite family of partitions that generates point sets with a smaller expected discrepancy than classical jittered sampling for large N ?

Proofs and examples
3.1. Proofs for Section 2.1. We now give proofs of the results in Subsection 2.1 using the notations and notions introduced there. On several occasions we will need the following lemma, which essentially is a reformulation of the fact that a distribution (in the probabilistic sense) is uniquely determined by its cumulative distribution function, also in the multivariate case; see e.g. [17,Example 1.44]. Indeed, the proof of the following lemma is based on this fact, if the function involved is split into positive and negative part and the total integrals are normalized.
In other words, We are now in a position to show the announced characterization of equivolume partitions in terms of the unbiasedness of the proportions Z x (P) in (2).
For a partition Ω of [0, 1] d into N Lebesgue sets Ω 1 , . . . , Ω N of positive volume the following three statements are equivalent.
Proof. The bias is If (i) holds, the vector u = (u 1 , . . . , u N ) is the zero vector and the bias (6) vanishes for all Clearly (ii) implies (iii), so it remains to assume (iii) and deduce (i). Assumption (iii) implies that (6) Lemma 1 implies f = 0 almost everywhere on [0, 1] d , and as Ω is a partition, u = 0. Hence the partition is equivolume.
Proof of Proposition 1. For x, y ∈ [0, 1] d with each component of y at least as large as the corresponding component of x, consider the stochastic variables Y is the stratified sample based on the partition Ω (N ) . The Y 's are row-wise independent random variables with uniformly bounded variances and we have Therefore the strong law of large numbers for triangular arrays (see [15,Theorem 2.2] with p = 1, ψ(t) = t 2 and a n = n) implies If assumption (4) holds, we have lim N →∞ a N (x, y) = |[x, y[|, so (7) implies (3) for almost every realization. Hence, the stratified sample points form almost surely a uniformly distributed triangular array.
If (4) is violated, there must be x, y ∈ [0, 1] d such that the limit in (4) does not exist or is (7) shows that (3)  Proof of Theorem 1. Let p > 1 be given. Using the variable Z x = Z x (P) from (2) and applying Tonelli's theorem we see that so the assumption of equivolume and Proposition 2 yield Here, The distribution of N Z x is usually called Poisson-binomial distribution with N trials and pa- . , X N in [0, 1] d and using similar arguments as above yields correspondingly The variable N U x has a binomial distribution with N trials and success probability |[0, x]|. Its mean is therefore coinciding with the mean of N Z x .
We now use the fact that among all Poisson-binomial distributions with given mean, the binomial is the largest one in convex order. This is formalized in [14,Theorem 3] (see also the paragraph directly after the statement of this theorem) and implies (11) M with equality if and only if Z x has a classical binomial distribution, that is, if and only if q 1 (x) = · · · = q N (x) = |[0, x]|. Integrating (11) with respect to x now yields (5) if we can exclude the equality case. Equality in (5)  It is worth emphasizing the special case p = 2 of (8), which has been used more or less explicitly and generally in the existing literature, as it implies that the mean L 2 -discrepancy can be described as sum of contributions from the individual sample points. Also for p = 4 an explicit integral representation can be stated.

Proposition 3.
Let Ω be an equivolume partition and let q i (x), i = 1, . . . , N be defined by (9). Then Proof. According to (8) with p = 2, we have where Z x (P) is given in (2). We have already seen that N Z x (P) is the sum of the independent Bernoulli variables Y i = 1 [0,x[ (X i ) with success probabilities q 1 (x), . . . , q N (x), respectively, so . This can be inserted into (12) to obtain the first claim. For the second claim, let W i = Y i − EY i , i = 1, . . . , N , and note that As . Inserting this into (13) and applying (8) with p = 4 yields the second assertion.
The latter integral x 2 i dx and can be evaluated explicitly. One obtains In particular, for d = 2, we get We now compare this with the mean L 2 -discrepancy of a stratified sample P vert based on the vertical strip partition mentioned in Section 2.1. We generalize the partition depicted in Fig. 3 to arbitrary d ≥ 2 by putting Due to independence, the relative number of points Z x (P vert ) given by (2) has variance Therefore, (8) yields The one-dimensional integral on the right hand side of this chain of equations evaluates to Putting things together, we arrive at where (14) and N ≥ 2 was used. This confirms the general result that equivolume stratification is always strictly better than Monte Carlo sampling. It also shows that this stratification scheme has the same asymptotic order (namely 1/N ) as Monte Carlo sampling, but a better leading constant: for instance, when d = 2 we get 3.4. Proofs for Section 2.3. Fix A ⊂ R d . We let int A and bd A be the interior and the boundary of A, respectively. For ε > 0 the ε-parallel set consists of all points with a distance at most ε from A. We recall that a set A ⊂ R d is said to have reach r > 0 if for any 0 < ε < r and x ∈ A ε there is a point y ∈ A such that x − y < x − z for all z ∈ A \ {y}.
The family of non-empty compact sets will be endowed with the Hausdorff metric d H given by where ∅ = K, K ⊂ R d are compact. Let C be the family of nonempty compact sets in [0, 1] d , and for r > 0 let R r be the subfamily of sets with reach at least r. The latter contains K, the family of non-empty compact convex subsets of [0, 1] d . Crucial for our line of arguments is the fact that all three families are compact in the Hausdorff metric. This statement for K is the famous Blaschke selection theorem [26, Theorem 1.8.7]; a proof for C and R r can be found in [26, Theorem 1.8.5] and [10, Remark 4.14], respectively. As a reference of convex geometric notions used in this section, we recommend [26]. Importantly, the volume functional is not continuous on C (for instance, [0, 1] d can be approximated by finite sets in the Hausdorff metric), but it is continuous on both R r and K. This can be seen by means of a Steiner-type result stating for K ∈ R r that holds for 0 ≤ ε < r; see [10]. Here, κ d−k is the volume of the Euclidean unit ball in R d−k , and V k (K) ∈ R is the kth total curvature measure (also called intrinsic volume when applied to convex sets). We use repeatedly that K → V k (K) is continuous on R r . These and more results on sets of positive reach can be found in [10]; see also the survey [28], where an outline of the history, newer results and additional references on the matter can be found. Proof. Clearly, the family of equivolume partitions of sets in R r is a subset of the Cartesian product R N r , more precisely, In fact, assume that (K 1 , . . . , K N ) is an element of the right hand side of (18). If there was a set K j overlapping with N i =j K i , we would have From the definitions it is clear that (K, M ) → K ∪ M is Lipschitz continuous if the product space is endowed with the maximum norm of the marginal metrics. Hence, the right hand set in (18) is closed in R N r , as it is defined by means of continuous functions. As R N r is compact, P N (r) is compact too.
Finally, K N is compact implying the compactness of P conv We now show that an average discrepancy of a stratified sample is continuous as a function of the partitioning sets. In the following, ∆ stands for a measure of discrepancy, and can be the L p -discrepancy or any other, which satisfies the rather weak assumptions below.
is Lipschitz continuous.

By Steiner's formula (17), we have
As V k is continuous and R r is compact, |M i | ≤ cε for all i = 1, . . . , N , with a constant c that does not depend on (K 1 , . . . , K N , K 1 , . . . , K N ). Thus, by (19) and (20), we have This implies the claimed continuity.
As P N (r) and its subset P conv N are both compact by Proposition 4, EL p (P Ω ) p attains minima on either set. This shows the assertion for the mean L p -discrepancy.
Similarly, as ∆(x 1 , . . . , x[| measurable and bounded, ED * N (P Ω ) is continuous on P N (r) by Proposition 6, so using the compactness of P N (r) and P conv N gives the asserted existence of a best partition.
It should be remarked that the above arguments do not rely on the choice of the unit cube as reference set. The results and proofs continue to hold with minor modifications if the set [0, 1] d is replaced by any fixed compact convex set K ⊂ R d .  Proof. Let Ω 1 and Ω 2 be two convex sets that partition [0, 1] 2 and have the same content. By convexity, the intersection of Ω 1 and Ω 2 is contained in a line . The midpoint p = (1/2, 1/2) of [0, 1] 2 must be contained in one of these sets, so let us assume p ∈ Ω 1 . As the reflection of at p is parallel to , the reflection Ω 1 of Ω 1 at p must contain Ω 2 . By the equivolume property we have |Ω 2 | = |Ω 1 | = |Ω 1 |, so Ω 2 = Ω 1 up to a Lebesgue-null set. As Ω 2 and Ω 1 are closed and convex we must even have Ω 2 = Ω 1 , so Ω 2 is the reflection of Ω 1 at p and we conclude p ∈ .

Example 2: Convex equivolume partitions into two sets. Recall that P conv
As Ω (2) * and EL 2 2 (P Ω ) are unaltered if all sets in the partition are reflected at the main diagonal of [0, 1] 2 , we may assume from now on that hits the x-axis in a point A ∈ [0, 1]; see Fig. 5 (left). Note that Ω (2) * corresponds to A = 1. For fixed A, we assume from now on that Ω 1 is the partitioning set that contains the left, vertical edge of the unit square.
To calculate the expected L 2 -discrepancy, we use the formula from [20], in which f (x) = |Ω 1 ∩ [0, x]|, and where we used for the second equality that the integrand vanishes whenever x ∈ Ω 1 .
We distinguish the three cases A = [0, 1/2), A = (1/2, 1] and A = 1/2. The special case A = 1/2 gives the vertical strip partition for N = 2 with expected discrepancy 1/18 = 0.055 . . . according to (16). Now assume that A = (1/2, 1]. In this case, the separating line has the equation We partition Ω 2 into the two sets Ω whereas for x ∈ Ω − 2 we have In total, we obtain for A ∈ (1/2, 1] by inserting the contributions of both cases into (21) that As a side remark, we mention also the partition generated by a set of equidistant points; i.e. v i = √ 2i/N for a given N > 1. This is a simple example of a family of partitions that is not equivolume for any N . However, by pairing complementary sets such that the union has volume 2/N , it is possible to obtain equivolume partitions for every even N with N/2 points into non-connected sets.

Example 3:
Relaxing the volume constraint I. We have seen in Example 2 that Ω gives the lowest expected discrepancy among all convex equivolume partitions into two sets.
We will now show that this partition can be improved if the equivolume condition is dropped by shifting the line along the diagonal, that is, by considering partitions in the class Ω Fig. 7.
A comparison with Lemma 2 shows that Ω v * has a smaller mean L 2 -discrepancy than any convex equivolume partion when N = 2. and points on the separating line satisfy y = −x + 1 + A. In total, we obtain for A ∈ [0, 1] and in a similar fashion as in the proof of Lemma 2 that This function attains its minimum for A = 0.122034 with value 0.04904 . . .; see Fig. 6.
The two partitioning sets have volumes B 2 /2 and 1 − B 2 /2 and points on the separating line satisfy y = −x + B. Similar to the previous considerations we split the integral into subintegrals and distinguish the different cases on which the intersections can be described with the same function. We obtain for B ∈ [0, 1], which attains its minimum for B = 1 with value 0.05; i.e. the minimum is attained for the anti-diagonal; see Fig. 6. This concludes the proof.

Example 4:
Relaxing the volume constraint II. In this section we extend the results of Examples 2 and 3 to the case N = 3. The partitions can still be analysed explicitly and it turns out that there is a unique partition that minimises the expected discrepancy; see Fig. 11. However, the full analysis consists of a tedious case-by-case study and, hence, we do not fully outline the proof of this assertion in the following, but report the most interesting cases only. We leave the analysis of the remaining cases -which we carried out and which follows along the same lines as our proof -to the interested reader. In general, let 0 < v 1 < v 2 < √ 2 and denote the three sets of the partition with Ω 1 , Ω 2 and Ω 3 , as described in Subsection 4.1. Associated to the three sets, there are three indicator functions χ 1 , χ 2 and χ 3 with where X j is the random point in Ω j . Setting #(x, y) := χ 1 (x, y) + χ 2 (x, y) + χ 3 (x, y), we get for the expected L 2 -discrepancy xyE(#(x, y)) + x 2 y 2 , and since #(x, y) is a Poisson-binomial distributed random variable we have that E(#(x, y)) = q 1 (x, y) + q 2 (x, y) + q 3 (x, y), in which, as in (9), The analysis proceeds now via two levels of case distinctions. The first level concerns the actual partitions into three sets that we are considering; see Fig. 8. Within each of the four cases, we partition the unit square in dependence of v 1 and v 2 into sets in which the probabilities q j have the same closed form in x, y, thus providing closed formulas for E (#(x, y)) and E(#(x, y)) 2 ; see Fig. 9. Next, we compute the expected discrepancy, which is an integral over the unit square, as a sum of integrals over the different closed form expressions according to the subcase-partition. In the following, we first focus on the third case in Fig. 8. In this case we have 0 It turns out that we can calculate the expected discrepancy as a rational function of A and B, which has a unique minimum. For simplicity we restrict considerations to 1/2 ≤ A ≤ 1 in Lemma 4.

Proof. We set
To calculate the expected discrepancy we subdivide the unit square into 6 sets, S 1 , . . . , S 6 , as illustrated in Fig. 9 and we use the symmetry along the diagonal in cases II-VI. For a point (x, y) ∈ S i we denote the expected discrepancy function by f i (x, y) and we calculate the expected discrepancy for the whole partition as For illustration, we give the calculations for the first case and refer to the appendix for the other (equally elementary, but much more technical) cases. Case I. Let (x, y) ∈ Ω 1 = S 1 . Then q 2 (x, y) = q 3 (x, y) = 0 and q 1 (x, y) = 2xy/A 2 . Hence, we get Cases II -VI. With similar calculations as in Case I we can obtain expressions f 2 , . . . , f 6 for the expected value of the discrepancy function for points (x, y) in each of the sets and in dependence of the parameters A and B; see the appendix for details. Integrating these functions over their respective domains and summing the values, gives us the following rational function in A and B which we can minimize over 1/2 ≤ A ≤ 1 and 2A − 1 ≤ B ≤ A in order to obtain the two parameters A and B that generate the partition with the smallest expected discrepancy in this family; i.e.
This function can be minimised using a standard In a similar fashion we can analyse the second case in Fig. 8.
v 1 ,v 2 be the corresponding partition of the unit square into three sets. Then for v * 1 = 0.5329 . . . and v * 2 = 1.06582 . . . Proof. The proof follows the same lines as in Lemma 4; the only difference is the subdivision of the unit square as well as the range of B. We set again To calculate the expected discrepancy we subdivide the unit square into 6 sets, S I , . . . , S V I , as illustrated in Fig. 9 and we use again the symmetry along the diagonal in cases II-VI. For a point (x, y) ∈ S i we denote the expected discrepancy function by f i (x, y) and we calculate the expected discrepancy for the whole partition as As before, explicit expressions for f i on S i , i = 1, . . . , 6 can be obtained. They depend on the parameters A and B. Integrating these functions over their respective domains and summing the values, gives us again a rational function in A and B which we can minimize over 1/2 ≤ A ≤ 1 and 0 ≤ B ≤ 2A − 1 in order to obtain the two parameters A and B that generate the partition with the smallest expected discrepancy in this family. More explicitly, we have Using again a computer algebra system we obtain that the minimum of this function is 0. The two lemmas provide two interesting insights. Firstly, we can now easily analyse the equivolume partition within this family.
Proof. We have that A =  and is thus only slightly smaller.
Interestingly, the minimum for a given A can be obtained in either of the two cases analysed in Lemma 4 and 5 as can be seen from the examples. As a rule of thumb, the global minimum within this family is obtained for parameters A and B in which the minimum for fixed A lies almost at the interval boundary, i.e. for which B min ≈ 2A − 1.
This observation relates to Question 1 of Section 2.5. It illustrates that the equivolume property appears to have no particular significance within this simple family of partitions. It rather seems that other geometric reasons drive the minimisation. The algorithm is based on a simple geometric consideration. Assume 0 ≤ y ≤ x ≤ 1. As we have seen, we need to determine the probability This elementary algorithm leaves us with two conclusions. On the one hand, it is rather straightforward to calculate the expected value of the discrepancy function for a given box [0, x]× [0, y]. On the other hand, it is incredibly tedious to do so. While this calculation can be solved algorithmically in a straightforward fashion, there is little hope to compute the expectation analytically since we have a different set of success probabilities for each box generated by a vector (x, y).

4.5.
Numerical results. In this final section, we present the results of two different sets of experiments. In all our experiments we generated many instances of stratified point sets for a given fixed partition, calculated the L 2 -discrepancy of each point set and approximated the expected discrepancy of the partition by the mean of the experiment.
We used Warnock's formula [31] as presented in [5,Proposition 2.15] to calculate the L 2discrepancy of a given point set. That is, for any point set P = {x 0 , . . . , x N −1 } ∈ [0, 1] d we have in which x n,i is the i-th component of the x n . We refer to [12,13] for quick implementations of this formula. First, we present a numerical observation which could, in principle, be proven along the same lines as Lemma 4. However, given the number of case distinctions such an analysis -based on our elementary method -would require, we only provide numerical evidence and state the result as a conjecture.

Conjecture 1. There exists
√ 2] such that EL 2 2 (P Ω (4) v ) < EL 2 2 (P jit4 ) = 0.01909 . . . We obtained various instances of partitions that seem to improve the classical jittered sampling by perturbing the three values of the vector ( √ 2/4)(1, 2, 3). We obtained the best numerical results for the three points − 0.02; see Fig. 11. We simulated 10 6 instances of stratified sets for this particular partition and calculated the discrepancy in each case with the formula of Warnock. Independently, we used our algorithm to estimate the expected discrepancy using 10 4 many grid points. Both methods indicate that the first digits after the decimal points of the expected discrepancy are 0.0188 . . . , which would be clearly better than the mean discrepancy of jittered sampling. Next, we use Warnock's formula to empirically study the discrepancy of different point sets and constructions; i.e. for given N we generate 500 samples and calculate the L 2 -discrepancy for each of these samples using Warnock's formula. The empirical mean of this sample approximates the expected value of the discrepancy. We collect our numerical results in Table 1. Our numerical results suggest that the expected discrepancy of partitions Ω (N ) * is about a factor 2 smaller than the expected discrepancy of a set of random points:   Table 1. Expected L 2 -discrepancy of different point sets, in which N stands for the number of points. The empirical values are calculated as the mean of the discrepancy of 500 samples. We calculated the discrepancy of individual samples with Warnock's formula.
in order to obtain the two parameters A and B that generate the partition with the smallest expected discrepancy in this family: