Go to Statistics Portal

CONFIDENTIAL CELLS

Statistics Directorate    
French Equivalent: Cellules confidentielles

Definition:
The cells of a table which are non-publishable due to the risk of statistical disclosure are referred to as confidential cells.

Context:
By definition there are three types of confidential data where disclosure might happen and therefore cells that are confidential:

- Small counts. A tabular cell is confidential, if less than m entities contribute to the total of that cell. The value of m is called a threshold and is usually determined by the statistical authority according to the desired degree of confidentiality protection: m is at least 3 but sometimes m= 5 is given. In the case of a threshold of m=3, a cell is confidential if the figure in the cell shows the data of only one unit, or if the figure is the sum of two entities and one respondent has the possibility of disclosing the figure of the other respondent by subtraction of his own figure from the sum. This is also known as the threshold rule.

- Dominance or case of predominance. (a) dominance rule, concentration rule, (n,k) rule: A cell is regarded as confidential, if the n largest units contribute more than k% to the cell total. The n and k are given by the statistical authority and differ quite a lot, e.g. you find that n=2 and k=85, which means that a cell is defined as confidential if the two largest units contribute more than 85% to the cell total. (b) prior posterior ambiguity rule, p/q rule: it is assumed that out of publicly available information the contribution of one individual to the cell total can be estimated to within p per cent (p=error before publication); after the publication of the statistic the value can be estimated to within q percent (q=error after publication). In the p/q rule the ratio p/q represents the information gain through publication and in the prior posterior ambiguity rule the difference p-q. If the information gain is unacceptable the cell is declared as confidential. P and q are given by the statistical authority and thus the definition of the acceptable level of information gain.

- Secondary confidentiality/derivation: Even if all confidential cells containing small counts or cases of predominance are protected by disclosure control methods (=primary protection), disclosure might be possible by recalculating confidential cells as the difference between a total and the sum of cells corresponding to that total. This recalculation of primary protected cells is called derivation. Derivation can occur (a) within one two-dimensional table or higher-dimensional tables, when margin totals are given in the lines, the columns or in a set of lines or columns; (b) between tables and subtables in the case of three or more dimensions e.g. between geographic levels or between aggregation levels (total economy, sector); (c) between different tables on the same aggregation or geographical level containing different sorts of information.

Note: Small counts and dominance are collectively primary confidentiality.

Source Publication:
Eurostat, "Manual on disclosure control methods", Office for Official Publications of the European Communities, Luxembourg, 1996, p. 8-9.

Statistical Theme: Methodological information (metadata)

Created on Thursday, July 07, 2005

Last updated on Monday, April 15, 2013