Just What Are We Censoring?

Introduction

There has been a lot of discussion about the use or abuse of the Internet for the transmission of obscene or otherwise undesirable pictures and text. In the USA there is now a law that makes this unlawful, and appears to render culpable those who are involved in the process only to the extent of providing the communication medium.

What do we Mean by Offensive Data?

Little attention appears to have been paid to the question of what we actually mean when we say that one or other piece of data is unacceptable on these grounds. There seems to be the unchallenged assumption that a set of data necessarily contains a specific item, or items, that can be tagged as offensive, or not.

It should be remembered that what is transmitted over the net is not text, or pictures, but encodings thereof. Everything has been reduced to a stream of bits, and constructing the text or picture requires that that stream be decoded in a particular way. It might seem that I'm splitting hairs here - each data stream can sensibly be processed in only one way, yielding its semantic content.

The Mechanism

The purpose of this page is to show that the idea of a data stream containing particular semantic information is simplistic, and that it doesn't have to be true. To do this, I present a mechanism whereby it becomes impossible meaningfully to attribute any particular content to a specific data stream.

By way of example, suppose that we want a WWW site that holds pictures. For the sake of simplicity, assume that each picture requires 1Mb of data. Suppose we have picture files named a, b, c, d and e. We follow the following procedure:

  1. Construct a file consisting of 1MB of random data. Call this A.

  2. Peform byte-wise exclusive ORs of A and the picture file a, to produce the file B.

  3. Now do the same with B, and the picture file b, to produce the file C.

  4. Continue with this scheme to produce files D, E and F.

Now, clearly we can get back any of the pictures by obtaining two consecutive files in the set, but no individual file can be said to contain a picture. Further, all the files have the property that they appear to consist of random data. File A (our original) is not qualitatively different from the others.

Of course, we don't have to stick to this linear scheme. We could derive other sequences of files from any of the new files (including A). Given such a set of files, it would be impossible to determine the starting point.

One can imagine that in time the data stored on the WWW sites around the world would consist of:

In such a scenario, it would not be possible to assert that a particular site was storing a particular picture, or that someone was downloading one.

The only way to censor undesirable material in this scheme would be to prohibit the publication of information about which file pairs produce which pictures. I wonder if they'd dare.

Is This a Practical Scheme?

There are a number of objections to implementing this scheme.


Contacting Me

If you have any comments, email me on sylviaw@cryogenic.net

Go to my home page.

Regards, Sylvia.