There has been a lot of discussion about the use or abuse of the Internet for the transmission of obscene or otherwise undesirable pictures and text. In the USA there is now a law that makes this unlawful, and appears to render culpable those who are involved in the process only to the extent of providing the communication medium.
Little attention appears to have been paid to the question of what we actually mean when we say that one or other piece of data is unacceptable on these grounds. There seems to be the unchallenged assumption that a set of data necessarily contains a specific item, or items, that can be tagged as offensive, or not.
It should be remembered that what is transmitted over the net is not text, or pictures, but encodings thereof. Everything has been reduced to a stream of bits, and constructing the text or picture requires that that stream be decoded in a particular way. It might seem that I'm splitting hairs here - each data stream can sensibly be processed in only one way, yielding its semantic content.
By way of example, suppose that we want a WWW site that holds pictures. For the sake of simplicity, assume that each picture requires 1Mb of data. Suppose we have picture files named a, b, c, d and e. We follow the following procedure:
Construct a file consisting of 1MB of random data. Call this A.
Peform byte-wise exclusive ORs of A and the picture file a, to produce the file B.
Now do the same with B, and the picture file b, to produce the file C.
Continue with this scheme to produce files D, E and F.
Now, clearly we can get back any of the pictures by obtaining two consecutive files in the set, but no individual file can be said to contain a picture. Further, all the files have the property that they appear to consist of random data. File A (our original) is not qualitatively different from the others.
Of course, we don't have to stick to this linear scheme. We could derive other sequences of files from any of the new files (including A). Given such a set of files, it would be impossible to determine the starting point.
One can imagine that in time the data stored on the WWW sites around the world would consist of:
A vast array of apparently random data files.
A similarly huge list of file pairs which are used to derive specific pictures.
In such a scenario, it would not be possible to assert that a particular site was storing a particular picture, or that someone was downloading one.
The only way to censor undesirable material in this scheme would be to prohibit the publication of information about which file pairs produce which pictures. I wonder if they'd dare.
There are a number of objections to implementing this scheme.
In practice, to obtain a particular item, one would need to download more than double the amount of data needed at present.
The data files would tend to get duplicated, or replicated.
Software complexity would be increased.
If you have any comments, email me on sylviaw@cryogenic.net
Go to my home page.
Regards, Sylvia.