Have I Given You Permission to Do That?

I have written this article on the understanding that you will be able to access it for free

In order to increase the chances of you finding it, the publisher (Principia Scientific) has included a number of tags and expects Google and other search engines to ‘crawl’ the content in order to classify and index it to help you find it. That’s all well and good.

If you were to copy all of the text and republish it in your own name that would be a breach of copyright. Not good.

On the other hand, if you were to use AI to ‘re-write’ the article and publish it, it’s not clear where we stand.

In this article I ask: what are the current technical options we have as authors and publishers and could there be a better way to facilitate ‘fair usage?’ At the moment publishers need to explicitly ‘disallow’ access to their content for each AI operator.

Colin Hayhurst, CEO of privacy based search engine Mojeek, clearly regards this approach as having ‘limitations.’ In a recent blog he adds “loopholes are very likely already being exploited by several big and many small players.”

https://blog.mojeek.com/2023/10/noml-proposal-and-open-letter.html

Colin’s proposed solution to this problem is simple: a noml value in the already-existing meta and X-Robots tag. In simple terms this would enable publishers to say ‘yes’ to search indexing and ‘no’ to machine learning use for any web page for all parties through an existing widely used mechanism.

This is an initiative I support and one which you can too by signing the open letter on the Mojeek blog.

Why Does This Matter?

We are concerned here with the use of computing power to read articles and words from one or more sources and produce a summary or alternative version. At its simplest someone may ask AI to

rewrite something in a different style, to summarise the combination of a number of articles or to draw on unlimited sources to answer a particular question. This may be OK for private use, but what if they then publish the article?

Whether or not this breaches copyright is currently a matter of some speculation. As a result of this some tech investors are looking for ways in which AI developers can be indemnified against copyright infringement.

On the opposing ‘bench’ the International News Media Association (INMA.org) and others are taking an interest in this subject given the scope for abuse of copyright of their content and long-standing ‘issues’ with Big Tech.

In a ‘real-world’ parallel, a campaign group I work with recently published a report. It was covered by one national newspaper with full credit given to the campaign group. This made them happy. It was then covered by two others with no mention of the campaign group.

Whilst the extra coverage was seen as good (the campaign group got their message across) the lack of credit was frustrating.

The second and third articles were not exact copies, but were clearly based on the first (a long-standing media industry practice). Perhaps this is mimicry, if it’s not plagiarism. It probably didn’t

involve AI, but could easily have done so.

AI is making such mimicry-publishing easier. It will increase the frequency of such frustration for ‘authors’ (who at least once want credit) and readers (who may not be able to readily follow up the report in detail).

It is not yet clear whether it will result in an avalanche of legal cases, who those actions will be taken against or whether copyright abuse can be proven.

In an example from the commercial world, an investment advisory service disallows access to their content on grounds that it is a key part of their USP (unique selling proposition).

Meanwhile a scientific publisher has recognised the issue but not yet formed a view. There are clearly going to be some ‘publishers’ who will want to limit how their content can be scanned and used.

An existential threat?

The ‘existential’ problem is, in my view, a much greater concern. The general consensus on this new type of AI seems to be that it can provide an average summary of average things but is not so good at figuring out unknowns.

It will rewrite what is known, but will not write anything genuinely new. Based on existing content it may, therefore, ‘regress to the mean’ (or to be more precise the ‘mode’) reinforcing the consensus, repeating what gets written about most.

It may do less well at questioning the consensus, which is an important part of scientific progress. If we look to the future, and assume that more content will be published which is generated using AI, how will the computer know whether it is regurgitating something that has already been regurgitated?

This existential threat is my principal concern about this use of AI, notwithstanding the fact that plagiarism is wrong and mimicry is dull!

My reading is that the amount of content which merits serious restrictions in this area is limited and so most content will remain ‘open’ to AI. This makes the existential threat of more ‘new’ AI written garbage based on fairly new AI written garbage even greater.

This, in turn, could increase the extent to which users (people) experience the ‘experiential’ problem described above. No doubt proponents of AI will say this can be solved. But then, they always do say that don’t they!

About the author: Nigel Jacklin is a statistician and market researcher who has worked with both traditional media and Big-Tech. He can be found on X @TheGoodStatsMan.

Please Donate Below To Support Our Ongoing Work To Defend The Scientific Method

PRINCIPIA SCIENTIFIC INTERNATIONAL, legally registered in the UK as a company incorporated for charitable purposes. Head Office: 27 Old Gloucester Street, London WC1N 3AX. 

Trackback from your site.

Comments (2)

Leave a comment

Save my name, email, and website in this browser for the next time I comment.
Share via