US cancer research centre published dozens of error-filled papers

The other day I became aware – via the below post on X – of this astonishing story, of which I was totally unaware previously

The press release from the office of the US Justice department can be found here.

In summary, this is what happened:

In January 2024, a molecular biologist called Sholto David – working from his home in the small Welsh town of Pontypridd – exposed systemic data manipulation in 58 oncology research papers from the Dana-Farber Cancer Institute (affiliated with Harvard).

His audit, in which he found “amateurish” image forgery embedded within the papers, resulted in many retractions and corrections. In 2025, Dana-Farber settled a federal fraud lawsuit for $15m, of which David received $2.6m.

There’s quite a decent Guardian article about it here:

A New York Times piece can be read here.

What legal mechanisms were engaged?

This legal action has come about by the use of the Qui Tam provisions of the USA’s False Claims Act – by which whistleblowers can report fraud (in contractual relationships with the US government) and then receive a cut of the fines levied.

This is the same route taken by Brook Jackson regarding wrongdoing uncovered in relation to the clinical trials for the Pfizer / BioNtech “covid vaccine”.

The contrast in the US government response could not be starker. In essence, the government is doing everything in its power to obstruct Brook’s case.

The DOJ has argued that the “burdens of continued litigation” outweighed any potential benefits. Moreover, they argued that since the FDA was aware of the issues and still authorized the vaccine and paid Pfizer, they were immaterial to the contract between the government and Pfizer.

How far-reaching are the “errors”?

This is where I decided to continue my experiments with using AI – in my case Gemini. I should emphasise that my cynicism of AI’s capacity to reason, or come up with anything innovative is as strong as it ever was, but I still think it is a potentially useful tool for summarising documents and performing laborious repetitive tasks on a well-circumscribed set of information.

Essentially, I would like to know what impact the studies containing these errors might have had. Looking through each paper, assessing the significance of the falsified / misleading data, determing how many times this paper had been cited, and so on, is an incredibly laborious process, and I wondered if Gemini could help me with it.

Therefore, I “fed” Gemini with all the material listed above – the press release, Guardian article, and settlement agreement, and asked it to create a table listing all the flagged papers, and for each the drug and pharma company involved (if any), the citations, the journal impact factor (expressed as a percentile), a summary of the errors observed, and a new measure which was my invention – the Conclusion Impact Index (“CII”).

The idea of the CII – which is expressed between 1 and 10 – came about from my desire to get a feel for how significant the errors were in the context of the main conclusions to the paper; were they central to it, or just an adornment?

This is how Gemini expressed the scoring system:

Interestingly – and unprompted – rather than listing individually ALL the studies Sholto David identified (which I asked it to do), by default Gemini decided to group the studies together by lead protagonist. This is what it has done in the table below.

As can be seen, these papers have had multiple citations – in the thousands, in extremely high impact journals. In the case of Kenneth Anderson, the fabrications (which cover 25 articles) have contributed to the success of drugs worth biliions to pharma. (Without delving into this more deeply it’s impossible to say what the extent of that contribution was to that success.)

In other cases, the misrepresentations have led to years of pointless scientific endeavour, no doubt costing many millions.

In the above table, the CII is represented as an average per cluster, but the range shows us there are several papers with scores of 9 and 10, which are extracted and listed individually in the table below:

Recall that a score of 10 means that the paper’s claims are entirely dependent on the data / images which were falsified.

Should we trust the tables Gemini generated?

To be honest, I don’t know. I am well aware of the capacity of AI to make things up to please its user. However, you would have expected any bias that it does have to favour the establishment position, rather than that attempting to critique it.

If I get the time, I may do some spot checks on some of its findings – or maybe any of you dear readers could help me out with that? Regardless, even if these tables above are totally wrong, the core of the story still stands, and it is a remarkable and quite disturbing episode.

Tip of the iceberg?

What is so shocking about this incident is that it took a blogger in a small Welsh town to uncover a huge number of long-standing systematic errors in mainstream, institutionalised and pharma-funded science.

Without his efforts, it seems unlikely that these errors would ever have been identified. The faulty data would have been left in the corpus of scientific knowledge, supporting the sales of several products and guiding the direction of future scientific research.

This all begs the question: why wasn’t this discovered by anyone else in the chain which is mean to guarantee integrity and quality in science?

Amongst those who could / should surely have spotted this are:

  • The scientists themselves
  • Their colleagues who might have assisted – did they really suspect nothing?
  • Internal QA systems
  • The sponsoring pharma companies – don’t they audit the work they fund?
  • The NIH (in the case of the NIH grant funded studies) – don’t they do any QA / audit of work paid for by their grants
  • The journal reviewers / editors
  • Readers of the articles once published, especially those citing them

The answer, of course, is that nobody in this chain really has an interest in calling this out. They are motivated by maintaining grants, getting publications out and receiving pharma funding.

Anyone who thinks the primary motivation is science or health is – in my view – quite naive.

Concluding questions:

  • How widespread are these kind of “mistakes”?
  • How many would be found if every publication was audited on the default assumption that anything in it could be fabricated, rather than a starting position that everything should be taken at face value?
  • How much of “science” – which is a never-ending iterative process – has been guided and influenced by similar fabrications and misrepresentations.
  • What are the implications for the inclusion of these papers in the training set of modern AI systems?

See more here substack.com

Bold emphasis added

Please Donate Below To Support Our Ongoing Work To Defend The Scientific Method

Comments (1)

  • Avatar

    Tom

    |

    Yet another totally corrupt institute. Why aren’t they actually curing cancer? No money in that since it is far more profitable to keep the cancer-cure lie alive and well.

    Reply

Leave a comment

Save my name, email, and website in this browser for the next time I comment.
Share via
Share via