Fake Text and Scientific Writing

Scientist Writing a Paper Using AI in the Style of Goya

Introduction

As technology continues to advance, it has become easier for people to create fake blogs and pass them off as legitimate sources of information. One tool that is being used to create these fake blogs is chatgpt, a machine learning algorithm that can generate text in a conversational style.

While chatgpt has many legitimate uses, it is also being used to create fake blogs that are designed to manipulate readers and spread false information. These fake blogs often take on the appearance of real news sources, using similar design elements and language to give the impression of credibility.

One problem with fake blogs written by chatgpt is that they can be difficult to detect. The text generated by the algorithm is often indistinguishable from human writing, making it hard for readers to know if they are reading a genuine article or a fake one.

One way to spot a fake blog written by chatgpt is to look for repetitive or formulaic language. Chatgpt is programmed to generate text based on specific prompts, so it may reuse the same phrases or structure in multiple articles. Another clue is a lack of human-like mistakes or variations in tone and style.

It is important for readers to be aware of the existence of fake blogs and to be cautious about the information they consume online. While it is not always easy to spot a fake blog, there are ways to increase your chances of avoiding them. One way is to look for multiple sources to verify the information being presented. It is also important to be critical of the sources you read and to be wary of blogs that seem too good to be true.

Ok, so the above was “fake”, written by ChatGPT. It does not follow my usual style of both poor readability and spelling. ChatGPT is free and accessible at the moment and can generate human-readable and accurate text from a relatively short input string. In this case I provided “A blog post about ChatGPT fake blog post”.

This technology has been slated as a panacea - from replacing google to offering new scientific insights. While this has not yet been realised its allure is beginning to snag some of those within the profession. People are already claiming to use ChatGPT and other tools like it to offer feedback on paper titles and offer writing prompts.

ChatGPT, and the underlying technology, is not going anywhere and it has hit the mainstream. With Microsoft’s recent announcement of an additional $10 billion investment we are going to see these tools being baked into a range of Microsoft products. These tools are only going to get better and their use, for both scrupulous and unscrupulous persons, will continue to rise. This is clearly an issue within academia where this type of issue has been seen before but what about other areas of scientific exploration?

Industry

Within industry a scientists time is expensive. You want to pay the scientist to do science, after all that is what makes the business money. Return of R&D investment is a key driver. One time sink for scientists within industry is the production of written reports and text for peers.

The rise of the internet has given students (and employees) the ability to approach tasks in a range of way. It has been reported that if students are not taught about plagiarism then they are more likely to commit it. Academics have tackled this, often with software, and it is still a punishable offence. Academia has its own battle to face and one it will overcome. It’s goal, after all, is teaching students both the right thing and the right way to approach problems.

But AI is a hydra - its many heads being vast and varied. Is the spell checked baked into word AI? It could be argued that it is however its use is ubiquitous.

I have, for a long time, used the Langtools plugin for emacs which gives me real time spell checking and grammar advice - similar tools like Grammerly maybe more familiar to readers. I have not mentioned it before, not due to a need to hide its use but because I did not think it worth mentioning. However, it is AI and it does give me an edge when producing technical writing. Like a jacked-up spell checker.

I would not, however, feel duped because it, or tools like it had been used and not cited - the same way you do not cite Word’s spell checker at the end of every paper. But this leads us to an interesting point. AI is powerful and useful but at what point is its use not acceptable?

Long form reporting

We write reports for two reasons - to either transfer knowledge from one worker to another or to inform management of something they need to be aware of. In both cases this is done with the ultimate aim of earning a return of investment for the enterprise. In either case we shouldn’t stick to the ivory tower of academic worry, why can we not speed up this process using AI?

Indeed, do we shirk away from using AI tools within our data analysis? Tools like Scikit-learn give researchers the ability to train models and gain insight from their data in a way that is unimaginably approachable when compared to only a few short years ago - with the use of AI contributing exponentially to various scientific fields in recent years. Of course scientists use everything available to do this type of analysis however we do state how everything was produced and give those following the ability to reproduce our work.

The same is, I believe, applicable to the use of AI within scientific writing. The below is a portion of an abstract produced with the prompt “abstract for paper on the use of metal organic frameworks for catalysis”

Metal-organic frameworks (MOFs) are a class of porous materials that have gained attention in recent years due to their high surface area, tunable pore size, and excellent stability. These properties make MOFs attractive candidates for use in catalysis, where they can improve the efficiency and selectivity of chemical reactions. In this paper, we review the use of MOFs in catalysis, including their synthesis and functionalization, their ability to host various metal catalysts, and their performance in a variety of reactions.

The above is correct and indistinguishable from of the first paragraphs of many MOF papers - indeed looking through my publication we come quite close to this. It is only lacking citations. As someone who has written enough about this it is possible to review this text and sign it off as acceptable. In this situation we have written half of an executive summary for a report in about a 10th of the time. This time saving could be even more where English is not the first language.

Heck, it’s even quite good when it comes to experimental procedure - for example “experimental procedure for the production of metal organic frameworks”

First, gather all necessary materials and equipment, including metal ions, organic ligands, solvents, stirring equipment, and any necessary templates or catalysts.

In a glass flask or reaction vessel, combine the desired metal ions and organic ligands in the desired molar ratio.

Add a solvent to the mixture, such as water or an organic solvent, until the desired concentration is reached.

…

This is not a bad first shot and certainly not a bad place to start. Clearly it needs training on more specific scientific work, but this is damn close to what is needed. With minor additions you have a serviceable experimental method.

Peer review is obviously common with industrial report writing. Not in the same way as a published paper but often direct line management or scientific peers review work and offer corrections or comments before anything is disseminated. Expert review is key to avoid mistakes and improve knowledge transfer.

I think real power can also be gained when this approach is couple this with internal data relating to what is expected within reports from a specific institution. Huge amounts of this critical process of knowledge transfer can be automated and done so in a way that has value to the business.

Grey areas

I touched on spell checking above. Tools exist that make the writing and editing process easier. As a rule these are not cited as tools we use to do science and we would not expect full transparency on their use in any reporting.

Tools like this exist all over - LaTeX vs. Word etc. and we do not cite these. AI has been used in similar ways to produce a sounding board for new ideas - in the same way a blank sheet of paper and a pen would do. This leads us to two categories however the line between them in perhaps undefined. On one side the use of AI is of no concern to the reader and is simply used to aid the production of written text. The other extreme is AI being used to produce large amounts of a given text. The middle ground is where AI is used to produce some, but perhaps not all, of a text. Or perhaps it is used to edit or provide feedback on large amounts of work or to derive new insight. This list is not complete nor is it static. Transparency from the author is needed here on a case-by-case basis.

IP

The biggest hurdle to using these types of tools in the current workplace is the uncertainty around IP. While it could be quite easy to agree that any IP produced or derived from work done using such tools can belong to any respective part - this needs to be clear before any use is conducted within industrial applications.

Similarly companies will need to be sure that the data being handed over is either 1) non-proprietary or 2) handled correctly with relevant NDAs. If this is not done the integrity of any arising IP could be impaired. Making the use of AI not worth the hassle.

Wrap-up

So what am I proposing? I believe that AI generated text should be used when academic integrity is not at risk, when experts can review what is written, and that it is clearly marked. These rules, if applied, would allow for the use of these technologies without impacting the process of knowledge transfer but would actually facilitate it and improve the return of investment for the institution.

The sooner we embrace this technology the sooner we can all move on with what is really important - pushing scientific knowledge to its limits and discovering new insights.

I also pledge that no artificial writing will appear on this blog, or any other of my work, unless it meets theses stated criteria.

Introduction#

Industry#

Long form reporting#

Grey areas#

IP#

Wrap-up#

Sign up to my mailing list for alerts on tech, productivity and industry insights - right into your inbox!

Introduction

Industry

Long form reporting

Grey areas

IP

Wrap-up