Zum Inhalt der Seite gehen


Good start on a hard question — how or whether to use #AI tools in #PeerReview.
https://www.researchsquare.com/article/rs-2587766/v1

"For the moment, we recommend that if #LLMs are used to write scholarly reviews, reviewers should disclose their use and accept full responsibility for their reports’ accuracy, tone, reasoning and originality."

PS: "For the moment" these tools can help reviewers string words together, not judge quality. We have good reasons to seek evaluative comments from human experts.
Update. I acknowledge that there's no bright line between using these tools to polish one's language and using them to shape one's judgments of quality. I also ack that these tools are steadily getting better at "knowing the field". That's why this is a hard problem.

One way to ensure that reviewers take #responsibility for their judgments is #attribution.

#PeerReview #OpenPeerReview
Dieser Beitrag wurde bearbeitet. (2 Jahre her)
Update. I'm pulling a few other comments into this thread, in preparation for extending it later.

1. I have mixed feelings on #attribution in peer review. I see the benefits, but I also see the benefits of #anonymity.
https://twitter.com/petersuber/status/1412455826397204487

2. For #AI today, good #reviews are a harder problem than good #summaries.
https://fediscience.org/@petersuber/109954904433171308

3. Truth detection is a deep, hard problem. Automating it is even harder.
https://fediscience.org/@petersuber/109921214854932516

#PeerReview #OpenPeerReview
Dieser Beitrag wurde bearbeitet. (2 Jahre her)
Update. I'm pulling in two of my Twitter threads on using #AI or #PredictionMarkets to estimate quality-surrogates (not quality itself). I should have kept them together in one thread, but it's too late now.

https://twitter.com/petersuber/status/1259521012196167681

https://twitter.com/petersuber/status/1196908657717342210
Update. I'm sure this has occurred to #AI / #LLM tool builders. Determining whether an assertion is #true is a hard problem and we don't expect an adequate software solution any time soon, if ever. But determining whether a #citation points to a real publication and whether it's #relevant to the passage citing it, are comparatively easy. (Just comparatively.)

Some tools already cite sources. But when will tools promise that their citations are real and relevant — and deliver on that promise?
Dieser Beitrag wurde bearbeitet. (1 Jahr her)
Update. I've been playing with #Elicit, one of the new #AI #search engines. Apart from answering your questions in full sentences, it cites peer-reviewed sources. When you click on one, Elicit helps you evaluate it. Quoting from a real example:

"Can I trust this paper?
• No mention found of study type
• No mention found of funding source
• No mention found of participant count
• No mention found of multiple comparisons
• No mention found of intent to treat
• No mention found of preregistration"
Update. Found in the wild: A peer-reviewer used #AI to write comments on a manuscript. The AI tool recommend that the author review certain sources, when nearly all of the recommended works were fake.
https://www.linkedin.com/feed/update/urn:li:share:7046083155149103105/

#Misconduct #NotHypothetical
Dieser Beitrag wurde bearbeitet. (2 Jahre her)
Update. The US #NIH and Australian Research Council (#ARC) have banned the use of #AI tools for the #PeerReview of grant proposals. The #NSF is studying the question.
https://www.science.org/content/article/science-funding-agencies-say-no-using-ai-peer-review
(#paywalled)

Apart from #quality, one concern is #confidentiality. If grant proposals become part of a tool's training data, there's no telling (in the NIH's words) “where data are being sent, saved, viewed, or used in the future.”

#Funders
Dieser Beitrag wurde bearbeitet. (1 Jahr her)
Update. If you *want* to use #AI for #PeerReview:

"Several publishers…have barred researchers from uploading manuscripts…[to] #AI platforms to produce #PeerReview reports, over fears that the work might be fed back into an #LLM’s training data set [&] breach contractual terms to keep work confidential…[But with] privately hosted [and #OpenSource] LLMs…one can be confident that data are not fed back to the firms that host LLMs in the cloud."
https://www.nature.com/articles/d41586-023-03144-w
Dieser Beitrag wurde bearbeitet. (11 Monate her)
Update. "Avg scores from multiple ChatGPT-4 rounds seems more effective than individual scores…If my weakest articles are removed… correlation with avg scores…falls below statistical significance, suggesting that [it] struggles to make fine-grained evaluations…Overall, ChatGPT [should not] be trusted for…formal or informal research quality evaluation…This is the first pub'd attempt at post-publication expert review accuracy testing for ChatGPT."
https://arxiv.org/abs/2402.05519

#AI #PeerReview
Update. 𝘓𝘢𝘯𝘤𝘦𝘵 𝘐𝘯𝘧𝘦𝘤𝘵𝘪𝘰𝘶𝘴 𝘋𝘪𝘴𝘦𝘢𝘴𝘦𝘴 on why it does not permit #AI in #PeerReview:
https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(24)00160-9/fulltext

1. In an experimental peer review report, #ChatGPT "made up statistical feedback and non-existent references."

2. "Peer review is confidential, and privacy and proprietary rights cannot be guaranteed if reviewers upload parts of an article or their report to an #LLM."
Dieser Beitrag wurde bearbeitet. (8 Monate her)
Update. "Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these [#CS] conferences could have been substantially modified by #LLMs, i.e. beyond spell-checking or minor writing updates."
https://arxiv.org/abs/2403.07183

#AI #PeerReview
Update. "We demonstrate how increased availability and access to #AI technologies through recent emergence of chatbots may be misused to write or conceal plagiarized peer-reviews."
https://link.springer.com/article/10.1007/s11192-024-04960-1

#PeerReview
Update. "Researchers should not be using tools like #ChatGPT to automatically peer review papers, warned organizers of top #AI conferences and academic publishers…Some researchers, however, might argue that AI should automate peer reviews since it performs quite well and can make academics more productive."
https://www.semafor.com/article/05/08/2024/researchers-warned-against-using-ai-to-peer-review-academic-papers

#PeerReview
Update. The @CenterforOpenScience (#COS) and partners are starting a new project (Scaling Machine Assessments of Research Trustworthiness, #SMART) in which researchers voluntarily submit papers to both human and #AI reviewers, and then give feedback on the reviews. The project is now calling for volunteers.
https://www.cos.io/smart-prototyping

#PeerReview
Dieser Beitrag wurde bearbeitet. (6 Monate her)
Update. These researchers built an #AI system to predict #REF #assessment scores from a range of data points, inc #citation rates. For individual works, the system was not very accurate. But for total institutional scores, it was 99.8%. "Despite this, we are not recommending this solution because in our judgement, its benefits are marginally outweighed by the perverse incentive it would generate for institutions to overvalue journal impact factors."
https://blogs.lse.ac.uk/impactofsocialsciences/2023/01/16/can-artificial-intelligence-assess-the-quality-of-academic-journal-articles-in-the-next-ref/

#DORA #JIFs #Metrics
Update. This editorial sketches a fantasy of #AI-assisted #PeerReview, then argues that it's "not far-fetched".
https://www.nature.com/articles/s41551-024-01228-0

PS: I call it far-fetched. And you?
Update. New study: "The majority of human reviewers’ comments (78.5 %) lacked equivalents in #ChatGPT's comments."
https://www.sciencedirect.com/science/article/abs/pii/S0169260724003067

#AI #LLM #PeerReview
Update. #AI researchers are among those pissed when #PeerReview of their work is outsourced to AI.
https://www.chronicle.com/article/ai-scientists-have-a-problem-ai-bots-are-reviewing-their-work
(#paywalled)

One complained, “If I wanted to know what #ChatGPT thought of our paper, I could have asked myself.”

Anselmo Lucio hat dies geteilt