First of all, this question relates to an issue that affects several areas of science, but since StackExchange doesn’t have a meta science section I’ll make it specific to computer science, which is pertinent because the solution to the bigger problem may actually come from computer science.
In the context of the news that Springer and IEEE published more than 120 nonsense papers my question is as follows:
What rigourous set of methods can we apply to the process of publishing scientific papers so that we can quickly verify the reproducibility of the experiments?
We already have systems like Turnitin that are highly efficient at detecting plagiarism, yet I don’t know of any system that can score a piece of work on its scientific soundness.
Is there any ongoing work related to this? I found out about Semantic Publishing whilst composing this question, but I have no idea what other approaches, if any, are being actively worked on.
This is a complex issue that affects all areas of science but has been getting higher visibility as the mainstream media has reported some cases in headlines. An answer seems to be better review systems. However, one might argue that nonsense papers are not necessarily a failure of the peer review system. All peer review systems must be human to some degree and all human systems are fallible/ subject to failure. any peer review system will have both false positives and false negatives in the sense of papers that were accepted but shouldnt be on 2020 hindsight, and papers that were rejected but were of acceptable quality. There is some increasing awareness/ sociological study of peer review systems. Cyberspace can in fact aid the process in some ways by increasing reviewers, increasing visiblity of reviews, adding rating systems, etc. and it can have downsides such as computational ease of creating fake submissions, increased AI capabilities to fool humans, etc.
An example of a CS specific peer review (meta-) analysis can be found in the recent NIPs experiment where peer reviewers were split into two groups, the same papers were given to each, and the overlap in acceptance/ rejection decision was measured. not unsurprisingly to many, results had quite high variance. Researchers overcome “false negatives” by resubmitting papers to other conferences. Unfortunately this NIPs experiment never seemed to be documented except across a lot of CS blogs and there is already some “link rot” of key links. It was announced informally at the conference and many insiders blogged on it including those participating. A full documentation might be considered “airing dirty laundry”.