The long march to open science

02/Sep/2016

Many researchers are positive about the new, burgeoning science culture, but they still hesitate to enter into an open exchange of knowledge. There are many reasons why – such as a lack of knowledge about data management and the fear of intellectual property theft. By Sven Titz

​(From "Horizons" no. 110 September 2016)​​​

Scientists disclose all their study plans and experimental designs; they write daily blogs about their progress in the lab, revealing every detail; and then they publish in open-access journals that are assessed through an open peer-review process. And their results are stored in databases that are on open access to everyone. This is the utopia of open science.

Are we about to attain such a state of transparent research? Well, things are unlikely to develop quite so straightforwardly. Sometimes it's because there's just not enough money. Sometimes people aren't in a position to set up the required databases. And sometimes scientists hesitate to reveal their data because they fear that competitors could steal their ideas and publish them first.

Tougher than expected

The successes achieved by Big Science in laying open research data are deceptive – whether at the nuclear research centre CERN or in genetics research. In many fields outside the big projects, there are still major obstacles to progress. It's easy to insist that data should be made freely available. But individual researchers who have neither the means nor the expertise can be driven to despair over it. And because individuals and small research groups find it difficult to place their data on open access, it means other scientists find it difficult to use that data. "Many researchers lack the time and the knowledge they need to be able to document their data adequately and make it available", says Benedikt Fecher. He is currently doing his doctorate at the German Institute for Economic Research and the Alexander von Humboldt Institute for Internet and Society in Berlin, and he has been investigating researchers' attitudes to open science.

In the USA and Europe, the research funding organisations have proclaimed their support for data disclosure. But this isn't enough to enforce the standards of open science. Researchers also need organisational, financial and personnel support. This is just what is offered by the Swiss Centre of Expertise in the Social Sciences (FORS), for example. It helps with processing, documenting and storing research data in the social sciences, and provides the necessary infrastructure for it. Researchers can attend workshops to gain further skills and get access to online tools for data management, for instance.

Open data is already established in the natural sciences, but the social sciences are still shy about the concept. This is partly because they usually work with personal data that is subject to data-protection laws. But this isn't the only reason. According to Alexandra Stam, the head of the Data Promotion group at FORS, one of the problems is that social scientists are in general not used to documenting their data in a standardised fashion. "Many researchers don't realise that their data can have an existence after their own work is completed". This means that much potentially valuable data and many important details are lost unnecessarily.

This predicament is in part a result of how people are trained today. Data management isn't taught formally as part of degree courses, says Stam. And often, researchers simply fail to document their data during their project, but instead only start the documentation process when it's nearing completion.

In some countries, such as the USA and the UK, a data management plan often has to be provided when an application for research funding is submitted. In Switzerland, this is not yet the case. Stam hopes that it will happen soon. Furthermore, it's essential that data should be stored in permanent databases after it's been documented and placed on open access. Otherwise, the issue of data maintenance can be left up in the air after the end of a project.

The online blackboard

Mathematicians such as Emmanuel Kowalski of ETH Zurich present problems, discuss them and solve them together. The Polymath Project functions like a blog: it's open to all and is nourished by discussion in the form of comments. Instead of keeping their research projects secret, researchers are spontaneously pooling their strengths.

Photo: Valérie Chételat

 

In principle yes, but …

We still can't be overly optimistic about the prospects for open data – despite institutional assistance such as that offered by FORS. Even when researchers aren't left to cope on their own, many still hesitate to reveal their data. In his survey, Fecher has discerned a discrepancy between a generally positive attitude to open science on the part of researchers, and their personal hesitance to reveal their own data.

Often, the fear of intellectual property theft is what holds researchers back. This risk might be heavily overstated, but it can't be denied that there have indeed been such cases. The genetics researcher Titus Brown at the University of California in Davis reports that competitors once used his data for their own articles after he had placed it on open access – articles he could have written himself. Nevertheless, he remains an advocate of open science. Brown is convinced that it's of use to research. Naturally there are also other reasons for this hesitation. People might be in favour of transparency, but it can still be held back by certain established rights. In empirical medical research, for example, it is still the custom today – however antiquated it might seem – for the author of data to be listed as the co-author of any new studies based on it, says Fecher.

Few in​​​centives

Many observers complain that there is a general lack of incentives to disclose data. Today, researchers are measured by the quality and quantity of their publications. But there is no similar academic recognition for datasets. "Researchers would welcome it", says Fecher. Stam also believes that such an incentive would be significant. "It's important that people recognise the usefulness of good data management for their own research – above and beyond the act of sharing data".

Nevertheless, in recent years, many so-called data journals have emerged that place their main focus on publishing new datasets. The best known of these is probably 'Scientific Data' from the Nature publishing group. But archaeology, the geosciences, chemistry and other branches of science have also seen the emergence of subject-specific data journals. These specialised media will fill the gap until the day when research data is given formal recognition.

​The frivolous candour of notebooks

When it comes to revealing the research process itself, things are a little different. Such as in 'open lab notebooks'. The ecology researcher Carl Boettiger of the University of California, Berkeley, began putting his research notes online when he was still a doctoral student. As he admits today, he was simply lucky: he went about it quite naively, and none of his bosses took umbrage at his notebook. But this isn't usually the case. Some young researchers irritate their colleagues by being a bit too candid about what they put on open online access. In some cases, they can even damage their careers.

Boettiger uses his notebook primarily as an aide-memoire and to exchange information with colleagues, whom he can refer directly to specific content. Now and then, his co-authors of articles ask him to hold back sensitive information, he says. But otherwise, he always writes up everything straight away. None of his ideas have as yet been stolen from his open notebook. Along with the assorted vague worries that exist about open science, there is, however, a real problem with open lab notebooks: they can eat up too much time. According to Boettiger, you have to familiarise yourself with special programs, depending on your degree of IT knowledge. Because he is overall keen to simplify open science in all its facets, Boettiger founded the project 'rOpenSci' a few years ago. It's a platform that offers software for processing and disclosing scientific data, and which is also useful for keeping lab notebooks.

 

Pergament 2.0

manuscripts so that researchers all over the world can study them. But it's even better to be able to comment on them from a distance. At the University of Bern, the historian Tara Andrews makes her annotations with T-PEN, and shares them online.

Photo: Valérie Chételat

Companies holding back hardware​

Naturally, open science isn't confined to data and communication. In open-source projects, hardware and software are also transparent. Circuit diagrams and construction plans are placed on open access – just like the source code of open-source software, explains Lorenz Meier, a doctoral student at the Institute for Visual Computing at ETH Zurich. Meier has worked on several projects together with outside companies. He was usually able to insist on working with open hardware and software. In the case of open-source software, that also meant companies were often prepared to pass on the improvements made during the course of a project.

Together with his colleagues, for example, Meier developed the autopilot software 'PX4' that allows you to control drones and miniature airplanes. The software and the instructions for the hardware are offered as free downloads. Nothing else would make any sense, says Meier. "For drones, the open-source solutions are even better than military software". No longer is any company in a position to prevent the new development of better software.

Meier thinks that his collaborations with companies work well – though not always right from the outset. In his experience, companies tend to block open access when problems occur – for example, if they feel that their business model is under threat. In order to dispel such resistance, you have to explain where a company can actually make money from a project. And it's rarely the construction plans or the software that bring in income. Instead, it comes from marketing the company's expertise and services.

Oliver Gassmann works at the Institute of Technology Management at the University of St. Gallen. He confirms that models such as Linux, where the source code is openly accessible and enjoys no protection, have proven themselves in the marketplace. Companies have recognised their advantages to the extent that they sometimes even donate patents to the open-source movement. "That means new standards prevail much quicker than they would with protected solutions", says Gassmann. In such cases, the company's task is to seek added value elsewhere.

Basically, Gassmann finds that the cooperation between research institutes and private companies is a positive thing. The companies get access to fundamental knowledge, and the researchers get additional funding. Open science could cause conflict, however, if the researchers were to publish so early that they reveal the current state of the technology while a patent application is still being processed. But this is a fundamental problem that also occurs in classical collaborations between universities and business partners, says Gassmann. With open science, the problem is simply exacerbated.

From the lab bench to the world

Experiments, successes and failures – biologists store all their observations on either paper or computers. The researchers of the international Open Source Malaria Project go one step further and open up their lab journals online to everyone – as does Volker Heussler of the University of Bern. This is the best way to document progress and to prevent other scientists from repeating the same mistakes.

Photo: Valérie Chételat

 

​The problem of the private sphere

sphereThis call for transparency can be pushed too far, however, if open information is misused in order to damage the reputation of scientists. Climate researchers – especially those in the English-speaking world – can testify to the frustrating impact of data disclosure such as is allowed by the 1967 US Freedom of Information Act. Often, such information has subsequently been used to disparage mainstream climate research. Michael Mann of Pennsylvania State University is probably the most prominent victim of such activities.

So it's not so easy to decide just how far researchers should go in disclosing their work. The pressure to be overly transparent can also lead to unwanted results. Self-censorship, for example, can lead to conformist behaviour. And that in turn would be counterproductive to open science's hopes for success.

When the rights of third parties are involved, the private sphere can become a minefield for open science – such as when patient data from clinical or genetic studies is made accessible to third-party medics. The consequences can be really exasperating. Doctors with patients suffering from very rare diseases need to know of concrete, comparable cases to help them find the best possible therapy – but data protection laws have until now often stood in their way.

Nevertheless, solutions do exist, even for such difficult cases. In 2013, for example, the 'Global Alliance for Genetic Health' was founded – a worldwide association with more than 380 institutional members. It develops sophisticated procedures so that patient data can be shared safely and effectively on a volunteer basis. It has made a carefully differentiated permit model for the release of data by patients, and has developed algorithms to aid data access. Ultimately, such an exchange of patient data should support research into cancer and into rare and infectious diseases.

There is still a lot to be done if we are going to realise the cultural shift towards open science, despite all the obstacles.

 

Sven Titz is a science journalist in Berlin.