Easy chair:Topics For Inviting Speakers

From GenBioWiki

Jump to: navigation, search

Contents

[edit] Science Commons and Open Science

[edit] The problem

One good way to define this topic is by quoting the first three paragraphs from the Science Commons website:


"There are terabytes of research data being produced in laboratories around the world, but the best web search tools available can’t help us make sense of it. Why? Because more stands between basic research and meaningful discovery than the problem of search.Many scientists today work in relative isolation, left to follow blind alleys and duplicate existing research. Data is balkanized — trapped behind firewalls, locked up by contracts or lost in databases that can’t be accessed or integrated...."

"The consequences in many cases are no less than tragic. The time it takes to go from identifying a gene to developing a drug currently stands at 17 years — forever, for people suffering from disease."


Two main factors should be considered when trying to understand the Science Commons statement above: copyrights for published data, data formats and standards


[edit] Copyrights for published data

An excellent article by Peter Murray-Rust on Nature Preceedings documenting from personal experience and also using good examples from others, on what happens when you "hand" your article to a peer-reviewed journal for publication.

The central idea is that for many (if not most) of the traditional, non-open access journals, whenever you publish an article they acquire copyright over the text, data-figures in the main text and supplemental material (you have to sign a "Copyright Transfer Agreement" to the publishing house, as part of the publishing process). To demonstrate the implications of that fact, Murray-Rust reports on a real case of a student that blogged some of the supplemental material of an article in order to comment on some conclusions she reached from the data, and then she started getting notices for legal action from the journal because of copyright violations !

Does not this come in contrast with the fundamental spirit of science, sharing results in order to compare with that of peers and get feedback and to improve or own work ? If we think about it, we all do a copyright infringement each time we put up on our presentations a graph from a paper so that we can compare it with our results.

Just imagine yourself in the situation where you read an article and see a pattern on a graph which the authors haven't observed and wrote about in their article (or maybe they noticed and did not want to write about). You probably cannot publish a new article for just an observation (you can try to send a letter to the journal, but will they publish it?). You can contact the author to let them know, and also you can include a slide about what you noticed on the article's graph on a presentation at your Institute's seminar (the journal cannot see the copyright infringement by putting the graph on your slides and showing it within a conference room). But how far does the last two get you ? .

What about if you want to post your observations on the medium that will spread your word most (the web) ? After the real world example I told you, would you do it and wait maybe for the journal to contact you that you posted on the web copyrighted material ?

In Murray-Rust's paper it is also advocated that while it seems logical for the text of an article to be copyright-able (because it is the art of the author, like when writing a novel for example), the same is not true of the data. The data belong to the public because they are facts measured from nature.

[edit] Data formats and standards

Primarily the problem comes from the fact mentioned in the first section, of data being locally deposited behind the firewall of each research group in all sorts formats. Probably we all have seen the "data available upon request from the authors" in research papers, and most of requested and received the data from a publication, only to find out that it takes quite some time to figure out the on-the-fly-quickly-designed-for-my-current-application format in order to built upon the data.

The whole point is to be able to make scientific data re-useful, and enable integration from the fragmented and disparately-formated data. Just being able to integrate easily data from a set of experiments on even a small set of objects of biological research (some proteins that have been found to be connected to cancer), would probably bring forward a completely new set of insights. There is a whole range of research in this area, such as for example the semantic web. But in order to connect with our journal copyright example above I would like to focus in a much more simple and every day example :.

So you are conducting research in one topic, maybe there are 25, 50, 150 (of broader scope) papers related to it. You can probably read all those the first year of your PhD and keep referring back to them (you cannot memorize all of the content) while you advance in your experiments in subsequent years. But is there less boring thing than going back to a paper, and trying to spot the section that you feel that gives a clue to a result you got from an experiment ?

That is where data standards comes to the rescue. Over at NCBI Pubmed is an open access repository of peer-reviewed articles that their content is deposited in XML. In a few words, Pubmed's XML tags all sections of an article such as <Title>, <Abstract>, main text <Section>, section's paragraphs, <Images> etc. Therefore, journal articles in XML data-standard format, translates to the ability of transferring all the tedious "let's go and find those couple of sections I remember I read somewhere in these 15 papers", to something your computer can do. To give some hints how to do that, one approach is building something that parses XML in Perl or any other favorite programming language of yours (google that, there's gonna be something out there already). So with the right code, you simple have every chunck of the article easily accessible, and you can perform operations like "bring me paragraphs of the 25 articles that contain such and such keywords".


A second approach would be to use something that aggregates and processes data from around the web like Yahoo! Pipes (have a look at this even though I think it's broken, so feel free to clone it and play with workflow's source). And of course you can all the way to the other end of literature mining (here's some links for such tools).

To conclude, Pubmed enables all that because it follows the XML standard and has the text of the articles open and free. But there is only one very small fraction of the peer-reviewed journals that deposit there. For all the rest, you will have to re-open all those papers and search to find the section of interest. And thinking one step further, we can realize how much this slow process of manually collecting information from the literature can hinder research in mission-critical areas such as drug research.

[edit] Proposed Speakers

One proposed speaker for Science Commons is James Boyle. I have put a link on his web page as reference, but don't look at it until we need to call him up for giving the talk.

Instead look at this video of his talk over at Google, and you get the whole picture of open science. In this talk there's a brief introduction to the Creative Commons idea, and then how this is applied to Science Commons in explained. Furthermore in this talk you get a dose of technical stuff, on the problem of biomedical published data being locked behind database html front-ends, and how this slows down research. The solution to that through Open / Commons Science is by (free) access to standardized information (we are not talking Semantic Web here, even if it is part of the deal).

In addition to these, this candidate is a good deal because he is located close by in Durham, NC.


A second proposed speaker is John Wilbanks with his talk over at MIT, to give you some more perspective on Open Science and a sample of what he could talk about.


[edit] References

Science 2.0: Great New Tool, or Great Risk?

Freeing the Dark Data in Scientific Experiments

The Future of Science is Open, Part 1: Open Access, also Part 2 and Part 3

"Open Data" article from Wikipedia

Open source drug discovery, an article from the Economist magazine

Another reason for opening access to research from BMJ 2006;333(7582):1306

[edit] Synthetic Biology

[edit] Proposed Speakers

[edit] Personalized Genomics in Healthcare and Beyond

Companies such as 23andme have arisen, offering personalized genomic services. These have many important implications, particularly in the future of medicine, but also in the realm of patient privacy, insurance policies, and general societal acceptance of this technology.

[edit] Proposed Speakers

  • Deepak Singh, a medical informatics expert and prolific blogger

[edit] Science from Science Fiction

[edit] Proposed Speakers

Alexander G. Volkov is the editor of Plant Electrophysiology. I can not find his personal website, but I do have his email address: [1].

The reason why I suggested the Prof. Volkov is that I am interested in how to generated the pure energy from plant not algae. The primary idea is Plant + electric eel = electric plant.

We do know that some plants like Fly Trap can generate strong action potentials and electric eel can generate electric currency with 500 voltage. The magic electric organ of eel is actually made up from thousands of stacked electroplaque cells. We know electric currency derived from voltage-gated ion channels on cell membrane. Theoretically, Electrochemistry of plant did the similar job. So I think plant leafs may act as Solar Electric Photovoltaic Panels to generate electric power could be used directly instead of saving solar energy to chemical ATP bonds.

The comments above are some kind of Scifi, but if you are interested as well, contact me [2].

Personal tools