Wikipedia:Featured article candidates/Folding@home/archive3
- The following is an archived discussion of a featured article nomination. Please do not modify it. Subsequent comments should be made on the article's talk page or in Wikipedia talk:Featured article candidates. No further edits should be made to this page.
The article was promoted by Ian Rose 07:13, 25 October 2012 [1].
- You may be looking for a different page: see https://en.wikipedia.org/wiki/User_talk:SandyGeorgia#Folding@home SandyGeorgia (Talk) 18:42, 6 December 2020 (UTC)[reply]
Folding@home (edit | talk | history | protect | delete | links | watch | logs | views)
Toolbox |
---|
- Nominator(s): • Jesse V.(talk) 01:08, 17 September 2012 (UTC)[reply]
I am nominating this article for Featured Article because I believe that it meets the FA criteria. It uses a multitude of reliable citations, is GA-class, has undergone a peer review, and thoroughly describes the subject in an encyclopedic fashion. This article should stand a very good chance at becoming a Featured Article, and I look forward to your comments and suggestions! • Jesse V.(talk) 01:08, 17 September 2012 (UTC)[reply]
- This page is getting lengthy, so as a TL;DR, here is a summary of the current situation: discussion has been very extensive, support appears to be slowly growing in favor of promotion, and the following editors have significantly contributed:
- PumpkinSky has voiced support following an image check and citation comments.
- Crisco 1492 supports promotion following prose comments and a request for an improved image, which have both been resolved and moved to the Talk page.
- Emw is supporting after identifying some clarity weaknesses in the prose and asked for the insertion of several explanatory images. He has knowledge and background in the relevant fields, and is the primary editor of the FA-class Rosetta@home article.
- EdJohnston left some suggestions on prose and proposed a new lead, but has not responded to discussion about them despite several reminders. He also has a background in the relevant fields.
- Hekerui left prose comments which should now be all resolved.
- Tonystewart14 identified some link issues, which have been taken care of.
- Ian Rose identified some minor technical issues, which were addressed.
- Comments from PSKY
- A well written lead is a summary. As such, it should need few, if any, refs. If it's ref'd in the body, you don't need it in the lead unless it's an exceptional claim. This is the rule of thumb, though of course sometimes mileage varies. I highly doubt your lead needs 12 refs. You should be able to remove most, possibly all refs from the lead. See the FA Yogo sapphire, which has a longer lead, but with no refs.
- I have removed citations from the lead where there was supporting information in the article. The information in the info template is not supported, so I left the citations there. • Jesse V.(talk) 02:28, 17 September 2012 (UTC)[reply]
- That should be one of your top priorities then. Information like the software release date should be included somewhere in the article. It is currently mentioned in the lead but not later in the article which isn't in line with WP:LEAD. I haven't had a chance to read the whole article yet, but it seems to be lacking history. There certainly isn't a section. What led to its creation? I've got a source here, Jane McGonigal's Reality is Broken, which says that "Since 2001, anyone in the world has been able to connect their personal computer to the Folding@home network." PS3 came later, specifically because they were more powerful. Granted, the information is available (that it started in 2007) in the Playstation 3 section, but I feel that is poor organization. Should some of that not be mentioned in the Client section or a base history section that doesn't currently exist? I hope to get a chance to read it fully soon, but I'm a bit concerned about the amount of technical language used. Of course an article like this will have technical language, but this seems to have an overwhelming amount of it. This is an interesting topic, and I wouldn't like the technical information in it to make reading a bore. Ryan Vesey 02:47, 17 September 2012 (UTC)[reply]
- I may have missed something in the policies, but I don't think the infobox is considered as part of the lead. I consider it part of the body. Take for example the FA article acetic acid which has all sorts of numbers and things in its infobox which is is not in the body. To duplicate version numbers in the article also makes it harder for other editors to update the version number, and its possible that the two could get out of sync, which wouldn't look good. • Jesse V.(talk) 03:09, 17 September 2012 (UTC)[reply]
- Regarding the reasons that led to F@h's creation: the article presents information on why this project is important. From the early publications on F@h, it seems that the project was developed for these reasons. If a change is in fact needed, I'm not sure how to present the information. Those early papers seem less scientific and a bit more opinionated. The article does mention the first screensaver, I could put a release date on that. • Jesse V.(talk) 16:10, 18 September 2012 (UTC)[reply]
- I mentioned the launch date in body of the article like you wanted. It's tied into the line about the screensaver in the Client section. • Jesse V.(talk) 20:22, 18 September 2012 (UTC)[reply]
- The article talks about F@h's history, the article does talk about it. It presents a history of the GPU, SMP, and PS3 clients in their respective sections, since they all developed separately. They all started from a screensaver, which is also discussed in the Client section. With this in mind, I believe the organization makes sense. • Jesse V.(talk) 03:09, 17 September 2012 (UTC)[reply]
- The subject is technical in nature and in order to meet FA's criteria I thought details were important. An expert shouldn't go wanting. Please see this section in the article's Talk page for some past comments on this from Dmitrij D. Czarkoff, the GA reviewer. If you has specific areas that you think I can improve, or you have suggestions on rewording, please let me know and I'll do my best to address them. • Jesse V.(talk) 03:09, 17 September 2012 (UTC)[reply]
- Much better on lead refs. Yes infoboxes are on their own. I didn't notice some refs were in it. But again, much of it will be in the body.PumpkinSky talk 10:01, 17 September 2012 (UTC)[reply]
- That should be one of your top priorities then. Information like the software release date should be included somewhere in the article. It is currently mentioned in the lead but not later in the article which isn't in line with WP:LEAD. I haven't had a chance to read the whole article yet, but it seems to be lacking history. There certainly isn't a section. What led to its creation? I've got a source here, Jane McGonigal's Reality is Broken, which says that "Since 2001, anyone in the world has been able to connect their personal computer to the Folding@home network." PS3 came later, specifically because they were more powerful. Granted, the information is available (that it started in 2007) in the Playstation 3 section, but I feel that is poor organization. Should some of that not be mentioned in the Client section or a base history section that doesn't currently exist? I hope to get a chance to read it fully soon, but I'm a bit concerned about the amount of technical language used. Of course an article like this will have technical language, but this seems to have an overwhelming amount of it. This is an interesting topic, and I wouldn't like the technical information in it to make reading a bore. Ryan Vesey 02:47, 17 September 2012 (UTC)[reply]
- I have removed citations from the lead where there was supporting information in the article. The information in the info template is not supported, so I left the citations there. • Jesse V.(talk) 02:28, 17 September 2012 (UTC)[reply]
- "are no longer in active service.[146][4][147]", "usage is unaffected.[40][2]" refs are out of sequence. Your referential duckies must all be lined up.
- The referential duckies appear to be all lined up and floating nicely now. Good catch. • Jesse V.(talk) 02:28, 17 September 2012 (UTC)[reply]
- Ref 118, the Chinese one, is misformatted
-
- all for tonight. It's too late for me to go through this thoroughly, so if you don't see me here again after two days, ping my talk page. PumpkinSky talk 01:50, 17 September 2012 (UTC)[reply]
- Image check by PSKY
- Noting that the F@H Logo has an OTRS ticket, so it should be fine.
- File:F@h_v7_novice_shot.png is there a page on that website that says this is free software?
- PumpkinSky talk 23:48, 18 September 2012 (UTC)[reply]
- Regarding File:F@h_v7_novice_shot.png, that's a picture of FAHControl, which is GPL 3.0. I'm running V7, and if I click the About button (you can see it in the upper right-hand corner of the pic) the About screen says "GPL v3.0" and has a link to the GPL3 page. See this page for additional confirmation. • Jesse V.(talk) 00:07, 19 September 2012 (UTC)[reply]
- GPL confirmation is important so I've added that link to the image licensing section. PumpkinSky talk 01:58, 19 September 2012 (UTC)[reply]
- So I see. Thanks. • Jesse V.(talk) 02:01, 19 September 2012 (UTC)[reply]
- GPL confirmation is important so I've added that link to the image licensing section. PumpkinSky talk 01:58, 19 September 2012 (UTC)[reply]
- Regarding File:F@h_v7_novice_shot.png, that's a picture of FAHControl, which is GPL 3.0. I'm running V7, and if I click the About button (you can see it in the upper right-hand corner of the pic) the About screen says "GPL v3.0" and has a link to the GPL3 page. See this page for additional confirmation. • Jesse V.(talk) 00:07, 19 September 2012 (UTC)[reply]
- Support now
Leaning support now, pending work on table and source check.PumpkinSky talk 14:08, 28 September 2012 (UTC)[reply]
- "Work on table"? Sorry, what exactly would you like to have fixed? • Jesse V.(talk) 14:19, 28 September 2012 (UTC)[reply]
- The stuff Crisco mentioned on 24 Sep. Is that done now?PumpkinSky talk 16:41, 30 September 2012 (UTC)[reply]
- I think so. • Jesse V.(talk) 17:59, 30 September 2012 (UTC)[reply]
- Supporting now. I was waiting for someone like Emw to support as he seems to know a lot on the tech side of this. The other aspects are fine now IMHO.PumpkinSky talk 18:36, 20 October 2012 (UTC)[reply]
- I think so. • Jesse V.(talk) 17:59, 30 September 2012 (UTC)[reply]
- The stuff Crisco mentioned on 24 Sep. Is that done now?PumpkinSky talk 16:41, 30 September 2012 (UTC)[reply]
- "Work on table"? Sorry, what exactly would you like to have fixed? • Jesse V.(talk) 14:19, 28 September 2012 (UTC)[reply]
Comment Remove some external links per WP:ELNO. TBrandley 14:17, 19 September 2012 (UTC)[reply]
- Can you please specify which ones you would like me to remove? • Jesse V.(talk) 14:30, 19 September 2012 (UTC)[reply]
- Based on relevance, generally. I'd trim Folding@home support forum, "Futures In Biotech" episode about Folding@home, and Video of 1.5ms folding of NTL9 for starters. — Crisco 1492 (talk) 23:08, 19 September 2012 (UTC)[reply]
- Addressed comments from Crisco 1492 moved to talk
- Support now that Emw has finished his review.
Leaning support; currently waiting on Emw and Ed to vet this (I'm not comfortable supporting without someone better versed in the topic than me going over it)— Crisco 1492 (talk) 04:44, 5 October 2012 (UTC)[reply]
Comments from Emw: I am knowledgeable about biochemistry and have some background in this article's general topic area. I did an intensive review of this article almost a year ago. I plan to review it again here within a week or so. Emw (talk) 02:19, 20 September 2012 (UTC)[reply]
- Project significance
- This is the article's most advanced section and it's important to simplify the subject matter as much as possible. Here are some points along those lines:
"Due to the complexity of proteins' conformation space and limitations in computational power, all-atom molecular dynamics simulations have been severely limited in the timescales which they can study."
- Although it's wikilinked, there should be some sort description of what "conformation space" is, and even what "conformation" is. Conformations are pretty central to understanding what Folding@home does, so the brief note would be valuable to virtually all readers. A note in the body of the article as well as the protein folding image caption would work well. Emw (talk) 03:24, 27 September 2012 (UTC)[reply]
- All right. I'll think about how best to do this. • Jesse V.(talk) 20:32, 27 September 2012 (UTC)[reply]
- Added a note, and expanded the caption. One alternative is to add an explanation to the Notes section at the bottom of the page. • Jesse V.(talk) 00:15, 29 September 2012 (UTC)[reply]
"General-purpose supercomputers have been used to simulate protein folding, but such systems are intrinsically expensive and typically shared between many different research groups, and because the computations in kinetic models are serial in nature, strong scaling of traditional molecular simulations to these architectures is exceptionally difficult."
- This is a long sentence -- please break it into at least two sentences. Emw (talk) 03:24, 27 September 2012 (UTC)[reply]
Instead, proteins spend the majority of their folding time – nearly 96% in some cases[18] – "waiting" in various intermediate conformational states, each a local thermodynamic free energy minimum in the protein's energy landscape."
- I think this article and Wikipedia more broadly would benefit from a graphic illustrating what an "energy landscape" is. Without an image I think the idea is difficult to conceptualize and will strike a lot of readers as academic hand-waving. An image like this energy landscape would be great. I'm not aware of any free alternative, but a bumpy 3D funnel with those axis labels would help a lot of people have the "ah-ha" moment needed to understand most of this section. Such a graphic would help dozens of other articles, too.
If you're handy with Python, there's an example of something close to an energy landscape plot in the bottom of the matplotlib gallery. Such plots are also doable in R or in JavaScript with D3 or WebGL. I might be able to make this if you don't have the time or inclination.Emw (talk) 03:24, 27 September 2012 (UTC)[reply]- There's an article on the folding funnel but it lacks an illustration. I'll look into wikilinking to that article and mentioning it in the article. • Jesse V.(talk) 20:32, 27 September 2012 (UTC)[reply]
- Alright, I'll take a look into making the image. If you plan to do that though, let me know so our efforts aren't duplicated. Emw (talk) 03:20, 28 September 2012 (UTC)\[reply]
- Thanks very much. You're probably better able to make a proper folding funnel illustration than I am. • Jesse V.(talk) 00:15, 29 September 2012 (UTC)[reply]
- With the recently added MSM image, a reasonably sized folding funnel graphic would sandwich the text too much. Emw (talk) 04:28, 5 October 2012 (UTC)[reply]
- Thanks very much. You're probably better able to make a proper folding funnel illustration than I am. • Jesse V.(talk) 00:15, 29 September 2012 (UTC)[reply]
- Alright, I'll take a look into making the image. If you plan to do that though, let me know so our efforts aren't duplicated. Emw (talk) 03:20, 28 September 2012 (UTC)\[reply]
- There's an article on the folding funnel but it lacks an illustration. I'll look into wikilinking to that article and mentioning it in the article. • Jesse V.(talk) 20:32, 27 September 2012 (UTC)[reply]
"As the simulations discover more conformations, the trajectories are restarted from them, and a Markov state model (MSM) is gradually created from this cyclic process. MSMs are discrete-time master equation models which map out a biomolecule's conformational and energy landscape by describing its set of distinct structures and the transition rates between them."
Markov state models are in the same boat as energy landscapes: understanding them is fairly central to understanding this article, but generally only upper-level undergraduate students in a relevant field (e.g. math or computer science) or above will know what the concept is. The explanation of MSMs given in this article isn't bad, but it will leave most readers bewildered. Do any of the journal articles illustrate how Folding@home uses MSMs? If so, I think it be a major boon to this article's accessibility if similar diagrams could be adapted from published ones and included in this article.Emw (talk) 03:24, 27 September 2012 (UTC)[reply]- Yes, but the explanation is the journals is often really technical and I was struggling to understand them. This page seems to have the simplest explanation, and includes two diagrams. I really like the illustration of NTL9's MSM, should I pursuit getting a Creative Commons license on that? • Jesse V.(talk) 20:32, 27 September 2012 (UTC)[reply]
- Getting one of the MSM illustrations in the What do MSMs look like? section of that page freely licensed would be superb. However, I'm not optimistic about the chances of that happening, since both images are in articles copyrighted by the American Chemical Society. But asking couldn't hurt. The more efficient path would likely to be to get the PDB files for the various transitions states for the "MSM for the ACBP protein" image (on the right), quickly loading and exporting images of them in PyMOL, then drawing lines between them in such a way that the resulting image is notably different than the copyrighted version. If you could ask a Folding@home researcher for those PDBs then I could take it from there. If you can't get the PDB files, then substituting the transition state images with letters or numbers seems like the next best alternative. Emw (talk) 03:21, 28 September 2012 (UTC)[reply]
- I sent an email to the ACS about the NTL9 MSM image, and another email to Drs. Voelz and Bowman for the PDB files for the other protein. I have no experience with PyMOL or anything like that. • Jesse V.(talk) 00:15, 29 September 2012 (UTC)[reply]
- Yesterday afternoon Dr. Voelz replied saying "Recreating a graph with new PDBs seems like a lot of work. Attached is a "side view" version of the ACBP TPT diagram, that should avoid any copyright issues. Let me know if this works for you." and he attached this PDF. I have yet to receive a response from the ACS about the NTL9 MSM image though. • Jesse V.(talk) 07:03, 2 October 2012 (UTC)[reply]
- I got a free copyright from him, and the image has been added to the Project Significance section. • Jesse V.(talk) 20:28, 3 October 2012 (UTC)[reply]
- The MSM image is really great! I think it improves the article a lot, and also provides a comprehension aid and visual enrichment of the corresponding paragraph. Emw (talk) 04:28, 5 October 2012 (UTC)[reply]
- Getting one of the MSM illustrations in the What do MSMs look like? section of that page freely licensed would be superb. However, I'm not optimistic about the chances of that happening, since both images are in articles copyrighted by the American Chemical Society. But asking couldn't hurt. The more efficient path would likely to be to get the PDB files for the various transitions states for the "MSM for the ACBP protein" image (on the right), quickly loading and exporting images of them in PyMOL, then drawing lines between them in such a way that the resulting image is notably different than the copyrighted version. If you could ask a Folding@home researcher for those PDBs then I could take it from there. If you can't get the PDB files, then substituting the transition state images with letters or numbers seems like the next best alternative. Emw (talk) 03:21, 28 September 2012 (UTC)[reply]
- Yes, but the explanation is the journals is often really technical and I was struggling to understand them. This page seems to have the simplest explanation, and includes two diagrams. I really like the illustration of NTL9's MSM, should I pursuit getting a Creative Commons license on that? • Jesse V.(talk) 20:32, 27 September 2012 (UTC)[reply]
"In 2010, Folding@home researcher Greg Bowman was awarded the Thomas Kuhn Paradigm Shift Award from the American Chemical Society for the instrumental development of the open-source MSMBuilder software and for attaining quantitative agreement between theory and experiment."
- Please cite the assertion that MSMBuilder is open source with a link to the source code. Emw (talk) 22:47, 11 October 2012 (UTC)[reply]
- Biomedical research
"Protein folding is normally tightly regulated to ensure that it proceeds smoothly."
This statement strikes me as incorrect. To my understanding, "regulation" in the context of protein folding implies some involvement of chaperone proteins, but most proteins fold spontaneously and without the aid of chaperones. Changing "normally" to "often" would make this more correct to my knowledge, but I'm not sure if "sometimes" might be most correct. If you can't find a reliable source stating whether "normally", "often" or "sometimes" is most accurate to describe how frequently protein folding is regulated (I haven't been able to find one), then this sounds like a question for http://biology.stackexchange.com. Or maybe another lead sentence for this section would work.Emw (talk) 03:20, 28 September 2012 (UTC)[reply]- Ah yes, that does seem to contradict the article's earlier statements. I've improved the line while staying within the provided sources. • Jesse V.(talk) 00:15, 29 September 2012 (UTC)[reply]
- The revised sentence ("Protein folding often spontaneously but can be tightly regulated to ensure that it proceeds smoothly.") seems clunky. I think the second sentence is a better introduction to the section, though listing 11 diseases there seems excessive. Here's my proposed revision; let me know what you think: "Protein misfolding can contribute to a variety of diseases including Alzheimer's disease, cancer, cystic fibrosis, Huntington's disease, sickle-cell anemia, and type II diabetes.[11][30][31] Once protein misfolding is better understood, therapies could be developed that augment cells' natural ability to regulate protein folding. Such therapies could use engineered molecules to alter the production of a certain protein, help destroy a misfolded protein, or assist in the folding process." Emw (talk) 16:20, 30 September 2012 (UTC)[reply]
- That looks a lot better. I replaced the current text with that, though I added Creutzfeldt–Jakob disease since it's a big player. I also changed "can contribute" to "can result" and "could be developed" to "can be developed". Thanks. • Jesse V.(talk) 17:59, 30 September 2012 (UTC)[reply]
- The revised sentence ("Protein folding often spontaneously but can be tightly regulated to ensure that it proceeds smoothly.") seems clunky. I think the second sentence is a better introduction to the section, though listing 11 diseases there seems excessive. Here's my proposed revision; let me know what you think: "Protein misfolding can contribute to a variety of diseases including Alzheimer's disease, cancer, cystic fibrosis, Huntington's disease, sickle-cell anemia, and type II diabetes.[11][30][31] Once protein misfolding is better understood, therapies could be developed that augment cells' natural ability to regulate protein folding. Such therapies could use engineered molecules to alter the production of a certain protein, help destroy a misfolded protein, or assist in the folding process." Emw (talk) 16:20, 30 September 2012 (UTC)[reply]
- Ah yes, that does seem to contradict the article's earlier statements. I've improved the line while staying within the provided sources. • Jesse V.(talk) 00:15, 29 September 2012 (UTC)[reply]
"The Pande lab is a non-profit organization and does not sell the results generated by Folding@home."
The first part of this sentence is imprecise. Stanford University, not the Pande lab, is the 501(c)(3) non-profit entity (see http://folding.stanford.edu/English/Donate). I think wording like the following would be accurate: "The Pande lab is part of Stanford University, a non-profit entity, and does not sell the results generated by Folding@home."Emw (talk) 03:21, 28 September 2012 (UTC)[reply]
"In 2011 they released the open-source Copernicus software..."
Please cite this with a link to the source code.Emw (talk) 23:27, 28 September 2012 (UTC)[reply]- There's a citation to the paper, which says in its abstract that it has a "publicly available implementation". • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
- Yes, my suggestion is to add a citation to publicly available implementation, i.e. its source code. Emw (talk) 16:20, 30 September 2012 (UTC)[reply]
- There's a citation to the paper, which says in its abstract that it has a "publicly available implementation". • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
"The full publications are available online from an academic library."
The fact that relevant journal publications are available online from an academic library holds for virtually all subjects represented on Wikipedia, and I haven't seen it mentioned in any other featured articles. So I think this is unnecessary detail and should be removed.Emw (talk) 23:27, 28 September 2012 (UTC)[reply]- Good point. Removed. • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
"It accounts for more than half of all cases of dementia, and as of 2008 it affects more than 24 million people worldwide, with 4.6 million new cases reported each year."
The sentence previous to the one above seems like enough context. I really like that each section is put into context, but I think there's often a little too much background information. This amount of exposition ends up making 'Biomedical research' lean too much toward being a primer on the various diseases rather than a discussion about what Folding@home is doing in biomedical research on those diseases.Emw (talk) 23:27, 28 September 2012 (UTC)[reply]- I've removed most of that statement. • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
"Its exact cause remains unknown, but the disease is identified as a protein misfolding disease and is associated with toxic aggregations of the amyloid beta (Aβ) peptide, a fragment of the larger amyloid precursor protein. High concentrations of misfolded Aβ42 causes protein oligomer growth leading to aggregation that in turn contributes to Aβ misfolding. This cyclic process appears to be toxic and leads to neuronal cell death. The oligomer aggregates then collect into dense nontoxic formations known as senile plaques, a pathological marker of Alzheimer's.[45][46][47] Due to the heterogeneous nature of Aβ oligomer aggregates, experimental techniques such as X-ray crystallography and NMR have had difficulty characterizing their structures. Moreover, atomistic simulations are extremely computationally demanding due to their size and complexity."
- I think this level of jargon and detail on the etiology of Alzheimer's unnecessarily bogs down almost all readers, and distracts from the overall subject of the article. Some content seems rather extraneous, and removable. I suggest something simpler and more focused, along the lines of the following:
"Its exact cause remains unknown, but the disease is identified as a protein misfolding disease. Alzheimer's is associated with toxic aggregations of the amyloid beta (Aβ) peptide, which are caused by Aβ misfolding and clumping together with other Aβ peptides. These Aβ aggregates eventually grow big enough to form significantly larger senile plaques, a key marker of Alzheimer's disease."Emw (talk) 04:15, 5 October 2012 (UTC)[reply]- I guess the details better belong in the AD article. It should be much simpler now like you suggested. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
I made some changes to the Alzheimer's section. Please review this diff against the references for accuracy. In particular, in the following sentence, does the cited literature indicate that it would be permissible to interchange "dynamics of AB" and "folding of AB"?: "Folding@home simulated the dynamics of Aβ in atomic detail over timescales of the order of tens of seconds. This was significant because previous studies were only able to simulate about 10 microseconds—in other words, Folding@home was able to simulate Aβ folding for six orders of magnitude longer than had previously been possible." If so, I think it would be more much preferable to substitute "dynamics" for "folding", or maybe say "folding dynamics". (I ask to ensure they weren't looking at some other non-folding types of molecular dynamics.)Emw (talk) 04:15, 5 October 2012 (UTC)[reply]
- The sentence should have been more specific. (I have fixed this.) It was simulating the dynamics of Aβ aggregation, not folding. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
The caption of the image in the Alzheimer's disease section should mention how Folding@home relates to the image, and focus less on explaining the steps in the molecular development of the disease.Emw (talk) 00:37, 6 October 2012 (UTC)[reply]
- I added some information. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
"...and in 2010, Folding@home researcher Veena Thomas proposed a novel therapeutic strategy for Huntington's which may be funded by the National Institutes of Health."
Has there been an update on the status of this proposal?Emw (talk) 00:37, 6 October 2012 (UTC)[reply]- No. There's no further updates in the citation and there's been no announcements in the blog about it. I suppose I could email Veena Thomas about it, but then I couldn't cite the reply. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
- Alright, thanks for checking. Although the potential research is likely interesting with regard to the subsection, I think the facts that the research proposal has presumably not been funded and hasn't been updated in two years means that this content doesn't meet the encyclopedia's notability standards. Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
- Maybe it has been funded and there has been an update, but AFAIK there hasn't been anything published. What you said makes sense, so I removed the statement. • Jesse V.(talk) 05:38, 10 October 2012 (UTC)[reply]
- Alright, thanks for checking. Although the potential research is likely interesting with regard to the subsection, I think the facts that the research proposal has presumably not been funded and hasn't been updated in two years means that this content doesn't meet the encyclopedia's notability standards. Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
- No. There's no further updates in the citation and there's been no announcements in the blog about it. I suppose I could email Veena Thomas about it, but then I couldn't cite the reply. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
- In 2004, Folding@home was used to perform the first molecular dynamics study of the refolding of p53's protein dimer in explicit water which revealed insights that were previously unobtainable..."
- I think the background discussion that preceeds the sentence should be trimmed to make room for some brief mention of what a 'protein dimer' is an how it relates to p53; same for 'explicit water'. Also, a brief mention of what those previously unobtainable insights were would be good. Emw (talk) 00:37, 6 October 2012 (UTC)[reply]
- Unlike you I don't have a background in biochemistry or computational biology, so I'm struggling to figure out key statements like "Dimerization of the p53 oligomerization domain involves coupled folding and binding of monomers". I completed most of what you said, but please feel free to help with the background when you have a moment. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
The first paragraph of the Virus section falls into the same issue as the Alzheimer's disease section: too much background discussion of the disease. The 9-sentence paragraph goes into 7 sentences of detailed discussion about the molecular biology and molecular dynamics of viruses and membrane fusion. I suggest trimming the background discussion in this section to about half of what it currently is.Emw (talk) 00:37, 6 October 2012 (UTC)[reply]
- I did some trimming and rewording. • Jesse V.(talk) 19:04, 6 October 2012 (UTC)[reply]
The second paragraph of the Viruses section has one sentence about HIV research in the middle of a paragraph about influenza research. I suggest improving the focus of the second paragraph by making it only about influenza, e.g. by somehow integrating the sentence about HIV into the first paragraph, or removing it altogether. If the HIV sentence needs to be kept and can't be incorporated nicely into the first paragraph, then I'd suggest expanding the article's coverage of HIV research into a new paragraph.Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
- I incorporated the sentence into the first paragraph and did a bit of copyediting there too. • Jesse V.(talk) 18:33, 12 October 2012 (UTC)[reply]
- Participation
"In 2007 Guinness recognized Folding@home as the most powerful distributed computing network in the world."
Does Guinness still assign Folding@home that distinction? In any case, I think this statement should be qualified by the fact that Folding@home is no longer the most powerful distributed computing network. The figures are made available in subsequent sentences for readers to deduce that, but I think it should be made apparent up front. Any idea/sources on when this swap of most powerful distributed computing network occurred?Emw (talk) 02:51, 10 October 2012 (UTC)[reply]- I don't know, I suppose they need to reevaluate, if they're even aware that they need to. Can you be more specific as to which distributed computing project you think is more powerful? The only one I can think of is Bitcoin, which does purely integer operations and so by definition its FLoating-point Operations Per Second measurement is precisely zero, not 274.72 petaFLOPS as bitcoinwatch.com claims. https://bitcointalk.org/index.php?topic=50720.0 was top on the list in Google for me. • Jesse V.(talk) 05:38, 10 October 2012 (UTC)[reply]
- My mistake. Reading again, I was reminded that BOINC is not itself a distributed computing project, but rather just a platform for such projects. The Bitcoin note is interesting. At http://bitcoincharts.com/bitcoin/ I see "Network Hashrate PetaFLOPS 279.23", but I see https://bitcointalk.org/index.php?topic=38064.msg550161#msg550161 that works through some apparent assumptions, which could be questionable, that made by bitcoincharts.com. So given the unclear/unestablished rank of the Bitcoin network's computational power, I think it makes for the claim in the article to remain. Emw (talk) 01:36, 11 October 2012 (UTC)[reply]
- I don't know, I suppose they need to reevaluate, if they're even aware that they need to. Can you be more specific as to which distributed computing project you think is more powerful? The only one I can think of is Bitcoin, which does purely integer operations and so by definition its FLoating-point Operations Per Second measurement is precisely zero, not 274.72 petaFLOPS as bitcoinwatch.com claims. https://bitcointalk.org/index.php?topic=50720.0 was top on the list in Google for me. • Jesse V.(talk) 05:38, 10 October 2012 (UTC)[reply]
As can be seen in the recently-added graph and the FLOPS table in this section, Folding@home's participation and overall computing performance have both decreased significantly in the last two years or so. This seems notable, and at least mentioning it in the article would offer a nice point of balance to what is otherwise a very positive article on the subject. If there has been any decent coverage of its possible causes, that would be good to include too.Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
- I see that too, but I haven't been able to find any reliable causes for it. The closest I've got is a a foldingforum.org thread on the subject. Seasonal fluctuation was brought up a few times in that thread and in this thread, and I think that's a big part of it. If you look at the stats graphs, particularly the performance one, you'll see that an upward trend is starting. It's tied to the season and thus cyclic, and it may be that the project's participation and performance have stabilized overall, so that's why it looks like there's a big drop compared to the rest of the graph. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
- Alright, the granularity of that newly-added performance graph reveals that the drop in performance is not as drastic as I was inferring from the participation graph -- eyeballing things, the annual maximum and minimum for 2012 performance are both roughly 10% lower than 2011, while number of active processors (participation) has decrease about 25% since the beginning of 2011. The graphs tell the story well enough. Emw (talk) 17:47, 20 October 2012 (UTC)[reply]
- I see that too, but I haven't been able to find any reliable causes for it. The closest I've got is a a foldingforum.org thread on the subject. Seasonal fluctuation was brought up a few times in that thread and in this thread, and I think that's a big part of it. If you look at the stats graphs, particularly the performance one, you'll see that an upward trend is starting. It's tied to the season and thus cyclic, and it may be that the project's participation and performance have stabilized overall, so that's why it looks like there's a big drop compared to the rest of the graph. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
"Active participation in Folding@home has grown steadily since its launch."
- The graph in the beginning of the same section contradicts this statement. Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
- Fixed. • Jesse V.(talk) 05:38, 10 October 2012 (UTC)[reply]
- The new version of the sentence -- "Active participation in Folding@home increased steadily between its launch and 2010" -- seems slightly awkward by restricting itself to 2010 and not mentioning what's happened in the last two years or so. And the Folding@home participation over time graph shows a decrease between just before 2006 and just after 2007, and sputtering growth from around 4/2009 and the global maximum around 11/2010 -- not a steady increase. So the story is more complex than what's presented, and what's presented is still a little too rosy. Maybe this sentence can be be fixed in the same swoop as the fix for the previous point. Emw (talk) 01:36, 11 October 2012 (UTC)[reply]
- I removed the sentence as 1) it didn't belong as the lead sentence of that paragraph and 2) it's an incorrect statement as you pointed out. The caption to the performance graph seems to do a better job anyway. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
- The new version of the sentence -- "Active participation in Folding@home increased steadily between its launch and 2010" -- seems slightly awkward by restricting itself to 2010 and not mentioning what's happened in the last two years or so. And the Folding@home participation over time graph shows a decrease between just before 2006 and just after 2007, and sputtering growth from around 4/2009 and the global maximum around 11/2010 -- not a steady increase. So the story is more complex than what's presented, and what's presented is still a little too rosy. Maybe this sentence can be be fixed in the same swoop as the fix for the previous point. Emw (talk) 01:36, 11 October 2012 (UTC)[reply]
- Fixed. • Jesse V.(talk) 05:38, 10 October 2012 (UTC)[reply]
Presenting the longitudinal performance data in tabular form as is done now is cumbersome and difficult to analyze. I think it would be immensely better to put into a graph. The graph here would probably work well.Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
- Software:
"Specialized molecular dynamics programs, referred to as "FahCores" and often abbreviated "cores", perform the calculations on the work unit behind the scenes."
"Behind the scenes" sounds too colloquial to me, especially given the degree of jargon the rest of the article (fairly reasonably) uses. I'd stick with "as a background process", which will be just about as self-explanatory for the vast majority of users, but also more technically correct. Same with the phrase's usage in the 'Clients' subsection.Emw (talk) 02:51, 10 October 2012 (UTC)[reply]
"Folding@home software developers have put significant work into minimizing security issues." .... "Thus from a security standpoint it behaves in a similar fashion to a web browser, but is even more secure."
The sources cited for these statements are either from people directly associated with Folding@home or from a community-edited wiki. I understand that such entities are often the only sources to get some kinds of information, but I don't think security information should be one of those kinds. Please either find conventional, independent, reliable third-party sources for these sentences or substantially temper/remove them.Emw (talk) 02:51, 10 October 2012 (UTC)[reply]- I see; those are opinions/claims that I agree should come from sources stronger than F@h staff. I rewrote the paragraph and presented facts instead. Until I can find a reliable source for statements like those, I'll let the readers draw their own conclusions however they will. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
"BOINC's fixed architecture limits the types of project it can accommodate and thus was not appropriate for Folding@home."
I think this would be worth expanding on a bit. For example, what is the meaning of "fixed architecture" here? Is it still in place at BOINC? If not, then I think putting "limits" should be put in the past tense. How is/was it limiting, inappropriate and unworkable?Emw (talk) 02:51, 10 October 2012 (UTC)[reply]- I would expand if I could. The publication only says "The Berkeley Open Infrastructure for Network Computing (BOINC) [16] now used by SETI@home and many others, launched in 2002. BOINC provides a standard client, server, and statistics system, but with this fixed architecture comes limitations on the types of projects it can accommodate." I was unable to find any additional information, so since I can't elaborate on the claim I felt it was appropriate to remove the line, but I did add a detail that F@h utilizes Cosm. • Jesse V.(talk) 22:43, 12 October 2012 (UTC)[reply]
"Folding@home and distributed.net use the Cosm libraries"
This newly-added sentence about Cosm needs a bit of context. Briefly, what does Folding@home use Cosm for? I'd also suggest removing the mention of distributed.net; it's not mentioned anywhere else in the article, and seems superfluous.Emw (talk) 17:43, 14 October 2012 (UTC)[reply]- I only found another reference, which had only a little additional information. Nevertheless, I've used it to improve the sentence as well as I could. The word "cosm" appears briefly in the F@h client logs, but I don't know why. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
"While this approach is not only scientifically valuable, the resulting publications would not have been possible without this computing power."
Please improve this sentence's syntax. Also, "resulting publications" is vague -- which ones are being talked about?Emw (talk) 03:25, 11 October 2012 (UTC)[reply]- I copyedited the whole paragraph and I gave specifics about the paper that the blog post was talking about. • Jesse V.(talk) 18:33, 12 October 2012 (UTC)[reply]
- The 'scientific value' sentence looks good. However, the summary of the Journal of Molecular Biology paper was impenetrable for non-experts. Please review this diff and let me know if my attempt at simplification more or less preserves the original version's point. Emw (talk) 17:43, 14 October 2012 (UTC)[reply]
- Thanks for that. Your version looks good as far as I can tell. Technically they were simulating HP-35 NleNle, "a variant of the villin headpiece subdomain". • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
- The 'scientific value' sentence looks good. However, the summary of the Journal of Molecular Biology paper was impenetrable for non-experts. Please review this diff and let me know if my attempt at simplification more or less preserves the original version's point. Emw (talk) 17:43, 14 October 2012 (UTC)[reply]
- I copyedited the whole paragraph and I gave specifics about the paper that the blog post was talking about. • Jesse V.(talk) 18:33, 12 October 2012 (UTC)[reply]
"This was the first time a distributed computing project had utilized MPI, as it had previously been reserved only for supercomputers, and SMP1 represented a landmark in the simulation of protein folding."
This sentence could be improved in a few ways. The first problem is that it's a run-on. The SMP1 clause seems gratuitous. And I don't think MPI was "reserved" for supercomputers, since A) it has historically been commonly used in much smaller, non-supercomputer clusters, and B) noone was reserving MPI for supercomputers, it had simply not been implemented in a distributed computing project.Emw (talk) 03:25, 11 October 2012 (UTC)[reply]- I removed the supercomputer clause because you're right. The SMP1 clause is only cited to Dr. Pande, and such a bold statement would need a third-party reference. • Jesse V.(talk) 18:33, 12 October 2012 (UTC)[reply]
"The user typically interacts with V7's open-source GUI, known as FAHControl."
Please cite the open-source assertion with a link to the source code for the FAHControl GUI.Emw (talk) 03:25, 11 October 2012 (UTC)[reply]- There are several solid references supporting this: [3], [4], and [5]. I added in the last one, but I can swap it out for one of the others if there's a necessity to link to the source code. • Jesse V.(talk) 18:33, 12 October 2012 (UTC)[reply]
- I think a direct link to the source code is the most appropriate citation for assertions that software is open source, so I'd include a link to https://fah-web.stanford.edu/svn/pub/trunk/control/. Conveniently, that resource also shows the actual GPL license in LICENSE.txt. Emw (talk) 17:43, 14 October 2012 (UTC)[reply]
- I've incorporated the link to the source code. Emw (talk) 01:02, 16 October 2012 (UTC)[reply]
- I noticed. Thanks! • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
- I've incorporated the link to the source code. Emw (talk) 01:02, 16 October 2012 (UTC)[reply]
- I think a direct link to the source code is the most appropriate citation for assertions that software is open source, so I'd include a link to https://fah-web.stanford.edu/svn/pub/trunk/control/. Conveniently, that resource also shows the actual GPL license in LICENSE.txt. Emw (talk) 17:43, 14 October 2012 (UTC)[reply]
- There are several solid references supporting this: [3], [4], and [5]. I added in the last one, but I can swap it out for one of the others if there's a necessity to link to the source code. • Jesse V.(talk) 18:33, 12 October 2012 (UTC)[reply]
Support from Emw: This is a well-written and impressively comprehensive article. It does a good job of combining coverage of different domains the subject encompasses -- from the molecular biology of diseases and protein folding to the software and performance aspects of distributed computing -- into a coherent narrative. Jesse has done a commendable job of building the article from a basic state to its current developed form over more than a year of work. It looks like some comments from other reviewers are still being worked through, but for me this article has clearly passed the threshold of quality needed to be considered one of Wikipedia's best articles. I support promoting Folding@home to featured article status. Emw (talk) 17:57, 20 October 2012 (UTC)[reply]
Comments from EdJohnston: I have some relevant background, having studied biophysics years ago and written programs to simulate polypeptide conformations.
- General comments
- This is an important topic and deserves an article of FA quality. At present the prose gets boosterish in places, and it may repeat some of its favorite themes too often: (a) how hard it is to simulate large proteins in realistic detail, (b) how fast this particular simulator is. In my opinion the importance of the work will shine through better if we use more neutral prose. Also, it will save the reader from getting fatigued by repetition if we can find a way to make each important point just once. I would like to find reliable sources for all the faster-than or invented-first claims or drop them from the article. Did they pioneer the use of the PlayStation 3? In a quick look, I couldn't find an external source for that. I know that Sony worked with the project, and that ought to be mentioned.
- The article could stand to be 20% shorter. I agree with Emw's point about too much background information. I would prune the disease sections to focus in on one or two areas where the project has generated peer-reviewed results that indicate practical importance. One of these is the 'superkine,' the variant of Interleukin-2 that has potential in cancer treatment and has been licensed for investigation by a drug company. It will take a while to collect all my notes, but I'd like to begin with a suggested revision of the lead. More explanation of these proposed changes will follow. EdJohnston (talk) 06:30, 29 September 2012 (UTC)[reply]
- Thank you for the help with the prose. I would have liked to have other editors help me with this article, but that's how it is sometimes. If you check the PlayStation 3 section, it mentions the collaboration between Sony and the Pande lab, and citation 176 (the Post-Gazette.com one) confirms that F@h pioneered the use of PS3s. Numerous scientists are using Folding@home to do disease research, and some of them work on one particular disease, so overall F@h has helped generate results in a number of areas. You raise a good point about focusing on practical importance; I haven't included sections on everything in the Diseases FAQ but I tried to focus on some of the bigger ones per WP:DUE. I'd like to be concise, but I want to make sure that there's enough explanatory information, though I realize that more can be found in the articles on each disease. • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
- Proposed new draft of the lead
Folding@home (FAH or F@h) is a distributed computing project for simulation of protein folding, computational drug design, and other molecular dynamics for disease research. Folding@home is powered by the idle processing resources of thousands of personal computers and PlayStation 3s from volunteers who have installed the software on these systems. The project primarily attempts to determine the mechanisms of protein folding (the process by which proteins reach their final three-dimensional structure) and the causes of protein misfolding. This is of significant academic interest and has major has implications for medical research into Alzheimer's disease, Huntington's disease and cancer., and many forms of cancer, among other diseases. To a lesser extent, Folding@home also tries to predict a protein's final structure and determine how other molecules may interact with it, which has applications in drug design The program may be useful in drug design, since it can simulate the steps of protein-ligand docking. Folding@home is developed and operated by the Pande laboratory at Stanford University, under the leadership of Vijay Pande. [1]
The project uses statistical simulation methodology that represents a paradigm shift from traditional computational approaches. ref name="10.1016/j.ymeth.2010.06.002"/ distributed parallel calculations which are combined statistically to determine the folding behavior of the protein. As part of the project's client-server distributed computing architecture, the volunteered machines receive simulation work units, complete them, and return them to database servers where they are compiled into an overall simulation. Volunteers can track their contributions on the Folding@home website, which can make participation competitive and encourages long-term involvement. The project has pioneered the uses of GPUs, PlayStation 3s, and Message Passing Interface (used for computing on multi-core processors) for distributed computing and scientific research. The project has employed PlayStation 3s and GPUs and pressed into service the multi-core processors that are now found on many client machines for the benefit of scientific research.
Folding@home remains one of the world's fastest computing systems, The combined power of the machines in the Folding@home network compares favorably with a supercomputer and currently operates at a computational performance nearly equal to almost equals the total power of all distributed computing projects under BOINC combined. The project is also the world's most powerful molecular dynamics simulator. This performance from its large-scale computing network has allowed researchers to run computationally expensive atomic-level simulations thousands of times longer than previously achieved. Since its launch on October 1, 2000, the Pande lab has produced 100 scientific research papers as a direct result of the project.[2] These simulations have demonstrated accuracy compared to experimental observations.[7][8] The Folding@home simulator has produced folding times and equilibrium constants that can be compared with experiment [3]
- It's suggested to not use templates on FAC pages, so I'd suggest putting the suggestion either in hidden text or fully visible. — Crisco 1492 (talk) 08:22, 29 September 2012 (UTC)[reply]
- I removed the collapse box. When you say 'no templates' do you also mean no citation templates? And what about the green highlighting? If you think more reformatting of my post is needed, please go ahead. Thx, EdJohnston (talk) 13:35, 29 September 2012 (UTC)[reply]
- Above there is "Use of graphics or templates including graphics (such as {{done}} and {{not done}}) is discouraged, as they slow down the page load time.". I don't think the green text is a problem as it's not achieved graphically. The refs probably aren't needed. — Crisco 1492 (talk) 13:39, 29 September 2012 (UTC)[reply]
- I removed the collapse box. When you say 'no templates' do you also mean no citation templates? And what about the green highlighting? If you think more reformatting of my post is needed, please go ahead. Thx, EdJohnston (talk) 13:35, 29 September 2012 (UTC)[reply]
- Your proposed lead is interesting, but I see several disadvantages. The current lead more accurately summarizes the body, and I think has a better tone. "The program may be useful in drug design, since it can simulate the steps of protein-ligand docking" sounds a bit like WP:OR, whereas the current lead uses more factual phrasing to describe F@h's work in drug design. The paradigm shift statement is confirmed by the contents of the in-line citation (since distributed computing is a unique/unusual approach to the protein folding problem) and the award described in the article. It does use a client-server architecture, that should be clear from the body, but if it's necessary to add a citation then it is confirmed in "Lessons From Eight Years of Volunteer Distributed Computing". I think the "Pressing into service" phrase could be changed for something else, because the phrase reminds me of slavery. However, the last line of your proposed lead is interesting, though the conformational states from the MSM can also favorably compare to experiment. I will have to check some publications to be sure of the details. • Jesse V.(talk) 03:25, 30 September 2012 (UTC)[reply]
- As I replied to Emw below, I have added details to that last statement in the lead. • Jesse V.(talk) 04:51, 3 October 2012 (UTC)[reply]
- Also, the lead was reworked a bit by Montanabw, who knows very little about this subject. This discussion is related. • Jesse V.(talk) 18:47, 20 October 2012 (UTC)[reply]
- Suggest omitting the 'paradigm shift' statement from the lead
- "The project uses statistical simulation methodology that represents a paradigm shift from traditional computational approaches.[4]"
- This sentence is given in the lead as a summary of the fuller version in the body of the article:
- "Folding@home researcher Greg Bowman was awarded the 2010 Kuhn Paradigm Shift Award from the American Chemical Society (ACS) for his talk on two paradigm shifts resulting from Folding@home: 1) the new methods that Folding@home uses to simulate protein folding, misfolding, etc, and 2) the results themselves, which suggest a significant change in protein folding theory."
- This award carries a $1,000 award from the ACS, and there were five applicants for the award. WP does not have a free-standing article on this award, and does not have a list of winners of this award anywhere. There is no list kept on ACS's web site. There does not seem to be any award citation published by ACS. Every year somebody is going to win the 'paradigm shift' award so I'm not sure how much stock we should put in this. (It's like the 'best revolution of 2010'). The fullest statement of this seems to be a reprinted press release from ACS. The wording included in our article (as quoted above) says 'a significant change in protein folding theory' but I can't even find those words in the probable ACS press release, and not in any reliable source. Material like this, and the phrase "paradigm shift", may strike the reader as promotional and buzzwordy. Bowman's work is clearly important but maybe we can find other ways of showing that. EdJohnston (talk) 16:05, 29 September 2012 (UTC)[reply]
- You've brought up some interesting points, I hadn't really thought about it that way before. The exact quote from the body of the article is In 2010, Folding@home researcher Greg Bowman was awarded the Thomas Kuhn Paradigm Shift Award from the American Chemical Society for the instrumental development of the open-source MSMBuilder software and for attaining quantitative agreement between theory and experiment.[25] which summarizes the source's quote you provided above. A number of months ago I emailed Bowman and the ACS asking for the original source of the phrasing, but IIRC I was told that it doesn't exist on the ACS website anymore and this might be the best source. SimTK is a completely different organization than the Pande lab or Stanford University. One might even consider it third-party. I'll look more into his work. • Jesse V.(talk) 03:25, 30 September 2012 (UTC)[reply]
- Suggest omitting 'demonstrated accuracy compared to experimental observations' from the lead:
- The sentence is a bit vague, and the extent of 'demonstrated accuracy' remains to be determined. I suggested in my above draft that this be replaced by "The Folding@home simulator has produced folding times and equilibrium constants that can be compared to experiment". As a reference for this I propose the Nature paper, Snow, Nguyen, Pande and Gruebele (2002), "Absolute comparison of simulated and experimental protein-folding dynamics". This the most heavily cited paper from the Pande group, is well argued, and tries to link theory and experiment in close detail. Careful reading might show that the 2002 paper can support a stronger statement than just 'can be compared to experiment', but I haven't done that yet. EdJohnston (talk) 16:28, 29 September 2012 (UTC)[reply]
- Good point. I too will have to review the literature and make that statement more specific as you suggest. • Jesse V.(talk) 03:25, 30 September 2012 (UTC)[reply]
- I have added some details to that statement. • Jesse V.(talk) 04:51, 3 October 2012 (UTC)[reply]
- The sentence is a bit vague, and the extent of 'demonstrated accuracy' remains to be determined. I suggested in my above draft that this be replaced by "The Folding@home simulator has produced folding times and equilibrium constants that can be compared to experiment". As a reference for this I propose the Nature paper, Snow, Nguyen, Pande and Gruebele (2002), "Absolute comparison of simulated and experimental protein-folding dynamics". This the most heavily cited paper from the Pande group, is well argued, and tries to link theory and experiment in close detail. Careful reading might show that the 2002 paper can support a stronger statement than just 'can be compared to experiment', but I haven't done that yet. EdJohnston (talk) 16:28, 29 September 2012 (UTC)[reply]
- Suggest reducing or combining some of the repetitious speed-power claims: (including here the claims that simulating protein folding is difficult)
- Here are some of the repeated claims (maybe not all of these are redundant).
- "Folding@home remains one of the world's fastest computing systems"
- "currently operates at a computational performance nearly equal to all distributed computing projects under BOINC combined"
- "The project is also the world's most powerful molecular dynamics simulator."
- "..allowed researchers to run computationally expensive atomic-level simulations thousands of times longer than previously achieved."
- "all-atom molecular dynamics simulations have been severely limited in the timescales which they can study."
- "Between 2000 and 2010, the timescales over which Folding@home simulates protein folding have increased by six orders of magnitude" -- actually this claim is very interesting and may deserve to be expanded and fully sourced. I wouldn't propose dropping this claim. At present it is sourced only to the project's blog.
- "Using the Markov state model approach, Folding@home achieves strong scaling across its user base and gains a near-linear speedup for every additional processor"
- "This large and powerful network allows Folding@home to do work not possible any other way."
- "These sites are attractive drug targets, but locating them is very computationally expensive."
- The first two are combined in the lead, but I remember trying to append the third statement but it sounded like a run-on sentence that way. I'll look around for a a better source for the claim about a sixfold increase in simulation timescales, but if you look at the blog post itself, there is supporting numerical data from which the claim is fairly easy to see. I have added some more information to that statement. These are all related statements, but they all say different things. They describe F@h's prominence and why it is important in what it is doing. I'm not sure how I could combine them without damaging the prose, and they seem like important information to me. • Jesse V.(talk) 03:25, 30 September 2012 (UTC)[reply]
- Here are some of the repeated claims (maybe not all of these are redundant).
- Repeated claims that the distributed approach is better than standalone supercomputers:
- (referring to supercomputers:) "..strong scaling of traditional molecular simulations to these architectures is exceptionally difficult
- "a limited number of long simulations are not sufficient for comprehensive views of protein folding" (a claim cited only to Pande-group papers)
- "This complexity and timescale makes standard computer simulations exceptionally computationally demanding,.."
- In the Notes section there are some qualifiers about LINPACK. They admit there that LINPACK 'more efficiently maps to supercomputer hardware.' It would be good to see some of these comparisons worked through in a balanced way, since F@H does have many advantages even after you allow for the LINPACK problem.
- I'll look around for better sources. Note that the "Why China's New Supercomputer Is Only Technically the World's Fastest" citation used in the LINPACK section is entirely third-party. I'll work on improving these statements. • Jesse V.(talk) 03:25, 30 September 2012 (UTC)[reply]
- Comparison to other molecular systems:
- This section can hopefully be expanded to include third-party assessments of F@H versus Anton and other systems. The majority of the references provided are to the papers of the Pande group or to the Folding@home project's own blog. Including outside references would be good. Anton is a very impressive project and it would be fair to mention some of Anton's best results in this paragraph.
- There are several references to papers from Anton researchers. I agree Anton is very impressive, and the paragraph mentions some of Anton's simulation accomplishments and how long trajectories can be useful. • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
- This section can hopefully be expanded to include third-party assessments of F@H versus Anton and other systems. The majority of the references provided are to the papers of the Pande group or to the Folding@home project's own blog. Including outside references would be good. Anton is a very impressive project and it would be fair to mention some of Anton's best results in this paragraph.
- Drop the statements about internal project events and predictions of possible future activities:
- "The goal of the first five years of the project was to make significant advances in understanding folding, while the current goal is to understand misfolding and related disease, especially Alzheimer's disease." Regardless of goals, it would be better to report actual peer-reviewed contributions. If you check the citation counts, the citing authors greatly appreciate the 'nuts-and-bolts' work of F@H in better simulation methods but F@H's disease-related work is not heavily cited as yet.
- "Following these studies, the Pande lab expanded their efforts to other p53-related diseases.."
- "This strategy could be used to bring the results from Folding@home directly to a therapeutic drug."
- "From simulations of this protein, they hope to accelerate research efforts to modify it to identify other diseases or to bind to drugs"
- "The Pande lab is focusing their research on Alzheimer's with the goal of predicting the aggregate structure and how it develops for drug design approaches as well as developing methods to stop the aggregation process"
- "Later that year, Folding@home began simulations of various Aβ fragments in order to determine how various natural enzymes affect the structure and folding of Aβ". This statement is cited only to internal project documents.
- "Although researchers have used Folding@home to study collagen folding and misfolding, the interest stands as a pilot project compared to Alzheimer's and Huntington's research." This statement is cited only to the project's own web site.
- "As of 2012, Folding@home continues to simulate the folding and interactions of hemagglutinin, complementing experimental studies at the University of Virginia." This is cited only to the project's own web site.
- "They hope to be better able to design drugs to deactivate them." Cited only to the project's own web site.
- "In 2007 the Pande lab received a grant to study and design new antibiotics"
- "Ribosomal research has helped the Pande lab prepare for larger and more complex biomedical problems."
- "In June 2011 Folding@home began additional sampling of an Anton simulation in an effort to better determine how its techniques compare to Anton's methods". Cited only to the project's own web site.
- – EdJohnston (talk) 18:00, 29 September 2012 (UTC)[reply]
- Good points. I removed many of them per WP:CRYSTAL. However, some of the statements you listed there are not opinions but rather non-controversial statements of fact, so I'm not convinced they should be removed. • Jesse V.(talk) 20:43, 29 September 2012 (UTC)[reply]
Comments from Hekerui
- Hi and thanks for working on this interesting article. Some comments/suggestions:
- Project significance
- instead of using "stochastic" and linking to its meaning, why not run with the explanation itself - perhaps that would make the sentence clearer (it's not obvious to me how a stochastic folding makes using long simulations a challenge)
- as it is now, understanding "discrete-time master equation" requires the user to leave the article twice, why not explain shortly - it would make the article more self-contained (I printed this out to read it and was left hanging)
- it's not clear how the sentence about "near-linear parallelization" explains the previous sentence in other words, because it requires an explanation on its own (why near-linear?)
- "pathways from the protein's phase space" remains puzzling because it is not explained, starting with what a phase space is
- "... can represent these states at an arbitrary resolution." - arbitrary in what way? determined by the user's wish or determined by chance? the use of "can" implies the first choice, but I can't tell
- "Between 2000 and 2010, the length ..." - this sentence belongs into the next paragraph contentwise, no?
- "... for the instrumental development ..." - what "instrumental" is meant? as in "regarding the instruments" or as in "important" (then it can be removed)
- Fixed.
- I puzzled over that sentence for a while and tweaked it a little bit, but I'm not sure how to properly adjust it. "Discrete-time master equation" is the precise definition and it seems like it is, at least partially, explained in the lines preceding and following that line.
- I double-checked the journal references and they use the term "linear", so I changed it to that instead. The phrase "non-linear" cames from folding.stanford.edu, and I'm not sure why they say that. Fixed.
- Reworded.
- Reworded.
- I moved the sentence.
- I removed the term. • Jesse V.(talk) 05:48, 20 October 2012 (UTC)[reply]
- Biomedical research
- sentences two and three don't work together, two says therapies "can be" developed while three says they "could use" certain results of this project - one is left wondering whether therapies are really being developed or not
- the sentences about "Computer-assisted drug design ..." and "The combination of computational molecular modeling ...." have a very similar content and sound repetitive, one could easily merge them
- "Folding@home is dedicated to producing many results ..." - this sentence repeats information already given in more specifics before and sounds like promotional material
- "... relationships to disease that are exceptionally difficult to observe ..." - "exceptionally" is POV unless attributed/explained
- "For example, in 2011 Folding@home continued simulations of folding ..." - continued from what/when?
- the text does not show why Protein L, which is not explained, was used or why it was chosen - if the sentence is about how good the predictions match experiment then being this specific only confuses, but if Protein L was an interesting case for some reason, this remains unclear
- There is no guarantee that therapies will be developed. However, I did improve the wording a bit here, hopefully it is better.
- So merged.
- Yeah, I thought about it more and you're right that the statement isn't really all that necessary, so I removed it.
- Term removed.
- Fixed.
- I tried a few rephrases of this sentence, but none seemed to work right. I'm not sure what to do here. • Jesse V.(talk) 05:48, 20 October 2012 (UTC)[reply]
- Alzheimer's disease
- "Moreover, atomistic simulations of Aβ aggregation ..." - "atomistic" suggests a connection to atomism, perhaps "atomic" would be better as this is used later on anyway (and is the name of the field of atomic physics)? -> on the other hand, would simulations like this not necessarily be molecular?
- "Preventing Aβ aggregation using small molecules is regarded as ..." - by whom? this is a weasel word, we must be more specific
- "Soon after that study into Aβ's folding, ..." - why not give the specific date/year?
- "... from the test tube ..." - that part is gratuitous, the sentence works just the same without it
- "oligomers" need to be explained, it's an uncommon term
- Best regards Hekerui (talk) 23:54, 14 October 2012 (UTC)[reply]
- I used "atomic" instead. Simulations are on the molecular level here.
- Oh, I didn't realize it was a weasel word! I know that it needs to be fixed, but how do I that? I could name the authors or the publication, and that would be specific, but add an unusual amount of highlighting and the article doesn't do that anywhere else. Any ideas?
- Good idea. I replaced that with a more specific date.
- Removed.
- Until recently, that paragraph contained details that helped explained oligomers, but the information was highly technical and was removed/summarized. I made some minor changes to the statement and wikilinked the term. This should make it a bit clearer. • Jesse V.(talk) 05:48, 20 October 2012 (UTC)[reply]
- Regarding the weasel word, I came up with two alternatives and I'm debating between them:
- In a literature review article, Drs. Naeem and Fazili regard the use of small molecules to prevent Aβ aggregation as a promising approach to the development of therapeutic drugs for treating Alzheimer's patients.
- Preventing Aβ aggregation using small molecules may be a promising approach to the development of therapeutic drugs for treating Alzheimer's patients.
- I used the first one, as it was more lengthy but seemed the better choice. • Jesse V.(talk) 22:33, 21 October 2012 (UTC)[reply]
- After a discussion with my English professor, I changed it to Preventing Aβ aggregation is a promising approach to the development of therapeutic drugs for Alzheimer's disease, according to Drs. Naeem and Fazili in a literature review article. • Jesse V.(talk) 01:38, 24 October 2012 (UTC)[reply]
- Cancer
- "Inhibiting these specific chaperones are seen as potential modes ..." - weird prose, I think it means "Inhibitions to these specific chaperones are seen as potential modes ...", otherwise the sentence doesn't make sense
- the link to "antineoplastic" does not explain the word properly, so I suggest explaining or paraphrasing it right in this article
- Engrailed homeodomain is not explained at all and one would have to read up on it outside of the article, which is a hassle - I wonder whether one can add a short explanation that does not disrupt the text flow? also, Engrailed should not be capitalized, or?
- Yeah that prose is weird. I replaced it as you suggested.
- Chemotherapy is a more common term so I used that instead.
- I clarified it a bit. The main source capitalizes it, but since the engrailed (gene) article doesn't, I made it lowercase. • Jesse V.(talk) 16:39, 20 October 2012 (UTC)[reply]
- I'd propose more extensive revision of the Cancer section. It has three paragraphs. In my opinion the third paragraph, the one which mentions Interleukin 2, offers the major contribution by Folding@home. The p53 work published by Chong, Swope Pitera and Pande only gets 14 citations in Google Scholar. The last paragraph about Interleukin 2 describes work that led to licensing by a drug company and has received much more recognition by other scientists. EdJohnston (talk) 16:54, 20 October 2012 (UTC)[reply]
- The cancer section contains a lot of noteworthy information, and IMO the article should describe the research areas that F@h has helped out in. When I do a Google Scholar search for the p53 paper, I see "cited by 31" numerous times. The first paragraph has the line "this was the first peer reviewed publication on cancer from a distributed computing project." and for that to make sense the paper is described and its background. Then the second paragraph is mainly about protein chaperones, also noteworthy when talking about cancer research. • Jesse V.(talk) 17:23, 20 October 2012 (UTC)[reply]
- Viruses
- "... makes standard computer simulations exceptionally computationally demanding, so they are typically limited to ..." - what about this project? this sentence is too general, we ought to be specific
- "Using Folding@home for detailed simulations of vesicle fusion ..." vesicle is an uncommon word, I suggest a short explanation
- "... for measuring fusion intermediate topology." - does this mean the structure of intermediate stages? even knowing what topology is does not make this clear and I suggest a rewrite, otherwise readers must guess
- I trimmed this sentence a bit, it should be more specific.
- Rewrote.
- Rewrote. • Jesse V.(talk) 22:33, 21 October 2012 (UTC)[reply]
- Drug design
- "... and causing a certain desired change." - vague, what change is it? a change to their function?
- "... within 1.8 Å RMSD ..." - even if readers know what Ångström is, RMSD is an unexplained abbreviation - at the very least it needs to be introduced ("... within X root-mean-square deviation (RMSD) ..." or something similar)
- "This may be important to ..." - "this" is not a good way to start a sentence following a sentence like the one that precedes it because it's not clear what is meant - if the closeness between prediction and experiment is meant, that should be plainly stated - also, who determined that this "may be important"? it's POV unless attributed
- I added more specifics.
- I expanded the abbreviation similar to your suggestion.
- Fixed. • Jesse V.(talk) 22:33, 21 October 2012 (UTC)[reply]
- Software
- SSE and API are an uncommon abbreviations that should be introduced imo
- the explanation of work units starts with a reference to the client - maybe the structure would profit from putting the explanation for the client first (I don't think the flow would be inhibited by this, maybe others disagree?)
- "Although limited in generality, this makes GPUs one ..." - what is meant with generality?
- I added their names before their abbreviation. Now they read as "... with Streaming SIMD Extensions (SSE)." and "... two Application programming interface (API) levels ..."
- The three components are related to each other. The references are needed to show the relationships. The "work unit" section refers briefly to the client, the "cores" refers to the work units and the client, and the "client" section refers to work units and the cores. Of the three, the client is the most intuitive component, since its just a regular program and all other distributed computing projects have clients. Thus a small reference to a client isn't a big deal. The current ordering of the section does seem like the more preferable option. When I wrote the section I really thought about the ordering.
- I meant that GPUs only accelerate certain types of calculations and are different than general-purpose CPUs. This clause is actually redundant due to the preceding sentence, so I have removed it. • Jesse V.(talk) 01:38, 24 October 2012 (UTC)[reply]
- Comparison to other molecular systems
- a program calculating molecular systems is meant, not the molecular system itself. wouldn't something like "computing system" or "molecular simulators" be better?
- Retitled to "molecular simulators". • Jesse V.(talk) 02:10, 23 October 2012 (UTC)[reply]
Comments from Tonystewart14
- Broken links
- I noticed the following four links are broken:
- "Nanomedicine Center for Protein Folding" in the third paragraph under "Alzheimer's disease".
- "Protein Folding Center" in the second paragraph of "Cancer".
- "5292-5309" (referring to PMID) in reference 25.
- "the original" in reference 101.
- If these could be fixed, either by changing the link destination or unlinking the text, it would be appreciated.
- Emw put those redlinks in, and I'm not sure on the policy on redlinks in an FA-class article, which this article may soon be. Reference 101 should be kept because the archive link works while the original site is now gone. This is useful because the information contained in the reference is still preserved, so the citation still holds. Brilliant catch on Reference 25 though, I fixed that one. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
- Redlinks are peachy in an FA. This is not the Indonesian Wikipedia (for example) where FA writers are expected to fill redlinks before nominating an FA. — Crisco 1492 (talk) 22:21, 18 October 2012 (UTC)[reply]
- A red link is not a broken link, it just means the page has not been created yet. I think the occasional well-placed and relevant red link is a great invitation to readers to contribute to Wikipedia by creating a new article.
- On a more concrete note, the "Nanomedicine Center for Protein Folding" and the "Protein Folding Center" seem very likely to refer to the same thing -- the Center for Protein Folding Machinery. I've corrected the name and removed the second (almost certainly redundant) red link. Emw (talk) 01:18, 19 October 2012 (UTC)[reply]
- Emw put those redlinks in, and I'm not sure on the policy on redlinks in an FA-class article, which this article may soon be. Reference 101 should be kept because the archive link works while the original site is now gone. This is useful because the information contained in the reference is still preserved, so the citation still holds. Brilliant catch on Reference 25 though, I fixed that one. • Jesse V.(talk) 20:37, 18 October 2012 (UTC)[reply]
Delegate notes
- Just from a quick scan for the moment, citation #30 doesn't seem to point anywhere. Cheers, Ian Rose (talk) 00:22, 21 October 2012 (UTC)[reply]
- Well hello! That template should have been removed at the conclusion of the discussions in the Talk page. Fixed. • Jesse V.(talk) 00:52, 21 October 2012 (UTC)[reply]
- Don't see the necessity for including items under See also that have been linked in the body of the article, i.e. Computational biology, Molecular dynamics, Rosetta@home. Cheers, Ian Rose (talk) 04:25, 21 October 2012 (UTC)[reply]
- Fixed. • Jesse V.(talk) 05:10, 21 October 2012 (UTC)[reply]
- Couple of housekeeping issues before we wrap this up. Firstly, you seem to be employing at least three styles of dash for the same purpose:
- Emdash with spaces: However, due to a protein's chemical properties or other factors, proteins may misfold — that is, fold down the wrong pathway and end up misshapen.
- Emdash without spaces: Due to the complexity of proteins' conformation space—the set of possible shapes a protein can take—and limitations in computational power, all-atom molecular dynamics simulations have been severely limited in the timescales which they can study.
- Endash with spaces: Instead, proteins spend the majority of their folding time – nearly 96% in some cases[20] – "waiting" in various intermediate conformational states...
- All these dashes should be the same format, generally the second example is preferred (emdash without surrounding spaces). Pls check throughout the article.
- The article contains quite a few duplicate links that should be dealt with. If you don't have Ucucha's dup link tool installed to highlight them all, you can find it here. Cheers, Ian Rose (talk) 04:41, 25 October 2012 (UTC)[reply]
- Fixed. Emdashes without spaces are now used consistently throughout the body of the article. Regarding the wikilinks, I have that script installed and have used it and AWB to reduce the number of duplicate wikilinks. However, please see this comment from Czarkoff in the GA review. I've thought about this too, and I agree that multiple wikilinks actually assist the reader. Nevertheless, I've tried to keep the duplicate links to a minimum though. • Jesse V.(talk) 05:04, 25 October 2012 (UTC)[reply]
- I agree that duplicate links can occasionally be justified, particularly in longer articles, but suggest you walk through the article again. Just two examples: PlayStation 3s is linked twice in the lead, and Alzheimer's disease is linked twice in successive sections. This suggests that there'd be others that could be eliminated without unduly affecting the readers' comprehension... Cheers, Ian Rose (talk) 05:36, 25 October 2012 (UTC)[reply]
- All right. I ran the script and double-checked all the duplications. I removed the ones that I felt were redundant or unnecessary. • Jesse V.(talk) 06:08, 25 October 2012 (UTC)[reply]
- The above discussion is preserved as an archive. Please do not modify it. No further edits should be made to this page.
- ^ Cite error: The named reference
About FAH
was invoked but never defined (see the help page). - ^ Cite error: The named reference
papers
was invoked but never defined (see the help page). - ^ Cite error: The named reference
10.1038/nature01160
was invoked but never defined (see the help page). - ^ Cite error: The named reference
10.1016/j.ymeth.2010.06.002
was invoked but never defined (see the help page).