Wikipedia:Bots/Requests for approval/WildBot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Josh Parris
Automatic or Manually assisted: Automatic
Programming language(s): Python, pywikipedia
Source code available: https://svn.toolserver.org/svnroot/josh/ (revision 6)
Function overview: Add checking of #section anchors for existence to existing bot
Links to relevant discussions (where appropriate): Guideline: Wikipedia:Linking#Checking links as they are created
Edit period(s): Continuous
Estimated number of pages affected: I'd guess less than 5% of new pages have #section links, and perhaps 20% of those would be wrong. At 1000/new pages a day, this would be about 10 edits. Hard figures show: 4% of new pages have #section links, and 32.5% of these are wrong; At 1000/new pages a day, this would be about 13 edits/day.
Exclusion compliant (Y/N): Y, standard in pywikipedia
Already has a bot flag (Y/N): Y
Function details: At the same time as checking new page's wiki markup for links to dab pages, the bot will also check for links containing a #section anchor to ensure the anchor appears on the target page. Normally this is a section heading, but there are techniques available (templates like {{Anchor}} and raw HTML tags) which create an anchor without a ==section==; to detect these cases, the HTML of the target page will be downloaded and searched for these anchors.
WildBot found one or more links in this article with broken #section; for more information on #section links see Wikipedia:Linking#Piped links to sections of articles. The broken #section links found were: Broadway#Golden years, New York#Histery |
Discussion
[edit]This is certainly a good idea. @harej 16:22, 17 January 2010 (UTC)[reply]
- Isnt there an inline template for this, similar to {{deadlink}} or something? If there is it would certianly be more helpfull. Would you mind telling me why you think this should only be limited to new pages? You could do a dump scan for the whole project. Tim1357 (talk) 16:57, 17 January 2010 (UTC)[reply]
- Bandwidth; I just don't have it. Running the bot as is consumes a solid 20% of a bandwidth I have available. Unless I get a Toolserver account, recent changes or a database scan is off the cards. Additionally, I've got plans to make the bot smarter and more helpful, so I don't want to bomb every broken page link in the 'pedia with a mere advisory. Josh Parris 22:20, 17 January 2010 (UTC)[reply]
- You could get a Toolserver account if you'd like; it would probably help with the running of your bot and it's not very difficult to get one if you can demonstrate need. @harej 00:34, 18 January 2010 (UTC)[reply]
- DaB said he'll look at my application from 29th Dec on Sunday. Today's Sunday in Germany I believe. Or has it just finished? Anyway, WildBot's approval may help things along there. Josh Parris 00:53, 18 January 2010 (UTC)[reply]
- You could get a Toolserver account if you'd like; it would probably help with the running of your bot and it's not very difficult to get one if you can demonstrate need. @harej 00:34, 18 January 2010 (UTC)[reply]
- I've had a look, there's nothing for inline work. It may be inappropriate to inline too, because the link still kind-of works, it just goes to the target page rather than a part thereof. Josh Parris 00:58, 18 January 2010 (UTC)[reply]
- Bandwidth; I just don't have it. Running the bot as is consumes a solid 20% of a bandwidth I have available. Unless I get a Toolserver account, recent changes or a database scan is off the cards. Additionally, I've got plans to make the bot smarter and more helpful, so I don't want to bomb every broken page link in the 'pedia with a mere advisory. Josh Parris 22:20, 17 January 2010 (UTC)[reply]
- On another note, I would appreciate it if the bot only works only in the main-namespace. Tim1357 (talk) 16:57, 17 January 2010 (UTC)[reply]
- I was thinking along these lines but couldn't think of a reason not to check the other namespaces the bot currently patrols. What difficulties do you foresee outside of mainspace? Josh Parris 22:20, 17 January 2010 (UTC)[reply]
- Damn, one other thing. It is generaly frowned upon for bots to download the html markup. If I may suggest a more server-friendly version: use http://en.wikipedia.org/w/index.php?title=<title>&action=raw&templates=expand. That solves the problem of the {{Anchor}} template. Tim1357 (talk) 17:06, 17 January 2010 (UTC)[reply]
- That's pretty much what I've done; I called the API version (which I'm not sure, having seen your suggestion, is the best idea). Josh Parris 22:20, 17 January 2010 (UTC)[reply]
- So in order of questions:
- That's pretty much what I've done; I called the API version (which I'm not sure, having seen your suggestion, is the best idea). Josh Parris 22:20, 17 January 2010 (UTC)[reply]
- Thats ok, if you only want to do new pages, thats fine. You could look into the toolserver idea if you want.
- My reasoning is that there realy is no need for notifications outside of the mainspace. Plus there is no "talk" pages for talk pages, if you know what I mean.
- Ok, if you really need to download the html, thats fine. I just thought the templates=expand bit would be helpful; I myself just found out about it. Tim1357 (talk) 00:55, 18 January 2010 (UTC)[reply]
- Yes, I prefer your method over my API call. WildBot task 1 doesn't do talk pages, so no probs there. Toolserver account is in process. Josh Parris 01:07, 18 January 2010 (UTC)[reply]
- Nice, this gets the thumbs up from me as long as this acts in the manor that the Disambiguation Wildbot does. Tim1357 (talk) 01:10, 18 January 2010 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. @harej 03:30, 23 January 2010 (UTC)[reply]
Coding... adding this functionality has demanded a substantial internal redesign for WildBot, as it's no longer making one edit to a talk page (at least, not internally). The hard figures above were produced by a very rough draft. Josh Parris 22:20, 24 January 2010 (UTC)[reply]
Doing... The trial has commenced, with some preliminary results are available in this a seeded group of #section checking with nine hits. The rest of the results are going to be spread out though the normal run of WildBot. There's code to limit it to 50 #section edits per run. Josh Parris 01:01, 26 January 2010 (UTC)[reply]
- Thus far 25 edits have been made, and I've discovered a number of things. Turns out that pywikipedia has code to detect valid section references - but it doesn't work correctly when there's markup; the common case being an article link in a section header. People put all kinds of crazy stuff into section headers. I won't bore you with the stories. I seem to have bitten off quite a large, chewy part of the world. The internal re-coding has been shaken-out, so I'll soon be tidying up the code and running that in production. Josh Parris 13:57, 27 January 2010 (UTC)[reply]
- 33 edits Josh Parris 04:33, 28 January 2010 (UTC)[reply]
Trial complete. I'll be posting links to the edits in a few hours Josh Parris 02:45, 29 January 2010 (UTC) [reply]
If I might add, this has been terribly buggy. I'm going to be keeping a very close eye on it in its early life, the multitude of problem that turned up during the trial haven't endeared the code to me. Josh Parris 11:48, 29 January 2010 (UTC)[reply]
- Approved. Seems good to me. Tim1357 (talk) 00:54, 30 January 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.