Wikipedia talk:WikiProject JavaScript
JavaScript NA‑class | |||||||
|
User script: User:The Transhumanist/anno.js
This script makes the annotations of bullet list entries disappear and reappear with the click of a toggle, or the press of a hot key.
But, it has a problem. When material above the reader's current position on the page is removed, it displaces the reader from what he was reading. The text shifts up relative to the viewport and may even go off the screen.
If you know how to adjust the positioning of the viewport, please let me know.
Thank you. The Transhumanist 06:49, 6 April 2017 (UTC)
User script: User:The Transhumanist/OLUtils.js
This is going to have a bunch of features optimized for working on outlines.
It's first and only feature at this time is a redlink remover. I've adapted a pre-existing script and added a section for removing redlinked list entries from outlines, which is currently in the testing and bug-fixing phase. It doesn't work yet, but I'm hopeful.
I'm in the process of documenting the code liberally with comments, and with extensive explanatory notes on the script's talk page. Commentary and questions are welcome! The Transhumanist 01:10, 14 April 2017 (UTC)
What scripts are you working on or maintaining, or would like to build?
Feel free to share the details of your user script projects, your problems, your ideas, and your JavaScript-related technological fantasies here. :) The Transhumanist 01:10, 14 April 2017 (UTC)
Yeoman (computing) listed at Requested moves
A requested move discussion has been initiated for Yeoman (computing) to be moved to Yeoman (software). This page is of interest to this WikiProject and interested members may want to participate in the discussion here. —RMCD bot 18:32, 4 May 2017 (UTC)
- To opt out of RM notifications on this page, transclude {{bots|deny=RMCD bot}}, or set up Article alerts for this WikiProject.
Nested RegExp
I'm working on a script (User:The Transhumanist/OLUtils.js) to remove redlinks from outlines, and I've run into a problem with regular expressions:
1 var nodeScoop2 = new RegExp('('+RegExp.quote(redlinks[i])+')','i');
2 var matchString2 = wpTextbox1.value.match(nodeScoop2);
3 alert(matchString2);
The above returns two matches, when I was expecting one. The second one is coming from the nested RegExp constructor.
Is there another way to specify a variable within a regular expression? If so, what?
Also, I can't find any documentation on the plus signs as used here. Can you explain them, or point me to an explanation?
What would the RegExp look like in literal notation?
Thank you. The Transhumanist 11:07, 5 May 2017 (UTC)
- This is the way Twinkle specifies variables in a regular expression; to my knowledge it's the only way to do it. The plus signs are acting as string concatenation operators (string + string = concatenation). And you couldn't express this in literal notation, because literal notation can't accept variables (it is literal after all).
- As an example of using
new RegExp
, this regexp in literal notation:/^Hello\s+/gi
is entirely equivalent tonew RegExp('^Hello\\s+', 'gi')
. Note the double escaping! This is because character escapes in regular expression are processed separately from character escapes in strings. - As to why it is returning two matches instead of one, I really couldn't tell you. Could you provide a simplified test case or example? — This, that and the other (talk) 12:40, 5 May 2017 (UTC)
- @This, that and the other: Thank you for the explanation. In answer to your question, "yes". Run the script User:The Transhumanist/OLUtils.js on any article with "Outline of" in the title, and that has red links in it, and the alerts will show you. The Transhumanist 15:35, 5 May 2017 (UTC)
- (edit conflict)@The Transhumanist: It's difficult to quickly assess exactly what's going on without seeing the data it's being run against and the matches you are seeing. Is it possible that there's actually multiple matches in the input text? E.g. if you look for "apple" in "apple, orange, pineapple", two matches is the expected result. You would need to look for "\bapple\b" to restrict both ends to word boundaries, but that would still give multiple matches against "red apple, green apple, orange". There is nothing about that code snippet which suggests that multiple matches should be unexpected behaviour.
- I think your problem here is that you need to deal with the text before and after the thing the regexp is supposed to match. Looking at Alex's original script, I believe you need to use something like his original regular expressions, as it looks like they already deal with the beginning and end of the string. I don't see why you appear to be reinventing the wheel here, as it looks like Alex's script already deals with that issue.
- As for "plus signs as used here", do you mean the string concatenation operators? If you don't recognise basic JS operators and string concatenation, I suggest that you may need to learn fundamental JS programming before continuing. Try the tutorials and guides at https://developer.mozilla.org/en-US/docs/Web/JavaScript.
- Literal notation? If you feed "apple" into the above snipped, via the "redlinks" array, you'd get the equivalent of
/(apple)/i
. That's very basic stuff, so you should probably be doing some reading on Mozilla's MDN site (or some other JS learning resource). - Murph9000 (talk) 12:55, 5 May 2017 (UTC)
- @Murph9000: Thank you for the input. I've been having much difficulty with this script. The answer is "no" on the multiple matches. The original statement was
var nodeScoop2 = new RegExp('\\[\\[\\s*('+RegExp.quote(redlinks[i])+')\\s*\\]\\]','i');
- which for example returns [[Geography of France]], Geography of France
- So I figure it's the nested RegExp that is the second match. The Transhumanist 15:33, 5 May 2017 (UTC)
- Ok, now it's clearer exactly what you are talking about. This is expected behaviour, it's standard regexp group stuff as Syockit explained below. Don't use the term "nested RegExp" like that, as that's not what it is and that term just adds to the confusion here. Murph9000 (talk) 20:50, 5 May 2017 (UTC)
- The parentheses creates a capturing group. The first match is the whole matched string, while the second one is the captured group. Try with
RegExp(RegExp.quote(redlinks[i]),'i')
and see if it works. Syockit (talk) 12:57, 5 May 2017 (UTC)
Wow. It's been many moons since anyone has asked me for JS help- I thought I'd become just a mostly-faded memory for a few editors. With that being said, Syockit is right as far as I can tell in that the parentheses create a capturing group. I'm not entirely sure why they're there at all- I'd use the same nodeScoop2 you currently have without the parentheses around the RegExp.quote; i.e. try:
var nodeScoop2 = new RegExp('\\[\\[\\s*'+RegExp.quote(redlinks[i])+'\\s*\\]\\]','i');
Best, Kangaroopowah 20:09, 5 May 2017 (UTC)
- @Kangaroopower: I tried what you suggested in User:The Transhumanist/redlinkstest.js, and it doesn't seem to work. I'll keep at it, thgouh. The Transhumanist 20:34, 5 May 2017 (UTC)
- @Kangaroopower: I forgot the quotes. So I put those back, and adjusted the replace strings to account for the removal of the control group delimiters, and it worked. Now to try it on the current script... The Transhumanist 02:29, 6 May 2017 (UTC)
- @The Transhumanist: Glad I could help. Best, --03:34, 6 May 2017 (UTC)
- Perhaps you are looking for String.indexOf(). Oftentimes people discover regular expressions and somehow convince themselves that everything must be expressed in terms of regexes. If regex is not working for you, it is ok not to use it. 91.155.195.247 (talk) 20:07, 5 May 2017 (UTC)
- According to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf , that returns a number. I'm looking for specific strings, not the position (index) of a string. Thank you, as I was unaware of what this method does. The Transhumanist 11:00, 6 May 2017 (UTC)
- I cannot clear see what do you want to achieve, but I find these codes overkill. Mediawiki add titles of actual destinations as attribute
title
to links and classnew
for red links.
- This jQuery one-liner simply unlinks all red links. This snippet actually inserts linked texts before links and then remove these links.
$("a.new").before(function(){ return this.textContent }).remove();
- The function in
before
returns what to remain after link removal. Thethis
refers to the currently iterated element due to jQuery's design. If we want to completely remove a link, make the function return nothing then. The following example completely removes red category links and treat other red links as usual.
$("a.new").before(function(){ if (!this.title.startsWith("Category:")) return this.textContent; }).remove();
- Chen-Pang He (talk) —Preceding undated comment added 07:40, 6 May 2017 (UTC)
Sorry, but I don't understand what you are trying to achieve. If you want to remove red links from the DOM (in the generated code of the view), then you can use Javascript (faster) or jQuery (slower) to remove or replace all of them eventually at once, or do more things on each of them in a loop. With Javascript you need to use one of "getElementsByClassName" (for example applied to class="new"
) or "getElementsByTagName" for all <a>
elements, and then you can apply styles ('_color_', '_cursor_', …) or replace them with your own content such as their "innerHTML" values. With jQuery >= 1.2 you can use something like $(".new").replaceWith(function() { return $(this).text(); });
or $(".new").replaceWith(function() { return this.innerHTML; });
, while with jQuery >= 1.4 you can use the unwrap
function like this: $(".new").contents().unwrap();
. jQuery seems to be shorter, but this is because you do not see the whole code that is behind the execution of it, and it is much slower than doing it in native Javascript (when it is well written, of course). All of them, Javascript and jQuery, should be wrapped into a document ready function (via Javascript or jQuery), a setTimeout functions or both. If you need to store their values, then you can create a for
or a while
loop for each of them and the do whatever you want to. Of course, if you are working on the source code, then the above does not apply at all. About the regex, I need more about the data, plus tests and examples. The reason for its multiple matches has been well explained above. Just a note, if you are sending and parsin a huge quantity of data, for example the whole content of an article, then something like PERL is always the faster and the better solution possible because it was conceived for reporting of the big log files such as those generated by a server. AWK and sed are also good with this. Unfortunately, I do not think that they are available here. –pjoef (talk • contribs) 12:18, 6 May 2017 (UTC)
- @Pjoef and Jdh8: The script is User:The Transhumanist/OLUtils.js, and the section we are working on here is for processing outlines, and starts with this:
if (document.title.indexOf("Outline ") != -1) {
.
- For outlines, the script is supposed to remove list item entries (including bullet and carriage return) that are comprised entirely of redlinks, but only if they have no children. Red end nodes. It goes through several iterations, just in case the removal of a red end node renders other red entries into end nodes. After all those have been removed, then the script deletes any red category links, and finally delinks the remaining embedded red links. I've provided a more in-depth explanation below under #What the script is supposed to do. For non-outlines, it just deletes red cats and delinks the rest of the redlinks. The Transhumanist 04:09, 7 May 2017 (UTC)
The whole regex
@Jdh8, Kangaroopower, Syockit, Murph9000, SMcCandlish, and TheDJ:
The sample I posted at the beginning of this thread was simplified to show the problem that it was returning 2 matches instead of the expected 1. So, I thought the script might do unexpected replacements, but that has not happened (yet). But I've run into other problems...
The regex from the script is more involved than the sample, and is for matching the line the key topic (redlinks[i]) is included on plus the whole next line:
var nodeScoop2 = new RegExp('\\n((\\*)+)[ ]*?\\[\\[\\s*'+(RegExp.quote(redlinks[i]))+'\\s*\\]\\].*?\\n(.*?\\n)','i');
The reason the whole next line is included is because I'd like to delete entries based upon the type of line that follows (or more accurately, does not follow). If the entry is not followed by a child, then it gets deleted, but should be kept if it does have a child. The weird thing is, that the part matching the whole next line is in the 4th set of parentheses, so you would expect $4 to back reference that. In practice, it is $3 that accesses that capturing group. And I don't know why. Though the solution (ignoring the parentheses around the embedded RegExp, when counting the capturing groups) seems to be working. But, I've run into a worse problem...
// Here is the regular expression for matching the scoop target (to "scoop up" the redlinked entry with direct (non-piped) link, plus the whole next line) var nodeScoop2 = new RegExp('\\n((\\*)+)[ ]*?\\[\\[\\s*'+(RegExp.quote(redlinks[i]))+'\\s*\\]\\].*?\\n(.*?\\n)','i'); // To actualize the search string above, we create a variable with method: var matchString2 = wpTextbox1.value.match(nodeScoop2); alert(matchString2); // for testing // Declare match patterns var patt1 = new RegExp(":"); var patt2 = new RegExp(" – "); var patt3 = /$1\*/; // Here's the fun part. We use a big set of nested ifs to determine if matchString2 does not match criteria. If it does not match, delete the entry: // If matchString2 isn't empty if (matchString2 !== null) { // If has no coloned annotation (that is, does not have a ":") if (patt1.test(matchString2) === false) { // If has no hyphenated annotation (that is, does not have " – ") if (patt2.test(matchString2) === false) { // ...and if the succeeding line is not a child (that is, does not have more asterisks) if (patt3.test(matchString2) === false) { // ... then replace nodeScoop2 with the last line in it, thereby removing the end node entry wpTextbox1.value = wpTextbox1.value.replace(nodeScoop2,"\n$3"); incrementer++; alert("removed entry"); } } } }
The problem is patt3. I'm trying to check for the asterisks at the beginning of the second line. If there is one more asterisk on that line than in the line before it, it means it is a child. In which case I do not want to delete the parent. But, the above code deletes the parents anyways.
In the example below, $1 should match the asterisk at the beginning of the parent line, and $1\* (patt3) should match the asterisks at the beginning of the child line. But it doesn't seem to be working. And when I add an alert to test for the value of patt3 or $1, the script crashes!
* Parent
** Child
If $1 includes asterisks in it, does it return those asterisks escaped?
Any ideas on how to solve my patt3 problem? The Transhumanist 12:14, 6 May 2017 (UTC)
- Try to double-escape the aterisk
\\*
in aRegExp
constructor or in this way/\*
. –pjoef (talk • contribs) 12:26, 6 May 2017 (UTC)
- @Pjoef: I did. See the RegExp below. Notice that the double escaped asterisk is inside a capturing group. When you use $1 to refer to that capturing group, will the asterisks in there still be escaped? When I try to use alert to test for $1, it crashes the script.
var nodeScoop2 = new RegExp('\\n((\\*)+)[ ]*?\\[\\[\\s*'+(RegExp.quote(redlinks[i]))+'\\s*\\]\\].*?\\n(.*?\\n)','i');
- I look forward to your reply. The Transhumanist 13:58, 6 May 2017 (UTC)
- "*" is a quantifier (a special character) and, as well as all other special characters, it needs to be escaped when it is part of the pattern of characters that you want to find or replace. See: w3schools.com/jsref/jsref_obj_regexp.asp. About the use of the alert for debugging purpose I suggest you to use console.log() method to display data directly within the debugger of the browser. More @: w3schools.com/js/js_debugging.asp. The debugger itself should be also able to show you which and where is the error within your code. About the editing of the article and the DOM manipulation, it doesn't save the changes, but if an user is in the editor window/view and it presses the save button all changes that have been made to the content will be saved. –pjoef (talk • contribs) 09:26, 7 May 2017 (UTC)
- P.S.: I haven't tested it out but probably $1 is "undefined". In this case you need to check for this before you use it:
if ($1) …
. –pjoef (talk • contribs) 09:34, 7 May 2017 (UTC)
- Running the code in generated document seems to be easier because we can make use of HTML structure. A leaf link safe to remove is the only child of
li
.
$("a.new").replaceWith(function(){ if (this.title.startsWith("Category:")) return null; if (this.matches("li > :only-child")) return null; return this.textContent; });
- Cheers, Chen-Pang He (talk) —Preceding undated comment added 15:19, 6 May 2017 (UTC)
- @Jdh8: Hi. Thanks for the suggestions. I have some questions for you: Would the code you provided edit the article, or just affect the view? I'm looking for editing solutions. How could a script remove children list items in the edit window? The Transhumanist 03:57, 7 May 2017 (UTC)
I got your message. It looks like you may have gotten the help you need. When working with RegExp, I like to try them on some sample strings to see what each one is actually matching, and what it's returning. There's a great website for doing that: regex101. Nathanm mn (talk) 16:12, 6 May 2017 (UTC)
- @Nathanm mn: We still haven't figured it out. The problem I'm trying to solve is how to identify when a list item has a child. A child list item will have one more asterisk at the beginning than the parent. So, I set up a capturing group for the asterisks at the beginning of the parent (so $1 would be the back reference), and then try to match that number of asterisks plus one more in the child (using $1\*). But it isn't working. I am stuck. There are other criteria which the entries to be removed must fail, otherwise I wish to keep them. So simply getting rid of all children isn't what I'm after. We already know they are red linked entries, because the first half of the program puts all redlinks into an array, which we process in the second half of the program. Then the nested if structure checks first for whether the current redlink in the array has no entry. If it doesn't, then we check to see if it has no colon annotation. If it doesn't have a colon separator, then we check to see if it doesn't have a hyphenated annotation. If it doesn't have an en dash separator, then we check to see if it has no children. If it doesn't have a child, then we delete it from the wiki source, modifying the actual article itself.
- Once all redlinked entries that fail our tests are removed, then the rest of the program mops up, deleting red category links, and delinking all redlinks that still remain after that. We know, due to the extensive filtering we just subjected them to, that they are all embedded redlinks, the content of which we want to keep. I'll make a sample below that presents examples of the data instances to be processed. The Transhumanist 22:12, 6 May 2017 (UTC)
What the script is supposed to do
@Jdh8, Kangaroopower, Syockit, Murph9000, SMcCandlish, TheDJ, and Nathanm mn:
Here is a sample item list:
- Geography
- Geology – this text is an annotation. And here is an embedded redlink 1. After all the end node (dead end) redlinked entries are removed, this redlink will be delinked.
- Redlink 2
- Redlink 3
- Redlink 5: this text is also an annotation. So I want to keep this entry.
- Redlink 6
- Redlink 10
What we want to do is remove the list entries for which the topic is a redlink, but which do not have annotations, and which do not have children. Then we delete redlinked categories, and delink whatever redlinks are leftover — those will be by definition embedded, such as redlink 1 and redlink 3. Redlink 3 is embedded by virtue of having children.
Redlink 2 is a dead end. It is an end node in the tree structure that contains only a redlink. It gets deleted.
The script goes through the list multiple times, until it no longer finds dead end redlinks. This is because when it removes a redlinked end node, that may cause its redlinked parent to become a dead end node (such as when it has no other children). Multiple iterations catch these. So the entire branch starting with Redlink 10 will be deleted.
Here is the problem I've run into: the script currently and erroneously deletes the Redlink 3 list item. Because $1\* or $1\\* do not seem to be identifying the Redlink 4 list item as having more asterisks in the wikisource than the Redlink 3 list item. I do not know why. What should happen is that Redlink 3 would be retained because of Redlink 4, and after Redlink 4 is removed, then Redlink 3 is checked again and is kept by virtue of having Psychology as a child. But, when Redlink 3 is deleted in error, it makes Psychology a child of Geology, thus ruining the tree structure.
All this processing is to be done in the editor, so that the redlinked entries are actually removed from the article.
I'm stuck! I look forward to your replies. The Transhumanist 23:00, 6 May 2017 (UTC)
- Your patt3 is off for a couple of reasons. First, with the $n regex matches, in general you access them using
RegExp.$1
(which will be a string containing the match), not just$1
– except for within String.replace function, when just$1
is used in the replacement string [1]. Secondly, with regex literals, what you type is literally what you get as the regex string. Sovar patt3 = /$1\*/;
will literally be interpreted as/$1\*/
(where$
asserts position at the end of the string;1
matches the character1
;\*
matches the character*
). - What you could use instead is
var patt3 = new RegExp("\\*{"+(RegExp.$1.length+1)+"}");
which, for example, will give you the regex/\*{3}/
when theRegExp.$1
match is "**" - Evad37 [talk] 04:59, 7 May 2017 (UTC)
Use of Wikipedian programmer categories
@The Transhumanist: Was it really appropriate to spam over 500 users with a notice of this discussion using WP:AWB? That seems to me to be an inappropriate use of that tool. Murph9000 (talk) 13:21, 5 May 2017 (UTC)
- 90% of them haven't logged in for months or years, but were still listed in the js users categories. It's the only feasible way I could think of to reach the other 10%. You wouldn't happen to know of a script that can sort or screen a user list by the date of their last edit, would you? That would be very helpful. The Transhumanist 15:07, 5 May 2017 (UTC)
- Done Found a way. See #Tracking down recent editors, below. The Transhumanist 17:10, 5 May 2017 (UTC)
- Yeah, OP shouldn't have done that. If you want to recruit programming helpers, try Stack Overflow! Tomalak Geret'kal (talk) 13:29, 5 May 2017 (UTC)
- Just contacted users in the js wikipedian categories, which is what those categories are for, per WP:UCAT. For help with a script to improve the encyclopedia. I'll look into a way to filter out dead user accounts from a list. The Transhumanist 15:19, 5 May 2017 (UTC)
- Done See #Tracking down recent editors, below. The Transhumanist 19:26, 5 May 2017 (UTC)
- For real. Wikipedia is not StackOverflow. Julesmazur (talk) 14:00, 5 May 2017 (UTC)
- The programming userboxes are to enable Wikipedians to contact each other about programming, per WP:UCAT. That's why we list our JavaScript skill-level, right? The Transhumanist 15:07, 5 May 2017 (UTC)
- Consider that I've deleted several pages that you created through these mass messages that were not, in fact, user talk pages. Examples (not exhaustive, and I got bored of deleting them after a few): User talk:X!/egapresu siht tide t'nod esaelP, User talk:Vanished user 98wiejfno34tijsfoiwefjlok5y/infobox, User talk:Godlvall2/UserBoxes. Those userboxes are for identification, sure, but they're not for automated mass messaging. Writ Keeper ⚇♔ 15:35, 5 May 2017 (UTC)
- Thanks. My bad. I didn't notice them right away, and then started skipping them. The Transhumanist 16:52, 5 May 2017 (UTC)
- Consider that I've deleted several pages that you created through these mass messages that were not, in fact, user talk pages. Examples (not exhaustive, and I got bored of deleting them after a few): User talk:X!/egapresu siht tide t'nod esaelP, User talk:Vanished user 98wiejfno34tijsfoiwefjlok5y/infobox, User talk:Godlvall2/UserBoxes. Those userboxes are for identification, sure, but they're not for automated mass messaging. Writ Keeper ⚇♔ 15:35, 5 May 2017 (UTC)
- The programming userboxes are to enable Wikipedians to contact each other about programming, per WP:UCAT. That's why we list our JavaScript skill-level, right? The Transhumanist 15:07, 5 May 2017 (UTC)
- Also, was it really necessary to use AWB to post an invitation to 300+ mainspace talk pages? I don't think that's where they go. Writ Keeper ⚇♔ 14:11, 5 May 2017 (UTC)
- @Writ Keeper: I've converted those to informational notices concerning upkeep of JavaScript articles, with a more encyclopedic tone. Thank you for the feedback. The Transhumanist 16:51, 5 May 2017 (UTC)
Tracking down recent editors
I'd like to contact recent editors (say, in the past month) from the users listed at:
Many of the users listed here haven't logged in for years.
Any ideas? The Transhumanist 11:43, 5 May 2017 (UTC)
- Hi The Transhumanist. Quarry is awesome for stuff like this. Here you go :) https://quarry.wmflabs.org/query/18396 --EpochFail (talk • contribs) 16:05, 5 May 2017 (UTC)
- Perfect. I ran it again for the other cats. Thank you for the script! The Transhumanist 19:08, 5 May 2017 (UTC)
Data type section needed in JavaScript article
In my eyes the article misses a section describing the available data types. Can somebody with knowledge about the subject add this? --79.213.185.195 (talk) 06:37, 22 April 2017 (UTC)