This script processes bulleted lists, removing the redlinked end nodes, reiteratively, until none are left. (A redlinked end node is a list item that is comprised of nothing more than a redlink, and that has no children.) After it has done that, this script delinks the remaining red links, and deletes red category links. It doesn't remove list item entries that have annotations, or that have children (indented entries beneath it).
This semi-automated editing script is currently alpha software. It is still new and should be used with caution to ensure the results are as expected. Please check changes carefully before saving, and report any problems. |
This script processes bulleted lists, removing the redlinked end nodes, reiteratively, until none are left. (A redlinked end node is a list item that is comprised of nothing more than a redlink, and that has no children.) After it has done that, this script delinks the remaining red links, and deletes red category links. It doesn't remove list item entries that have annotations, or that have children (indented entries beneath it).
The redlink remover has two major uses (but it is not limited to these):
Important: this script was developed for use with the Vector skin (it's Wikipedia's default skin), and might not work with other skins. See the top of your Preferences appearance page, to be sure Vector is the chosen skin for your account.
To install this script, add this line to your vector.js page:
importScript("User:The Transhumanist/RedlinksRemover.js");
Save the page and bypass your cache to make sure the changes take effect. By the way, only logged-in users can install scripts.
This section explains the source code, in detail. It is for JavaScript programmers, and for those who want to learn how to program in JavaScript. Hopefully, this will enable you to adapt existing source code into new user scripts with greater ease, and perhaps even compose user scripts from scratch.
You can only use so many comments in the source code before you start to choke or bury the programming itself. So, I've put short summaries in the source code, and have provided in-depth explanations here.
My intention is Threefold:
In addition to plain vanilla JavaScript code, this script relies heavily on the jQuery library.
If you have any comments or questions, feel free to post them at the bottom of this page under Discussions. Be sure to {{ping}} me when you do.
An alias is one string defined to mean another. Another term for "alias" is "shortcut". In the script, the following aliases are used:
$
is the alias for jQuery (the jQuery library)
mw
is the alias for mediawiki (the mediawiki library)
These two aliases are set up like this:
( function ( mw, $ ) {}( mediaWiki, jQuery ) );
That also happens to be a "bodyguard function", which is explained in the section below...
The bodyguard function assigns an alias for a name within the function, and reserves that alias for that purpose only. For example, if you want "t" to be interpreted only as "transhumanist".
Since the script uses jQuery, we want to defend jQuery's alias, the "$". The bodyguard function makes it so that "$" means only "jQuery" inside the function, even if it means something else outside the function. That is, it prevents other javascript libraries from overwriting the $() shortcut for jQuery within the function. It does this via scoping.
The bodyguard function is used like a wrapper, with the alias-containing source code inside it, typically, wrapping the whole rest of the script. Here's what a jQuery bodyguard function looks like:
1 ( function($) {
2 // you put the body of the script here
3 } ) ( jQuery );
See also: bodyguard function solution.
To extend that to lock in "mw" to mean "mediawiki", use the following (this is what the script uses):
1 ( function(mw, $) {
2 // you put the body of the script here
3 } ) (mediawiki, jQuery);
For the best explanation of the bodyguard function I've found so far, see: Solving "$(document).ready is not a function" and other problems (Long live Spartacus!)
Many of my scripts create menu items using mw.util.addPortletLink
, which is provided in a resource module. Therefore, in those scripts it is necessary to make sure the supporting resource module (mediawiki.util) is loaded, otherwise the script could fail (though it could still work if the module happened to already be loaded by some other script). To load the module, use mw.loader, like this:
// For support of mw.util.addPortletLink
mw.loader.using( , function () {
// Body of script goes here.
} );
mw.loader.using
is explained at mw:ResourceLoader/Core modules#mw.loader.using.
For more information, see the API Documentation for mw.loader.
The ready() event listener/handler makes the rest of the script wait until the page (and its DOM) is loaded and ready to be worked on. If the script tries to do its thing before the page is loaded, there won't be anything there for the script to work on (such as with scripts that will have nowhere to place the menu item mw.util.addPortletLink), and the script will fail.
In jQuery, it looks like this: $( document ).ready(function() {});
You can do that in jQuery shorthand, like this:
$().ready( function() {} );
Or even like this:
$(function() {});
The part of the script that is being made to wait goes inside the curly brackets. But you would generally start that on the next line, and put the ending curly bracket, closing parenthesis, and semicolon following that on a line of their own), like this:
1 $(function() {
2 // Body of function (or even the rest of the script) goes here, such as a click handler.
3 });
This is all explained further at the jQuery page for .ready()
For the plain vanilla version see: http://docs.jquery.com/Tutorials:Introducing_$(document).ready()
This is the reserved word var, which is used to declare variables. A variable is a container you can put a value in. To declare the variable portletlink, write this:
var portletlink
A declared variable has no value, until you assign it one, such as like this:
portletlink = "yo mama";
You can combine declaration and assignment in the same statement, like this:
var portletlink = mw.util.addPortletLink('p-tb', '#', 'Remove red links');
Caveat: if you assign a value to a variable that does not exist, the variable will be created automatically. If it is created outside of a function, it will have global scope. For user scripts used on Wikipedia, having a variable of global scope means the variable may affect other scripts that are running, as the scripts are technically part of the same program, being called via import from a .js page (.js pages are programs). So, be careful. Here are some scope-related resources:
This adds a menu item to one of MediaWiki's menus. Use "p-tb" to signify the toolbox menu on the sidebar menu.
First you stick it in a variable, for example, "portletlink":
var portletlink = mw.util.addPortletLink('p-tb', '#', 'Remove redlinks');
It has up to 7 parameters. Only 3 are used above.
General usage:
mw.util.addPortletLink( 'portletId', 'href', 'text', 'id', 'tooltip', 'accesskey', 'nextnode');
It's components:
mw.util.addPortletLink
: the ResourceLoader module to add links to the portlets.portletId
: the id of the portlet (that is, menu) where the new menu item is to be placed. The various menus ("portlets") are::
p-navigation
: Navigation section in left sidebarp-interaction
: Interaction section in left sidebarp-tb
: Toolbox section in left sidebarcoll-print_export
: Print/export section in left sidebarp-personal
Personal toolbar at the top of the pagep-views
Upper right tabs in Vector only (read, edit, history, watch, etc.)p-cactions
Drop-down menu containing move, etc. (in Vector); subject/talk links and action links in other skinshref
: Link to a Wikipedia or external page (the initial purpose of portletlink was to link somewhere)text
: Text that displays in the menu (the title of theid
: HTML id (optional)tooltip
: Tooltip to display on mouseover (optional)accesskey
: Shortcut key press (optional)nextnode
: id of the existing portlet link to place the new portlet link before (optional) (Don't forget: ids have a leading "#")The optional fields must be included in the above order. To skip a field without changing it, use the value null, that is, no space between the quotes for that parameter.
To place the menu items in alphabetical order, and so that they don't move around in the menu, for your last menu item specify the id of an existing menu item to anchor it. Then set "next node" for the next to last item as the id for the menu item you just set, and so on.
See the complete documentation at https://www.mediawiki.orghttps://wikifreehand.com/en/ResourceLoader/Modules#addPortletLink and Help:Customizing toolbars.
Important: All we've done so far above is assign mw.util.addPortletLink to a variable. It won't do anything until we bind the variable to a click handler (see below).
To make a menu item that does something when you click on it, you have to "bind" mw.util.addPortletLink, via its variable, to a handler. Like this:
(The variable used in this example is "portletlink").
1 $(portletlink).click( function(e) {
2 e.preventDefault();
3 //do some stuff
4 }
The "handler" is the part between the curly brackets.
To read about function(e), see what does e mean in this function definition?
jQuery's event objects are explained here: http://api.jquery.com/category/events/event-object/
e.preventDefault()
is short for event.preventDefault()
, one of jQuery's event objects.
What is the default being prevented? Portletlink's default action is to link somewhere. We don't want it to do that, and so that is what e.preventDefault();
is for.
In JavaScript, a function is a subroutine, essentially, a program within the main program. Functions are usually placed at the end of the program, after its core, but can also be located in a library, like jQuery. You call a function by its name. The function "example" is called like this:
example();
See also: JavaScript Function Invocation.
window.location.href returns the current URL.
The window object represents the current window in the browser, and is at the top of the Browser Object Model hierarchy.
The location object pertains to the URL of the current document, and href is one of its properties.
This applies the indexof method upon the URL, to return the index (starting position) of a given string. This can be used to check if the URL contains a specific string.
if (window.location.href.indexOf('action') >= 0
essentially means "if 'action' is in the URL". That is, its position in the URL is equal or greater than 0 (0 represents the first spot, 1 is the second spot, etc.), telling us that it is in there. If it is not there, it would return a -1.
Gets part of the URL.
The substr method returns the substring from the provided start and end indexes, from within the string the method is applied to. If only a start index is provided, the substring will be from that index to the end of the string. In this case, the string is window.location.href (that is, the URL). Note that 0 represents the first character of the string.
So, window.location.href.substr(0,6)
would return the first 7 characters of the URL.
That's not particularly useful, as we probably want to manipulate the string based on what is in it. For example...
window.location.href.substr(0, window.location.href.indexOf('#'))
What that returns is the beginning of the URL through the # character, which we can in turn use in concatenation. The following line of code concatenates (adds) ?action=edit
to the substring, and then replaces the URL with it:
window.location = window.location.href.substr(0, window.location.href.indexOf('#'))+"?action=edit";
This jumps to the edit page for the current page, as if we clicked on "Edit".
This line assigns the variable redlinks
to an empty array (represented by opening and closing square brackets).
Arrays are ordered sets of items.
We created this array to store all the redlinks that are on the page. (See below).
The following line of code declares and assigns to the variable "a" all the elements in the document with the tag "<a>", creating an array:
var a = document.getElementsByTagName('a');
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Loops_and_iteration#for_statement
This method returns the value of the attribute specified for an element it is attached to (with a dot, for example someElement.getAttribute('attribute')
). This allows elements to be processed by a particular attribut, such as their class.
https://www.w3schools.com/jsref/met_element_getattribute.asp
https://www.w3schools.com/jsref/met_element_getattribute.asp
https://www.w3schools.com/html/html_attributes.asp
This didn't work:
localStorage.OLUtils_redlinks = JSON.stringify(redlinks);
So I used this, and it worked:
jsonString = JSON.stringify(redlinks);
localStorage.OLUtils_redlinks = jsonString;
Difference between JSON.stringify and JSON.parse
alert() is short for "window.alert()".
This command makes a message box with a message appear, with an OK button. The script will not continue until the OK button is pushed.
The message is included within the parentheses. It can be a string, a variable, or an object. If it is a variable or an object, its value or contents is displayed in the message.
Difference between JSON.stringify and JSON.parse
if (document.title.indexOf("Outline ") != -1) {}
Let's say a script works for one person, but not another. Or it's working on two machines, but after one is cold booted, it doesn't work on that one.
How would one find the dependencies required by the script? The Transhumanist 12:16, 12 February 2017 (UTC)
localStorage.setItem('foo', 'bar')
, not localStorage.foo = 'bar'
. If you use the API in a non-standard way I wouldn't be surprised if there were differences between the way the various browsers handle it. — Mr. Stradivarius ♪ talk ♪ 13:18, 12 February 2017 (UTC)
localStorage.foo = 'bar'
syntax is fine (although the setItem
syntax is preferred). That link does give some other suggestions as to things that could be wrong, though - localStorage might not be implemented on old browsers, it might be disabled by users, or it might be full. — Mr. Stradivarius ♪ talk ♪ 15:01, 12 February 2017 (UTC)olutils_
(so the current key would be olutils_redlinks
), to reduce the chance of clashes between your data and other localStorage data saved by MediaWiki or by other gadgets. — Mr. Stradivarius ♪ talk ♪ 13:24, 12 February 2017 (UTC)
------------------ End of copy ----------------
I'm working on a script (User:The Transhumanist/OLUtils.js) to remove redlinks from outlines, and I've run into a problem with regular expressions:
1 var nodeScoop2 = new RegExp('('+RegExp.quote(redlinks)+')','i');
2 var matchString2 = wpTextbox1.value.match(nodeScoop2);
3 alert(matchString2);
The above returns two matches, when I was expecting one. The second one is coming from the nested RegExp constructor.
Is there another way to specify a variable within a regular expression? If so, what?
Also, I can't find any documentation on the plus signs as used here. Can you explain them, or point me to an explanation?
What would the RegExp look like in literal notation?
Thank you. The Transhumanist 11:07, 5 May 2017 (UTC)
new RegExp
, this regexp in literal notation: /^Hello\s+/gi
is entirely equivalent to new RegExp('^Hello\\s+', 'gi')
. Note the double escaping! This is because character escapes in regular expression are processed separately from character escapes in strings./(apple)/i
. That's very basic stuff, so you should probably be doing some reading on Mozilla's MDN site (or some other JS learning resource).var nodeScoop2 = new RegExp('\\)+')\\s*\\]\\]','i');
RegExp(RegExp.quote(redlinks),'i')
and see if it works. Syockit (talk) 12:57, 5 May 2017 (UTC)Wow. It's been many moons since anyone has asked me for JS help- I thought I'd become just a mostly-faded memory for a few editors. With that being said, Syockit is right as far as I can tell in that the parentheses create a capturing group. I'm not entirely sure why they're there at all- I'd use the same nodeScoop2 you currently have without the parentheses around the RegExp.quote; i.e. try:
var nodeScoop2 = new RegExp('\\)+'\\s*\\]\\]','i');
Best, Kangaroopowah 20:09, 5 May 2017 (UTC)
title
to links and class new
for red links.$("a.new").before(function(){ return this.textContent }).remove();
before
returns what to remain after link removal. The this
refers to the currently iterated element due to jQuery's design. If we want to completely remove a link, make the function return nothing then. The following example completely removes red category links and treat other red links as usual.$("a.new").before(function(){
if (!this.title.startsWith("Category:"))
return this.textContent;
}).remove();
Sorry, but I don't understand what you are trying to achieve. If you want to remove red links from the DOM (in the generated code of the view), then you can use Javascript (faster) or jQuery (slower) to remove or replace all of them eventually at once, or do more things on each of them in a loop. With Javascript you need to use one of "getElementsByClassName" (for example applied to class="new"
) or "getElementsByTagName" for all <a>
elements, and then you can apply styles ('_color_', '_cursor_', …) or replace them with your own content such as their "innerHTML" values. With jQuery >= 1.2 you can use something like $(".new").replaceWith(function() { return $(this).text(); });
or $(".new").replaceWith(function() { return this.innerHTML; });
, while with jQuery >= 1.4 you can use the unwrap
function like this: $(".new").contents().unwrap();
. jQuery seems to be shorter, but this is because you do not see the whole code that is behind the execution of it, and it is much slower than doing it in native Javascript (when it is well written, of course). All of them, Javascript and jQuery, should be wrapped into a document ready function (via Javascript or jQuery), a setTimeout functions or both. If you need to store their values, then you can create a for
or a while
loop for each of them and the do whatever you want to. Of course, if you are working on the source code, then the above does not apply at all. About the regex, I need more about the data, plus tests and examples. The reason for its multiple matches has been well explained above. Just a note, if you are sending and parsin a huge quantity of data, for example the whole content of an article, then something like PERL is always the faster and the better solution possible because it was conceived for reporting of the big log files such as those generated by a server. AWK and sed are also good with this. Unfortunately, I do not think that they are available here. –pjoef (talk • contribs) 12:18, 6 May 2017 (UTC)
if (document.title.indexOf("Outline ") != -1) {
.The sample I posted at the beginning of this thread was simplified to show the problem that it was returning 2 matches instead of the expected 1. So, I thought the script might do unexpected replacements, but that has not happened (yet). But I've run into other problems...
The regex from the script is more involved than the sample, and is for matching the line the key topic (redlinks) is included on plus the whole next line:
var nodeScoop2 = new RegExp('\\n((\\*)+)*?\\))+'\\s*\\]\\].*?\\n(.*?\\n)','i');
The reason the whole next line is included is because I'd like to delete entries based upon the type of line that follows (or more accurately, does not follow). If the entry is not followed by a child, then it gets deleted, but should be kept if it does have a child. The weird thing is, that the part matching the whole next line is in the 4th set of parentheses, so you would expect $4 to back reference that. In practice, it is $3 that accesses that capturing group. And I don't know why. Though the solution (ignoring the parentheses around the embedded RegExp, when counting the capturing groups) seems to be working. But, I've run into a worse problem...
// Here is the regular expression for matching the scoop target (to "scoop up" the redlinked entry with direct (non-piped) link, plus the whole next line)
var nodeScoop2 = new RegExp('\\n((\\*)+)*?\\))+'\\s*\\]\\].*?\\n(.*?\\n)','i');
// To actualize the search string above, we create a variable with method:
var matchString2 = wpTextbox1.value.match(nodeScoop2);
alert(matchString2); // for testing
// Declare match patterns
var patt1 = new RegExp(":");
var patt2 = new RegExp(" – ");
var patt3 = /$1\*/;
// Here's the fun part. We use a big set of nested ifs to determine if matchString2 does not match criteria. If it does not match, delete the entry:
// If matchString2 isn't empty
if (matchString2 !== null) {
// If has no coloned annotation (that is, does not have a ":")
if (patt1.test(matchString2) === false) {
// If has no hyphenated annotation (that is, does not have " – ")
if (patt2.test(matchString2) === false) {
// ...and if the succeeding line is not a child (that is, does not have more asterisks)
if (patt3.test(matchString2) === false) {
// ... then replace nodeScoop2 with the last line in it, thereby removing the end node entry
wpTextbox1.value = wpTextbox1.value.replace(nodeScoop2,"\n$3");
incrementer++;
alert("removed entry");
}
}
}
}
The problem is patt3. I'm trying to check for the asterisks at the beginning of the second line. If there is one more asterisk on that line than in the line before it, it means it is a child. In which case I do not want to delete the parent. But, the above code deletes the parents anyways.
In the example below, $1 should match the asterisk at the beginning of the parent line, and $1\* (patt3) should match the asterisks at the beginning of the child line. But it doesn't seem to be working. And when I add an alert to test for the value of patt3 or $1, the script crashes!
* Parent
** Child
If $1 includes asterisks in it, does it return those asterisks escaped?
Any ideas on how to solve my patt3 problem? The Transhumanist 12:14, 6 May 2017 (UTC)
\\*
in a RegExp
constructor or in this way /\*
. –pjoef (talk • contribs) 12:26, 6 May 2017 (UTC)var nodeScoop2 = new RegExp('\\n((\\*)+)*?\\))+'\\s*\\]\\].*?\\n(.*?\\n)','i');
if ($1) …
. –pjoef (talk • contribs) 09:34, 7 May 2017 (UTC)li
.$("a.new").replaceWith(function(){
if (this.title.startsWith("Category:"))
return null;
if (this.matches("li > :only-child"))
return null;
return this.textContent;
});
I got your message. It looks like you may have gotten the help you need. When working with RegExp, I like to try them on some sample strings to see what each one is actually matching, and what it's returning. There's a great website for doing that: regex101. Nathanm mn (talk) 16:12, 6 May 2017 (UTC)
Here is a sample item list:
What we want to do is remove the list entries for which the topic is a redlink, but which do not have annotations, and which do not have children. Then we delete redlinked categories, and delink whatever redlinks are leftover — those will be by definition embedded, such as redlink 1 and redlink 3. Redlink 3 is embedded by virtue of having children.
Redlink 2 is a dead end. It is an end node in the tree structure that contains only a redlink. It gets deleted.
The script goes through the list multiple times, until it no longer finds dead end redlinks. This is because when it removes a redlinked end node, that may cause its redlinked parent to become a dead end node (such as when it has no other children). Multiple iterations catch these. So the entire branch starting with Redlink 10 will be deleted.
Here is the problem I've run into: the script currently and erroneously deletes the Redlink 3 list item. Because $1\* or $1\\* do not seem to be identifying the Redlink 4 list item as having more asterisks in the wikisource than the Redlink 3 list item. I do not know why. What should happen is that Redlink 3 would be retained because of Redlink 4, and after Redlink 4 is removed, then Redlink 3 is checked again and is kept by virtue of having Psychology as a child. But, when Redlink 3 is deleted in error, it makes Psychology a child of Geology, thus ruining the tree structure.
All this processing is to be done in the editor, so that the redlinked entries are actually removed from the article.
I'm stuck! I look forward to your replies. The Transhumanist 23:00, 6 May 2017 (UTC)
RegExp.$1
(which will be a string containing the match), not just $1
– except for within String.replace function, when just $1
is used in the replacement string . Secondly, with regex literals, what you type is literally what you get as the regex string. So var patt3 = /$1\*/;
will literally be interpreted as /$1\*/
(where $
asserts position at the end of the string; 1
matches the character 1
; \*
matches the character *
).var patt3 = new RegExp("\\*{"+(RegExp.$1.length+1)+"}");
which, for example, will give you the regex /\*{3}/
when the RegExp.$1
match is "**" - Evad37 04:59, 7 May 2017 (UTCt)
var patt3 = new RegExp("$1\*");
. Why won't that work? (That was the first thing I tried, before going literal). The Transhumanist 23:14, 7 May 2017 (UTC)
$1
as part of a string doesn't have any special meaning, except within the string .replace function. So var patt3 = new RegExp("$1\*");
would give you the regex /$1*/
. To use the actual match instead of $1, you would use var patt3 = new RegExp(RegExp.$1 + "\*");
which would e.g. give you the regex /***/
for a match "**". To actually get valid regex, the match would have to be escaped (note also that the single slash in "\*"
doesn't get preserved unless it is double-escaped as "\\*"
) . - Evad37 23:55, 7 May 2017 (UTC)