User talk:The Transhumanist/StripSearchSorted.js

This is the workshop support page for the user script StripSearchSorted.js. Comments and requests concerning the program are most welcome. Please post discussion threads below the section titled Discussions. Thank you. By the way, the various scripts I have written are listed at the bottom of the page.[1]
This script is operational, but there is a quirk in wikEd: When results are copied/pasted into wikEd, the results are erroneously double spaced. Clicking on undo in wikEd reverts it to single spaced as initially intended. (I don't know why. If you do, please tell me.)

StripSearchSorted.js: provides a menu item to strip search results down to bare page names, sort them alphabetically, and add bullet list wikicode formatting for easy copying and pasting into articles. The menu item is a toggle switch that turns this function on and off, and remembers its status for all searches. By default, just by being installed, the script removes from search results the redirected entries and members of matching categories (as they don't match the search string), even if you don't use the menu item. For Vector skin only.

Script's workshop edit

This is the work area for developing the script and its documentation. The talk page portion of this page starts at #Discussions, below.

Description / instruction manual edit

This script is operational, but there is a quirk in wikEd: When results are copied/pasted into wikEd, the results are erroneously double spaced. Clicking on undo in wikEd reverts it to single spaced as initially intended. (I don't know why. If you do, please tell me.)

StripSearchSorted.js: provides a menu item to strip search results down to bare page names, sort them alphabetically, and add bullet list wikicode formatting for easy copying and pasting into articles. The menu item is a toggle switch that turns this function on and off, and remembers its status for all searches. By default, just by being installed, the script removes from search results the redirected entries and members of matching categories (as they don't match the search string), even if you don't use the menu item. For Vector skin only.

In other words, when the menu item is turned on, this script reduces the search results to a list of links. It strips out the data between the page names, including that annoying "from redirect" note. It adds * [[]] to each entry and sorts them so they look like this:

* [[Brad Pitt]]
* [[Clint Eastwood]]
* [[Dwayne Johnson]]
* [[John Wayne]]
* [[Tom Cruise]]

The formatting makes it easier to copy and paste the links from search results into articles.

Once installed, the menu item "SR sort" will appear in the side bar tools menu, specifying what action it is ready to perform (either "turn on" or "turn off").

How to install this script edit

Important: this script was developed for use with the Vector skin (it's Wikipedia's default skin), and might not work with other skins. See the top of your Preferences appearance page, to be sure Vector is the chosen skin for your account.

To install this script, add this line to your vector.js page:

importScript("User:The Transhumanist/StripSearchSorted.js");

Save the page and bypass your cache to make sure the changes take effect. By the way, only logged-in users can install scripts.

Known issues edit

Quirk in wikEd: When results are copied/pasted into wikEd, the results are erroneously double spaced. Clicking on undo in wikEd reverts it to single spaced as initially intended. (I don't know why. If you do, please tell me.)

Explanatory notes (source code walk-through) edit

This section explains the source code, in detail. It is for JavaScript programmers, and for those who want to learn how to program in JavaScript. Hopefully, this will enable you to adapt existing source code into new user scripts with greater ease, and perhaps even compose user scripts from scratch.

You can only use so many comments in the source code before you start to choke or bury the programming itself. So, I've put short summaries in the source code, and have provided in-depth explanations here.

My intention is Threefold:

  1. to thoroughly document the script so that even relatively new JavaScript programmers can understand what it does and how it works, including the underlying programming conventions. This is so that the components and approaches can be modified, or used again and again elsewhere, with confidence. (I often build scripts by copying and pasting code that I don't fully understand, which often leads to getting stuck). To prevent getting stuck, the notes below include extensive interpretations, explanations, instructions, examples, and links to relevant documentation and tutorials, etc. Hopefully, this will help both you and I grok the source code and the language it is written in (JavaScript).
  2. to refresh my memory of exactly how the script works, in case I don't look at the source code for weeks or months.
  3. to document my understanding, so that it can be corrected. If you see that I have a misconception about something, please let me know!

In addition to plain vanilla JavaScript code, this script relies heavily on the jQuery library.

If you have any comments or questions, feel free to post them at the bottom of this page under Discussions. Be sure to {{ping}} me when you do.

General approach edit

The script uses the jQuery method .hide() for stripping the elements by class name. Here's an example of stripping out elements with the class name "searchalttitle":

$( ".searchalttitle" ).hide();

Learn about methods at https://www.w3schools.com/js/js_object_methods.asp

Learn about .hide at http://api.jquery.com/hide/

Aliases edit

An alias is one string defined to mean another. Another term for "alias" is "shortcut". In the script, the following aliases are used:

$ is the alias for jQuery (the jQuery library)

mw is the alias for mediawiki (the mediawiki library)

These two aliases are set up like this:

( function ( mw, $ ) {}( mediaWiki, jQuery ) );

That also happens to be a "bodyguard function", which is explained in the section below...

Bodyguard function edit

The bodyguard function assigns an alias for a name within the function, and reserves that alias for that purpose only. For example, if you want "t" to be interpreted only as "transhumanist".

Since the script uses jQuery, we want to defend jQuery's alias, the "$". The bodyguard function makes it so that "$" means only "jQuery" inside the function, even if it means something else outside the function. That is, it prevents other javascript libraries from overwriting the $() shortcut for jQuery within the function. It does this via scoping.

The bodyguard function is used like a wrapper, with the alias-containing source code inside it, typically, wrapping the whole rest of the script. Here's what a jQuery bodyguard function looks like:

1 ( function($) {
2     // you put the body of the script here
3 } ) ( jQuery );

See also: bodyguard function solution.

To extend that to lock in "mw" to mean "mediawiki", use the following (this is what the script uses):

1 ( function(mw, $) {
2     // you put the body of the script here
3 } ) (mediawiki, jQuery);

For the best explanation of the bodyguard function I've found so far, see: Solving "$(document).ready is not a function" and other problems   (Long live Spartacus!)

The ready() event listener/handler edit

The ready() event listener/handler makes the rest of the script wait until the page (and its DOM) is loaded and ready to be worked on. If the script tries to do its thing before the page is loaded, there won't be anything there for the script to work on (such as with scripts that will have nowhere to place the menu item mw.util.addPortletLink), and the script will fail.

In jQuery, it looks like this: $( document ).ready(function() {});

You can do that in jQuery shorthand, like this:

$().ready( function() {} );

Or even like this:

$(function() {});

The part of the script that is being made to wait goes inside the curly brackets. But you would generally start that on the next line, and put the ending curly bracket, closing parenthesis, and semicolon following that on a line of their own), like this:

1 $(function() {
2     // Body of function (or even the rest of the script) goes here, such as a click handler.
3 });

This is all explained further at the jQuery page for .ready()

For the plain vanilla version see: http://docs.jquery.com/Tutorials:Introducing_$(document).ready()

Activation filters edit

I didn't know what else to call these. I wanted the program to only work when intended, and only on intended pages (search result pages). So, I applied the conditional, if, as follows...

Vector skin activation filter edit

I use the Vector skin, and haven't tested the script on any other skin, so the script basically says "if the vector skin is in use, do what's between the curly brackets". (Which includes the rest of the main program. Note that functions, aka subroutines, follow after the main program.).

	// Only activate on Vector skin
        if ( mw.config.get( 'skin' ) === 'vector' ) {
		// The rest of the script goes here
	}
mw.config.get ( 'skin' )
edit

This looks up the value for skin (the internal name of the currently used skin) saved in MediaWiki's configuration file.

logical operators
edit

"===" means "equal value and equal type"

Page title activation filter edit

		// Run this script only if " - Search results - Wikipedia" is in the page title
		if (document.title.indexOf(" - Search results - Wikipedia") != -1) {
			// The rest of the script goes here
		}

Prep work edit

There is no prep work in this script. This would be the declaration of global variables and so on.

Core program edit

This is the part that controls the main flow of the script (decides what to do under what circumstances):

            if ( mw.config.get( 'skin' ) === 'vector' ) {
                $( function() {

                    // hide elements by class per http://api.jquery.com/hide
                    $( ".searchalttitle" ).hide();
                    $( ".searchresult" ).hide();
                    $( ".mw-search-result-data" ).hide();

                } );
            }

So, what this does is 4 things:

First, it checks if the Vector skin is being used and runs the rest of the script only if it is.

Then it applies the jQuery method .hide on all elements labeled as any of these 3 classes: searchalttitle, searchresult, or mw-search-result-data.

To use an object method, you append it to the end of an element, as is done with .hide() 3 times above. Don't forget the parentheses, and be sure to end your statements with a semicolon.

Learn more about .hide at http://api.jquery.com/hide/

mw.config.get ( 'skin' ) edit

This looks up the value for skin (the internal name of the currently used skin) saved in MediaWiki's configuration file.

logical operators edit

"===" means "equal value and equal type"

Strip out the sister project results edit

                    // hide elements of Results from sister projects (per http://api.jquery.com/hide)
                    $( ".iw-headline" ).hide();
                    $( ".iw-results" ).hide();
                    $( ".iw-resultset" ).hide();
                    $( ".iw-result__title" ).hide();
                    $( ".iw-result__content" ).hide();
                    $( ".iw-result__footer" ).hide();

I went through the pagesource looking for the classes of the data displayed in the right-hand column, and inserted them into the code above. (I assume "iw" stands for "interwiki").

Add wiki formatting to the list items edit

Change log edit

  • 2017-10-27
  • 2017-11-05
    • Evad37 provided sequence for sorting the search results
  • 2017-11-09
    • Add toggle switch (dual menu item)
    • Apply class of "Stripped" to the modified results, so that they can be removed to make way for original results
    • Make switch swap out results between original and modified, and vice versa
  • 2018-01-20
    • Added TrueMatch function (intitle bug workaround)
      • Evad37 provided the 2 key lines

Task list edit

Bug reports edit

Desired/completed features edit

Completed features are marked with   Done

Improvements that would be nice:

  • True Match (built-in intitle fix) – intitle doesn't work right in that it ignores common words, and so results turn up without the specified search term. This feature would discard all the results that don't match the search term (which the search feature should have done in the first place). (And since it'll all be in an array, anyways, this should be easy to implement).

Development notes edit

Implementing True Match edit

Run the function if Title includes "intitle:"

Parse the Title with regex to get the intitle string. The string may be a single word or a phrase within double quotation marks. Use regex pipe for or.

Then keep only the search results that include that string. One way to do this is use a regex to inverse match via negative look-arounds. ^((?!hi there).*)$ will match any line not containing "hi there". Those are the lines we want to remove.

See annotationToggle for how to wrap entries in classed span tags, and then hide those spans. But do it with jQuery instead.

Adding the wikicode edit

Evad37 nailed it in discussion below

The elements that I wish to change have the class mw-search-result-heading.

Each one has an anchor element within it. Perhaps those can be sandwiched with the desired wikicode (between the double square brackets).

removing the redirected entries edit

Evad37 nailed it in discussion below

Maybe using .splice could work, if regex could be applied somehow.

     for (var i = 0; i < x.length; i++) {
        // if current array item matches "searchalttitle"
            // remove it from array
            // x.splice(i)
            // i = i--
    }

In the loop above, splicing (removing) the current item would shift the next item into its position. When the loop iterates to the next item, it will have inadvertently skipped one. After splicing, you'd have to decrement i by one.

Or use forEach, and...

push all non-matches in a new array, and at the end of forEach replace the original array with the new one.

Or, using standard for loop...

iterate over the array index and decrease the loop index i-- whenever you find a match

more solutions edit

Improve the way the script hides edit

Seems like you could hide each entire search result and then unhide the element of interest, which is the pagename. --Izno (talk) 13:17, 29 September 2017 (UTC)Reply

Get rid of the extraneous linefeeds edit

The search results are double spaced, which shows up as a blank line between each list item when you cut and paste to an edit window.

First, it might help to be able to see the control characters (like linefeed, \n). One way to look for them is with this:

// Inspect the raw text, so you can look for \n linefeeds
$(".mw-search-results").each(function(index) {
    let mwsr_text = JSON.stringify($(this).text());
    alert(mwsr_text);
});

This showed the text, but didn't show the linefeeds (\n). Logically, they must be there. The linefeed characters don't show up in the editor I cut and pasted them into. But the editor's search/replace is still able to find/replace them. Therefore, it might be possible to use regex in JS to get rid of them on the web page.

So, I tried the following code to remove linefeeds (\n), but it didn't work.

var str = $(".mw-search-results").html();
var regex = /\n/gi;
$(".mw-search-results").html(str.replace(regex, ""));

I tried it on \s, and it got rid of the linefeeds along with all the other white space characters. Which means they may be specifically accessible.

Discussions edit

Post messages below.

Script to format search results as a list of page names with bullet list wikicode provided edit

(Originally posted to User:Evad37).

I've written a script called StripSearch.js that unclutters search results to make them bare lists of page names.

Now I'm writing a sequel to it called StripSearchInWikicode.js.

I would like the output of search results to look like this:

* [[Benjamin Franklin]]
* [[Larry Page]]
* [[Carl Sagan]]
* [[Hillary Clinton]]
* [[Warren Oates]]

...for easy copying and pasting into articles.

I'm having trouble manipulating the elements of class "mw-search-result-heading".

I gather that you put them into an array like this:

var x = document.getElementsByClassName("mw-search-result-heading");

I'd like to subject the items in that array to a regex, using the jQuery .each method, or the .each function, but I don't know how. The documentation is confusing as hell.

I think the search string (<a.*a>) and replacement string * [[$1]] ought to work.

Any pointers would be most appreciated.

Sincerely, The Transhumanist 12:58, 29 September 2017 (UTC)Reply

@The Transhumanist: You don't really need anything that complicated – you can just insert content before and after each element with class "mw-search-result-heading" using jQuery's prepend and append methods:
$(".mw-search-result-heading").prepend('* [[').append(']]');
just about does the trick. - Evad37 [talk] 13:51, 29 September 2017 (UTC)Reply
Or even better
$(".mw-search-result-heading").children().before('* [[').after(']]');
(this avoids leaving a space before the ]]) - Evad37 [talk] 13:53, 29 September 2017 (UTC)Reply
You are right, the first method would be perfect if it didn't insert an extraneous space.
The second method inserts * [[]] unexpectedly on the same line after various entries, like this (searched for "genre"):
* [[Genre]]
* [[Genre art]]
* [[Rapping]] * [[]]
* [[Pop music]] * [[]]
* [[Trap music]] * [[]]
Is there a way to apply regex, to avoid both problems?
Another feature I would love the script to have is to strip redirected entries out of the search results. Those are the mw-search-result-heading entries that include searchalttitle inside their divs. I would like to remove just those instances of mw-search-result-heading.
Adding that feature would probably also solve the bug in the second method you presented above.
Would .not work for this, to hide divs with the class mw-search-result-heading except for those that do not contain searchalttitle?
Unfortunately, I don't know how to apply regex to facilitate matches for this type of thing. I can construct regex strings, I just don't know how to put them into play.
Forgoing jQuery, I think a for loop could be set up like this:
    // Strip out redirected entries
    var x = document.getElementsByClassName("mw-search-result-heading");
    for (i = 0; i < x.length; i++) {
        // somehow remove this entry if
        // it contains element of class "searchalttitle"
    }
But I don't know how to write the guts.
By the way, the script failed when I ran it with that empty for loop, and it failed when I tried sorting the array, like this:
    // Sort the search results
    var x = document.getElementsByClassName("mw-search-result-heading");
    x.sort();
It's enough to make one's head spin. :) The Transhumanist 20:50, 29 September 2017 (UTC)Reply
Loops and regex aren't always the best tools, especially when working with collections of elements. jQuery has several ways to filter and refine results. One way would be to only apply * [[]] to the first-child elements within .mw-search-result-heading like so:
$(".mw-search-result-heading").children().filter(':first-child').before('* [[').after(']]');
Another way, like you alluded to above, is to first remove the searchalttitle elements, and then the * [[]] can be added safely:
$(".searchalttitle").remove();
$(".mw-search-result-heading").children().filter(':first-child').before('* [[').after(']]');
Or to remove instances of mw-search-result-heading which contain searchalttitle you can use .has():
$(".mw-search-result-heading").has(".searchalttitle").remove();
$(".mw-search-result-heading").children().before('* [[').after(']]');
Which can also be written slightly more succinctly like so:
$(".mw-search-result-heading").has(".searchalttitle").remove().end().children().before('* [[').after(']]');
Note that you can use .hide() instead of .remove() if you want to be able to show those elements again at some point. - Evad37 [talk] 02:52, 30 September 2017 (UTC)Reply

Wow. You make it looks so easy. So, you chain methods to a selector. Nice. That sure is convenient. jQuery is simpler than I thought. When you chain methods to a class, they work on all the elements of that class. I was doing that with hide, but was just copying the examples and didn't really grasp the underlying structure. Thank you. And on retrospect, with loops and regex, it looks like I was trying to conduct surgery with an icecream scoop. :)

I try to follow along in the documentation during these discussions, so that I can grasp the jargon. While doing so, I noticed this:

$(".searchalttitle").remove();
$(".mw-search-result-heading").children().filter(':first-child').before('* [[').after(']]');

can be refactored to this:

$(".searchalttitle").remove();
$(".mw-search-result-heading").children(':first-child').before('* [[').after(']]');

It seems to work!

The script is now operational, thanks to you. But, I came across an unforeseen obstacle. The results look great on the search results page, but when you copy and paste them into an edit page, there is a blank line between all the entries. That requires that the user regex them all out in WikEd. I'd like to eliminate that manual operation by removing the blank lines in the search results.

Also, when we remove the .mw-search-result-heading entries that contain .searchalttitle, additional blank lines are left behind. Is that a clue that can help us track those newlines (\n) down?

It is not apparent where the newlines are inserted in the page source for the search results page. So, I assume they are specified on a style sheet somewhere. What is the most effective way to hunt down the style sheet which defines a particular class used on a Wikipedia page? The Transhumanist 21:24, 30 September 2017 (UTC)Reply

It all seems to be very much browser dependent. Chrome gives me the expected result:
Extended content
There is a page named "Genre" on Wikipedia
* [[Genre]]
* [[Yuri (genre)]]
* [[Film genre]]
* [[Literary genre]]
* [[Harem (genre)]]
* [[Genre studies]]
* [[Music genre]]
* [[Western (genre)]]
* [[Bara (genre)]]
* [[Genre fiction]]
* [[Biblical genre]]
* [[Epic (genre)]]
* [[Genre art]]
* [[Thriller (genre)]]
Firefox adds spaces at the start of each line:
Extended content
 There is a page named "Genre" on Wikipedia

    * [[Genre]]
    * [[Yuri (genre)]]
    * [[Film genre]]
    * [[Literary genre]]
    * [[Harem (genre)]]
    * [[Genre studies]]
    * [[Music genre]]
    * [[Bara (genre)]]
    * [[Western (genre)]]
    * [[Genre fiction]]
    * [[Biblical genre]]
    * [[Epic (genre)]]
    * [[Genre art]]
    * [[Thriller (genre)]] 
IE adds several newlines between each item:
Extended content
There is a page named "Genre" on Wikipedia

* [[Genre]] 




* [[Yuri (genre)]] 




* [[Film genre]] 




* [[Harem (genre)]] 




* [[Literary genre]] 




* [[Genre studies]] 







* [[Music genre]] 




* [[Bara (genre)]] 




* [[Western (genre)]] 




* [[Genre fiction]] 











* [[Biblical genre]] 




* [[Epic (genre)]] 



* [[Thriller (genre)]] 







* [[Genre art]] 

That's all on windows 7. And you're presumably using some other browser/OS combination. Not really sure what the solution is though. - Evad37 [talk] 03:28, 1 October 2017 (UTC)Reply
Since the removed items each leave behind a newline, my guess is that it's one newline per div. But what div? There is other formatting there, including alternating background colors, and a solid border between entries. If I can remove the divs that the removed entries were in, that might get rid of some of the extraneous new lines. The rest I won't know until I get a look at the style sheets. But I can't find the style sheets. Is there a way to trace a class back to the style sheet it is defined on? The Transhumanist 04:54, 1 October 2017 (UTC)Reply
I got rid of the blank lines for the removed items by changing one of your lines of code to this:
// nuke "li" instead of ".mw-search-result-heading"
$("li").has(".searchalttitle").remove();
I'm wondering why the double spacing (extra newline) between list items doesn't show up in the page source. In WikEd, newline characters ("\n") are invisible, but its regex feature finds/replaces them anyways. Maybe the same concept can be applied. The Transhumanist 05:40, 1 October 2017 (UTC)Reply
I tried this to get rid of each \n, and it didn't work:
var str = $(".mw-search-results").html();
var regex = /\n/gi;
$(".mw-search-results").html(str.replace(regex, ""));
But then I tried it on \s instead, and it got rid of the extra linefeeds (along with all other white space, turning the entries to mush -- separated list items of mush! This shows that the extraneous linefeeds are potentially specifically accessible.). Any ideas? The Transhumanist 08:35, 1 October 2017 (UTC)Reply
Tracing styles: A lot of browsers have Web development tools ("dev tools" or "inspectors" or similar) that can show what styles an element currently has, and where they come from (e.g. in Chrome you can right-click on an area you're interested in and select Inspect).
Regex: \s is equivalent to [\r\n\t\f\v ], so one of those should work. There are various regex-testing website you could use to test, analyse, explain, and experiment with regex patterns – I use https://regex101.com/ (just need to make sure the 'flavor' is javascript), but there are others out there.
I'm wondering why the double spacing (extra newline) between list items doesn't show up in the page source. – Since I didn't have the problem with Chrome on Win 7, and FF/IE had different problems to what you're describing, I think its basically down to either browser bugs (or "features") – possibly MediaWiki is serving up (or the JavaScript modification is making) non-standard/non-compliant code, and the browsers have to decide for themselves how to handle it (thus some insert phantom spaces, others don't). - Evad37 [talk] 13:21, 4 October 2017 (UTC)Reply

Fixing the doublesspacing problem, and sorting it too edit

User:The Transhumanist/StripSearchInWikicode.js – the recent script you helped me on, which strips WP search results down to a bare list of links, and inserts wikilink formatting for ease of insertion of those links into lists. This is useful for gathering links for outlines. It still has the interlaced CR/LFs problem. Aside from that, I'd like this script to sort its results. So, if you know how, or know someone who knows how, please let me know.

The Transhumanist, I've had a thought on how to fix the stripsearch script: What it should do is make an array containing the search result titles - which can be sorted and otherwise manipulated using standard array methods - and then remove all the search result stuff, and rebuild the links from the array in the format you want. jQuery's .map() or .get() functions should be able to make the array. - Evad37 [talk] 00:34, 27 October 2017 (UTC)Reply
Thank you for the guidance. How would you "rebuld the links from the array"? The Transhumanist 02:19, 27 October 2017 (UTC)Reply
You can make links from page titles using code like I've got in User:Evad37/extra.js's makeLink function. But in your case you need to also surround the link with *[[ and ]], and have the whole thing within a block tag like <div> or <p>. Do that for each item in the array, and then you can add them all to (or next to) an element on the page using a jQuery method like .before(), .after(), .prepend(), or .append(), each of which can take an array as the input. - Evad37 [talk] 02:40, 27 October 2017 (UTC)Reply

Adding a filter to StripSearchSorted.js edit

Originally posted to Evad37's talk page:

There's a really annoying design flaw in WP's search's intitle feature. Common words like "of" are ignored, even though they are included within a quoted phrase. So, intitle:"of Boston" is interpreted as just intitle:Boston. And the search results are filled with non-matching results. To make matters worse, the search results include matches of the phrase in the contents of pages, watering the results down even more to inlcude pages that don't even have "Boston" in the title. What I need is for results to strictly match the term provided after "intitle:".

For StripSearchSorted.js, you wrote a long sequence of chained methods (which I modified ever so slightly):

// Replace the search results by hiding the original results and use .after to insert a modified version of those results
            $('ul.mw-search-results').hide().after(
                $('<div id="Stripped"></div>').append(
                    $('ul.mw-search-results')
                    .children()
                    .map( function() {
                        return $(this).find('a').text();
                    })
                    .get()
                    .sort()
                    .map( function(title) {
                        return $('<div></div>').append(
                            '* [[',
                            $('<a>').attr({
                                'href':'https://en.wikipedia.org/wiki/'+mw.util.wikiUrlencode(title),
                                'target':'_blank'
                            }).text(title),
                            ']]'
                        );
                    })
                )   
            );

Is it possible to continue adding to this chain in order to filter the array down to elements that only include the intitle search string?

Assume we've put the search string into a variable, say var intitleString;

After the closing parenthesis (included below), the .filter chain continuation might look something like this:

).filter(function () { return this.
})

The problem is, I don't know what to put after "this." to match intitleString. I know regex generally speaking, but I don't know how to include it in a chain, or how to match a variable with it.

By the way, would this nuke the array if intitle wasn't specified in the search? Can an if control structure be put in a chain? (Like: If "intitle" is in the title, do this). The Transhumanist 12:05, 7 December 2017 (UTC)Reply

Filtering is possible, but it's easier to do the filtering before the .map(), because at that stage you have a plain array of strings (each of which is a title), rather than an array of jQuery objects (which you have to drill down into to get the title string). When filtering on a plain javascript array, don't use this (that only works with jQuery objects) – the basic syntax is
newArray = oldArray.filter(function(arrayElement) {
    // do stuff, and return true (or a truthy value) to keep the array element,
    // or return false (or a falsey value) to remove the array element
});
To check if a string contains a test string, you can do string.indexOf(testString), which returns -1 if not found, or a number of where it is found. To convert to a true/false value; you just do string.indexOf(testString) !== -1. That's for case-sensitive results, and dosen't care about word boundaries. To do more advanced matching, you have to make a regex object, and then test for a match using regex.test(string), which returns true or false accordingly.
So putting it all together, elswehere in your code you make your regex pattern var intitlePatt, then
// ... same as code block above ...
                    .get()
                    .filter(function(title) {
                        return intitlePatt.test(title);
                    })
                    .sort()
// ... same as code block above ...
To stop things blowing up, you just have to make sure everything passes the filter when there's no intitle: in the search, i.e. set intitleString = '' or intitlePatt = /./ for that case. You can't really have control structures in a chain – you would have to store the intermediate value of the chain in a variable, then put in the control structure, and resume the chain from the intermediate variable. Like
var intermidateFoo = $(foo).bar().baz();
if (condition1) {
   intermidateFoo.qux().foobar().barbaz();
} else {
   intermidateFoo.barbaz();
}
- Evad37 [talk] 02:34, 8 December 2017 (UTC)Reply
So, let me see if I got this straight...
You store the search's intitle value in a variable, and if there isn't one, the variable's value would just be null.
Then, in the chain, filter out non-matching entries. If the variable has a null value, meaning that intitle wasn't included in the search, all entries would match.
Is that correct? The Transhumanist 21:50, 10 December 2017 (UTC)Reply
Not quite... null doesn't match anything, so no entries would match. To get all entries to match, you either have to set the variable to something that does actually match any entry (intitleString = '' or intitlePatt = /./ depending on whether you use indexOf or regex matching inside the filter); or else have an explicit check inside the filter which will just return true if the variable is null. - Evad37 [talk] 03:02, 11 December 2017 (UTC)Reply
So, there is no way to match null in regex? So you can't match null or whatever the string is, using the pipe character? The Transhumanist 04:26, 11 December 2017 (UTC)Reply
If you want to check if a variable is null or undefined, just do someVar == null (gives true if someVar is null/undefined, false otherwise). You can combine this with other logical tests using || , && , ! as usual. - Evad37 [talk] 04:37, 11 December 2017 (UTC)Reply

Adding TrueMatch to StripSearchSorted edit

Originally posted to Evad37's talk page

I'm in the process of trying to fix the intitle bug in Wikipedia's search, by providing the solution as a function within StripSearchSorted.js.

The intitle bug is that when you enter a search phrase in WP's search box with a common word (like this: intitle:"in Germany"), the titles in the search results don't match.

I'm almost done, but I can't figure out how to get :contains to accept a variable:

        function TrueMatch() {
            // The purpose of this function is to filter out non-matches

            // Activation filter:
            // Run this function only if 'intitle:"' is in the page title
            // Notice the lone " after intitle:
            if (document.title.indexOf('intitle:"') != -1) {

                // Body of function
                // Create variable with page title
                var docTitle = document.title;

                // Display on screen for checking
                //alert ( docTitle );

                // Extract the intitle search string from the html page title
                // We want the part between the quotation marks
                var regexIntitle = new RegExp('intitle:"(.+?)(")(.*)','i');
                var intitle;
                intitle = docTitle.replace(regexIntitle,"$1");
                //alert ( intitle );
               
                // Filter out search results that do not match
                $( "li" ).not( 'li:contains(" + intitle + ")' ).remove();               
            }
        }

It works fine up until that last line. I want to remove all li elements that do not contain the text in the intitle variable. The Transhumanist 07:51, 19 January 2018 (UTC)Reply

Instead of passing a single string for the selector, you need to build a string up around the variable:
$( "li" ).not( 'li:contains("' + intitle + '")' ).remove();
The 'li:contains("' + intitle + '")' gets processed first, and the result is passed through to .not(). Or if you wanted to be a bit more explicit, you could do something like
var intitle_selector = 'li:contains("' + intitle + '")';
$( "li" ).not( intitle_selector ).remove();
- Evad37 [talk] 13:59, 19 January 2018 (UTC)Reply
I tried both methods you posted above. I tested it on intitle:"of Australia". The script runs, and strips out the details as it is supposed to. And it is sorting the results. But it isn't removing the non-matches. It's like it's matching everything, and therefore removing nothing. (When it matches nothing, like in my version above, it removes everything, leaving the results blank). I reactivated the alerts, and those show up fine. It's still the last line that isn't working. When you replace it with $( "li" ).remove();, it removes all results. The Transhumanist 02:16, 20 January 2018 (UTC)Reply
Testing further on the "of Australia" search...
$( "li").not('li:contains( "of" )').remove(); resulted in blank results (ie, none).
$( "li").not('li:contains( of )').remove(); resulted in no matches (ie results unaffected).
So, it looks like the first one is matching nothing, causing all li elements to be removed, while the second one is matching everything, causing no li elements to be removed. The Transhumanist 03:11, 20 January 2018 (UTC)Reply


I stared at the current page source, and discovered spans with the class "searchmatch", the contents of which appear to have been causing false matches. So, I blasted those with:
   // First, strip out the searchmatch class elements (they match).
  $( 'li').find( '.searchmatch').remove();
Then, with the above line in place, I tested your solution again, but it didn't work:
$( "li" ).not( 'li:contains("' + intitle + '")' ).remove();
The results turned up blank, which means it removed everything.
Ironically, doing the opposite works:
$( 'li:contains("' + intitle + '")').remove();
Unfortunately, this removes precisely the entries the user wants to keep. The Transhumanist 08:26, 20 January 2018 (UTC)Reply
I think we need to be more specific, and target the main link of each result - since the value of intitle will be somewhere within the li, just not neccesarily in the title. Plus we can limit the searching of lis to just the search results, rather than the whole page:
// Mark true results with a class
$('.mw-search-results').find('li').has( 'div > a:contains("' + intitle + '")' ).addClass('truematch');
// Remove other results
$('.mw-search-results').find('li').not('.truematch').remove();
Which basically means: In the mw-search-results, find the lis which have a div that itself has (as a direct child element) an a that contains the text intitle, and add the class truematch to those lis. Then, in the mw-search-results, find the lis which do not have the class truematch, and remove them. - Evad37 [talk] 09:25, 20 January 2018 (UTC)Reply
That did the trick. It works beautifully. Thank you.
This leads the way to the development of two related programs:
  1. StripSearchFilter.js – will allow the user to enter additional search terms to filter down the results, including a term to keep or a term to discard. Can use it multiple times to further refine the result.
  2. SearchSuite.js – will put selected features on their own switches so they can be turned on and off. Like the details stripping, and the inserted wikicode. It will also include the search filter feature mentioned above.
I'll keep you posted. The Transhumanist 11:28, 20 January 2018 (UTC)Reply
  1. ^