Wikipedia talk:Database queries

Latest comment: 18 years ago by Ardric47 in topic Outdated

Requests for further queries

edit

Those who want to have an example of how searches are done (for example searches that they would like to make but are unable to) can add them here, those who know it can put them as an example on the subject page, or just delete them if they know the search is impossible.


Also interesting: how do I know all the table's names? It's only possible to do SELECTS, so I can't even do something like "SHOW TABLES" or "DESCRIBE (Table X)" (assuming it's MySQL we use). Jeronimo

You'll probably only need the cur table, which contains the current versions of the articles. "DESCRIBE cur" returns:
FieldTypeNullKeyDefaultExtra
cur_id int(8) unsigned   PRI   auto_increment
cur_namespace tinyint(2) unsigned   MUL 0  
cur_title varchar(255) binary   MUL    
cur_text mediumtext        
cur_comment tinyblob        
cur_user int(5) unsigned     0  
cur_user_text varchar(255) binary        
cur_timestamp varchar(14) binary   MUL    
cur_restrictions tinyblob        
cur_counter bigint(20) unsigned     0  
cur_ind_title varchar(255)   MUL    
cur_ind_text mediumtext   MUL    
cur_is_redirect tinyint(1) unsigned     0  
cur_minor_edit tinyint(1) unsigned     0  
cur_is_new tinyint(1) unsigned     0  
For details and explanations, see schema.doc. --Brion 01:48 Aug 28, 2002 (PDT)


What is the query to list all pages in the wikipedia: namespace? I tried "SELECT cur_title FROM cur WHERE cur_namespace=4", but it doesn't seeem to do the right thing. -- Tarquin 18:20 Dec 2, 2002 (UTC)

Perhaps what you want is:

SELECT CONCAT("Wikipedia:",cur_title) AS cur_title FROM cur
WHERE cur_namespace=4 AND cur_is_redirect=0 ORDER BY       
cur_title

? --Brion 18:39 Dec 2, 2002 (UTC)

Possibly. I'm trying to get a list of all Wikipedia: namespace articles on the Fr: so I can get an idea of what info is already there & work on the FAQ. I got: "Les requêtes autres que "SELECT" sont réservées aux développeurs de Wikipédia" -- could one of the devs generate a "liste de pages de l'espace Wikipedia:" page there please? -- Tarquin 19:37 Dec 2, 2002 (UTC)


Make sure you put that all on one line. --Brion 19:42 Dec 2, 2002 (UTC)
I thought I had ... but I must have zapped a space or somehting. Merci Brion. :-) -- Tarquin 19:50 Dec 2, 2002 (UTC)

Error for short pages:
A database query syntax error has occurred. This could be because of an illegal search query (see Searching Wikipedia), or it may indicate a bug in the software. The last attempted database query was:

SELECT cur_title, LENGTH(cur_text) AS len FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 ORDER BY LENGTH(cur_text) LIMIT 0, 200

from within function "wfSpecialShortpages". MySQL returned error "1030: Got error 127 from table handler". 

---mav

I would like a query to find protected pages. giskart 23:14 Jan 23, 2003 (UTC)

SELECT cur_namespace,cur_title FROM cur WHERE cur_restrictions !=
--Brion 23:19 Jan 23, 2003 (UTC)
Thanks giskart 23:37 Jan 23, 2003 (UTC)

To make a list of active Wikipedians i would like a query that gives back all registerd users that have made at least X edits in the last X days. giskart 13:06 Feb 2, 2003 (UTC)

Hmm, doing this exactly is kinda tricky, but approximately sure. :) Something like these; you have to separately query the current and old revisions, at least for now:


SELECT cur_user_text, COUNT(*) AS count
FROM cur
WHERE cur_timestamp > '20030127000000' AND cur_user != 0
GROUP BY cur_user
HAVING count > 5
ORDER BY count DESC
SELECT old_user_text, COUNT(*) AS count
FROM old
WHERE old_timestamp > '20030127000000' AND old_user != 0
GROUP BY old_user
HAVING count > 5
ORDER BY count DESC
You'll have to manually construct a timestamp with the cutoff date you want; the above is 27 January 2003 at midnight. Use a smaller count than the 'real' cutoff, as edits will be split between the two tables. --Brion 18:10 Feb 2, 2003 (UTC)
I have used this one on French Wiki (see http://fr.wikipedia.org/w/wiki.phtml?title=Wikipedia:Statistiques%20participants) :
SELECT rc_user_text, COUNT(*) AS count
FROM `recentchanges` WHERE rc_user !=0
AND rc_timestamp > '20030209000000' AND rc_timestamp < '20030218000000'
AND rc_bot = 0 GROUP BY rc_user_text ORDER BY rc_user_text

--User:Athymik


I would like to make a link to the French corresponding page [[1]], but the page is protected. :o( Yann 10:32 Feb 26, 2003 (UTC)


I would like to get a "most creative user page" for the French Wiki with users who have created the most pages. Ho do I do that ? Thanks, Yann


I would not advise doing a 'articles that have a link to themselves' query...I think this one bogs down the server the most. ilyanep 23:34 17 Jun 2003 (UTC)


Added new query: Finds all pages without bolded words, that don't have the word disambiguation, so it leaves out dsambiguation pages :) ilyanep 17:31 18 Jun 2003 (UTC)


Would playing with queries be much worse for the server than normal searches? If the number of results is limited? Just wondering how paranoid to be about queries. (Haven't tried any queries yet, and don't know anything about mySQL, just seen a bit of the uncompressed Danish database dump, to see what it looks like.)

Is there a query for finding all the pages that should have been reverted, but that everyone missed, or something like that? (Perhaps, all articles, where the last edit halved the size of the article, without adding the word "copyright violation", or something like that? Or where the last edit added strange sentences with no vowels?) Although, if there was a query for finding bad edits, suppose it would have already been used... كسيپ Cyp 20:18 20 Jun 2003 (UTC)

Please be very paranoid about queries, especially if you are not very familiar with SQL. Even with a return limit, queries can take minutes or even hours if no indices are used and the server has to look through our whole CUR or OLD tables. This will bog down the whole wiki. I am personally opposed to having an open SQL capability, but it's still there-- so please be careful. --Eloquence 20:21 20 Jun 2003 (UTC)
In that case, I suppose it's better to try to download the database some time, and bog down my whole computer, instead of the whole wiki... If I get around to it, that is... Don't know how likely it is that I'll actually download it and try, or whether I'll just think about it occasionally...
No need to find out, unless you know the answer off-hand, as I'm not planning on trying any time soon, but would the old-article database also contain the new-articles, or would both have to be downloaded to get articles with history? And how big would the whole thing be, uncompressed, anyway? كسيپ Cyp 20:32 20 Jun 2003 (UTC)
The OLD table contains all previous (i.e. non-current) revisions of articles and is only of interest if you want to study the history. New pages are stored in the CUR table. See http://download.wikipedia.org for the file size.--Eloquence 21:07 20 Jun 2003 (UTC)

moin! please place a link to the german version of Wikipedia:Database queries aswell. thx! (german: Wikipedia:Datenbank-Abfragen) since it is protected I wasn't able to do it myself.. thx! -- pit 16:57 24 Jun 2003 (UTC)


The link to sourceforge on the bottom of this page does not work. Andre Engels 17:22 26 Jun 2003 (UTC)

It's a 503 error, meaning the server's down or something like that, such that the connection couldn't be established, so it's likely to be back once they fix things up. I know myself it was working a day or two ago, I was using it myself then. But I don't think cvs.sourceforge.net is going to go away on a permanent basis anytime soon! -- John Owens 18:39 26 Jun 2003 (UTC)



I am interested by two queries. Please help me there, if I do them alone, I might do a mistake and get wikipedia down :-(

I need to have a list of all pages (names of pages) included in the meta space (on fr wiki), which have been created after a certain date.

I also need a query giving me the list of all pages (names of pages) included in the encyclopedia space, which contain a certain string of text.

Thanks

User:anthere

For the first; if you only need the last few days:

SELECT rc_timestamp,rc_namespace,rc_title from recentchanges where rc_new=1 and rc_minor=0 and rc_namespace=4

If you need to date from some arbitrary time in the past, that could be a little trickier.

For a phrase search:

SELECT cur_namespace,cur_title FROM searchindex,cur WHERE si_page=cur_id and MATCH(si_text) AGAINST('"some phrase"' IN BOOLEAN MODE)

Note that you put the phrase inside double quotes (") and _that_ inside single quotes ('). (At some point this will be built into the regular search feature, but I haven't finished fixing it up.) If you've got a string that's not working in the search engine, you can also try a LIKE search, but it will likely be slower:

SELECT cur_namespace,cur_title FROM cur WHERE cur_text LIKE '%some phrase%'

--Brion 20:07 26 Jun 2003 (UTC)

I'll try that. I hope the second one is ok when the string contain characters such as "!" for example.

For the first query, that is for a longuer time span. Say 1 or 2 months for example. I am trying to make a big clean up in our meta space as many links are quite hidden and useless. However, I would like to be able to update with the new meta pages from time to time. Now, if I have an old list (with links order changed) and a new list, it would be quite painful to try to compare both to find new articles. Hence, either I find a way to compare an old list and a new list, or I need to query only the new pages since the last update. Tips ? :-(

Oh well that's easy! Put both lists in alphabetical order. (If you have a spreadsheet program like Microsoft Excel, this should be easy: paste the list so each line is a row, all in one column, and find the 'sort rows' option. Or use a programmer's text editor. :) Then do a diff between them: you can do this in the wiki by saving first one, then the other into one page, then showing the diff between the two versions. Any new entries will stand right out as new lines in the diff.
Doing this directly in a query is somewhat complicated because of how the tables are split up. To find the creation date of a page you need to look in the 'old' table for the oldest edit for that page (if, that is, it's been edited more than once!) So we try to find the newest oldest edits.... WARNING: THIS MAY BE RATHER SLOW:
select min(old_timestamp),old_namespace,old_title as creation from old where old_namespace=4 group by old_title order by creation desc
Those that have only been edited once are easier to pick out, because the cur table specially marks edits that are new:
select cur_timestamp as creation,cur_namespace,cur_title from cur where cur_namespace=4 and cur_is_new=1 and cur_is_redirect=0 order by creation desc
--Brion 05:14 27 Jun 2003 (UTC)
Perhaps an easier way to do this might be by using the cur_id - if you sort the results by cur_id, you get the newest pages last (or first, if you sort descending). As far as I know, this works as long as the pages are created after the switch to Phase III. Please someone correct me if I'm wrong. Andre Engels 12:50 27 Jun 2003 (UTC)
Maybe yes, maybe no. :) cur_id may be out of order if, for instance, a page is undeleted. Certain types of move might do it too, though I think that should mostly just hit renames. However it will work for most cases... Try:
select cur_timestamp,cur_namespace,cur_title from cur where cur_namespace=4 and cur_is_redirect=0 order by cur_id desc
--Brion 17:53 27 Jun 2003 (UTC)

Thanks to both of you. I'll experiment (I'll try to be gentle). Here is the page for which I need this fr:Wikipédia:Pages méta. Still a work in progress :-)

edit

Since I can't edit the page maybe someone else wants to put the link http://www.wikipedia.org/upload/sqllog_mFhyRe6 on it. BTW I'd like to have this for the German Wikipedia. Maybe it puts more pressure on people not to experiment with this feature (well, I guess developers can still make queries that aren't shown in the log, but I hope they know what they're doing ;-) --Kurt Jansson 16:26, 19 Aug 2003 (UTC)

Cancelling queries

edit

If a query is taking to long to run, and you close the browser window, is the query cancelled fairly quickly, or does it run to completion anyway? If it does not cancel itself, then we must be much more careful in contructing queries (e.g. using indexes wherever possible {i.e. those created via indexes.sql}). Nanobug 15:11, 19 Sep 2003 (UTC)

want an advice on a heavy query

edit

There is this query getting the list of "articles containing a link to itself." I would like to run it for Japanese wikipedia and get the list so that we (japanese wikipedians) can fix them. But I am quite hesitant to take 24 seconds worth of database load. Does anyone has a suggestion when I should run it (time of day, day of week, etc)? Or should I not run it at all? Tomos 05:19, 21 Nov 2003 (UTC)

Now I gat a ja.WP in my Machine with dump. I could run anytime you want but the query does not responce anytiong from 29.Jan.'s dump...Suisui 06:41, 12 Feb 2004 (UTC)
edit

SELECT cur_namespace,cur_title FROM cur WHERE cur_text REGEXP '\\(.*[^\[]http:[^ \\(]*[^\] ]\\)' ORDER BY cur_title LIMIT 100

Runs quite fine on German WP. It will give some false positives but I've used it with pyWikipediaBot which is a bit better and ignores the false positives. -- TomK32 10:21, 14 Dec 2003 (CET)


Is old_id in table "old" a primary key? (use DESCRIBE `old` and look under Key column for "PRI") r3m0t 21:48, 3 Mar 2004 (UTC)


Please link to [[ja:Wikipedia:データーベースクエリ]]</nokiki>

  • Done -- User:Docu

Anyone knows when we get to use it again? -- User:Docu

Protection

edit

As special:Asksql no longer exists, does this page need to remain protected? Sysops are no more likely to know correct SQL than other users, so I see no need to limit editing to sysops. Angela. 00:56, Apr 29, 2004 (UTC)

I concurr with Angela and have unprotected the page. UninvitedCompany 20:09, 1 Jul 2004 (UTC)
Special:Asksql is back. For now. Does the page need protecting again? Angela. 01:49, Jul 24, 2004 (UTC)
Perhaps. If the queries are read-only and have a 30 second timeout, the potential for mischief would appear to me to be limited. Perhaps the developers could answer best. UninvitedCompany 03:30, 24 Jul 2004 (UTC)

Queries don't work

edit

When I do a query, it brings up a totally blank page with just the site notice on it. Ilγαηερ (Tαlκ) 03:19, 24 Sep 2004 (UTC)


Well, I guess it's gone for sure. -- User:Docu

Administrator queries

edit

Who is allowed to enable administrator queries? Right now, I'm not able to open Special:Asksql ("page does not exist").

Thanks, --Abdull 11:26, 26 Mar 2005 (UTC)

Outdated

edit

This page might be significantly outdated...or I might be doing something wrong. It looks like Wikisign can't run queries on the English Wikipedia, because (if I'm interpreting correctly) the database is no longer exported in SQL format. If this is true, then it should be mentioned in the article. The instructions on how to query from one's own computer would also need to include how to query the XML and/or how to convert it to SQL. Ardric47 05:32, 28 March 2006 (UTC)Reply