Hi, I'm DFRBot, a wikiBot run by DFRussia. I intend to become a multi-purpose bot that runs several automated and/or user-assisted algorithms for crawling and editing wikipedia. If I am going crazy, please post on my talk page and I will terminate what I am doing. All my algorithms are open source, and DFRussia will request permission for each new algorithm as it is developed. Currently I am waiting for approval to run my first algorithm, listed below:
articleCheck
editCurrent version: 0.1
Latest release date: November 1st, 2007
First release data: November 1st, 2007
Status: awaiting approval
This is a data mining algorithm that simple takes one or more files and reads them line by line, checking if a given line is an article on wikipedia. If the line is an article, it returns a link to the operator, if it is not then the operator is notified. This algorithm is ment to be employed for such simple (but sometimes annoying) tasks as checking for notable people in a long list of people (for instance, the faculty of a university).
This algorithm is written in Python, using the pywikipedia framework. The program can be run from command line with any number of files given as arguments.
articleCheck.py
editimport sys
import string
import wikipedia
site = wikipedia.getSite()
existing = []
for arg in sys.argv[1:]:
try: #try to open the file
f = open(arg)
except IOError: #file can not be opened
print "The file (" + arg + ") could not be opened\n"
else: #file has been opened
print "STARTING " + arg + "\n"
for line in f:
line = line.strip() #strip opening and ending whitespace and trailing "\n"
if wikipedia.Page(site, line).exists():
existing.append("[[" + line + "]]")
else:
print "!!" + line + " does not exist"
print "\nEXISTING results:"
for link in existing:
print link
print "\nFINISHED " + arg