The AWW package¶

A framework for controlling web robots.

Controller¶

The program’s controller relays communication between the view and the model. The functions here give a summary of what the program can do.

aww.controller.aww_print(some_str)¶

Ideally, all output should happen through this function, or through log_write() in the model. This function redirects the output, based on which mode the program is running in.

Parameters:	some_str – Output string

aww.controller.bot_get()¶

Get the names and descriptions of all registered robots.

Returns:	dictionary – Robot names and descriptions

aww.controller.bot_run(bot_name)¶

Imports the program code for the specified robot, if its name is registered in the database, and calls its aww_run() function.

Parameters:	bot_name – The robot to run

aww.controller.bot_run_url(bot_name, url)¶

Imports the program code for the specified robot, if its name is registered in the database, and calls its aww_run() function, with the given URLs as argument

Parameters:	bot_name – The robot to run url – A single URL, or a list containing several

aww.controller.bot_run_with_task_urls(bot_name, task_name)¶

Acquires the URLs associated with the given task. Imports the program code for the specified robot, if its name is registered in the database, and calls its aww_run() function, with the URLs as an argument.

Parameters:	bot_name – The robot to run task_name – A task containing URLs

aww.controller.dataset_export(set_name, format)¶

Writes a dataset to file in the output folder, using set_name and the current day as the filename.

The Excel format (xlsx) was originally inteded to be supported here, but was left out, because it appears to require additional libraries. More info here: http://www.python-excel.org/

Parameters:	set_name – The set to export format – The file format to use (txt, sql, xml, html, xlsx)

aww.controller.dataset_peek(set_name)¶

Prints 10 entries from a dataset.

Parameters:	set_name – The name of the dataset

aww.controller.datasets_get()¶

Get the names and descriptions of all the datasets.

Returns:	dictionary – Entries with datasetset names and corresponding descriptions

aww.controller.exit_aww()¶: Stops the scheduler, and exits the program.

aww.controller.get_argument(arg, argv)¶

If the string arg is found in the list argv the trailing elements, up to next string starting with a hyphen (-), is returned.

Parameters:	arg – the argument to be extracted argv – a list of arguments
Returns:	string – the argument

aww.controller.open_gui()¶: Opens the graphical interface.

aww.controller.run_command(cmd)¶

Receives a command as a string, parses it, to retrieve a command object, then executes the function described by that command object.

Parameters:	cmd – A string containing a command

aww.controller.run_daemon()¶: Instead of opening a user interface, the scheduler is started directly. In addition output is redirected to the log file.

aww.controller.scheduler_is_running()¶

Confirm whether the task scheduler is runnning

Returns:	boolean – True if the scheduler is running

aww.controller.scheduler_start()¶: Starts the task scheduler.

aww.controller.scheduler_stop()¶: Stops the task scheduler.

aww.controller.tab_print(str1, str2, str2_offset)¶

prints two strings, starting the second one at the given offset.

Parameters:	str1 – The first string str2 – The second string str2_offset – Integer describing the offset for the second string

aww.controller.table_truncate(table_name)¶

Delete all entries from a table

Parameters:	table_name – The name of the table

aww.controller.task_add_url(task_name, url)¶

For every task created there exists a (possibly empty) list of URLs. This functions appends URLs to such lists.

Parameters:	task_name – The name of the task url – The URL that will be appended

aww.controller.task_create(task_name, command)¶

Creates a task, and adds it to the table tasks. A task is a tuple containing a task name, execution frequency, and a command to be executed.

Parameters:	task_name – The name of the task command – The command to be run on task execution

aww.controller.task_get()¶

Get information about all the tasks.

Returns:	list of lists – the lists are on the form (bot_name, task_name, frequency)

aww.controller.task_get_urls(task_name)¶

Retrieves a list of URLs stored for this task.

Parameters:	task_name – The name of the task
Returns:	list – URLs belonging to the given task, or None, if the list is empty

aww.controller.task_import_urls(task_name, file_name)¶

Reads URLs from a text file, and saves them to a task. In the text file each line should contain one URL.

Parameters:	task_name – The name of the task file_name – A path to the file that will be read

aww.controller.task_remove(task_name)¶

Delete a task from the database.

Parameters:	task_name – The name of the task

aww.controller.task_remove_url(task_name, url)¶

Removes an URL from a list of URLs belonging to a task.

Parameters:	task_name – The name of the task url – The URL to be removed

aww.controller.task_run(task_name)¶

This function retrieves the command string belonging to the given task. It then parses it with help of the commandline module, to get a command object, then executes the function described by that command object.

Parameters:	task_name – The name of the task

aww.controller.task_set_frequency(task_name, frequency)¶

If the given task exists, its frequency is set to the specified value.

Parameters:	task_name – The name of the task frequency – A sting on the form <minute hour dom month>, where * means every

aww.controller.visualize(dataset_name, viz_name, browser=False, show=True, export=None, gifcopy=False)¶

Imports the program code for the specified visualization, if its name is registered in the database, and calls its aww_run() function.

Parameters:

dataset_name – The name of the dataset that should be visualized
viz_name – The name of the visualization that should be used
browser – (optional, boolean) Open visualization in a web browser
show – (optional, boolean) Display the visualization through Easyviz
export – (optional, string) A path to export the visualizaiton to
gifcopy – (optional, boolean) Write the output to a GIF-file

Returns:

string – filename of exported graphics

aww.controller.viz_get()¶

Get the names and descriptions of all registered visualizations.

Returns:	dictionary – Visualzation names and descriptions

Model¶

Anything concerning SQL happens here, via SQLite.

(SQLite supports the data types: null, integer, real, text, blob, but in the current implementation only text and integer are used.)

Most database related functions contain exception handling, and mostly with generic exceptions. Using generic exceptions can be considered bad practice, but the goal here is a roboust program. Therefore the worst case scenario should be that functions return None, and not an exception.

aww.model.bot_exists(bot_name)¶

Determine whether information about a robot exists in the database.

Parameters:	bot_name – The name of the robot
Returns:	boolean – True if the robot was found

aww.model.bot_get()¶

Get the names and descriptions of all registered robots.

Returns:	dictionary – Robot names and descriptions

aww.model.bot_get_default_urls(bot_name)¶

All tasks have a list of URLs. This function returns the URLs of the default task for the given robot.

(The default task is the first task found where execution frequency equals ‘not set’)

Suggestion for improvement: Tasks are no longer associated to robots. This function is no useful. URLs must be given by specifying a task, or typing them in manually. This function should probably be removed.

Parameters:	bot_name – The name of the robot
Returns:	list – default URLs for a robot

aww.model.bot_register(bot_info)¶

Saves the name and a description of a robot in database.

(Datasets for the robots are created through other functions.)

Parameters:	bot_info – instance of robots.robot_tools.Robot_info

aww.model.dataset_create(bot_name, set_info)¶

Creates a table in the database named on the form bot_name_set_name, then adds information about the dataset to the table datasets

Parameters:	bot_name – The name of the robot that owns the dataset set_info – An object of robots.robot_tools.Dataset_info

aww.model.dataset_exists(set_name)¶

Determine whether a dataset exists in the database. This is different from the function table_exists(). Here we only go through the list of datasets returned by datasets.get().

Parameters:	set_name – The name of the dataset
Returns:	boolean – True if the dataset exists

aww.model.dataset_set_description(set_name, set_description)¶

Add a description of an existing dataset.

Parameters:	set_name – Name of dataset set_description – Description of dataset

aww.model.dataset_write_as_html(dataset, set_name)¶

Write a list of tuples to a html file.

Parameters:	dataset – A list of tuples from the dataset set_name – The name of the dataset, used for chosing a file name

aww.model.dataset_write_as_sql(dataset, set_name)¶

Write a table (as SQL) to file.

Suggestion for improvement: param dataset is not used, so remove it

Parameters:	dataset – A list of tuples from the dataset set_name – The name of the dataset, used for chosing a file name

aww.model.dataset_write_as_txt(dataset, set_name)¶

Write a list of tuples to a text file.

Parameters:	dataset – A list of tuples from the dataset set_name – The name of the dataset, used for chosing a file name

aww.model.dataset_write_as_xml(dataset, set_name)¶

Write a list of tuples to a text file.

TODO: This function has not been implemented

Parameters:	dataset – A list of tuples from the dataset set_name – The name of the dataset, used for chosing a file name

aww.model.datasets_get()¶

Get the names and descriptions of all the datasets.

Returns:	dictionary – Entries with dataset names and corresponding descriptions

aww.model.get_conn()¶

This returns a connection to the database. This function should be called every time the database is accessed, to avoid racing conditions in threads, and because days could potentially pass between function calls.

Returns:	A pysqlite connection object

aww.model.get_free_filename(wanted_name, file_extension)¶

Returns the argument, possibly with a (nr)-suffix, to make sure we do not overwrite an existing file.

Suggestion for improvement of code: should not need to take the parameter file_extension, but rather work with the complete filename containe in the parameter wanted_name.

Parameters:	wanted_name – a suggestion for a file name file_extension – a file extension to go with the file name
Returns:	string – the wanted file name, but possibly modified

aww.model.get_main_folder()¶

This returns a string, that is a path to the program’s main folder. There is a folder hierarchy inside it, but anything written to disk by the program ends up somewhere within this folder. The main folder should automatically be located at the bottom level of the user’s home directory.

Returns:	string – the folder path

aww.model.get_output_folder()¶

Returns a folder within the main folder, where exported datasets and visualizations are stored.

Returns:	string – the folder path

aww.model.log_write(some_str)¶

The model is not meant to print output for users to see, but can write to a log file instead.

Parameters:	some_str – The text that should be written to file

aww.model.refresh_robots()¶: This is called automatically on startup, to ensure that all robots are available. This happens by going through the variable bot_list, in the __init__ file of the robots sub-package, and adding any unknown robots.

aww.model.refresh_visualizations()¶: This is called automatically on startup, to ensure that all visualizations are available. This happens by going through the variable viz_list, in the __init__ file of the visualizations sub-package, and adding any unknown visualzations.

aww.model.setup_database()¶: This is called automatically on startup. It creates a file for the database, if missing, as well as all the tables required for basic program functionality.

aww.model.table_contains(table_name, column_name, value)¶

Check for existence of a value in a table.

Parameters:	table_name – The name of the table column_name – The name of the column value – The value to be found
Returns:	boolean – True if the value was found

aww.model.table_dump(set_name, filename)¶

Writes the contents of the table corresponding to set_name to filename.

Suggestion for improvement: Should check that the filename is free.

Parameters:	set_name – The name of the table filename – The name of the file to write to

aww.model.table_exists(table_name)¶

Checks if a table exists in the database

Parameters:	table_name – The name of the table
Returns:	boolean – True if the table was found

aww.model.table_get_as_list(table_name)¶

This function returns all the data in the set. For sets of large size the data should be extracted in another way,

Suggestion for improvement: Could we return a subset defined by a time interval?

Parameters:	table_name – The name of the table
Returns:	list – A list with all the tuples in the table

aww.model.table_get_column_names(dataset_name)¶

Retrieves the column names for the given table.

Suggestion for improvement: it could instead return a dictionary including column descriptions, but descriptions are not saved in the system

Parameters:	dataset_name – The name of the table
Returns:	list – A list with all the column names in the table

aww.model.table_insert(table_name, tuples, column_names=None)¶

Insertion of multiple tuples into database. The tuples must all contain the same number of elements. If values for all table columns are not provided, then column names must also be specified.

Parameters:	table_name – The name of the table tuples – List of entries to be inserted column_names – Optional list of column names

aww.model.table_insert_special(table_name, tuple1, column_names=None)¶

Insertion of single tuples into database. If values for all columns are not provided, then column names must also be specified.

The function name contains special because it returns the rowid of the inserted entry. (This also means that it can only take single entries, and not lists of entries)

Parameters:	table_name – The name of the table tuple – One entry to be inserted column_names – Optional list of column names

aww.model.table_is_empty(table_name)¶

Check if there are any entries present in a table.

Parameters:	table_name – The name of the table
Returns:	boolean – True if the table is empty

aww.model.table_length(dataset_name)¶

Parameters:	dataset_name – The name of the table
Returns:	int – number of rows in table

aww.model.table_peek(table_name)¶

Retrieves 10 entries from a table, to give an impression of the table’s structure and content.

Parameters:	set_name – The name of the table
Returns:	list of tuples – Rows from the given table

aww.model.table_pop(table_name)¶

Returns one tuple from table_name. Order of entries is not considered. The entry is removed from the table.

When crawling large collections of URLs, this type of functionality makes it possible to use the database as a que.

Parameters:	table_name – The name of the table
Returns:	tuple – The first entry found

aww.model.table_truncate(table_name)¶

Delete all content from a table.

Parameters:	table_name – The name of the table

aww.model.task_add_url(task_name, url)¶

For every task created there exists a (possibly empty) list of URLs. This functions appends URLs to such lists.

Parameters:	task_name – The name of the task url – The URL that will be appended

aww.model.task_create(task_name, command)¶

Creates a task, and adds it to the table tasks. A task is a tuple containing a task name, execution frequency, and a command to be executed.

Parameters:	task_name – The name of the task command – The command to be run on task execution

aww.model.task_exists(task_name)¶

Determine whether information about a task exists in the database.

Parameters:	task_name – The name of the task
Returns:	boolean – True if the task was found

aww.model.task_get()¶

Get information about all the tasks.

Returns:	list of lists – the lists are on the form (bot_name, task_name, frequency)

aww.model.task_get_urls(task_name)¶

Retrieves a list of URLs stored for this task.

Parameters:	task_name – The name of the task
Returns:	list – URLs belonging to the given task, or None, if the list is empty

aww.model.task_import_urls(task_name, file_name)¶

Reads URLs from a text file, and saves them to a task. In the text file each line should contain one URL.

Parameters:	task_name – The name of the task file_name – A path to the file that will be read
Returns:	int – number of urls added

aww.model.task_remove(task_name)¶

Delete a task from the database.

Parameters:	task_name – The name of the task

aww.model.task_remove_url(task_name, url)¶

Removes an URL from a list of URLs belonging to a task.

Parameters:	task_name – The name of the task url – The URL to be removed

aww.model.task_set_frequency(task_name, frequency)¶

If the given task exists, its frequency is set to the specified value.

Parameters:	task_name – The name of the task frequency – A sting on the form <minute hour dom month>, where * means every

aww.model.viz_exists(viz_name)¶

Determine whether information about a visualization exists in the database.

Parameters:	viz_name – The name of the visualization
Returns:	boolean – True if the visualization is found

aww.model.viz_get()¶

Get the names and descriptions of all registered visualizations.

Returns:	dictionary – Visualzation names and descriptions

aww.model.viz_register(viz_name, description)¶

Save name and description of a visualization in database.

Parameters:	viz_name – The name of the visualization description – A description of the visualization

Scheduler¶

This module enables schedulation of tasks. It utlilizes, and is utilized by the controller module. When start_scheduler() is run, a loop is entered, which periodically checks the current time against the execution frequencies of the tasks, and executes any tasks with matching execution time.

aww.scheduler.frequency_to_timestamp(freq)¶

This function returns one of Pythons datetime objects, representing the next point in time that the given frequency describes.

Parameters:	freq – A time frequency on the form: ‘minute hour dom month’ (* means every)
Returns:	datetime.datetime – a point in time described by the freq parameter

aww.scheduler.is_running()¶

Confirm whether the task scheduler is runnning

Returns:	boolean – True if the scheduler is running

aww.scheduler.it_is_time(task_time, now)¶

Compares two datetime.datetime objects, to see if they are equal.

Parameters:	task_time – datetime.datetime object to be compared now – datetime.datetime object descibing current time
Returns:	boolean – True if the objects are equal, with a precision level of minutes

aww.scheduler.next_activations()¶

Iterates the task que and reuturns all tasks that should be executed during the current minute.

Returns:	list – A list of tasks

aww.scheduler.print_queue()¶: Prints all tasks in the task queue.

aww.scheduler.refresh_queue()¶: Checks if the current minute has changed since the last time the function was called. If we are in a new minute, all tasks are retrieved from the model, and their timestamps are regenerated.

aww.scheduler.run()¶: Loops forever. Refreshes the task queue every minute.

aww.scheduler.start_scheduler()¶: Creates a new thread and uses it to start the scheduling loop.

aww.scheduler.stop_scheduler()¶

Stops the task scheduler.

It breaks the scheduling loop, by negating a boolean, so that run() returns in anotother thread.

Commandline¶

The command line provides functionality by parsing commands, and their arguments, and then calling appropriate functions in controller.py

When parsing a command returns a Command object, a function referred to by it is called. Ideally that will be a function in the controller module, but often it is a private function in this module, where arguments to the command are sorted out, before calling the apropriate function in the controller.

class aww.commandline.Command(c_name, description)¶

class Command¶

Information about a single command

aww.commandline.complete(text, state)¶: A pointer to this function is passed to the readline module to enable tab completion.

aww.commandline.get_commands()¶: Builds the list of commands. Each Command object contains its own description, arguments, and witch function to call

aww.commandline.open_commandline()¶: Loops and parses commands. When the loop ends control is returned, which should cause the program to exit.

aww.commandline.parse_input(user_input)¶

Goes throgh the list of commands possible commands and compare them with the parameter user_input. If a matching command is found it is returned.

Parameters:	user_input – The text to be used for comparison
Returns:	Command – A command matching the input

GUI¶

The GUI module has little functionality, only graphical components that relay input to functions in the controller module.

class aww.gui.AWW_GUI(master=None)¶

Instantiating this class opens the Graphical Interface. This is normally done by the funtion open_gui().

refresh_menus()¶: Updates scheduler info, and drop down menus.

aww.gui.open_gui()¶: Creates an instance of the class AWW_GUI, and redirects I/O to it.

Hook¶

The hook is meant to be imported by extensions that require access to functionality in the model. Usually this means robots or visualizations.

Often parameters are given as extension_name and set_name, which are later concatenated into a table name before a function in the model is called.

aww.hook.bot_register(bot_info)¶

Register information about a robot in the database.

Parameters:	bot_info – instance of robots.robot_tools.Bot_info

aww.hook.dataset_contains(bot_name, set_name, column_name, value)¶

Check for existence of at least on instance of value in the column column_name in a dataset.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset column_name – The column to be examined value – The value to look for
Returns:	boolean – True if the value was found at least once

aww.hook.dataset_get(bot_name, set_name)¶

Retrieve the entire dataset as a list of tuples.

This function may not be suitable for large datasets.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset
Returns:	list – a list of tuples containing dataset entries

aww.hook.dataset_insert(bot_name, set_name, tuples, column_names=None)¶

Insert one or more tuples. If column_names is None then the content of the content of the tuples must be ordered the same way as when the dataset was created.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset the tuples will be inserted into tuples – List of tuples to be inserted column_names – Optional list of column names

aww.hook.dataset_insert_special(bot_name, set_name, tuple1, column_names=None)¶

This function takes only one tuple for insertion. The function is special in that it returns an integer, representing the resulting rowid of the insertion.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset the tuple will be inserted into tuple – One entry to be inserted column_names – Optional list of column names
Returns:	int – rowid of the resulting table entry

aww.hook.dataset_is_empty(bot_name, set_name)¶

Check whether there are any entries in a dataset.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset
Returns:	boolean – True if there is nothing in the dataset.

aww.hook.dataset_pop(bot_name, set_name)¶

Gets an entry from the dataset. The entry is deleted from the dataset at the same time.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset
Returns:	tuple – an entry from the dataset, or None

aww.hook.get_default_urls(bot_name)¶

All tasks have a (possibly empty) list of URLs. This function returns the URLs of the default task for the given robot.

(The default task is the first task found where execution frequency equals ‘not set’)

Suggestion for improvement: Tasks are no longer associated to robots. This function is not useful. URLs must be given by specifying a task, or typing them in manually.

Parameters:	bot_name – The name of the robot
Returns:	list – default URLs for a robot

aww.hook.purge(bot_name, set_name)¶

Delete all entries from a dataset.

Parameters:	bot_name – The name of the robot that created the dataset set_name – The name of the dataset

aww.hook.viz_register(viz_name, description)¶

Save name and description of a visualization in database.

Parameters:	viz_name – The name of the visualization description – A description of the visualization

The robots sub package¶

The robots sub-package contains modules representing web robots. In addition it contains a module named robot_tools, with functionality meant to be used by web robots. That way, functionality does not have to be re-created every time a new robot is made.

class aww.robots.robot_tools.CustomRedirectHandler¶: By default this handler is added to HTTP-open requests. It throws exceptions instead of following redirects. The exceptions can be caught in order to handle redirects in a different place.

class aww.robots.robot_tools.Dataset_info(name, description)¶: Holds information about a dataset. Fields is a list of lists on the form [[name, type, description],...]

exception aww.robots.robot_tools.HttpError(message, http_code=0, local_code=0, url='')¶: Custom exception for passing on HTTP-related information

class aww.robots.robot_tools.Robot_info(name, description)¶: Holds information about a robot. An object of this class can be passed to the model when a robot is registered in the database.

class aww.robots.robot_tools.TrafficHandler(agent_string)¶

This handler should be added to all openers for requests made with AWW’s robots. It registeres the traffic in global variables, and waits if the traffic load is high.

If conditions are not right, it waits a while, if that don’t help, it throws an exception.

If there is only one thread, waiting is a good option. If we get multithreading, this functionality might give racing conditions.

aww.robots.robot_tools.debug_print(dlevel, s)¶

Writes the given string to debug_info.txt if the dlevel parameter is smaller or equal to the value of the global variable debug_level.

Parameters:	dlevel – The debug level controlling output of the given string s – The string to be output

aww.robots.robot_tools.get_local_network_load()¶

Returns bytes downloaded during the current minute. Indirectly causes counters to reset if current_minute has changed.

Returns:	int – No. of bytes downloaded during current minute

aww.robots.robot_tools.get_local_request_count()¶

Returns outgoing HTTP requests during the current minute. Indirectly causes counters to reset if current_minute has changed.

Returns:	int – No. of requests during current minute

aww.robots.robot_tools.get_remote_network_load(url)¶

Get the number of HTTP requests that has been made for a specific hostname during the current minute.

Parameters:	url – An URL used for lookup
Returns:	int – No. of HTTP requests to the given hostname during the current minute

aww.robots.robot_tools.get_standard_fields()¶: Returns a list of lists with information about columns that datasets usually will include: timestamp, url, http_status

aww.robots.robot_tools.get_timestamp()¶

Get current date and time as a string.

Returns:	string – Current time, on the format <YYYY_MM_DD HH:MM:SS>

aww.robots.robot_tools.get_timestamp_excel()¶

In Excel time is sometimes represented as number of days since the beginning of year1900, and a fraction describing the time of day. This function returns the current time on that format.

Returns:	float – Current date and time, based on Excel’s format

aww.robots.robot_tools.impolite_open(url)¶

Considers traffic load, but not robots.txt Useful, for example for downloading robots.txt files, and some other resources located at the web root (which is sometimes disallowed in robots.txt).

Aside from not consulting the robots.txt file, this functions makes all the same considerations as polite_open() before making the HTTP request.

Parameters:	url – The HTTP address to request
Returns:	file object – The file that was retrieved

aww.robots.robot_tools.increase_remote_request_count(url)¶

This function is used for logging the number of HTTP requests to individual hostnames during the current minute.

Parameters:	url – The URL to update

aww.robots.robot_tools.make_well_formed(url)¶

Ensures that the given URL starts with http://. If this function is called with a relative path, the result will not make sense.

Parameters:	url – An URL to be checked, and possibly modified
Returns:	string – A valid representation of the given URL

aww.robots.robot_tools.polite_get_header(url)¶

A version of polite_open() that uses less resources by only retrieving the head of the web page returned for url argrument, if any.

Parameters:	url – The address to use for the HTTP request
Returns:	file object – The header that was retrieved

aww.robots.robot_tools.polite_open(url)¶

Considers robots.txt. Uses an opener that considers traffic load, as well as includes the user agent string. Sleeps for high traffic locally and remotely. Might create racing conditions when multi-threading.

This function will raise HttpErrors (defined in this module) and possibly other exceptions. These should be handled by all robots that use the function.

Parameters:	url – The HTTP address to request
Returns:	file object – The file that was retrieved

aww.robots.robot_tools.print_traffic_info()¶: Prints the values of some traffic related global values to stdout.

aww.robots.robot_tools.reset_local_network_counters()¶: We measure our local load in bytes downloaded and http-requests made. Both are reset once a minute (if get functions are called)

aww.robots.robot_tools.robotstxt_allow(url)¶

Consults the robots.txt file for the given URL, and confirms whether a request can be sent for to that address.

Parameters:	url – The web address to check
Returns:	boolean – True if the URL can be downloaded

aww.robots.robot_tools.same_host(url1, url2)¶

Returns True if the urls have the same host, otherwise False.

Parameters:	url1 – An URL for comparison url2 – An URL for comparison
Returns:	boolean – True if the hostname is the same for both URLs

aww.robots.robot_tools.url_to_site(url)¶

Attempts to extract the hostname from the specified URL.

Parameters:	url – An URL to analyze
Returns:	string – The hostname of the given URL

The visualizations sub package¶

The visualization sub-package is meant to contain visualizations for collected datasets. In addition, it contains a module named visualization_tools, with functionality that can be useful when creating visualizations.

aww.visualizations.visualization_tools.png2gif(png_filename, gif_filename)¶

Use functionality from the easyviz pacakge to covnert a file.

If png_filename is specified as several files (using * notation), an animated gif should be made.

Parameters:	png_filename – The file to read from gif_filename – The file to write to

The AWW package¶

Controller¶

Model¶

Scheduler¶

Commandline¶

GUI¶

Hook¶

The robots sub package¶

The visualizations sub package¶

Table Of Contents

Previous topic

This Page

Navigation

The AWW package¶

Controller¶

Model¶

Scheduler¶

Commandline¶

GUI¶

Hook¶

The robots sub package¶

The visualizations sub package¶

Table Of Contents

Previous topic

This Page

Quick search

Navigation