The AWW package

A framework for controlling web robots.

Controller

The program’s controller relays communication between the view and the model. The functions here give a summary of what the program can do.

aww.controller.aww_print(some_str)

Ideally, all output should happen through this function, or through log_write() in the model. This function redirects the output, based on which mode the program is running in.

Parameters:some_str – Output string
aww.controller.bot_get()

Get the names and descriptions of all registered robots.

Returns:dictionary – Robot names and descriptions
aww.controller.bot_run(bot_name)

Imports the program code for the specified robot, if its name is registered in the database, and calls its aww_run() function.

Parameters:bot_name – The robot to run
aww.controller.bot_run_url(bot_name, url)

Imports the program code for the specified robot, if its name is registered in the database, and calls its aww_run() function, with the given URLs as argument

Parameters:
  • bot_name – The robot to run
  • url – A single URL, or a list containing several
aww.controller.bot_run_with_task_urls(bot_name, task_name)

Acquires the URLs associated with the given task. Imports the program code for the specified robot, if its name is registered in the database, and calls its aww_run() function, with the URLs as an argument.

Parameters:
  • bot_name – The robot to run
  • task_name – A task containing URLs
aww.controller.dataset_export(set_name, format)

Writes a dataset to file in the output folder, using set_name and the current day as the filename.

The Excel format (xlsx) was originally inteded to be supported here, but was left out, because it appears to require additional libraries. More info here: http://www.python-excel.org/

Parameters:
  • set_name – The set to export
  • format – The file format to use (txt, sql, xml, html, xlsx)
aww.controller.dataset_peek(set_name)

Prints 10 entries from a dataset.

Parameters:set_name – The name of the dataset
aww.controller.datasets_get()

Get the names and descriptions of all the datasets.

Returns:dictionary – Entries with datasetset names and corresponding descriptions
aww.controller.exit_aww()

Stops the scheduler, and exits the program.

aww.controller.get_argument(arg, argv)

If the string arg is found in the list argv the trailing elements, up to next string starting with a hyphen (-), is returned.

Parameters:
  • arg – the argument to be extracted
  • argv – a list of arguments
Returns:

string – the argument

aww.controller.open_gui()

Opens the graphical interface.

aww.controller.run_command(cmd)

Receives a command as a string, parses it, to retrieve a command object, then executes the function described by that command object.

Parameters:cmd – A string containing a command
aww.controller.run_daemon()

Instead of opening a user interface, the scheduler is started directly. In addition output is redirected to the log file.

aww.controller.scheduler_is_running()

Confirm whether the task scheduler is runnning

Returns:boolean – True if the scheduler is running
aww.controller.scheduler_start()

Starts the task scheduler.

aww.controller.scheduler_stop()

Stops the task scheduler.

aww.controller.tab_print(str1, str2, str2_offset)

prints two strings, starting the second one at the given offset.

Parameters:
  • str1 – The first string
  • str2 – The second string
  • str2_offset – Integer describing the offset for the second string
aww.controller.table_truncate(table_name)

Delete all entries from a table

Parameters:table_name – The name of the table
aww.controller.task_add_url(task_name, url)

For every task created there exists a (possibly empty) list of URLs. This functions appends URLs to such lists.

Parameters:
  • task_name – The name of the task
  • url – The URL that will be appended
aww.controller.task_create(task_name, command)

Creates a task, and adds it to the table tasks. A task is a tuple containing a task name, execution frequency, and a command to be executed.

Parameters:
  • task_name – The name of the task
  • command – The command to be run on task execution
aww.controller.task_get()

Get information about all the tasks.

Returns:list of lists – the lists are on the form (bot_name, task_name, frequency)
aww.controller.task_get_urls(task_name)

Retrieves a list of URLs stored for this task.

Parameters:task_name – The name of the task
Returns:list – URLs belonging to the given task, or None, if the list is empty
aww.controller.task_import_urls(task_name, file_name)

Reads URLs from a text file, and saves them to a task. In the text file each line should contain one URL.

Parameters:
  • task_name – The name of the task
  • file_name – A path to the file that will be read
aww.controller.task_remove(task_name)

Delete a task from the database.

Parameters:task_name – The name of the task
aww.controller.task_remove_url(task_name, url)

Removes an URL from a list of URLs belonging to a task.

Parameters:
  • task_name – The name of the task
  • url – The URL to be removed
aww.controller.task_run(task_name)

This function retrieves the command string belonging to the given task. It then parses it with help of the commandline module, to get a command object, then executes the function described by that command object.

Parameters:task_name – The name of the task
aww.controller.task_set_frequency(task_name, frequency)

If the given task exists, its frequency is set to the specified value.

Parameters:
  • task_name – The name of the task
  • frequency – A sting on the form <minute hour dom month>, where * means every
aww.controller.visualize(dataset_name, viz_name, browser=False, show=True, export=None, gifcopy=False)

Imports the program code for the specified visualization, if its name is registered in the database, and calls its aww_run() function.

Parameters:
  • dataset_name – The name of the dataset that should be visualized
  • viz_name – The name of the visualization that should be used
  • browser – (optional, boolean) Open visualization in a web browser
  • show – (optional, boolean) Display the visualization through Easyviz
  • export – (optional, string) A path to export the visualizaiton to
  • gifcopy – (optional, boolean) Write the output to a GIF-file
Returns:

string – filename of exported graphics

aww.controller.viz_get()

Get the names and descriptions of all registered visualizations.

Returns:dictionary – Visualzation names and descriptions

Model

Anything concerning SQL happens here, via SQLite.

(SQLite supports the data types: null, integer, real, text, blob, but in the current implementation only text and integer are used.)

Most database related functions contain exception handling, and mostly with generic exceptions. Using generic exceptions can be considered bad practice, but the goal here is a roboust program. Therefore the worst case scenario should be that functions return None, and not an exception.

aww.model.bot_exists(bot_name)

Determine whether information about a robot exists in the database.

Parameters:bot_name – The name of the robot
Returns:boolean – True if the robot was found
aww.model.bot_get()

Get the names and descriptions of all registered robots.

Returns:dictionary – Robot names and descriptions
aww.model.bot_get_default_urls(bot_name)

All tasks have a list of URLs. This function returns the URLs of the default task for the given robot.

(The default task is the first task found where execution frequency equals ‘not set’)

Suggestion for improvement: Tasks are no longer associated to robots. This function is no useful. URLs must be given by specifying a task, or typing them in manually. This function should probably be removed.

Parameters:bot_name – The name of the robot
Returns:list – default URLs for a robot
aww.model.bot_register(bot_info)

Saves the name and a description of a robot in database.

(Datasets for the robots are created through other functions.)

Parameters:bot_info – instance of robots.robot_tools.Robot_info
aww.model.dataset_create(bot_name, set_info)

Creates a table in the database named on the form bot_name_set_name, then adds information about the dataset to the table datasets

Parameters:
  • bot_name – The name of the robot that owns the dataset
  • set_info – An object of robots.robot_tools.Dataset_info
aww.model.dataset_exists(set_name)

Determine whether a dataset exists in the database. This is different from the function table_exists(). Here we only go through the list of datasets returned by datasets.get().

Parameters:set_name – The name of the dataset
Returns:boolean – True if the dataset exists
aww.model.dataset_set_description(set_name, set_description)

Add a description of an existing dataset.

Parameters:
  • set_name – Name of dataset
  • set_description – Description of dataset
aww.model.dataset_write_as_html(dataset, set_name)

Write a list of tuples to a html file.

Parameters:
  • dataset – A list of tuples from the dataset
  • set_name – The name of the dataset, used for chosing a file name
aww.model.dataset_write_as_sql(dataset, set_name)

Write a table (as SQL) to file.

Suggestion for improvement: param dataset is not used, so remove it

Parameters:
  • dataset – A list of tuples from the dataset
  • set_name – The name of the dataset, used for chosing a file name
aww.model.dataset_write_as_txt(dataset, set_name)

Write a list of tuples to a text file.

Parameters:
  • dataset – A list of tuples from the dataset
  • set_name – The name of the dataset, used for chosing a file name
aww.model.dataset_write_as_xml(dataset, set_name)

Write a list of tuples to a text file.

TODO: This function has not been implemented

Parameters:
  • dataset – A list of tuples from the dataset
  • set_name – The name of the dataset, used for chosing a file name
aww.model.datasets_get()

Get the names and descriptions of all the datasets.

Returns:dictionary – Entries with dataset names and corresponding descriptions
aww.model.get_conn()

This returns a connection to the database. This function should be called every time the database is accessed, to avoid racing conditions in threads, and because days could potentially pass between function calls.

Returns:A pysqlite connection object
aww.model.get_free_filename(wanted_name, file_extension)

Returns the argument, possibly with a (nr)-suffix, to make sure we do not overwrite an existing file.

Suggestion for improvement of code: should not need to take the parameter file_extension, but rather work with the complete filename containe in the parameter wanted_name.

Parameters:
  • wanted_name – a suggestion for a file name
  • file_extension – a file extension to go with the file name
Returns:

string – the wanted file name, but possibly modified

aww.model.get_main_folder()

This returns a string, that is a path to the program’s main folder. There is a folder hierarchy inside it, but anything written to disk by the program ends up somewhere within this folder. The main folder should automatically be located at the bottom level of the user’s home directory.

Returns:string – the folder path
aww.model.get_output_folder()

Returns a folder within the main folder, where exported datasets and visualizations are stored.

Returns:string – the folder path
aww.model.log_write(some_str)

The model is not meant to print output for users to see, but can write to a log file instead.

Parameters:some_str – The text that should be written to file
aww.model.refresh_robots()

This is called automatically on startup, to ensure that all robots are available. This happens by going through the variable bot_list, in the __init__ file of the robots sub-package, and adding any unknown robots.

aww.model.refresh_visualizations()

This is called automatically on startup, to ensure that all visualizations are available. This happens by going through the variable viz_list, in the __init__ file of the visualizations sub-package, and adding any unknown visualzations.

aww.model.setup_database()

This is called automatically on startup. It creates a file for the database, if missing, as well as all the tables required for basic program functionality.

aww.model.table_contains(table_name, column_name, value)

Check for existence of a value in a table.

Parameters:
  • table_name – The name of the table
  • column_name – The name of the column
  • value – The value to be found
Returns:

boolean – True if the value was found

aww.model.table_dump(set_name, filename)

Writes the contents of the table corresponding to set_name to filename.

Suggestion for improvement: Should check that the filename is free.

Parameters:
  • set_name – The name of the table
  • filename – The name of the file to write to
aww.model.table_exists(table_name)

Checks if a table exists in the database

Parameters:table_name – The name of the table
Returns:boolean – True if the table was found
aww.model.table_get_as_list(table_name)

This function returns all the data in the set. For sets of large size the data should be extracted in another way,

Suggestion for improvement: Could we return a subset defined by a time interval?

Parameters:table_name – The name of the table
Returns:list – A list with all the tuples in the table
aww.model.table_get_column_names(dataset_name)

Retrieves the column names for the given table.

Suggestion for improvement: it could instead return a dictionary including column descriptions, but descriptions are not saved in the system

Parameters:dataset_name – The name of the table
Returns:list – A list with all the column names in the table
aww.model.table_insert(table_name, tuples, column_names=None)

Insertion of multiple tuples into database. The tuples must all contain the same number of elements. If values for all table columns are not provided, then column names must also be specified.

Parameters:
  • table_name – The name of the table
  • tuples – List of entries to be inserted
  • column_names – Optional list of column names
aww.model.table_insert_special(table_name, tuple1, column_names=None)

Insertion of single tuples into database. If values for all columns are not provided, then column names must also be specified.

The function name contains special because it returns the rowid of the inserted entry. (This also means that it can only take single entries, and not lists of entries)

Parameters:
  • table_name – The name of the table
  • tuple – One entry to be inserted
  • column_names – Optional list of column names
aww.model.table_is_empty(table_name)

Check if there are any entries present in a table.

Parameters:table_name – The name of the table
Returns:boolean – True if the table is empty
aww.model.table_length(dataset_name)
Parameters:dataset_name – The name of the table
Returns:int – number of rows in table
aww.model.table_peek(table_name)

Retrieves 10 entries from a table, to give an impression of the table’s structure and content.

Parameters:set_name – The name of the table
Returns:list of tuples – Rows from the given table
aww.model.table_pop(table_name)

Returns one tuple from table_name. Order of entries is not considered. The entry is removed from the table.

When crawling large collections of URLs, this type of functionality makes it possible to use the database as a que.

Parameters:table_name – The name of the table
Returns:tuple – The first entry found
aww.model.table_truncate(table_name)

Delete all content from a table.

Parameters:table_name – The name of the table
aww.model.task_add_url(task_name, url)

For every task created there exists a (possibly empty) list of URLs. This functions appends URLs to such lists.

Parameters:
  • task_name – The name of the task
  • url – The URL that will be appended
aww.model.task_create(task_name, command)

Creates a task, and adds it to the table tasks. A task is a tuple containing a task name, execution frequency, and a command to be executed.

Parameters:
  • task_name – The name of the task
  • command – The command to be run on task execution
aww.model.task_exists(task_name)

Determine whether information about a task exists in the database.

Parameters:task_name – The name of the task
Returns:boolean – True if the task was found
aww.model.task_get()

Get information about all the tasks.

Returns:list of lists – the lists are on the form (bot_name, task_name, frequency)
aww.model.task_get_urls(task_name)

Retrieves a list of URLs stored for this task.

Parameters:task_name – The name of the task
Returns:list – URLs belonging to the given task, or None, if the list is empty
aww.model.task_import_urls(task_name, file_name)

Reads URLs from a text file, and saves them to a task. In the text file each line should contain one URL.

Parameters:
  • task_name – The name of the task
  • file_name – A path to the file that will be read
Returns:

int – number of urls added

aww.model.task_remove(task_name)

Delete a task from the database.

Parameters:task_name – The name of the task
aww.model.task_remove_url(task_name, url)

Removes an URL from a list of URLs belonging to a task.

Parameters:
  • task_name – The name of the task
  • url – The URL to be removed
aww.model.task_set_frequency(task_name, frequency)

If the given task exists, its frequency is set to the specified value.

Parameters:
  • task_name – The name of the task
  • frequency – A sting on the form <minute hour dom month>, where * means every
aww.model.viz_exists(viz_name)

Determine whether information about a visualization exists in the database.

Parameters:viz_name – The name of the visualization
Returns:boolean – True if the visualization is found
aww.model.viz_get()

Get the names and descriptions of all registered visualizations.

Returns:dictionary – Visualzation names and descriptions
aww.model.viz_register(viz_name, description)

Save name and description of a visualization in database.

Parameters:
  • viz_name – The name of the visualization
  • description – A description of the visualization

Scheduler

This module enables schedulation of tasks. It utlilizes, and is utilized by the controller module. When start_scheduler() is run, a loop is entered, which periodically checks the current time against the execution frequencies of the tasks, and executes any tasks with matching execution time.

aww.scheduler.frequency_to_timestamp(freq)

This function returns one of Pythons datetime objects, representing the next point in time that the given frequency describes.

Parameters:freq – A time frequency on the form: ‘minute hour dom month’ (* means every)
Returns:datetime.datetime – a point in time described by the freq parameter
aww.scheduler.is_running()

Confirm whether the task scheduler is runnning

Returns:boolean – True if the scheduler is running
aww.scheduler.it_is_time(task_time, now)

Compares two datetime.datetime objects, to see if they are equal.

Parameters:
  • task_time – datetime.datetime object to be compared
  • now – datetime.datetime object descibing current time
Returns:

boolean – True if the objects are equal, with a precision level of minutes

aww.scheduler.next_activations()

Iterates the task que and reuturns all tasks that should be executed during the current minute.

Returns:list – A list of tasks
aww.scheduler.print_queue()

Prints all tasks in the task queue.

aww.scheduler.refresh_queue()

Checks if the current minute has changed since the last time the function was called. If we are in a new minute, all tasks are retrieved from the model, and their timestamps are regenerated.

aww.scheduler.run()

Loops forever. Refreshes the task queue every minute.

aww.scheduler.start_scheduler()

Creates a new thread and uses it to start the scheduling loop.

aww.scheduler.stop_scheduler()

Stops the task scheduler.

It breaks the scheduling loop, by negating a boolean, so that run() returns in anotother thread.

Commandline

The command line provides functionality by parsing commands, and their arguments, and then calling appropriate functions in controller.py

When parsing a command returns a Command object, a function referred to by it is called. Ideally that will be a function in the controller module, but often it is a private function in this module, where arguments to the command are sorted out, before calling the apropriate function in the controller.

class aww.commandline.Command(c_name, description)
class Command

Information about a single command

aww.commandline.complete(text, state)

A pointer to this function is passed to the readline module to enable tab completion.

aww.commandline.get_commands()

Builds the list of commands. Each Command object contains its own description, arguments, and witch function to call

aww.commandline.open_commandline()

Loops and parses commands. When the loop ends control is returned, which should cause the program to exit.

aww.commandline.parse_input(user_input)

Goes throgh the list of commands possible commands and compare them with the parameter user_input. If a matching command is found it is returned.

Parameters:user_input – The text to be used for comparison
Returns:Command – A command matching the input

GUI

The GUI module has little functionality, only graphical components that relay input to functions in the controller module.

class aww.gui.AWW_GUI(master=None)

Instantiating this class opens the Graphical Interface. This is normally done by the funtion open_gui().

refresh_menus()

Updates scheduler info, and drop down menus.

aww.gui.open_gui()

Creates an instance of the class AWW_GUI, and redirects I/O to it.

Hook

The hook is meant to be imported by extensions that require access to functionality in the model. Usually this means robots or visualizations.

Often parameters are given as extension_name and set_name, which are later concatenated into a table name before a function in the model is called.

aww.hook.bot_register(bot_info)

Register information about a robot in the database.

Parameters:bot_info – instance of robots.robot_tools.Bot_info
aww.hook.dataset_contains(bot_name, set_name, column_name, value)

Check for existence of at least on instance of value in the column column_name in a dataset.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset
  • column_name – The column to be examined
  • value – The value to look for
Returns:

boolean – True if the value was found at least once

aww.hook.dataset_get(bot_name, set_name)

Retrieve the entire dataset as a list of tuples.

This function may not be suitable for large datasets.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset
Returns:

list – a list of tuples containing dataset entries

aww.hook.dataset_insert(bot_name, set_name, tuples, column_names=None)

Insert one or more tuples. If column_names is None then the content of the content of the tuples must be ordered the same way as when the dataset was created.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset the tuples will be inserted into
  • tuples – List of tuples to be inserted
  • column_names – Optional list of column names
aww.hook.dataset_insert_special(bot_name, set_name, tuple1, column_names=None)

This function takes only one tuple for insertion. The function is special in that it returns an integer, representing the resulting rowid of the insertion.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset the tuple will be inserted into
  • tuple – One entry to be inserted
  • column_names – Optional list of column names
Returns:

int – rowid of the resulting table entry

aww.hook.dataset_is_empty(bot_name, set_name)

Check whether there are any entries in a dataset.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset
Returns:

boolean – True if there is nothing in the dataset.

aww.hook.dataset_pop(bot_name, set_name)

Gets an entry from the dataset. The entry is deleted from the dataset at the same time.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset
Returns:

tuple – an entry from the dataset, or None

aww.hook.get_default_urls(bot_name)

All tasks have a (possibly empty) list of URLs. This function returns the URLs of the default task for the given robot.

(The default task is the first task found where execution frequency equals ‘not set’)

Suggestion for improvement: Tasks are no longer associated to robots. This function is not useful. URLs must be given by specifying a task, or typing them in manually.

Parameters:bot_name – The name of the robot
Returns:list – default URLs for a robot
aww.hook.purge(bot_name, set_name)

Delete all entries from a dataset.

Parameters:
  • bot_name – The name of the robot that created the dataset
  • set_name – The name of the dataset
aww.hook.viz_register(viz_name, description)

Save name and description of a visualization in database.

Parameters:
  • viz_name – The name of the visualization
  • description – A description of the visualization

The robots sub package

The robots sub-package contains modules representing web robots. In addition it contains a module named robot_tools, with functionality meant to be used by web robots. That way, functionality does not have to be re-created every time a new robot is made.

class aww.robots.robot_tools.CustomRedirectHandler

By default this handler is added to HTTP-open requests. It throws exceptions instead of following redirects. The exceptions can be caught in order to handle redirects in a different place.

class aww.robots.robot_tools.Dataset_info(name, description)

Holds information about a dataset. Fields is a list of lists on the form [[name, type, description],...]

exception aww.robots.robot_tools.HttpError(message, http_code=0, local_code=0, url='')

Custom exception for passing on HTTP-related information

class aww.robots.robot_tools.Robot_info(name, description)

Holds information about a robot. An object of this class can be passed to the model when a robot is registered in the database.

class aww.robots.robot_tools.TrafficHandler(agent_string)

This handler should be added to all openers for requests made with AWW’s robots. It registeres the traffic in global variables, and waits if the traffic load is high.

If conditions are not right, it waits a while, if that don’t help, it throws an exception.

If there is only one thread, waiting is a good option. If we get multithreading, this functionality might give racing conditions.

aww.robots.robot_tools.debug_print(dlevel, s)

Writes the given string to debug_info.txt if the dlevel parameter is smaller or equal to the value of the global variable debug_level.

Parameters:
  • dlevel – The debug level controlling output of the given string
  • s – The string to be output
aww.robots.robot_tools.get_local_network_load()

Returns bytes downloaded during the current minute. Indirectly causes counters to reset if current_minute has changed.

Returns:int – No. of bytes downloaded during current minute
aww.robots.robot_tools.get_local_request_count()

Returns outgoing HTTP requests during the current minute. Indirectly causes counters to reset if current_minute has changed.

Returns:int – No. of requests during current minute
aww.robots.robot_tools.get_remote_network_load(url)

Get the number of HTTP requests that has been made for a specific hostname during the current minute.

Parameters:url – An URL used for lookup
Returns:int – No. of HTTP requests to the given hostname during the current minute
aww.robots.robot_tools.get_standard_fields()

Returns a list of lists with information about columns that datasets usually will include: timestamp, url, http_status

aww.robots.robot_tools.get_timestamp()

Get current date and time as a string.

Returns:string – Current time, on the format <YYYY_MM_DD HH:MM:SS>
aww.robots.robot_tools.get_timestamp_excel()

In Excel time is sometimes represented as number of days since the beginning of year1900, and a fraction describing the time of day. This function returns the current time on that format.

Returns:float – Current date and time, based on Excel’s format
aww.robots.robot_tools.impolite_open(url)

Considers traffic load, but not robots.txt Useful, for example for downloading robots.txt files, and some other resources located at the web root (which is sometimes disallowed in robots.txt).

Aside from not consulting the robots.txt file, this functions makes all the same considerations as polite_open() before making the HTTP request.

Parameters:url – The HTTP address to request
Returns:file object – The file that was retrieved
aww.robots.robot_tools.increase_remote_request_count(url)

This function is used for logging the number of HTTP requests to individual hostnames during the current minute.

Parameters:url – The URL to update
aww.robots.robot_tools.make_well_formed(url)

Ensures that the given URL starts with http://. If this function is called with a relative path, the result will not make sense.

Parameters:url – An URL to be checked, and possibly modified
Returns:string – A valid representation of the given URL
aww.robots.robot_tools.polite_get_header(url)

A version of polite_open() that uses less resources by only retrieving the head of the web page returned for url argrument, if any.

Parameters:url – The address to use for the HTTP request
Returns:file object – The header that was retrieved
aww.robots.robot_tools.polite_open(url)

Considers robots.txt. Uses an opener that considers traffic load, as well as includes the user agent string. Sleeps for high traffic locally and remotely. Might create racing conditions when multi-threading.

This function will raise HttpErrors (defined in this module) and possibly other exceptions. These should be handled by all robots that use the function.

Parameters:url – The HTTP address to request
Returns:file object – The file that was retrieved
aww.robots.robot_tools.print_traffic_info()

Prints the values of some traffic related global values to stdout.

aww.robots.robot_tools.reset_local_network_counters()

We measure our local load in bytes downloaded and http-requests made. Both are reset once a minute (if get functions are called)

aww.robots.robot_tools.robotstxt_allow(url)

Consults the robots.txt file for the given URL, and confirms whether a request can be sent for to that address.

Parameters:url – The web address to check
Returns:boolean – True if the URL can be downloaded
aww.robots.robot_tools.same_host(url1, url2)

Returns True if the urls have the same host, otherwise False.

Parameters:
  • url1 – An URL for comparison
  • url2 – An URL for comparison
Returns:

boolean – True if the hostname is the same for both URLs

aww.robots.robot_tools.url_to_site(url)

Attempts to extract the hostname from the specified URL.

Parameters:url – An URL to analyze
Returns:string – The hostname of the given URL

The visualizations sub package

The visualization sub-package is meant to contain visualizations for collected datasets. In addition, it contains a module named visualization_tools, with functionality that can be useful when creating visualizations.

aww.visualizations.visualization_tools.png2gif(png_filename, gif_filename)

Use functionality from the easyviz pacakge to covnert a file.

If png_filename is specified as several files (using * notation), an animated gif should be made.

Parameters:
  • png_filename – The file to read from
  • gif_filename – The file to write to

Table Of Contents

Previous topic

User documentation for AWW

This Page