Code docs¶

class ananse.Ananse[source]¶

A python package to partially automate search term selection and write search strategies for systematic reviews

create_dtm(doc, min_len=2, max_len=3, **kwargs)[source]¶

This method creates a Document-Term Matrix

Parameters

min_len – minimum keyword length
max_len – maximum keyword length
doc – a list of article title, abstract or any article property
keywords – a list of keywords to use for the Document-Term Matrix
dfm_type – whether the dtm should be created based on document tokens or a restricted list of keywords options: token or keywords

Returns

a multidimensional array of a Document-Term Matrix and a list of terms(columns)

create_network(im, keywords, draw_graph=False, save_network=False, save_directory=None)[source]¶

This method creates a graph when given a Document-Term Matrix in the form of an incidence matrix

Parameters

im – the incidence matrix
keywords – keywords for labelling
draw_graph – if TRUE, graph is drawn
save_network – if TRUE, saves the graph to a .png
save_directory – the path to a directory where search results will be saved if save_dataset is set to TRUE

Returns

a networkx graph

deduplicate_dataframe(DataFrame, columns)[source]¶

this method duplicated a DataFrame based on certain columns it considers on the first occurrence of a row as unique and deletes(inplace=True) other duplicates

Parameters

DataFrame – a pandas DataFrame to be deduplicated
columns – a list of fields to check for duplicate values and deduplicated the dataframe

Returns

DataFrame with removed duplicate rows depending on Arguments passed.

dtm_to_dataframe(dtm, keywords, doc)[source]¶

This method created a data frame of a Document-Term Matrix

Parameters

dtm – a multidimensional array of a Document-Term Matrix
keywords – a list of keywords to use for the Document-Term Matrix
doc – a list of article title, abstract or any article property

Returns

a data frame of a Document-Term Matrix

extract_terms(DataFrame, min_len=2, max_len=4)[source]¶

This method uses the RAKE Algorithm to extract keywords from the text column of the DataFrame of naive search results.

Parameters

DataFrame –
min_len – minimum keyword length
max_len – maximum keyword length

Returns

a list consisting of a combination of extracted keywords and author keyword

find_cutoff(g, method, importance_method, degrees=2, knot_num=1, percent=0.2, diagnostics=False)[source]¶

This method finds the cutoff for a graph network using either cumulative or spline method of cutting of the degree distribution

Parameters

g – graph
method – method of finding cutoff
importance_method – method to use to check node importance
degrees – spline degree
knot_num – spline number of knots
percent – cutoff percentage for cumulative method

Returns

cutoff strengths

find_knots(x, y, degrees, knot_num=1)[source]¶

This method find the knots of a two sets of values

Parameters

x – x values
y – y values
degrees – degrees of the spline
knot_num – number of knots of the spline

Returns

t = knots, c = spline coefficients, k = B-spline order

fit_spline(t, c, k)[source]¶

This methods fits t = knots, c = spline coefficients, k = B-spline order to a B-spline

Parameters

t – knots
c – spline coefficients
k – B-spline order

Returns

fitted B-spline

get_centrality(g, method)[source]¶

This method evaluate the node importance of a graph

Parameters

g – a graph from which you find its node importance
method – the method for finding the node importance degree, closeness, betweenness or eigenvalue

Returns

a dictionary containing nodes with their importance

get_keywords(g, importance_method, cutoff_strength, save_keywords=True, save_directory=None, draw_reduced_graph=False)[source]¶

Parameters

g – graph
importance_method – method to use to check node importance
cutoff_strength – where to cut off of the graph
save_keywords – if save_keywords=True saves the keywords to a .csv
save_directory – path to a directory where suggested keywords will be saved if save_dataset is set to TRUE
draw_reduced_graph – RUE, draws reduced graph

Returns

suggested keywords for final review

import_naive_results(path, save_dataset=False, save_directory=None, clean_dataset=False)[source]¶

This method imports the search results from a specified path

Parameters

clean_dataset – if TRUE, de-duplicates search results after importing
save_dataset – if TRUE, saves the full search results to a .csv
save_directory – the path to a directory where search results will be saved if save_dataset is set to TRUE
path – path containing the naive search results files

Returns

a pandas data frame consisting of assembled search results

make_importance(g, importance_method)[source]¶

This methods creates a dataframe made up of node names with their importance and their rank (index) from a graph

Parameters

g – graph
importance_method – method to use to check node importance

Returns

a data frame of rank, node importance and node name

plot_degree_distribution(g, save_plot=False, save_directory=None)[source]¶

This method plots a distribution of the graph degree

Parameters

g – graph whose degree distribution is to be plotted
save_plot – if save_plot=True saves the plot to a .png
save_directory – the path to a directory where search results will be saved if save_plot is set to TRUE

Returns

plot_degree_histogram(g, save_plot=False, save_directory=None)[source]¶

This method plots a histogram of the graph degree

Parameters

g – graph whose degree distribution is to be plotted
save_plot – if save_plot=True saves the plot to a .png
save_directory – the path to a directory where search results will be saved if save_plot is set to TRUE

Returns

remove_punctuations(raw_string)[source]¶

This method removes all symbols and numbers from a string

Parameters: raw_string – string with characters and numbers
Returns: cleaned string

Code docs¶

ananse

Navigation

Related Topics