Code docs

class ananse.Ananse[source]

A python package to partially automate search term selection and write search strategies for systematic reviews

create_dtm(doc, min_len=2, max_len=3, **kwargs)[source]

This method creates a Document-Term Matrix

Parameters
  • min_len – minimum keyword length

  • max_len – maximum keyword length

  • doc – a list of article title, abstract or any article property

  • keywords – a list of keywords to use for the Document-Term Matrix

  • dfm_type – whether the dtm should be created based on document tokens or a restricted list of keywords options: token or keywords

Returns

a multidimensional array of a Document-Term Matrix and a list of terms(columns)

create_network(im, keywords, draw_graph=False, save_network=False, save_directory=None)[source]

This method creates a graph when given a Document-Term Matrix in the form of an incidence matrix

Parameters
  • im – the incidence matrix

  • keywords – keywords for labelling

  • draw_graph – if TRUE, graph is drawn

  • save_network – if TRUE, saves the graph to a .png

  • save_directory – the path to a directory where search results will be saved if save_dataset is set to TRUE

Returns

a networkx graph

deduplicate_dataframe(DataFrame, columns)[source]

this method duplicated a DataFrame based on certain columns it considers on the first occurrence of a row as unique and deletes(inplace=True) other duplicates

Parameters
  • DataFrame – a pandas DataFrame to be deduplicated

  • columns – a list of fields to check for duplicate values and deduplicated the dataframe

Returns

DataFrame with removed duplicate rows depending on Arguments passed.

dtm_to_dataframe(dtm, keywords, doc)[source]

This method created a data frame of a Document-Term Matrix

Parameters
  • dtm – a multidimensional array of a Document-Term Matrix

  • keywords – a list of keywords to use for the Document-Term Matrix

  • doc – a list of article title, abstract or any article property

Returns

a data frame of a Document-Term Matrix

extract_terms(DataFrame, min_len=2, max_len=4)[source]

This method uses the RAKE Algorithm to extract keywords from the text column of the DataFrame of naive search results.

Parameters
  • DataFrame

  • min_len – minimum keyword length

  • max_len – maximum keyword length

Returns

a list consisting of a combination of extracted keywords and author keyword

find_cutoff(g, method, importance_method, degrees=2, knot_num=1, percent=0.2, diagnostics=False)[source]

This method finds the cutoff for a graph network using either cumulative or spline method of cutting of the degree distribution

Parameters
  • g – graph

  • method – method of finding cutoff

  • importance_method – method to use to check node importance

  • degrees – spline degree

  • knot_num – spline number of knots

  • percent – cutoff percentage for cumulative method

Returns

cutoff strengths

find_knots(x, y, degrees, knot_num=1)[source]

This method find the knots of a two sets of values

Parameters
  • x – x values

  • y – y values

  • degrees – degrees of the spline

  • knot_num – number of knots of the spline

Returns

t = knots, c = spline coefficients, k = B-spline order

fit_spline(t, c, k)[source]

This methods fits t = knots, c = spline coefficients, k = B-spline order to a B-spline

Parameters
  • t – knots

  • c – spline coefficients

  • k – B-spline order

Returns

fitted B-spline

get_centrality(g, method)[source]

This method evaluate the node importance of a graph

Parameters
  • g – a graph from which you find its node importance

  • method – the method for finding the node importance degree, closeness, betweenness or eigenvalue

Returns

a dictionary containing nodes with their importance

get_keywords(g, importance_method, cutoff_strength, save_keywords=True, save_directory=None, draw_reduced_graph=False)[source]
Parameters
  • g – graph

  • importance_method – method to use to check node importance

  • cutoff_strength – where to cut off of the graph

  • save_keywords – if save_keywords=True saves the keywords to a .csv

  • save_directory – path to a directory where suggested keywords will be saved if save_dataset is set to TRUE

  • draw_reduced_graph – RUE, draws reduced graph

Returns

suggested keywords for final review

import_naive_results(path, save_dataset=False, save_directory=None, clean_dataset=False)[source]

This method imports the search results from a specified path

Parameters
  • clean_dataset – if TRUE, de-duplicates search results after importing

  • save_dataset – if TRUE, saves the full search results to a .csv

  • save_directory – the path to a directory where search results will be saved if save_dataset is set to TRUE

  • path – path containing the naive search results files

Returns

a pandas data frame consisting of assembled search results

make_importance(g, importance_method)[source]

This methods creates a dataframe made up of node names with their importance and their rank (index) from a graph

Parameters
  • g – graph

  • importance_method – method to use to check node importance

Returns

a data frame of rank, node importance and node name

plot_degree_distribution(g, save_plot=False, save_directory=None)[source]

This method plots a distribution of the graph degree

Parameters
  • g – graph whose degree distribution is to be plotted

  • save_plot – if save_plot=True saves the plot to a .png

  • save_directory – the path to a directory where search results will be saved if save_plot is set to TRUE

Returns

plot_degree_histogram(g, save_plot=False, save_directory=None)[source]

This method plots a histogram of the graph degree

Parameters
  • g – graph whose degree distribution is to be plotted

  • save_plot – if save_plot=True saves the plot to a .png

  • save_directory – the path to a directory where search results will be saved if save_plot is set to TRUE

Returns

remove_punctuations(raw_string)[source]

This method removes all symbols and numbers from a string

Parameters

raw_string – string with characters and numbers

Returns

cleaned string