Calculate the covariance of two numerical columns of a DataFrame.
Calculate the covariance of two numerical columns of a DataFrame.
The DataFrame
the column names
the covariance of the two columns.
Generate a table of frequencies for the elements of two columns.
Calculates the approximate quantiles of multiple numerical columns of a DataFrame in one pass.
Calculates the approximate quantiles of multiple numerical columns of a DataFrame in one pass.
The result of this algorithm has the following deterministic bound:
If the DataFrame has N elements and if we request the quantile at probability p
up to error
err
, then the algorithm will return a sample x
from the DataFrame so that the *exact* rank
of x
is close to (p * N).
More precisely,
floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna.
the dataframe
numerical columns of the dataframe
a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
The relative target precision to achieve (>= 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.
for each column, returns the requested approximations
Calculate the Pearson Correlation Coefficient for the given columns