:mod:`dfply.reshape` ==================== .. py:module:: dfply.reshape Module Contents --------------- .. function:: arrange(df, *args, **kwargs) Calls `pandas.DataFrame.sort_values` to sort a DataFrame according to criteria. See: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html For a list of specific keyword arguments for sort_values (which will be the same in arrange). Args: *args: Symbolic, string, integer or lists of those types indicating columns to sort the DataFrame by. Kwargs: **kwargs: Any keyword arguments will be passed through to the pandas `DataFrame.sort_values` function. .. function:: rename(df, **kwargs) Renames columns, where keyword argument values are the current names of columns and keys are the new names. Args: df (:obj:`pandas.DataFrame`): DataFrame passed in via `>>` pipe. Kwargs: **kwargs: key:value pairs where keys are new names for columns and values are current names of columns. .. function:: gather(df, key, values, *args, **kwargs) Melts the specified columns in your DataFrame into two key:value columns. Args: key (str): Name of identifier column. values (str): Name of column that will contain values for the key. *args (str, int, symbolic): Columns to "melt" into the new key and value columns. If no args are specified, all columns are melted into they key and value columns. Kwargs: add_id (bool): Boolean value indicating whether to add a `"_ID"` column that will preserve information about the original rows (useful for being able to re-widen the data later). Example: diamonds >> gather('variable', 'value', ['price', 'depth','x','y','z']) >> head(5) carat cut color clarity table variable value 0 0.23 Ideal E SI2 55.0 price 326.0 1 0.21 Premium E SI1 61.0 price 326.0 2 0.23 Good E VS1 65.0 price 327.0 3 0.29 Premium I VS2 58.0 price 334.0 4 0.31 Good J SI2 58.0 price 335.0 .. function:: convert_type(df, columns) Helper function that attempts to convert columns into their appropriate data type. .. function:: spread(df, key, values, convert=False) Transforms a "long" DataFrame into a "wide" format using a key and value column. If you have a mixed datatype column in your long-format DataFrame then the default behavior is for the spread columns to be of type `object`, or string. If you want to try to convert dtypes when spreading, you can set the convert keyword argument in spread to True. Args: key (str, int, or symbolic): Label for the key column. values (str, int, or symbolic): Label for the values column. Kwargs: convert (bool): Boolean indicating whether or not to try and convert the spread columns to more appropriate data types. Example: widened = elongated >> spread(X.variable, X.value) widened >> head(5) _ID carat clarity color cut depth price table x y z 0 0 0.23 SI2 E Ideal 61.5 326 55 3.95 3.98 2.43 1 1 0.21 SI1 E Premium 59.8 326 61 3.89 3.84 2.31 2 10 0.3 SI1 J Good 64 339 55 4.25 4.28 2.73 3 100 0.75 SI1 D Very Good 63.2 2760 56 5.8 5.75 3.65 4 1000 0.75 SI1 D Ideal 62.3 2898 55 5.83 5.8 3.62 .. function:: separate(df, column, into, sep='[\\W_]+', remove=True, convert=False, extra='drop', fill='right') Splits columns into multiple columns. Args: df (pandas.DataFrame): DataFrame passed in through the pipe. column (str, symbolic): Label of column to split. into (list): List of string names for new columns. Kwargs: sep (str or list): If a string, the regex string used to split the column. If a list, a list of integer positions to split strings on. remove (bool): Boolean indicating whether to remove the original column. convert (bool): Boolean indicating whether the new columns should be converted to the appropriate type. extra (str): either `'drop'`, where split pieces beyond the specified new columns are dropped, or `'merge'`, where the final split piece contains the remainder of the original column. fill (str): either `'right'`, where `np.nan` values are filled in the right-most columns for missing pieces, or `'left'` where `np.nan` values are filled in the left-most columns. .. function:: unite(df, colname, *args, **kwargs) Does the inverse of `separate`, joining columns together by a specified separator. Any columns that are not strings will be converted to strings. Args: df (pandas.DataFrame): DataFrame passed in through the pipe. colname (str): the name of the new joined column. *args: list of columns to be joined, which can be strings, symbolic, or integer positions. Kwargs: sep (str): the string separator to join the columns with. remove (bool): Boolean indicating whether or not to remove the original columns. na_action (str): can be one of `'maintain'` (the default), '`ignore'`, or `'as_string'`. The default will make the new column row a `NaN` value if any of the original column cells at that row contained `NaN`. '`ignore'` will treat any `NaN` value as an empty string during joining. `'as_string'` will convert any `NaN` value to the string `'nan'` prior to joining.