dfply.reshape¶
Module Contents¶
-
dfply.reshape.arrange(df, *args, **kwargs)¶ Calls pandas.DataFrame.sort_values to sort a DataFrame according to criteria.
See: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
For a list of specific keyword arguments for sort_values (which will be the same in arrange).
-
dfply.reshape.rename(df, **kwargs)¶ Renames columns, where keyword argument values are the current names of columns and keys are the new names.
- Args:
- df (
pandas.DataFrame): DataFrame passed in via >> pipe. - Kwargs:
- **kwargs: key:value pairs where keys are new names for columns and
- values are current names of columns.
-
dfply.reshape.gather(df, key, values, *args, **kwargs)¶ Melts the specified columns in your DataFrame into two key:value columns.
- Args:
key (str): Name of identifier column. values (str): Name of column that will contain values for the key. *args (str, int, symbolic): Columns to “melt” into the new key and
value columns. If no args are specified, all columns are melted into they key and value columns.- Kwargs:
- add_id (bool): Boolean value indicating whether to add a “_ID”
- column that will preserve information about the original rows (useful for being able to re-widen the data later).
- Example:
diamonds >> gather(‘variable’, ‘value’, [‘price’, ‘depth’,’x’,’y’,’z’]) >> head(5)
carat cut color clarity table variable value0 0.23 Ideal E SI2 55.0 price 326.0 1 0.21 Premium E SI1 61.0 price 326.0 2 0.23 Good E VS1 65.0 price 327.0 3 0.29 Premium I VS2 58.0 price 334.0 4 0.31 Good J SI2 58.0 price 335.0
-
dfply.reshape.convert_type(df, columns)¶ Helper function that attempts to convert columns into their appropriate data type.
-
dfply.reshape.spread(df, key, values, convert=False)¶ Transforms a “long” DataFrame into a “wide” format using a key and value column.
If you have a mixed datatype column in your long-format DataFrame then the default behavior is for the spread columns to be of type object, or string. If you want to try to convert dtypes when spreading, you can set the convert keyword argument in spread to True.
- Args:
- key (str, int, or symbolic): Label for the key column. values (str, int, or symbolic): Label for the values column.
- Kwargs:
- convert (bool): Boolean indicating whether or not to try and convert
- the spread columns to more appropriate data types.
- Example:
widened = elongated >> spread(X.variable, X.value) widened >> head(5)
_ID carat clarity color cut depth price table x y z0 0 0.23 SI2 E Ideal 61.5 326 55 3.95 3.98 2.43 1 1 0.21 SI1 E Premium 59.8 326 61 3.89 3.84 2.31 2 10 0.3 SI1 J Good 64 339 55 4.25 4.28 2.73 3 100 0.75 SI1 D Very Good 63.2 2760 56 5.8 5.75 3.65 4 1000 0.75 SI1 D Ideal 62.3 2898 55 5.83 5.8 3.62
-
dfply.reshape.separate(df, column, into, sep='[\W_]+', remove=True, convert=False, extra='drop', fill='right')¶ Splits columns into multiple columns.
- Args:
- df (pandas.DataFrame): DataFrame passed in through the pipe. column (str, symbolic): Label of column to split. into (list): List of string names for new columns.
- Kwargs:
- sep (str or list): If a string, the regex string used to split the
- column. If a list, a list of integer positions to split strings on.
remove (bool): Boolean indicating whether to remove the original column. convert (bool): Boolean indicating whether the new columns should be
converted to the appropriate type.- extra (str): either ‘drop’, where split pieces beyond the specified
- new columns are dropped, or ‘merge’, where the final split piece contains the remainder of the original column.
- fill (str): either ‘right’, where np.nan values are filled in the
- right-most columns for missing pieces, or ‘left’ where np.nan values are filled in the left-most columns.
-
dfply.reshape.unite(df, colname, *args, **kwargs)¶ Does the inverse of separate, joining columns together by a specified separator.
Any columns that are not strings will be converted to strings.
- Args:
df (pandas.DataFrame): DataFrame passed in through the pipe. colname (str): the name of the new joined column. *args: list of columns to be joined, which can be strings, symbolic, or
integer positions.- Kwargs:
sep (str): the string separator to join the columns with. remove (bool): Boolean indicating whether or not to remove the
original columns.- na_action (str): can be one of ‘maintain’ (the default),
- ‘ignore’, or ‘as_string’. The default will make the new column row a NaN value if any of the original column cells at that row contained NaN. ‘ignore’ will treat any NaN value as an empty string during joining. ‘as_string’ will convert any NaN value to the string ‘nan’ prior to joining.