dynamo_pandas
- dynamo_pandas.get_df(*, table, keys=None, attributes=None, dtype=None, boto3_kwargs={})
Get items from a table into a dataframe.
- Parameters
table (str) – Name of the DynamoDB table.
keys (list[dict]) – List of keys to get where each key is represented by a dictionary.
attributes (list[str]) – Names of the item attributes to return as dataframe columns. If None (default), all attributes are returned.
dtype (data type or dict of column names -> data type) – Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.
boto3_kwargs (dict) – Keyword arguments to pass to the underlying
boto3.resource('dynamodb')function call (see boto3 docs for details).
- Returns
A dataframe where each item from the table matching the requested keys is represented by a row and its attributes by columns.
- Return type
pandas.DataFrame
Examples
>>> df = get_df( ... table="players", ... keys=[{"player_id": "player_three"}, {"player_id": "player_one"}] ... ) >>> print(df) bonus_points player_id last_play rating play_time 0 4 player_three 2021-01-21 10:22:43 2.5 1 days 14:01:19 1 3 player_one 2021-01-18 22:47:23 4.3 2 days 17:41:55
By default, the data types of the returned dataframe are basic pandas/numpy types:
>>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 bonus_points 1 non-null float64 1 player_id 2 non-null object 2 last_play 2 non-null object 3 rating 2 non-null float64 4 play_time 2 non-null object dtypes: float64(2), object(3) memory usage: 208.0+ bytes
The
dtypeparameter can be used to specify the data types of the different columns:>>> df = get_df( ... table="players", ... keys=keys(player_id=["player_two", "player_four"]), ... dtype={ ... "bonus_points": "Int8", ... "last_play": "datetime64[ns, UTC]", ... # "play_time": "timedelta64[ns]" # See note below. ... } ... ) >>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 bonus_points 1 non-null Int8 1 player_id 2 non-null object 2 last_play 2 non-null datetime64[ns, UTC] 3 rating 2 non-null float64 4 play_time 2 non-null object dtypes: Int8(1), datetime64[ns, UTC](1), float64(1), object(2) memory usage: 196.0+ bytes
Note
Due to a known bug in pandas, timedelta strings cannot currently be converted back to timedelta64 type via the
dtypeparameter. Use thepandas.to_timedeltafunction instead:>>> df.play_time = pd.to_timedelta(df.play_time) >>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 bonus_points 1 non-null Int8 1 player_id 2 non-null object 2 last_play 2 non-null datetime64[ns, UTC] 3 rating 2 non-null float64 4 play_time 2 non-null timedelta64[ns] dtypes: Int8(1), datetime64[ns, UTC](1), float64(1), object(1), timedelta64[ns](1) memory usage: 196.0+ bytes
Omitting the
keysparameter performs a scan of the table and returns all the items.>>> df = get_df(table="players") >>> print(df) bonus_points player_id last_play rating play_time 0 4.0 player_three 2021-01-21 10:22:43 2.5 1 days 14:01:19 1 NaN player_four 2021-01-22 13:51:12 4.8 0 days 03:45:49 2 3.0 player_one 2021-01-18 22:47:23 4.3 2 days 17:41:55 3 1.0 player_two 2021-01-19 19:07:54 3.8 0 days 22:07:34
Specifying item attributes via the
attributesparameter returns only the columns corresponding to the specified attributes:>>> df = get_df(table="players", attributes=["player_id", "rating"]) >>> print(df) player_id rating 0 player_three 2.5 1 player_four 4.8 2 player_one 4.3 3 player_two 3.8
- dynamo_pandas.keys(**kwargs)
Generate a list of key dictionaries from the partition key attribute name and a list of values. This can simplify the generation of keys to use with the
get_dffunction when only a partition key is used.- Parameters
**kwargs – A single keyword argument corresponding to the partition key name with a value corresponding to the list of key values to return.
- Returns
A list of key dictionaries.
- Return type
list[dict]
Examples
Assuming we have a table with
player_idas the partition key, we can generate the list of keys from the list of players:>>> key_list = keys(player_id=["player_two", "player_three", "player_four"]) >>> print(key_list) [{'player_id': 'player_one'}, {'player_id': 'player_three'}, {'player_id': 'player_four'}]
- dynamo_pandas.put_df(df, *, table, boto3_kwargs={})
Put rows of a dataframe as items into a table. If the item(s) do not exist in the table they are created, otherwise the existing items are replaced with the new ones.
- Parameters
df (pandas.DataFrame) – Dataframe of items to add/update in the table. The dataframe must, at a minimum, contain columns that correspond to the table’s primary key attribute(s).
table (str) – Name of the DynamoDB table.
boto3_kwargs (dict) –
Keyword arguments to pass to the underlying
boto3.client('dynamodb')function call (see boto3 docs for details).
Examples
Assume with have the following dataframe:
>>> print(players_df) player_id last_play play_time rating bonus_points 0 player_one 2021-01-18 22:47:23 2 days 17:41:55 4.3 3 1 player_two 2021-01-19 19:07:54 0 days 22:07:34 3.8 1 2 player_three 2021-01-21 10:22:43 1 days 14:01:19 2.5 4 3 player_four 2021-01-22 13:51:12 0 days 03:45:49 4.8 <NA>
The following will add or update the corresponding items in the table named
players:>>> put_df(players_df, table="players")