Last active
March 15, 2023 01:33
-
-
Save headquarters/b96a5268391d693bc73a4b6ec9800511 to your computer and use it in GitHub Desktop.
List S3 objects with depth (ChatGPT generated)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
def list_s3_objects(bucket_name, prefix='', depth=0, page_size=100, start_after='', max_pages=10): | |
""" | |
Lists all objects in an S3 bucket at a certain "depth" (number of slashes in the key), optionally filtered by a prefix, and paginates results. | |
Returns a list of dictionaries containing object metadata. | |
Args: | |
bucket_name (str): Name of the S3 bucket. | |
prefix (str): Prefix to filter objects by (default ''). | |
depth (int): Depth of objects to list (default 0, meaning all objects). | |
page_size (int): Maximum number of objects to return per page (default 100). | |
start_after (str): Object key to start listing after (default ''). | |
max_pages (int): Maximum number of pages to retrieve (default 10). | |
Returns: | |
list: List of dictionaries containing object metadata. | |
""" | |
s3 = boto3.client('s3') | |
paginator = s3.get_paginator('list_objects_v2') | |
page_iterator = paginator.paginate( | |
Bucket=bucket_name, | |
Prefix=prefix, | |
PaginationConfig={ | |
'PageSize': page_size, | |
'StartingToken': start_after, | |
} | |
) | |
objects = [] | |
page_count = 0 | |
for page in page_iterator: | |
for obj in page.get('Contents', []): | |
if obj['Key'].count('/') == depth: | |
objects.append(obj) | |
page_count += 1 | |
if page_count >= max_pages: | |
break | |
return objects |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Example:
"In this example, list_s3_objects lists only the objects in the S3 bucket whose name is specified by the bucket_name parameter and whose keys have two slashes (i.e., a depth of 2), filtered by the prefix specified by the prefix parameter. It retrieves up to page_size objects per page, starting with the object whose key is specified by the start_after parameter (if any), and returns up to max_pages pages of results. The function returns a list of dictionaries, where each dictionary contains metadata about an S3 object (such as its key, size, and last modified date)."