Last active
November 1, 2024 06:31
-
-
Save bosswissam/a369b7a31d9dcab46b4a034be7d263b2 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
def get_size(obj, seen=None): | |
"""Recursively finds size of objects""" | |
size = sys.getsizeof(obj) | |
if seen is None: | |
seen = set() | |
obj_id = id(obj) | |
if obj_id in seen: | |
return 0 | |
# Important mark as seen *before* entering recursion to gracefully handle | |
# self-referential objects | |
seen.add(obj_id) | |
if isinstance(obj, dict): | |
size += sum([get_size(v, seen) for v in obj.values()]) | |
size += sum([get_size(k, seen) for k in obj.keys()]) | |
elif hasattr(obj, '__dict__'): | |
size += get_size(obj.__dict__, seen) | |
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)): | |
size += sum([get_size(i, seen) for i in obj]) | |
return size |
Is this not weird?
>>> from pysize import get_size
>>> get_size(b)
152
>>> get_size(b[0])
28
>>> get_size(b[1])
28
>>> len(b)
300
I've been happily using this code for a long time, but I just encountered a use case where this breaks down: a class built over a simple namedtuple data core. This pattern is desirable for certain multi-processing/cloud computing contexts.
from __future__ import print_function
from collections import namedtuple
import sys
import numpy as np
my_tup = namedtuple('MyNamedTuple', ['Array','Name'])
class my_class(my_tup):
def __init__(self, *kwargs):
super(my_class, self).__init__(*kwargs)
# Add workhorse functions...
dat_tuple = my_tup(np.zeros([1000,1000]), 'long name'*10)
dat_obj = my_class(np.zeros([1000,1000]), 'long name'*10)
print(get_size(dat_tuple), get_size(dat_obj))
These sizes should be almost the same, but they are not.
8000946 360
The problem is caused because dat_obj
has an empty __dict__
and data stored in __iter__
.
Here is the fix I made. It doesn't come out exactly the same, but it's a lot closer than before:
def get_size2(obj, seen=None):
"""Recursively finds size of objects"""
size = sys.getsizeof(obj)
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
# Important mark as seen *before* entering recursion to gracefully handle
# self-referential objects
seen.add(obj_id)
if isinstance(obj, dict):
size += sum([get_size(v, seen) for v in obj.values()])
size += sum([get_size(k, seen) for k in obj.keys()])
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum([get_size(i, seen) for i in obj])
if hasattr(obj, '__dict__'):
size += get_size(obj.__dict__.values(), seen)
elif hasattr(obj, '__dict__'):
size += get_size(obj.__dict__, seen)
return size
print(get_size2(dat_tuple), get_size2(dat_obj))
8000671 8000647
There is python module that provides similar functionality and other things as well such as tracking the memory consumption of the instances of a specific class, etc. called Pympler.
https://pympler.readthedocs.io/en/latest/
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Amazing code ! thanks