Closed
Description
Running select_dtypes for a variety of lengths.
import numpy as np
import pandas as pd
from timeit import default_timer as tic
ns = [0, 10, 100, 1_000, 10_000]
times = []
for n in ns:
df = pd.DataFrame(np.random.randn(10, n))
t0 = tic()
df.select_dtypes(include='int')
t1 = tic()
times.append([t1 - t0])
df = pd.DataFrame(times, columns=['include'], index=ns)
df.plot()
This looks O(n) in the number of columns. I think that can be improved (to whatever set intersection is)
Edit: maybe it's O(log(n)), I never took CS :)