pandas.DataFrame Query
Jump to navigation
Jump to search
A pandas.DataFrame Operation is a Pandas query (a python-based tabular data structure query) for a pandas.DataFrame structure.
- Context:
- It can be a pandas.DataFrame.join Query.
- …
- Example(s)
- Query an array:
dataValue = df.loc[indexValue,['col2']][0]
seriesRecord = df.iloc[row=7]
df[ ['colK','colL'] ][(df.colI=='valX') and (df.colJ=='valY')]
df[df['colX'].str.contains("strS")]
mask=df['count'] > 2
df[mask]df.textItem.apply(lambda s: s.split(' ')).str.len() # token count
.df.groupby(["colX","colY"]).count()
, Group By Query.DataFrame({'count' : df.groupby(["colX","colY"]).size()}).reset_index()
DataFrame({'count' : df.groupby(["colX","colY"]).size()}).reset_index().query('(count>4321)')
g=df.groupby(['col1'])
g.count().sort('col2', ascending=False)
g.filter(lambda x: x['col1'].count() > minCount) # Roll-Up Querysrs_tokenCount = df.col2.apply(lambda x: pd.value_counts(x.lower().split(" "))).sum(axis = 0)
- Query an array's structure:
rows, cols = df.shape
rows = len(df.index)
- Query an array's metadata:
df.columns
df.dtypes
- Iterate over an array.
for index, row in df.iterrows():
print row['colY'], row['colX']
- Selection by Label
dfl = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=pd.date_range('20130101',periods=5))
dfl.loc['20130102':'20130104']
- Selection by Position
df1 = pd.DataFrame(np.random.randn(6,4), index=list(range(0,12,2)), columns=list(range(0,8,2)))
df1.iloc[1:5, 2:4]- Selection by Callable
df1 = pd.DataFrame(np.random.randn(6, 4), index=list('abcdef'), columns=list('ABCD'))
df1.loc[lambda df: df.A > 0, :]
df1.iloc[:, lambda df: [0, 1]]
- Query an array:
- Counter-Example(s)
- a pandas.Series Query, on a pandas.Series.
- a numpy.ndarray Query, on a numpy.ndarray.
- a SciPy Sparse Array Query.
- a Python Array Query.
- a Python List Query.
- a Perl Associative Array Query.
- a Perl Array Query.
- a SQL Table Query.
- an R DataFrame Query.
- See: pandas.DataFrame Attribute, Pandas DataFrame Operation.
References
2016
- http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label
dfl = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=pd.date_range('20130101',periods=5))
dfl.loc['20130102':'20130104']
- http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
df1 = pd.DataFrame(np.random.randn(6,4), index=list(range(0,12,2)), columns=list(range(0,8,2)))
df1.iloc[1:5, 2:4]
- http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-callable
df1 = pd.DataFrame(np.random.randn(6, 4), index=list('abcdef'), columns=list('ABCD'))
df1.loc[lambda df: df.A > 0, :]
df1.iloc[:, lambda df: [0, 1]]