pandas.DataFrame Operation
Jump to navigation
Jump to search
A pandas.DataFrame Operation is a python-based tabular data structure operation for a pandas.DataFrame.
- Context:
- It can (typically) involve a pandas.DataFrame Method.
- It can range from being a pandas.DataFrame Management Operation to being a pandas.DataFrame Query.
- Example(s)
- Create an empty array:
df = pd.DataFrame(columns=['col1','col2'])
.df = pd.DataFrame(np.zeros(0, dtype=[('col1', 'i7'),('col2', 'a50')]))
.
- Create a prepopulated array:
df = pd.DataFrame([2,7,9])
, with auto-gen column name and index names.df = pd.DataFrame({'col1' : [1,2,3], 'col2' : Series([1., 2., 3., 4.]) })
, with auto-gen index keys.df = pd.DataFrame({'col1' : Series([1., 2., 3.], index=['row3', 'row2', 'row1']), 'col2' : Series([1., 2., 3., 4.], index=['row1', 'row2', 'row3', 'row4']) })
, with explicit index keys.df = pd.DataFrame(np.random.randn(10, 2), columns=['colA', 'colB'])
mydata = [
{'col0':'A', 'col1':'EB', 'col2':1.1},
{'col0':'B', 'col1':'EB', },
{'col0':'C', 'col1':'PG', 'col2':2.4},
{'col0':'D', 'col1':'PG', 'col2':'7.0'},
]
df = pd.DataFrame(mydata)
df.set_index('col0', inplace=True)mydata = [{"str 1"}, {"str 2"},]
pd.DataFrame(mydata, columns=['colA'])
- Update array rows (or add array column):
df = df['colX'].str.replace("\n","< BR >")
df.loc[0] = ["val_0_A", 5.7]
df.loc[1] = ["val_1_A", 8.8]
df.loc[2] = ["val_2_A", -0.2]df_tmp = df.ColX.str.extract('us-(....)[-]?(.*)', expand=False) ;
df_tmp.columns = ['ColX_A','ColX_B'] ;
df.loc[:,'ColX_A'] = df_tmp'ColX_A' ;
df.loc[:,'ColX_B'] = df_tmp'ColX_B' ;df_yyyymm = df_dates['dtimeCol'].map(lambda x: 100*x.year + x.month).astype('int') ;
df_yyyywoy = df_dates['dtimeCol'].map(lambda x: 100*x.year + x.weekofyear).astype('int') ;
df3.loc[:,'yyyywoy'] = pd.Series(df_yyyywoy, index=df3.index) ;
- Query an array:
dataValue = df.loc[indexValue,['col2']][0]
seriesRecord = df.iloc[row=7]
df[ ['colK','colL'] ][(df.colI=='valX') and (df.colJ=='valY')]
df[df['colX'].str.contains("strS")]
mask=df['count'] > 2
df[mask]df.textItem.apply(lambda s: s.split(' ')).str.len() # token count
.df.groupby(["colX","colY"]).count()
, Group By Query.DataFrame({'count' : df.groupby(["colX","colY"]).size()}).reset_index()
DataFrame({'count' : df.groupby(["colX","colY"]).size()}).reset_index().query('(count>4321)')
g=df.groupby(['col1'])
g.count().sort('col2', ascending=False)
g.filter(lambda x: x['col1'].count() > minCount) # Roll-Up Querysrs_tokenCount = df.col2.apply(lambda x: pd.value_counts(x.lower().split(" "))).sum(axis = 0)
- Add a columns to an array:
df['col4'] = df['col3'].str.len() # characters count
df['col5'] = df.col3.apply(lambda s: s.split(' > ')) # array with tokenized string
- Delete array rows.
g = df.groupby(['col1'])
df = g.filter(lambda x: x['col2'].count() >= 1)
df.index = range(0, len(df))
- Query an array's structure:
rows, cols = df.shape
rows = len(df.index)
- Query an array's metadata:
df.columns
df.dtypes
- Modify an array's structure.
df'c1' = df'c1'.astype(float)
df'c2' = df'c2'.astype(object)
df'dtimeCol' = df'c2'.astype('datetime64[ns]')
df.index = np.random.permutation(range(0, len(df)))
# randomly reorder array.
df.sort_index(inplace=True)if not df.empty: # delete all records
df=df[0:0]if 'colX' in df.columns: # remove a single column
df = df.drop('colX', 1)df.rename(columns={'oldColName':'newColName'}, inplace=True)
- Iterate over an array.
for index, row in df.iterrows():
print row['colY'], row['colX']
- Delete an array:
del df
gc.collect()
- Create an empty array:
- Counter-Example(s)
- a pandas.Series Operation, on a pandas.Series.
- a numpy.ndarray Operation, on a numpy.ndarray.
- a SciPy Sparse Array Operation.
- a Python Array Operation.
- a Python List Operation.
- a Perl Associative Array Operation.
- a Perl Array Operation.
- a SQL Table Operation.
- an R Array Operation (R DataFrame Operation).
- See: pandas.DataFrame Attribute.
References
2014
- http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html
abs()
Return an object with absolute value taken.add(other[, axis, level, fill_value])
Binary operator add with support to substitute a fill_value for missing data in- …
describe([percentile_width])
Generate various summary statistics of each column, excluding- …
from_csv(path[, header, sep, index_col, ...])
Read delimited file into DataFrame- …
get_value(index, col)
Quickly retrieve single value at passed column and indexget_values()
same as values (but handles sparseness conversions)groupby([by, axis, level, as_index, sort, ...])
Group series using mapper (dict or key function, apply given function- …
isnull()
Return a boolean same-sized object indicating if the values are null- …
median([axis, skipna, level, numeric_only])
Return the median of the values for the requested axis- …
rename_axis(mapper[, axis, copy, inplace])
Alter index and / or columns using input function or functions.- …
select(crit[, axis])
Return data corresponding to axis labels matching criteria- …
tail([n])
Returns last n row- …
to_excel(excel_writer[, sheet_name, na_rep, ...])
Write DataFrame to a excel sheet- …
transpose()
Transpose index and columns- …
sort([columns, column, axis, ascending, inplace])
Sort DataFrame either by labels (along either axis) or by the values insort_index([axis, by, ascending, inplace, kind])
Sort DataFrame either by labels (along either axis) or by the values insortlevel([level, axis, ascending, inplace])
Sort multilevel index by chosen axis and primary level.squeeze()
squeeze length 1 dimensionsstack([level, dropna])
Pivot a level of the (possibly hierarchical) column labels, returning a- …