pandas.DataFrame Operation: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "]] *** " to "]]. *** ")
m (Text replacement - "]]s **" to "]]s. **")
 
Line 9: Line 9:
** [[Create a prepopulated array]]:
** [[Create a prepopulated array]]:
*** <code>df = pd.DataFrame([2,7,9])</code>, with [[auto-gen]] [[column name]] and [[index name]]s.
*** <code>df = pd.DataFrame([2,7,9])</code>, with [[auto-gen]] [[column name]] and [[index name]]s.
*** <code>df = pd.DataFrame({'col1' : [1,2,3], 'col2' : Series([1., 2., 3., 4.]) })</code>, with [[auto-gen]] [[index key]]s
*** <code>df = pd.DataFrame({'col1' : [1,2,3], 'col2' : Series([1., 2., 3., 4.]) })</code>, with [[auto-gen]] [[index key]]s.
*** <code>df = pd.DataFrame({'col1' : Series([1., 2., 3.], index=['row3', 'row2', 'row1']), 'col2' : Series([1., 2., 3., 4.], index=['row1', 'row2', 'row3', 'row4']) })</code>, with explicit [[index key]]s
*** <code>df = pd.DataFrame({'col1' : Series([1., 2., 3.], index=['row3', 'row2', 'row1']), 'col2' : Series([1., 2., 3., 4.], index=['row1', 'row2', 'row3', 'row4']) })</code>, with explicit [[index key]]s.
*** <code>df = pd.DataFrame([[np]].random.randn(10, 2), columns=['colA', 'colB'])</code>
*** <code>df = pd.DataFrame([[np]].random.randn(10, 2), columns=['colA', 'colB'])</code>
*** <code>mydata = [<BR>&nbsp; &nbsp; &nbsp; {'col0':'A', 'col1':'EB', 'col2':1.1},<BR>&nbsp; &nbsp; &nbsp;  {'col0':'B', 'col1':'EB', },<BR>&nbsp; &nbsp; &nbsp;  {'col0':'C', 'col1':'PG', 'col2':2.4},<BR>&nbsp; &nbsp; &nbsp;  {'col0':'D', 'col1':'PG', 'col2':'7.0'},<BR>] <BR>df = pd.DataFrame(mydata) <BR> df.set_index('col0', inplace=True)</code>
*** <code>mydata = [<BR>&nbsp; &nbsp; &nbsp; {'col0':'A', 'col1':'EB', 'col2':1.1},<BR>&nbsp; &nbsp; &nbsp;  {'col0':'B', 'col1':'EB', },<BR>&nbsp; &nbsp; &nbsp;  {'col0':'C', 'col1':'PG', 'col2':2.4},<BR>&nbsp; &nbsp; &nbsp;  {'col0':'D', 'col1':'PG', 'col2':'7.0'},<BR>] <BR>df = pd.DataFrame(mydata) <BR> df.set_index('col0', inplace=True)</code>
Line 34: Line 34:
*** <code>df['col4'] = df['col3'].str.len() # [[characters count]]</code>
*** <code>df['col4'] = df['col3'].str.len() # [[characters count]]</code>
*** <code>df['col5'] = df.col3.apply(lambda s: s.split(' > ')) # [[array]] with [[tokenized string]]</code>
*** <code>df['col5'] = df.col3.apply(lambda s: s.split(' > ')) # [[array]] with [[tokenized string]]</code>
** [[Delete array row]]s
** [[Delete array row]]s.
*** <code>g = df.groupby(['col1']) <BR> df = g.filter(lambda x: x['col2'].count() >= 1)  <BR>  df.index = range(0, len(df))</code>
*** <code>g = df.groupby(['col1']) <BR> df = g.filter(lambda x: x['col2'].count() >= 1)  <BR>  df.index = range(0, len(df))</code>
** [[Query an array's structure]]:
** [[Query an array's structure]]:

Latest revision as of 15:36, 24 July 2023

A pandas.DataFrame Operation is a python-based tabular data structure operation for a pandas.DataFrame.



References

2014

  • http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html
    • abs() Return an object with absolute value taken.
    • add(other[, axis, level, fill_value]) Binary operator add with support to substitute a fill_value for missing data in
    • describe([percentile_width]) Generate various summary statistics of each column, excluding
    • from_csv(path[, header, sep, index_col, ...]) Read delimited file into DataFrame
    • get_value(index, col) Quickly retrieve single value at passed column and index
    • get_values() same as values (but handles sparseness conversions)
    • groupby([by, axis, level, as_index, sort, ...]) Group series using mapper (dict or key function, apply given function
    • isnull() Return a boolean same-sized object indicating if the values are null
    • median([axis, skipna, level, numeric_only]) Return the median of the values for the requested axis
    • rename_axis(mapper[, axis, copy, inplace]) Alter index and / or columns using input function or functions.
    • select(crit[, axis]) Return data corresponding to axis labels matching criteria
    • tail([n]) Returns last n row
    • to_excel(excel_writer[, sheet_name, na_rep, ...]) Write DataFrame to a excel sheet
    • transpose() Transpose index and columns
    • sort([columns, column, axis, ascending, inplace]) Sort DataFrame either by labels (along either axis) or by the values in
    • sort_index([axis, by, ascending, inplace, kind]) Sort DataFrame either by labels (along either axis) or by the values in
    • sortlevel([level, axis, ascending, inplace]) Sort multilevel index by chosen axis and primary level.
    • squeeze() squeeze length 1 dimensions
    • stack([level, dropna]) Pivot a level of the (possibly hierarchical) column labels, returning a