文章目录
- 1、pd.read_csv()
- 2、Dataframe.drop()
- 3、pd.get_dummies()
- pandas官方文档:https://pandas.pydata.org/pandas-docs/stable/index.html
1、pd.read_csv()
- pd.read_csv()是用于读取 CSV(Comma Separated Values,逗号分隔值)文件并将其转换为 DataFrame 对象。
- CSV 是一种常见的数据存储格式,其中数据以纯文本形式存储,每行表示一条记录,每个字段之间用逗号(或其他分隔符)分隔。
- 简单使用:
pd.read_csv(file_path, sep) 1) file_path: 文件路径 2) sep: csv文件的分隔符,默认为逗号
- 更复杂的使用方法:详见https://blog.csdn.net/weixin_47139649/article/details/126744842
read_csv( reader: FilePathOrBuffer, *, sep: str = ..., delimiter: str | None = ..., header: int | Sequence[int] | str = ..., names: Sequence[str] | None = ..., index_col: int | str | Sequence | Literal[False] | None = ..., usecols: int | str | Sequence | None = ..., squeeze: bool = ..., prefix: str | None = ..., mangle_dupe_cols: bool = ..., dtype: str | Mapping[str, Any] | None = ..., engine: str | None = ..., converters: Mapping[int | str, (*args, **kwargs) -> Any] | None = ..., true_values: Sequence[Scalar] | None = ..., false_values: Sequence[Scalar] | None = ..., skipinitialspace: bool = ..., skiprows: Sequence | int | (*args, **kwargs) -> Any | None = ..., skipfooter: int = ..., nrows: int | None = ..., na_values=..., keep_default_na: bool = ..., na_filter: bool = ..., verbose: bool = ..., skip_blank_lines: bool = ..., parse_dates: bool | List[int] | List[str] = ..., infer_datetime_format: bool = ..., keep_date_col: bool = ..., date_parser: (*args, **kwargs) -> Any | None = ..., dayfirst: bool = ..., cache_dates: bool = ..., iterator: Literal[True], chunksize: int | None = ..., compression: str | None = ..., thousands: str | None = ..., decimal: str | None = ..., lineterminator: str | None = ..., quotechar: str = ..., quoting: int = ..., doublequote: bool = ..., escapechar: str | None = ..., comment: str | None = ..., encoding: str | None = ..., dialect: str | None = ..., error_bad_lines: bool = ..., warn_bad_lines: bool = ..., delim_whitespace: bool = ..., low_memory: bool = ..., memory_map: bool = ..., float_precision: str | None = ...)
2、Dataframe.drop()
- 用于删除 DataFrame 或 Series 中的指定行、列或元素。
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors=‘raise’) 1) labels: 指定要删除的列名或者行索引,可以是单个值(int/str)或者list 2) axis: 指定删除方向(行或列),0 或 ‘index’ : 删除行;1 or ‘columns’: 删除列 3) index: 用于指定要删除的行索引(index=labels 等效于 labels, axis=0) 4) columns: 用于指定要删除的列名(columns=labels 等效于 labels, axis=1) 5) inplace: bool类型,True表示原地修改,False表示返回一个新的DataFrame,默认为False
- 例如:
import pandas as pd # 创建一个简单的 DataFrame df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) # 删除列 'A' df_dropped = df.drop('A', axis=1) # 这与下面的用法是等效的 df_dropped_equiv = df.drop(columns='A') # 删除索引为 1 的行 df_dropped_row = df.drop(1, axis=0) # 这与下面的用法是等效的 df_dropped_row_equiv = df.drop(index=1)
3、pd.get_dummies()
- pd.get_dummies()是将类别变量转换为one-hot变量,进行one-hot编码,一般用于数据的预处理,在推荐系统中将类别变量转换为one-hot变量后,可继续进行embedding
-
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False)[source] 1) data: 待转换的类别变量,可以是Series, or DataFrame 2) prefix: str类型,是生成的新列的前缀,可见如下例子
- 例如:
import pandas as pd data = pd.DataFrame({ 'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C': np.random.randn(8), 'D': np.random.randn(8) }) dummy_data = pd.get_dummies(data['A'], prefix='A') ''' 结果 dummy_data 将是: A_bar A_foo 0 0 1 1 1 0 2 0 1 3 1 0 4 0 1 5 1 0 6 0 1 7 0 1 '''
- 例如:
- 例如:
- 用于删除 DataFrame 或 Series 中的指定行、列或元素。
- 更复杂的使用方法:详见https://blog.csdn.net/weixin_47139649/article/details/126744842
- pandas官方文档:https://pandas.pydata.org/pandas-docs/stable/index.html
还没有评论,来说两句吧...