Pandas数据读取+索引计算

news/2024/7/5 8:49:54
  1. 读csv文件

    import pandas
    food_info=pandas.read_csv('food_info.csv')
    print(type(food_info))
    print(food_info.dtypes)
    print(help(pandas.read_csv))
    

    运行结果:
    在这里插入图片描述

  2. 显示前5行

    food_info.head()
    

    运行结果:
    在这里插入图片描述

    food_info.head(3)
    

    运行结果:
    在这里插入图片描述

  3. 显示后5行

    food_info.tail()
    

    运行结果:
    在这里插入图片描述

    print(food_info.shape)
    

    运行结果:
    在这里插入图片描述

  4. 读取特定的数据:

    print(food_info.loc[0])
    

    运行结果:
    在这里插入图片描述

  5. 切片:

    print(food_info.loc[3:6])
    

    运行结果:
    在这里插入图片描述

  6. 列名取数据

    ndb_col=food_info['NDB_No']
    print(ndb_col)
    

    运行结果:
    在这里插入图片描述

  7. 取某几个列

    columns=['Zinc_(mg)','Copper_(mg)']
    zinc_copper=food_info[columns]
    print(zinc_copper)
    

    运行结果:
    在这里插入图片描述

  8. endswith()

    col_names=food_info.columns.tolist()
    print(col_names)
    gram_columns=[]
    
    for c in col_names:
        if c.endswith('(g)'):
            gram_columns.append(c)
    gram_df=food_info[gram_columns]
    print(gram_df.head(3))
    

    运行结果:
    在这里插入图片描述

  9. 对取出的某些列进行数值运算

    print(food_info['Iron_(mg)'])
    div_1000=food_info['Iron_(mg)']/1000
    print(div_1000)
    

    运行结果:
    在这里插入图片描述

  10. 列和列相乘

    water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]
    water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]
    iron_grams = food_info["Iron_(mg)"] / 1000  
    print(food_info.shape)
    food_info['Iron_(g)']=iron_grams
    print(food_info.shape)
    

    运行结果:
    在这里插入图片描述

  11. 求某一列的最大值

    # the "Vit_A_IU" column ranges from 0 to 100000, while the "Fiber_TD_(g)" column ranges from 0 to 79
    #For certain calculations, columns like "Vit_A_IU" can have a greater effect on the result, 
    #due to the scale of the values
    # The largest value in the "Energ_Kcal" column.
    max_calories = food_info["Energ_Kcal"].max()
    # Divide the values in "Energ_Kcal" by the largest value.
    normalized_calories = food_info["Energ_Kcal"] / max_calories
    normalized_protein = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()
    normalized_fat = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()
    food_info["Normalized_Protein"] = normalized_protein
    food_info["Normalized_Fat"] = normalized_fat
    

http://www.niftyadmin.cn/n/4714814.html

相关文章

pandas数据预处理实例

排序,默认从小到大排 #By default, pandas will sort the data by the column we specify in ascending order and return a new DataFrame # Sorts the DataFrame in-place, rather than returning a new DataFrame. #print food_info["Sodium_(mg)"] fo…

pandas常用预处理方法

求均值,表格中含有空值: #The result of this is that mean_age would be nan. This is because any calculations we do with a null value also result in a null value mean_age sum(titanic_survival["Age"]) / len(titanic_survival[&qu…

VS 2010之多显示器支持 / Multi-Monitor Support (VS 2010 and .NET 4 Series)

【原文地址】Multi-Monitor Support (VS 2010 and .NET 4 Series) 【原文发表日期】 Monday, August 31, 2009 10:37 PM 这是我针对即将发布的VS 2010 和 .NET 4所撰写的 贴子系列的第四篇。 今天的贴子讨论其中一个IDE改进,我知道很多人都在迫切期望VS 2010的--…

pandas自定义函数

sort_values和reset_index new_titanic_survival titanic_survival.sort_values("Age",ascendingFalse) print (new_titanic_survival[0:10]) titanic_reindexed new_titanic_survival.reset_index(dropTrue) print(titanic_reindexed.iloc[0:10])运行结果&#xf…

Series结构

读取csv文件: import pandas as pd fandango pd.read_csv(fandango_score_comparison.csv) series_film fandango[FILM] print(series_film[0:5]) series_rt fandango[RottenTomatoes] print (series_rt[0:5])运行结果: 制作Series # Import the Se…

折线图的绘制

to_datetime import pandas as pd unrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) print(unrate.head(12))运行结果: 绘图 from pandas.plotting import register_matplotlib_converters #%matplotlib inline #Using the different…

技术人员不应该固步自封

能力的提高不是通过量,而是通过质来提高的。 经常听到人们说,这点东西犯不到花这么大力气。 如果是学术问题,我觉得OK,确实是这样,因为有思路就行了。 但是技术问题则不同,光有想法是不够的。工程上是要…

子图的操作

读数据绘图: import pandas as pd from pandas.plotting import register_matplotlib_convertersunrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) first_twelve unrate[0:12] plt.plot(first_twelve[DATE], first_twelve[VALUE]) plt…