self picture of CY
CY Wu

This is me who never ever stop learning.

kaggle sharing-Fortune 1000

Introduction

Fortune-1000列舉了全美1000間最大公司,sector(領域)從零售業、科技、能源、金融、健康照護等,數據包括revenue、profit、員工人數、女性ceo人數等。能從Fortune-1000得到全美revenue、profit與產業領域、員工人數的相關性。此外,女性ceo在各領域的比例也能從fortune-1000的資料中得到。

fortune-1000

Data

data可以從Fortune.com取得[1],此dataset在kaggle上也能輕易下載下來進行分析[2]。非常謝謝kaggle上有前人分析整理並提供程式碼,研究到一篇bronze獎的分析,大幅加速自己對資料分析的認識與基礎[3]。

Importing package

Data cleaning & Check data

                    RangeIndex: 1000 entries, 0 to 999
                    Data columns (total 18 columns):
                     #   Column             Non-Null Count  Dtype  
                    ---  ------             --------------  -----  
                     0   company            1000 non-null   object 
                     1   rank               1000 non-null   int64  
                     2   rank_change        1000 non-null   float64
                     3   revenue            1000 non-null   float64
                     4   profit             997 non-null    float64
                     5   num. of employees  999 non-null    float64
                     6   sector             1000 non-null   object 
                     7   city               1000 non-null   object 
                     8   state              1000 non-null   object 
                     9   newcomer           1000 non-null   object 
                     10  ceo_founder        1000 non-null   object 
                     11  ceo_woman          1000 non-null   object 
                     12  profitable         1000 non-null   object 
                     13  prev_rank          1000 non-null   object 
                     14  CEO                1000 non-null   object 
                     15  Website            1000 non-null   object 
                     16  Ticker             951 non-null    object 
                     17  Market Cap         969 non-null    object 
                    dtypes: float64(4), int64(1), object(13)
                    memory usage: 140.8+ KB
                    None
                
                    Index: 996 entries, 0 to 999
                    Data columns (total 18 columns):
                    #   Column             Non-Null Count  Dtype  
                    ---  ------             --------------  -----  
                    0   company            996 non-null    object 
                    1   rank               996 non-null    int64  
                    2   rank_change        996 non-null    float64
                    3   revenue            996 non-null    float64
                    4   profit             996 non-null    float64
                    5   num. of employees  996 non-null    float64
                    6   sector             996 non-null    object 
                    7   city               996 non-null    object 
                    8   state              996 non-null    object 
                    9   newcomer           996 non-null    bool   
                    10  ceo_founder        996 non-null    bool   
                    11  ceo_woman          996 non-null    bool   
                    12  profitable         996 non-null    bool   
                    13  prev_rank          996 non-null    object 
                    14  ceo                996 non-null    object 
                    15  website            996 non-null    object 
                    16  ticker             996 non-null    object 
                    17  market cap         966 non-null    object 
                    dtypes: bool(4), float64(4), int64(1), object(9)
                    memory usage: 120.6+ KB
                    None
                
                    company  rank  rank_change  revenue  profit  num. of employees  \
                    249  Lam Research   250         54.0  14626.2  3908.5            14100.0   
                    
                             sector     city state  newcomer  ceo_founder  ceo_woman  profitable  \
                    249  Technology  Fremont    CA     False        False      False        True   
                    
                        prev_rank                ceo                      website ticker  \
                    249     304.0  Timothy M. Archer  https://www.lamresearch.com   LRCX   
                    
                        market cap  
                    249    74996.7  
                
                    company              False
                    rank                 False
                    rank_change          False
                    revenue              False
                    profit               False
                    num. of employees    False
                    sector               False
                    city                 False
                    state                False
                    newcomer             False
                    ceo_founder          False
                    ceo_woman            False
                    profitable           False
                    prev_rank            False
                    ceo                  False
                    website              False
                    ticker               False
                    market cap            True
                    dtype: bool
                
                       state  company_count
                    3     CA            131
                    40    TX             97
                    31    NY             86
                    13    IL             62
                    32    OH             54
                    35    PA             45
                    8     FL             38
                    42    VA             34
                    9     GA             34
                    18    MA             33
                    21    MI             30
                    25    NC             29
                    22    MN             27
                    29    NJ             23
                    5     CT             23
                    39    TN             22
                    23    MO             21
                    45    WI             21
                    4     CO             21
                    2     AZ             20
                
                                             sector  ceo_woman
                  16                      Retailing         14
                  6                      Financials         12
                  4                          Energy          9
                  9                     Health Care          9
                  17                     Technology          7
                  0             Aerospace & Defense          4
                  11             Household Products          4
                  13                      Materials          4
                  12                    Industrials          4
                  18                 Transportation          2
                  10  Hotels, Restaurants & Leisure          2
                  8       Food, Beverages & Tobacco          2
                  7              Food & Drug Stores          2
                  2               Business Services          2
                  1                         Apparel          1
                  14                          Media          1
                  15         Motor Vehicles & Parts          1
                  5      Engineering & Construction          1
                  3                       Chemicals          1
                  19                    Wholesalers          1
                

Next: Visualization

woman ceo
company profit revenue ceo_woman ceo
43 Citigroup 21952.0 79865.0 True Jane Fraser
90 Oracle 13746.0 40479.0 True Safra A. Catz
33 United Parcel Service 12890.0 97287.0 True Carol B. Tomé
24 General Motors 10019.0 127004.0 True Mary T. Barra
3 CVS Health 7910.0 292111.0 True Karen Lynch
sector profit
6 Financials 556635.2
17 Technology 459503.3
9 Health Care 209442.4
4 Energy 139828.6
16 Retailing 126835.7
top 5 sector profit percentage mean revenue per sector relationship between revenue and num. of employees

Source

[1] Fortune
[2] 2022 Fortune 1000
[3] Fortune 1000 EDA
[4] Fortune1000 EDA&Clustering