12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 1/11

Milestone 1 - Dataset - Used cars in USA A deep dive into used car market and industry trend in past few years

In [28]:

Lets plot box-plot for all the numerical columns as

Datapoints = 72435

Data variables = 10

Column names: ['model', 'year', 'price', 'transmission', 'mileage', 'fuelType', 'tax', 'mpg', 'engineSize', 'Make']

Out[28]: year price mileage tax mpg engineSize

count 72435.000000 72435.000000 72435.000000 72435.000000 72435.000000 72435.000000

mean 2017.073666 16580.158708 23176.517057 116.953407 55.852480 1.635650

std 2.101252 9299.028754 21331.515562 64.045533 17.114391 0.561535

min 1996.000000 495.000000 1.000000 0.000000 0.300000 0.000000

25% 2016.000000 10175.000000 7202.500000 30.000000 47.900000 1.200000

50% 2017.000000 14495.000000 17531.000000 145.000000 55.400000 1.600000

75% 2019.000000 20361.000000 32449.000000 145.000000 62.800000 2.000000

max 2020.000000 145000.000000 323000.000000 580.000000 470.800000 6.600000

This dataset is imported from https://www.kaggle.com/datasets/aishwaryamuthukumar/cars-dataset-audi-bmw-ford-hyundai-skoda-vw The purpose of the data set is to put forth the information of all the used cars available in the US market. The data has 10 variables and 72k datapoints. As you can see on the source link above, the whole data has only two discussion and one comment regarding use of data. The data seems very fresh and unanalyzed authored by 3 kaggle competitors.

#Importing the dataset and describing import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = pd.read_csv("./cars_dataset.csv") print("Datapoints = "+str(len(df))+"\n") print("Data variables = "+str(len(df.columns))+"\n") print("Column names: \n"+str(list(df.columns.values))) df.describe()

The above imported dataset consists of global car model details for used cars. The data consisit of information on ~72k cars. Each car has 10 variables among which 4 are categorical variables and 6 are quantitative variables.

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 2/11

In [29]: for column in ["year","price","mileage","tax","mpg","engineSize"]: sns.boxplot(x= "Make", y=column, data=df) plt.show()

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 3/11

Data cleaning using: 1. Dropping missing data 2. Filtering oultiers 3. Filtering small chunk of the data

In [30]:

The data seems to be pretty clean as none of the value got dropped. No missing values.

Now filtering outliers on both the ends, considering the range between 10% and 90% , lets use quantile functions on certain columns to obtain the filtered data.

We would filter out the outliers on mpg and milage columns

In [31]:

In [32]:

Re-Arranging the Variables In [33]:

Cleaning out the data set to the subset such that has only cars manufactured from 2017

Out[30]: 72435

Out[31]: 65038

Out[32]: 50190

Manufacturer model transmission Year_of_Manufacture fuelType engineSize \ 0 audi A1 Manual 2017 Petrol 1.4 1 audi A6 Automatic 2016 Diesel 2.0 2 audi A1 Manual 2016 Petrol 1.4 3 audi A4 Automatic 2017 Diesel 2.0 4 audi A3 Manual 2019 Petrol 1.0

mpg mileage price tax 0 55.4 15735 12500 150.0 1 64.2 36203 16500 20.0 2 55.4 29946 11000 30.0 3 67.3 25952 16800 145.0 4 49.6 1998 17300 145.0

df = df.dropna(axis='columns') len(df)

milage_hi = df["mileage"].quantile(0.95) milage_low = df["mileage"].quantile(0.05) df = df[(df["mileage"] < milage_hi) & (df["mileage"] > milage_low)] len(df)

mpg_hi = df["mpg"].quantile(0.9) mpg_low = df["mpg"].quantile(0.1) df = df[(df["mpg"] < mpg_hi) & (df["mpg"] > mpg_low)] len(df)

df = df[['Make', 'model','transmission', 'year', 'fuelType','engineSize', 'mpg', 'mileage', 'price', 'tax']] df.rename(columns = { 'year': 'Year_of_Manufacture', 'Make':'Manufacturer'}, inplace = True, index = None) print(df.head(5))

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 4/11

In [34]:

Total cars manufactured in last 3yrs is 34383

df_new = df.loc[df['Year_of_Manufacture'] >= 2017] print("Total cars manufactured in last 3yrs is "+str(len(df_new)))

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 5/11

In [35]: for column in ["Year_of_Manufacture","price","mileage","tax","mpg","engineSize"]: sns.boxplot(x= "Manufacturer", y=column, data=df_new) plt.show()

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 6/11

In [36]:

Cars manufactured by the car companies from 2017-2020 print("Following are the number of cars manufactured by the car companies in this decade: \n") df_new['Manufacturer'].value_counts()

In [37]:

In [38]:

In [39]:

Hybrid cars manufactured by the car companies from 2017-2020

Out[36]: Year_of_Manufacture engineSize mpg mileage price tax

count 34383.000000 34383.000000 34383.000000 34383.000000 34383.000000 34383.00000

mean 2018.042637 1.490521 54.645386 15259.372510 16666.185557 134.85996

std 0.930195 0.486528 7.338323 11367.604925 6673.895175 36.45610

min 2017.000000 0.000000 40.300000 1001.000000 5275.000000 0.00000

25% 2017.000000 1.000000 48.700000 6098.500000 11390.000000 145.00000

50% 2018.000000 1.500000 55.400000 12580.000000 15500.000000 145.00000

75% 2019.000000 2.000000 60.100000 21587.000000 20450.000000 145.00000

max 2020.000000 3.000000 68.800000 65601.000000 92000.000000 265.00000

Out[37]: Manufacturer BMW Ford Hyundai audi skoda toyota vw

Year_of_Manufacture

2017 1200 4208 1049 1590 1363 918 2565

2018 639 3686 702 680 787 609 1308

2019 2100 2341 470 1650 1430 782 3026

2020 198 83 70 303 112 55 459

Out[38]: fuelType Diesel Hybrid Other Petrol

Year_of_Manufacture

2017 4389 97 43 8364

2018 1859 54 11 6487

2019 4589 206 49 6955

2020 448 75 12 745

Out[39]: fuelType Diesel Hybrid Other Petrol

Manufacturer

BMW 2807 15 0 1315

Ford 2219 11 0 8088

Hyundai 592 84 0 1615

audi 1845 0 0 2378

skoda 933 0 5 2754

toyota 114 283 63 1904

vw 2775 39 47 4497

df_new.describe()

pd.crosstab(columns= df_new['Manufacturer'], index = df_new['Year_of_Manufacture'])

pd.crosstab(columns= df_new['fuelType'], index = df_new['Year_of_Manufacture'])

pd.crosstab(columns= df_new['fuelType'], index = df_new['Manufacturer'])

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 7/11

In [40]:

In [41]:

Total Hybrid cars manufactured in last 3yrs is 543

Out[40]: toyota 389 Hyundai 84 vw 39 BMW 17 Ford 14 Name: Manufacturer, dtype: int64

Out[41]: Year_of_Manufacture engineSize mpg mileage price tax

count 543.000000 543.000000 543.000000 543.000000 543.00000 543.000000

mean 2018.018416 2.054328 56.172192 16566.740331 23088.94291 101.740331

std 1.514881 0.430454 6.810315 15221.367119 5388.89180 55.393267

min 2009.000000 0.000000 40.400000 1026.000000 5750.00000 0.000000

25% 2017.000000 1.600000 51.100000 4602.500000 19970.00000 20.000000

50% 2019.000000 2.000000 55.400000 10003.000000 22461.00000 135.000000

75% 2019.000000 2.500000 61.400000 25623.000000 26490.00000 140.000000

max 2020.000000 3.000000 68.800000 65601.000000 40995.00000 195.000000

df_new_hybrid = df.loc[df['fuelType'] == "Hybrid"] print("Total Hybrid cars manufactured in last 3yrs is "+str(len(df_new_hybrid))) df_new_hybrid['Manufacturer'].value_counts()

df_new_hybrid.describe()

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 8/11

In [42]: for column in ["Year_of_Manufacture","price","mileage","tax","mpg","engineSize"]: sns.boxplot(y= "Manufacturer", x=column, data=df_new_hybrid) plt.show()

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 9/11

Diesel cars manufactured by the car companies from 2017-2020 In [43]:

In [44]:

Total Diesel cars manufactured in last 3yrs is 18126

Out[43]: BMW 4675 vw 4092 Ford 3383 audi 3333 skoda 1250 Hyundai 1098 toyota 295 Name: Manufacturer, dtype: int64

Out[44]: Year_of_Manufacture engineSize mpg mileage price tax

count 18126.000000 18126.000000 18126.000000 18126.000000 18126.000000 18126.000000

mean 2016.995862 2.014278 56.168233 25296.426239 18900.331347 125.326879

std 1.762329 0.367310 7.152138 17093.255171 6773.250091 52.623628

min 2005.000000 0.000000 40.400000 1001.000000 2395.000000 0.000000

25% 2016.000000 2.000000 50.400000 9736.000000 14295.000000 125.000000

50% 2017.000000 2.000000 56.500000 23631.500000 17750.000000 145.000000

75% 2019.000000 2.000000 62.800000 37647.750000 22495.000000 145.000000

max 2020.000000 3.000000 67.300000 65612.000000 63000.000000 265.000000

df_new_diesel = df.loc[df['fuelType'] == "Diesel"] print("Total Diesel cars manufactured in last 3yrs is "+str(len(df_new_diesel))) df_new_diesel['Manufacturer'].value_counts()

df_new_diesel.describe()

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 10/11

In [45]: for column in ["Year_of_Manufacture","price","mileage","tax","mpg","engineSize"]: sns.boxplot(x= column, y = "Manufacturer", data=df_new_diesel) plt.show()

12/6/22, 2:56 PM Milestone1 - Jupyter Notebook

localhost:8888/notebooks/Downloads/Milestone1.ipynb 11/11

Lets Plot

In [46]:

In [47]:

The above plot clearly explains the interest of the manufacturer companies in hybrid cars. Clearly Toyota had foreseen future and invested in hybrid car market.

In [48]:

The above plot shows all the companies invested in diesel vehicles. Toyota invested least in diesel as compared to others. BMW had a huge diesel cars.

Out[47]: <AxesSubplot:xlabel='Manufacturer', ylabel='count'>

Out[48]: <AxesSubplot:xlabel='Manufacturer', ylabel='count'>

import seaborn as sns

sns.countplot(x="Manufacturer",data=df_new_hybrid)

sns.countplot(x="Manufacturer",data=df_new_diesel)

Pick at least one of the poems below:

Please focus on any aspect of the poem you would like - or you can use the questions below as a guide. The poem is found in our digital textbook. I am also posting a link below for easy access: 

Plath's "Daddy:" 

https://www.poetryfoundation.org/poems/48999/daddy-56d22aafa45b2

Who does the speaker compare her father to? Her husband? How are the two phenomena similar?

How has the speaker tried to free herself from her father?

List some words describing the tone of the poem.

What happened to Plath when she was 10? What was going on in world history at the time? How does Plath relate world history to personal history?

How do your attempts to balance your family and individual identity compare to Plath's? (let's hope favorably)

Hayden's "Those Winter Sundays:"

https://d43fweuh3sg51.cloudfront.net/media/media_files/bd7e329f-d7d5-4eb9-aa33-8491b5364989/336993c7-cbab-4840-bd8f-0246a20cc010.pdf

Was the speaker an especially clueless and unappreciative kid? Or are all children a little bit blind to their parents' life experiences?

Does the father's point of view appear anywhere in the poem? Do we ever know what he is thinking or feeling?

Why is so much of the poem about coldness and warmth? How do these things affect the poem's meaning?

Do you think that the speaker was ever able to tell his father what he's learned about love? Why or why not?

Give an example of alliteration.

Is this poem lyric or narrative? Fixed or Free?

What is the mood of this poem? Cite at least 2 pieces of evidence (words with connotations) that prove it.

Cite evidence that the speaker's father loved him.

Cite evidence that the speaker did not appreciate his father.

Get help from top-rated tutors in any subject.

Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com