12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 1/11
Milestone 1 - Dataset - Used cars in USA A deep dive into used car market and industry trend in past few years
In [28]:
Lets plot box-plot for all the numerical columns as
Datapoints = 72435
Data variables = 10
Column names: ['model', 'year', 'price', 'transmission', 'mileage', 'fuelType', 'tax', 'mpg', 'engineSize', 'Make']
Out[28]: year price mileage tax mpg engineSize
count 72435.000000 72435.000000 72435.000000 72435.000000 72435.000000 72435.000000
mean 2017.073666 16580.158708 23176.517057 116.953407 55.852480 1.635650
std 2.101252 9299.028754 21331.515562 64.045533 17.114391 0.561535
min 1996.000000 495.000000 1.000000 0.000000 0.300000 0.000000
25% 2016.000000 10175.000000 7202.500000 30.000000 47.900000 1.200000
50% 2017.000000 14495.000000 17531.000000 145.000000 55.400000 1.600000
75% 2019.000000 20361.000000 32449.000000 145.000000 62.800000 2.000000
max 2020.000000 145000.000000 323000.000000 580.000000 470.800000 6.600000
This dataset is imported from https://www.kaggle.com/datasets/aishwaryamuthukumar/cars-dataset-audi-bmw-ford-hyundai-skoda-vw The purpose of the data set is to put forth the information of all the used cars available in the US market. The data has 10 variables and 72k datapoints. As you can see on the source link above, the whole data has only two discussion and one comment regarding use of data. The data seems very fresh and unanalyzed authored by 3 kaggle competitors.
#Importing the dataset and describing import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = pd.read_csv("./cars_dataset.csv") print("Datapoints = "+str(len(df))+"\n") print("Data variables = "+str(len(df.columns))+"\n") print("Column names: \n"+str(list(df.columns.values))) df.describe()
The above imported dataset consists of global car model details for used cars. The data consisit of information on ~72k cars. Each car has 10 variables among which 4 are categorical variables and 6 are quantitative variables.
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 2/11
In [29]: for column in ["year","price","mileage","tax","mpg","engineSize"]: sns.boxplot(x= "Make", y=column, data=df) plt.show()
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 3/11
Data cleaning using: 1. Dropping missing data 2. Filtering oultiers 3. Filtering small chunk of the data
In [30]:
The data seems to be pretty clean as none of the value got dropped. No missing values.
Now filtering outliers on both the ends, considering the range between 10% and 90% , lets use quantile functions on certain columns to obtain the filtered data.
We would filter out the outliers on mpg and milage columns
In [31]:
In [32]:
Re-Arranging the Variables In [33]:
Cleaning out the data set to the subset such that has only cars manufactured from 2017
Out[30]: 72435
Out[31]: 65038
Out[32]: 50190
Manufacturer model transmission Year_of_Manufacture fuelType engineSize \ 0 audi A1 Manual 2017 Petrol 1.4 1 audi A6 Automatic 2016 Diesel 2.0 2 audi A1 Manual 2016 Petrol 1.4 3 audi A4 Automatic 2017 Diesel 2.0 4 audi A3 Manual 2019 Petrol 1.0
mpg mileage price tax 0 55.4 15735 12500 150.0 1 64.2 36203 16500 20.0 2 55.4 29946 11000 30.0 3 67.3 25952 16800 145.0 4 49.6 1998 17300 145.0
df = df.dropna(axis='columns') len(df)
milage_hi = df["mileage"].quantile(0.95) milage_low = df["mileage"].quantile(0.05) df = df[(df["mileage"] < milage_hi) & (df["mileage"] > milage_low)] len(df)
mpg_hi = df["mpg"].quantile(0.9) mpg_low = df["mpg"].quantile(0.1) df = df[(df["mpg"] < mpg_hi) & (df["mpg"] > mpg_low)] len(df)
df = df[['Make', 'model','transmission', 'year', 'fuelType','engineSize', 'mpg', 'mileage', 'price', 'tax']] df.rename(columns = { 'year': 'Year_of_Manufacture', 'Make':'Manufacturer'}, inplace = True, index = None) print(df.head(5))
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 4/11
In [34]:
Total cars manufactured in last 3yrs is 34383
df_new = df.loc[df['Year_of_Manufacture'] >= 2017] print("Total cars manufactured in last 3yrs is "+str(len(df_new)))
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 5/11
In [35]: for column in ["Year_of_Manufacture","price","mileage","tax","mpg","engineSize"]: sns.boxplot(x= "Manufacturer", y=column, data=df_new) plt.show()
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 6/11
In [36]:
Cars manufactured by the car companies from 2017-2020 print("Following are the number of cars manufactured by the car companies in this decade: \n") df_new['Manufacturer'].value_counts()
In [37]:
In [38]:
In [39]:
Hybrid cars manufactured by the car companies from 2017-2020
Out[36]: Year_of_Manufacture engineSize mpg mileage price tax
count 34383.000000 34383.000000 34383.000000 34383.000000 34383.000000 34383.00000
mean 2018.042637 1.490521 54.645386 15259.372510 16666.185557 134.85996
std 0.930195 0.486528 7.338323 11367.604925 6673.895175 36.45610
min 2017.000000 0.000000 40.300000 1001.000000 5275.000000 0.00000
25% 2017.000000 1.000000 48.700000 6098.500000 11390.000000 145.00000
50% 2018.000000 1.500000 55.400000 12580.000000 15500.000000 145.00000
75% 2019.000000 2.000000 60.100000 21587.000000 20450.000000 145.00000
max 2020.000000 3.000000 68.800000 65601.000000 92000.000000 265.00000
Out[37]: Manufacturer BMW Ford Hyundai audi skoda toyota vw
Year_of_Manufacture
2017 1200 4208 1049 1590 1363 918 2565
2018 639 3686 702 680 787 609 1308
2019 2100 2341 470 1650 1430 782 3026
2020 198 83 70 303 112 55 459
Out[38]: fuelType Diesel Hybrid Other Petrol
Year_of_Manufacture
2017 4389 97 43 8364
2018 1859 54 11 6487
2019 4589 206 49 6955
2020 448 75 12 745
Out[39]: fuelType Diesel Hybrid Other Petrol
Manufacturer
BMW 2807 15 0 1315
Ford 2219 11 0 8088
Hyundai 592 84 0 1615
audi 1845 0 0 2378
skoda 933 0 5 2754
toyota 114 283 63 1904
vw 2775 39 47 4497
df_new.describe()
pd.crosstab(columns= df_new['Manufacturer'], index = df_new['Year_of_Manufacture'])
pd.crosstab(columns= df_new['fuelType'], index = df_new['Year_of_Manufacture'])
pd.crosstab(columns= df_new['fuelType'], index = df_new['Manufacturer'])
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 7/11
In [40]:
In [41]:
Total Hybrid cars manufactured in last 3yrs is 543
Out[40]: toyota 389 Hyundai 84 vw 39 BMW 17 Ford 14 Name: Manufacturer, dtype: int64
Out[41]: Year_of_Manufacture engineSize mpg mileage price tax
count 543.000000 543.000000 543.000000 543.000000 543.00000 543.000000
mean 2018.018416 2.054328 56.172192 16566.740331 23088.94291 101.740331
std 1.514881 0.430454 6.810315 15221.367119 5388.89180 55.393267
min 2009.000000 0.000000 40.400000 1026.000000 5750.00000 0.000000
25% 2017.000000 1.600000 51.100000 4602.500000 19970.00000 20.000000
50% 2019.000000 2.000000 55.400000 10003.000000 22461.00000 135.000000
75% 2019.000000 2.500000 61.400000 25623.000000 26490.00000 140.000000
max 2020.000000 3.000000 68.800000 65601.000000 40995.00000 195.000000
df_new_hybrid = df.loc[df['fuelType'] == "Hybrid"] print("Total Hybrid cars manufactured in last 3yrs is "+str(len(df_new_hybrid))) df_new_hybrid['Manufacturer'].value_counts()
df_new_hybrid.describe()
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 8/11
In [42]: for column in ["Year_of_Manufacture","price","mileage","tax","mpg","engineSize"]: sns.boxplot(y= "Manufacturer", x=column, data=df_new_hybrid) plt.show()
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 9/11
Diesel cars manufactured by the car companies from 2017-2020 In [43]:
In [44]:
Total Diesel cars manufactured in last 3yrs is 18126
Out[43]: BMW 4675 vw 4092 Ford 3383 audi 3333 skoda 1250 Hyundai 1098 toyota 295 Name: Manufacturer, dtype: int64
Out[44]: Year_of_Manufacture engineSize mpg mileage price tax
count 18126.000000 18126.000000 18126.000000 18126.000000 18126.000000 18126.000000
mean 2016.995862 2.014278 56.168233 25296.426239 18900.331347 125.326879
std 1.762329 0.367310 7.152138 17093.255171 6773.250091 52.623628
min 2005.000000 0.000000 40.400000 1001.000000 2395.000000 0.000000
25% 2016.000000 2.000000 50.400000 9736.000000 14295.000000 125.000000
50% 2017.000000 2.000000 56.500000 23631.500000 17750.000000 145.000000
75% 2019.000000 2.000000 62.800000 37647.750000 22495.000000 145.000000
max 2020.000000 3.000000 67.300000 65612.000000 63000.000000 265.000000
df_new_diesel = df.loc[df['fuelType'] == "Diesel"] print("Total Diesel cars manufactured in last 3yrs is "+str(len(df_new_diesel))) df_new_diesel['Manufacturer'].value_counts()
df_new_diesel.describe()
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 10/11
In [45]: for column in ["Year_of_Manufacture","price","mileage","tax","mpg","engineSize"]: sns.boxplot(x= column, y = "Manufacturer", data=df_new_diesel) plt.show()
12/6/22, 2:56 PM Milestone1 - Jupyter Notebook
localhost:8888/notebooks/Downloads/Milestone1.ipynb 11/11
Lets Plot
In [46]:
In [47]:
The above plot clearly explains the interest of the manufacturer companies in hybrid cars. Clearly Toyota had foreseen future and invested in hybrid car market.
In [48]:
The above plot shows all the companies invested in diesel vehicles. Toyota invested least in diesel as compared to others. BMW had a huge diesel cars.
Out[47]: <AxesSubplot:xlabel='Manufacturer', ylabel='count'>
Out[48]: <AxesSubplot:xlabel='Manufacturer', ylabel='count'>
import seaborn as sns
sns.countplot(x="Manufacturer",data=df_new_hybrid)
sns.countplot(x="Manufacturer",data=df_new_diesel)
Pick at least one of the poems below:
Please focus on any aspect of the poem you would like - or you can use the questions below as a guide. The poem is found in our digital textbook. I am also posting a link below for easy access:
Plath's "Daddy:"
https://www.poetryfoundation.org/poems/48999/daddy-56d22aafa45b2
Who does the speaker compare her father to? Her husband? How are the two phenomena similar?
How has the speaker tried to free herself from her father?
List some words describing the tone of the poem.
What happened to Plath when she was 10? What was going on in world history at the time? How does Plath relate world history to personal history?
How do your attempts to balance your family and individual identity compare to Plath's? (let's hope favorably)
Hayden's "Those Winter Sundays:"
Was the speaker an especially clueless and unappreciative kid? Or are all children a little bit blind to their parents' life experiences?
Does the father's point of view appear anywhere in the poem? Do we ever know what he is thinking or feeling?
Why is so much of the poem about coldness and warmth? How do these things affect the poem's meaning?
Do you think that the speaker was ever able to tell his father what he's learned about love? Why or why not?
Give an example of alliteration.
Is this poem lyric or narrative? Fixed or Free?
What is the mood of this poem? Cite at least 2 pieces of evidence (words with connotations) that prove it.
Cite evidence that the speaker's father loved him.
Cite evidence that the speaker did not appreciate his father.

Get help from top-rated tutors in any subject.
Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com