Part 3: Connecting House Sales In the Same Year¶
Instead of grouping all the sales made of the same house in the same year, we can analyse the total number of time the same house was sold in the same year.
To do this, we return to our original data, inner join each year to itself, remove any rows where the second date of transfer is before the first then remove any duplicates of the first date of transfer.
This returns us with a table that has each sale of the same house in order, pairing consecutive sales together.
In [52]:
data.head()
Out[52]:
In [53]:
# Seperate each year into its own table
data1995_2 = data.loc[data['Year']==1995]
data1996_2 = data.loc[data['Year']==1996]
data1997_2 = data.loc[data['Year']==1997]
data1998_2 = data.loc[data['Year']==1998]
data1999_2 = data.loc[data['Year']==1999]
data2000_2 = data.loc[data['Year']==2000]
data2001_2 = data.loc[data['Year']==2001]
data2002_2 = data.loc[data['Year']==2002]
data2003_2 = data.loc[data['Year']==2003]
data2004_2 = data.loc[data['Year']==2004]
data2005_2 = data.loc[data['Year']==2005]
data2006_2 = data.loc[data['Year']==2006]
data2007_2 = data.loc[data['Year']==2007]
data2008_2 = data.loc[data['Year']==2008]
data2009_2 = data.loc[data['Year']==2009]
data2010_2 = data.loc[data['Year']==2010]
data2011_2 = data.loc[data['Year']==2011]
data2012_2 = data.loc[data['Year']==2012]
data2013_2 = data.loc[data['Year']==2013]
data2014_2 = data.loc[data['Year']==2014]
data2015_2 = data.loc[data['Year']==2015]
data2016_2 = data.loc[data['Year']==2016]
data2017_2 = data.loc[data['Year']==2017]
In [98]:
dfs2 = [data1995_2, data1996_2,data1996_2,data1997_2,
data1998_2,data1999_2,data2000_2,data2001_2,data2002_2,
data2003_2,data2004_2,data2005_2,data2006_2,data2007_2,data2008_2,
data2009_2,data2010_2,data2011_2,data2012_2,data2013_2,data2014_2,
data2015_2,data2016_2,data2017_2]
d2={}
for j in range(0,23):
d2["data_" +str(j+1995) +"_"+str(j+1995)] = dfs2[j].merge(dfs2[j], on = ['Post Code', 'Address 1','Address 3','Property Type'], how = 'inner')
d2["data_" +str(j+1995) +"_"+str(j+1995)] = d2["data_" +str(j+1995) +"_"+str(j+1995)].loc[(d2["data_" +str(j+1995) +"_"+str(j+1995)]['Date of Transfer_x'] < d2["data_" +str(j+1995) +"_"+str(j+1995)]['Date of Transfer_y'] ),]
d2["data_" +str(j+1995) +"_"+str(j+1995)] = d2["data_" +str(j+1995) +"_"+str(j+1995)].sort_values(by = ['Date of Transfer_x'])
d2["data_" +str(j+1995) +"_"+str(j+1995)] = d2["data_" +str(j+1995) +"_"+str(j+1995)].drop_duplicates(subset = ['Date of Transfer_x'])
d2["data_" +str(j+1995) +"_"+str(j+1995)] = d2["data_" +str(j+1995) +"_"+str(j+1995)][['Year_x','AdjustedPrice_x', 'Date of Transfer_x', 'Year_y','AdjustedPrice_y','Date of Transfer_y', 'LondonCheck_x']]
In [99]:
(d2["data_2017_2017"])
Out[99]:
In [103]:
HouseCountComparisonSameYear = pd.DataFrame([])
for j in range(1996,2018):
#print(str(i) + "-" +str(j) +": " + str(len(d["data_"+str(i)+"_"+str(j)])))
count = len(d2["data_"+str(j)+"_"+str(j)])
HouseCountComparisonSameYear2 = pd.DataFrame([[j,j,count]])
HouseCountComparisonSameYear = HouseCountComparisonSameYear.append(HouseCountComparisonSameYear2)
HouseCountComparisonSameYear.columns = ['Year1', 'Year2', 'Count']
In [105]:
HouseCountComparisonSameYear
Out[105]:
In [107]:
HouseCountComparisonSameYearMatrix = HouseCountComparisonSameYear.pivot("Year1", "Year2")
HouseCountComparisonSameYearMatrix
Out[107]:
In [122]:
with sns.axes_style("white"):
ax = sns.heatmap(HouseCountComparisonSameYearMatrix, linewidth=0.2, cmap = "BuPu")
ax.set_title('Number of times the same house was sold in the same year')
ax.set_ylabel('First Year')
ax.set_xlabel('Second Year')
plt.show()
Lastly we combine both comparison tables into one and create our final heat map¶
In [111]:
HouseCountComparisonSameALLYEARS = pd.concat([HouseCountComparisonSameYear,HouseCountComparison])
In [121]:
HouseCountComparisonSameALLYEARSMatrix = HouseCountComparisonSameALLYEARS.pivot("Year1", "Year2")
HouseCountComparisonSameALLYEARSMatrix
Out[121]:
In [133]:
with sns.axes_style("white"):
ax = sns.heatmap(HouseCountComparisonSameALLYEARSMatrix, linewidth=0.2, cmap = "BuPu")
ax.set_title('Number of times the same house was sold in a second year')
ax.set_ylabel('First Year')
ax.set_xlabel('Second Year')
plt.show()