Part 2: Connecting House Sales Between Years¶
Combine multiple sales of the same house in a year into one taking the mean of prices¶
Originally we simply sorted by the date of transfer and took the first occurance of the house in each year. Instead, we now group all sales of the same house in each year into one row and take the mean adjusted price of the sales.
In [18]:
# Old method of simply taking first occurence
#data2 = data.sort_values(by=['Date of Transfer'])
#data3 = data2.drop_duplicates(subset = ['Post Code', 'Address 1','Address 3','Property Type', 'Year'])
# New method of taking the mean of all sales of the same house in that year
data2 = data.groupby(['Post Code', 'Address 1','Address 3','Property Type', 'Year', 'LondonCheck']).mean().reset_index()
In [19]:
data2.head()
Out[19]:
In [20]:
# Seperate each year into its own table
data1995 = data2.loc[data2['Year']==1995]
data1996 = data2.loc[data2['Year']==1996]
data1997 = data2.loc[data2['Year']==1997]
data1998 = data2.loc[data2['Year']==1998]
data1999 = data2.loc[data2['Year']==1999]
data2000 = data2.loc[data2['Year']==2000]
data2001 = data2.loc[data2['Year']==2001]
data2002 = data2.loc[data2['Year']==2002]
data2003 = data2.loc[data2['Year']==2003]
data2004 = data2.loc[data2['Year']==2004]
data2005 = data2.loc[data2['Year']==2005]
data2006 = data2.loc[data2['Year']==2006]
data2007 = data2.loc[data2['Year']==2007]
data2008 = data2.loc[data2['Year']==2008]
data2009 = data2.loc[data2['Year']==2009]
data2010 = data2.loc[data2['Year']==2010]
data2011 = data2.loc[data2['Year']==2011]
data2012 = data2.loc[data2['Year']==2012]
data2013 = data2.loc[data2['Year']==2013]
data2014 = data2.loc[data2['Year']==2014]
data2015 = data2.loc[data2['Year']==2015]
data2016 = data2.loc[data2['Year']==2016]
data2017 = data2.loc[data2['Year']==2017]
In [21]:
data1995.head()
Out[21]:
Inner join a year based on the address features to match the same house between years if it appears in both years.¶
In [22]:
data_1995_2017 = data1995.merge(data2017, on = ['Post Code', 'Address 1','Address 3',
'Property Type'], how = 'inner')
data_1995_2017 = data_1995_2017[['Year_x','AdjustedPrice_x', 'Year_y','AdjustedPrice_y', 'LondonCheck_x']]
data_1995_2017.head()
Out[22]:
We then calculate the regression of the prices where x is the price of the first year and y is the price of the second year. We also now seperate by whether the houses are in London or the Rest of the UK.¶
Unfortunately, the plot creates so many points that it is difficult to load correctly and is extremely cluttered. We therefore need to consider more practical ways to visualise with the high density of houses in the lower price brackets.¶
In [23]:
x1 = data_1995_2017['AdjustedPrice_x'].loc[data_1995_2017['LondonCheck_x']=="GREATER LONDON"]
y1 = data_1995_2017['AdjustedPrice_y'].loc[data_1995_2017['LondonCheck_x']=="GREATER LONDON"]
slope1, intercept1, r_value1, p_value1, std_err1 = stats.linregress(x1,y1)
line1 = slope1*x1+intercept1
x2 = data_1995_2017['AdjustedPrice_x'].loc[data_1995_2017['LondonCheck_x']=="Rest of UK"]
y2 = data_1995_2017['AdjustedPrice_y'].loc[data_1995_2017['LondonCheck_x']=="Rest of UK"]
slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(x2,y2)
line2 = slope2*x2+intercept2
# The number of rows in greater london is .. percent of the number of rows in the rest of the UK data
1-((len(x2)-len(x1))/len(x2))
Out[23]:
In [24]:
trace0 = go.Scatter(
x = data_1995_2017['AdjustedPrice_x'].loc[data_1995_2017['LondonCheck_x']=="GREATER LONDON"],
y = data_1995_2017['AdjustedPrice_y'].loc[data_1995_2017['LondonCheck_x']=="GREATER LONDON"],
name = 'London',
mode = 'markers',
marker = dict(
size = 10,
color = 'rgba(152, 0, 0, .8)',
line = dict(
width = 2,
color = 'rgb(0, 0, 0)'
)
)
)
trace1 = go.Scatter(
x = data_1995_2017['AdjustedPrice_x'].loc[data_1995_2017['LondonCheck_x']=="Rest of UK"],
y = data_1995_2017['AdjustedPrice_y'].loc[data_1995_2017['LondonCheck_x']=="Rest of UK"],
name = 'Rest of UK',
mode = 'markers',
marker = dict(
size = 10,
color = 'rgba(255, 182, 193, .9)',
line = dict(
width = 2,
color = 'rgb(0, 0, 0)'
)
)
)
trace2 = go.Scatter(
x=x1,
y=line1,
mode='lines',
marker=go.Marker(color='rgba(152, 0, 0, .8)'),
name='London Fit'
)
trace3 = go.Scatter(
x=x2,
y=line2,
mode='lines',
marker=go.Marker(color='rgba(255, 182, 193, .9)'),
name='Rest of UK Fit'
)
datascatter = [trace0, trace1, trace2, trace3]
layout_scatter = dict(title = 'Comparison of the Adjusted Price of the Same House in 1995 Sold Again in 2017 <br> Seperated by whether in London or Rest of UK',
yaxis = dict(title = 'Price 1995 (£)',zeroline = False, range = [0,10000000]),
xaxis = dict(title = 'Price 2017 (£)',zeroline = False, range = [0,10000000])
)
fig = dict(data=datascatter, layout=layout_scatter)
#iplot(fig, filename='1995 to 2017', validate = False)
Handling Large Amounts of Data in One Plot¶
Instead of reducing the data further, we can apply R’s Hexbins which group the points and then shades the hexagons based on the amount in each to illustrate the denser areas.
In [25]:
%load_ext rpy2.ipython
In [26]:
%%R -i x1,y1 -w 5 -h 5 --units in -r 200
library(hexbin)
hbin <- hexbin(x1,y1,xbins = 40)
plot(hbin)
GG plot also has a Hexbin visual that allows us more customisation options and can improve our aesthetics for analysis¶
In [27]:
%%R -i x1,y1,x2,y2,line1,line2 -w 5 -h 4 --units in -r 200
#install.packages("plotly")
library(ggplot2)
library(plotly)
df <- data.frame(x1,y1, line1)
df2 <- data.frame(x2,y2,line2)
hbin2 <- ggplot() +
stat_binhex(data=df,aes(x=x1,y=y1, alpha=log10(..count..)), fill="#A833FF", line = "#fffff", bins = 75) +
stat_binhex(data=df2,aes(x=x2,y=y2, alpha=log10(..count..)), fill="#33FFA6", bins = 75) +
guides(fill = FALSE, alpha=FALSE) +
ggtitle('Comparison of the Adjusted Price of the Same House in 1995 Sold Again in 2017 \n Seperated by whether in London or Rest of UK')+
scale_x_continuous(name="Price 1995 (£)", labels = scales::comma) +
scale_y_continuous(name="Price 2017 (£)", labels = scales::comma)+
theme(panel.background = element_rect(fill='black' ,colour='#ababab'),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
title = element_text(size=7)) +
geom_line(color = '#ee77ee', data = df, aes(x=x1,y=line1)) +
geom_line(color = '#36e3e3', data = df2, aes(x=x2,y=line2)) +
annotate(geom = 'text', label = 'London', color = '#A833FF', x=2000000, y = 7500000)+
annotate(geom = 'text', label = 'Rest of UK', color = '#33FFA6', x=3500000, y = 8000000)
plot(hbin2)
In [28]:
%%R
# Or we can use Plot.ly to make this a bit clearer
hbin3 <-(ggplotly(hbin2))
#hbin3
Create dictionary of data tables connecting each year to all possible future years¶
In [29]:
dfs = [data1995,data1996,data1997,data1998,data1999,data2000,data2001,data2002,data2003,data2004,data2005,data2006,
data2007,data2008,data2009,data2010,data2011,data2012,data2013,data2014,data2015,data2016,data2017]
d={}
for i in range(0,22):
for j in range(1,23):
if j <= i:
continue
else:
d["data_" +str(i+1995) +"_"+str(j+1995)] = dfs[i].merge(dfs[j], on = ['Post Code', 'Address 1','Address 3','Property Type'], how = 'inner')
d["data_" +str(i+1995) +"_"+str(j+1995)] = d["data_" +str(i+1995) +"_"+str(j+1995)][['Year_x','AdjustedPrice_x', 'Year_y','AdjustedPrice_y', 'LondonCheck_x']]
In [30]:
d["data_2016_2017"].head()
Out[30]:
In [31]:
HouseCountComparison = pd.DataFrame([])
for i in range(1995,2017):
for j in range(1996,2018):
if j <= i:
continue
else:
#print(str(i) + "-" +str(j) +": " + str(len(d["data_"+str(i)+"_"+str(j)])))
count = len(d["data_"+str(i)+"_"+str(j)])
HouseCountComparison2 = pd.DataFrame([[i,j,count]])
HouseCountComparison = HouseCountComparison.append(HouseCountComparison2)
HouseCountComparison.columns = ['Year1', 'Year2', 'Count']
In [32]:
HouseCountComparison.head()
Out[32]:
In [33]:
HouseCountComparisonMatrix = HouseCountComparison.pivot("Year1", "Year2")
HouseCountComparisonMatrix
Out[33]:
In [34]:
with sns.axes_style("white"):
ax = sns.heatmap(HouseCountComparisonMatrix, linewidth=0.2, cmap = "BuPu")
ax.set_title('Number of times the same house was sold on consecutive years')
ax.set_ylabel('First Year')
ax.set_xlabel('Second Year')
plt.show()
Create and Save a plot for each year comprison¶
In [35]:
for i in range(1995,2017):
for j in range(1996,2018):
if j <= i:
continue
else:
dataY2YforPlot = (d["data_"+str(i)+"_"+str(j)])
In [36]:
for i in range(1995,2017):
for j in range(1996,2018):
if j <= i:
continue
else:
#"data_"+str(i)+"_"+str(j) + " = " + "d[" + " 'data_" + str(i) + "_" + str(j) + "']"
print("data_"+str(i)+"_"+str(j)+",")
In [37]:
data_1995_1996 = d[ 'data_1995_1996']
data_1995_1997 = d[ 'data_1995_1997']
data_1995_1998 = d[ 'data_1995_1998']
data_1995_1999 = d[ 'data_1995_1999']
data_1995_2000 = d[ 'data_1995_2000']
data_1995_2001 = d[ 'data_1995_2001']
data_1995_2002 = d[ 'data_1995_2002']
data_1995_2003 = d[ 'data_1995_2003']
data_1995_2004 = d[ 'data_1995_2004']
data_1995_2005 = d[ 'data_1995_2005']
data_1995_2006 = d[ 'data_1995_2006']
data_1995_2007 = d[ 'data_1995_2007']
data_1995_2008 = d[ 'data_1995_2008']
data_1995_2009 = d[ 'data_1995_2009']
data_1995_2010 = d[ 'data_1995_2010']
data_1995_2011 = d[ 'data_1995_2011']
data_1995_2012 = d[ 'data_1995_2012']
data_1995_2013 = d[ 'data_1995_2013']
data_1995_2014 = d[ 'data_1995_2014']
data_1995_2015 = d[ 'data_1995_2015']
data_1995_2016 = d[ 'data_1995_2016']
data_1995_2017 = d[ 'data_1995_2017']
data_1996_1997 = d[ 'data_1996_1997']
data_1996_1998 = d[ 'data_1996_1998']
data_1996_1999 = d[ 'data_1996_1999']
data_1996_2000 = d[ 'data_1996_2000']
data_1996_2001 = d[ 'data_1996_2001']
data_1996_2002 = d[ 'data_1996_2002']
data_1996_2003 = d[ 'data_1996_2003']
data_1996_2004 = d[ 'data_1996_2004']
data_1996_2005 = d[ 'data_1996_2005']
data_1996_2006 = d[ 'data_1996_2006']
data_1996_2007 = d[ 'data_1996_2007']
data_1996_2008 = d[ 'data_1996_2008']
data_1996_2009 = d[ 'data_1996_2009']
data_1996_2010 = d[ 'data_1996_2010']
data_1996_2011 = d[ 'data_1996_2011']
data_1996_2012 = d[ 'data_1996_2012']
data_1996_2013 = d[ 'data_1996_2013']
data_1996_2014 = d[ 'data_1996_2014']
data_1996_2015 = d[ 'data_1996_2015']
data_1996_2016 = d[ 'data_1996_2016']
data_1996_2017 = d[ 'data_1996_2017']
data_1997_1998 = d[ 'data_1997_1998']
data_1997_1999 = d[ 'data_1997_1999']
data_1997_2000 = d[ 'data_1997_2000']
data_1997_2001 = d[ 'data_1997_2001']
data_1997_2002 = d[ 'data_1997_2002']
data_1997_2003 = d[ 'data_1997_2003']
data_1997_2004 = d[ 'data_1997_2004']
data_1997_2005 = d[ 'data_1997_2005']
data_1997_2006 = d[ 'data_1997_2006']
data_1997_2007 = d[ 'data_1997_2007']
data_1997_2008 = d[ 'data_1997_2008']
data_1997_2009 = d[ 'data_1997_2009']
data_1997_2010 = d[ 'data_1997_2010']
data_1997_2011 = d[ 'data_1997_2011']
data_1997_2012 = d[ 'data_1997_2012']
data_1997_2013 = d[ 'data_1997_2013']
data_1997_2014 = d[ 'data_1997_2014']
data_1997_2015 = d[ 'data_1997_2015']
data_1997_2016 = d[ 'data_1997_2016']
data_1997_2017 = d[ 'data_1997_2017']
data_1998_1999 = d[ 'data_1998_1999']
data_1998_2000 = d[ 'data_1998_2000']
data_1998_2001 = d[ 'data_1998_2001']
data_1998_2002 = d[ 'data_1998_2002']
data_1998_2003 = d[ 'data_1998_2003']
data_1998_2004 = d[ 'data_1998_2004']
data_1998_2005 = d[ 'data_1998_2005']
data_1998_2006 = d[ 'data_1998_2006']
data_1998_2007 = d[ 'data_1998_2007']
data_1998_2008 = d[ 'data_1998_2008']
data_1998_2009 = d[ 'data_1998_2009']
data_1998_2010 = d[ 'data_1998_2010']
data_1998_2011 = d[ 'data_1998_2011']
data_1998_2012 = d[ 'data_1998_2012']
data_1998_2013 = d[ 'data_1998_2013']
data_1998_2014 = d[ 'data_1998_2014']
data_1998_2015 = d[ 'data_1998_2015']
data_1998_2016 = d[ 'data_1998_2016']
data_1998_2017 = d[ 'data_1998_2017']
data_1999_2000 = d[ 'data_1999_2000']
data_1999_2001 = d[ 'data_1999_2001']
data_1999_2002 = d[ 'data_1999_2002']
data_1999_2003 = d[ 'data_1999_2003']
data_1999_2004 = d[ 'data_1999_2004']
data_1999_2005 = d[ 'data_1999_2005']
data_1999_2006 = d[ 'data_1999_2006']
data_1999_2007 = d[ 'data_1999_2007']
data_1999_2008 = d[ 'data_1999_2008']
data_1999_2009 = d[ 'data_1999_2009']
data_1999_2010 = d[ 'data_1999_2010']
data_1999_2011 = d[ 'data_1999_2011']
data_1999_2012 = d[ 'data_1999_2012']
data_1999_2013 = d[ 'data_1999_2013']
data_1999_2014 = d[ 'data_1999_2014']
data_1999_2015 = d[ 'data_1999_2015']
data_1999_2016 = d[ 'data_1999_2016']
data_1999_2017 = d[ 'data_1999_2017']
data_2000_2001 = d[ 'data_2000_2001']
data_2000_2002 = d[ 'data_2000_2002']
data_2000_2003 = d[ 'data_2000_2003']
data_2000_2004 = d[ 'data_2000_2004']
data_2000_2005 = d[ 'data_2000_2005']
data_2000_2006 = d[ 'data_2000_2006']
data_2000_2007 = d[ 'data_2000_2007']
data_2000_2008 = d[ 'data_2000_2008']
data_2000_2009 = d[ 'data_2000_2009']
data_2000_2010 = d[ 'data_2000_2010']
data_2000_2011 = d[ 'data_2000_2011']
data_2000_2012 = d[ 'data_2000_2012']
data_2000_2013 = d[ 'data_2000_2013']
data_2000_2014 = d[ 'data_2000_2014']
data_2000_2015 = d[ 'data_2000_2015']
data_2000_2016 = d[ 'data_2000_2016']
data_2000_2017 = d[ 'data_2000_2017']
data_2001_2002 = d[ 'data_2001_2002']
data_2001_2003 = d[ 'data_2001_2003']
data_2001_2004 = d[ 'data_2001_2004']
data_2001_2005 = d[ 'data_2001_2005']
data_2001_2006 = d[ 'data_2001_2006']
data_2001_2007 = d[ 'data_2001_2007']
data_2001_2008 = d[ 'data_2001_2008']
data_2001_2009 = d[ 'data_2001_2009']
data_2001_2010 = d[ 'data_2001_2010']
data_2001_2011 = d[ 'data_2001_2011']
data_2001_2012 = d[ 'data_2001_2012']
data_2001_2013 = d[ 'data_2001_2013']
data_2001_2014 = d[ 'data_2001_2014']
data_2001_2015 = d[ 'data_2001_2015']
data_2001_2016 = d[ 'data_2001_2016']
data_2001_2017 = d[ 'data_2001_2017']
data_2002_2003 = d[ 'data_2002_2003']
data_2002_2004 = d[ 'data_2002_2004']
data_2002_2005 = d[ 'data_2002_2005']
data_2002_2006 = d[ 'data_2002_2006']
data_2002_2007 = d[ 'data_2002_2007']
data_2002_2008 = d[ 'data_2002_2008']
data_2002_2009 = d[ 'data_2002_2009']
data_2002_2010 = d[ 'data_2002_2010']
data_2002_2011 = d[ 'data_2002_2011']
data_2002_2012 = d[ 'data_2002_2012']
data_2002_2013 = d[ 'data_2002_2013']
data_2002_2014 = d[ 'data_2002_2014']
data_2002_2015 = d[ 'data_2002_2015']
data_2002_2016 = d[ 'data_2002_2016']
data_2002_2017 = d[ 'data_2002_2017']
data_2003_2004 = d[ 'data_2003_2004']
data_2003_2005 = d[ 'data_2003_2005']
data_2003_2006 = d[ 'data_2003_2006']
data_2003_2007 = d[ 'data_2003_2007']
data_2003_2008 = d[ 'data_2003_2008']
data_2003_2009 = d[ 'data_2003_2009']
data_2003_2010 = d[ 'data_2003_2010']
data_2003_2011 = d[ 'data_2003_2011']
data_2003_2012 = d[ 'data_2003_2012']
data_2003_2013 = d[ 'data_2003_2013']
data_2003_2014 = d[ 'data_2003_2014']
data_2003_2015 = d[ 'data_2003_2015']
data_2003_2016 = d[ 'data_2003_2016']
data_2003_2017 = d[ 'data_2003_2017']
data_2004_2005 = d[ 'data_2004_2005']
data_2004_2006 = d[ 'data_2004_2006']
data_2004_2007 = d[ 'data_2004_2007']
data_2004_2008 = d[ 'data_2004_2008']
data_2004_2009 = d[ 'data_2004_2009']
data_2004_2010 = d[ 'data_2004_2010']
data_2004_2011 = d[ 'data_2004_2011']
data_2004_2012 = d[ 'data_2004_2012']
data_2004_2013 = d[ 'data_2004_2013']
data_2004_2014 = d[ 'data_2004_2014']
data_2004_2015 = d[ 'data_2004_2015']
data_2004_2016 = d[ 'data_2004_2016']
data_2004_2017 = d[ 'data_2004_2017']
data_2005_2006 = d[ 'data_2005_2006']
data_2005_2007 = d[ 'data_2005_2007']
data_2005_2008 = d[ 'data_2005_2008']
data_2005_2009 = d[ 'data_2005_2009']
data_2005_2010 = d[ 'data_2005_2010']
data_2005_2011 = d[ 'data_2005_2011']
data_2005_2012 = d[ 'data_2005_2012']
data_2005_2013 = d[ 'data_2005_2013']
data_2005_2014 = d[ 'data_2005_2014']
data_2005_2015 = d[ 'data_2005_2015']
data_2005_2016 = d[ 'data_2005_2016']
data_2005_2017 = d[ 'data_2005_2017']
data_2006_2007 = d[ 'data_2006_2007']
data_2006_2008 = d[ 'data_2006_2008']
data_2006_2009 = d[ 'data_2006_2009']
data_2006_2010 = d[ 'data_2006_2010']
data_2006_2011 = d[ 'data_2006_2011']
data_2006_2012 = d[ 'data_2006_2012']
data_2006_2013 = d[ 'data_2006_2013']
data_2006_2014 = d[ 'data_2006_2014']
data_2006_2015 = d[ 'data_2006_2015']
data_2006_2016 = d[ 'data_2006_2016']
data_2006_2017 = d[ 'data_2006_2017']
data_2007_2008 = d[ 'data_2007_2008']
data_2007_2009 = d[ 'data_2007_2009']
data_2007_2010 = d[ 'data_2007_2010']
data_2007_2011 = d[ 'data_2007_2011']
data_2007_2012 = d[ 'data_2007_2012']
data_2007_2013 = d[ 'data_2007_2013']
data_2007_2014 = d[ 'data_2007_2014']
data_2007_2015 = d[ 'data_2007_2015']
data_2007_2016 = d[ 'data_2007_2016']
data_2007_2017 = d[ 'data_2007_2017']
data_2008_2009 = d[ 'data_2008_2009']
data_2008_2010 = d[ 'data_2008_2010']
data_2008_2011 = d[ 'data_2008_2011']
data_2008_2012 = d[ 'data_2008_2012']
data_2008_2013 = d[ 'data_2008_2013']
data_2008_2014 = d[ 'data_2008_2014']
data_2008_2015 = d[ 'data_2008_2015']
data_2008_2016 = d[ 'data_2008_2016']
data_2008_2017 = d[ 'data_2008_2017']
data_2009_2010 = d[ 'data_2009_2010']
data_2009_2011 = d[ 'data_2009_2011']
data_2009_2012 = d[ 'data_2009_2012']
data_2009_2013 = d[ 'data_2009_2013']
data_2009_2014 = d[ 'data_2009_2014']
data_2009_2015 = d[ 'data_2009_2015']
data_2009_2016 = d[ 'data_2009_2016']
data_2009_2017 = d[ 'data_2009_2017']
data_2010_2011 = d[ 'data_2010_2011']
data_2010_2012 = d[ 'data_2010_2012']
data_2010_2013 = d[ 'data_2010_2013']
data_2010_2014 = d[ 'data_2010_2014']
data_2010_2015 = d[ 'data_2010_2015']
data_2010_2016 = d[ 'data_2010_2016']
data_2010_2017 = d[ 'data_2010_2017']
data_2011_2012 = d[ 'data_2011_2012']
data_2011_2013 = d[ 'data_2011_2013']
data_2011_2014 = d[ 'data_2011_2014']
data_2011_2015 = d[ 'data_2011_2015']
data_2011_2016 = d[ 'data_2011_2016']
data_2011_2017 = d[ 'data_2011_2017']
data_2012_2013 = d[ 'data_2012_2013']
data_2012_2014 = d[ 'data_2012_2014']
data_2012_2015 = d[ 'data_2012_2015']
data_2012_2016 = d[ 'data_2012_2016']
data_2012_2017 = d[ 'data_2012_2017']
data_2013_2014 = d[ 'data_2013_2014']
data_2013_2015 = d[ 'data_2013_2015']
data_2013_2016 = d[ 'data_2013_2016']
data_2013_2017 = d[ 'data_2013_2017']
data_2014_2015 = d[ 'data_2014_2015']
data_2014_2016 = d[ 'data_2014_2016']
data_2014_2017 = d[ 'data_2014_2017']
data_2015_2016 = d[ 'data_2015_2016']
data_2015_2017 = d[ 'data_2015_2017']
data_2016_2017 = d[ 'data_2016_2017']
In [39]:
AllYearsforCSVExport =[data_1995_1996,
data_1995_1997,
data_1995_1998,
data_1995_1999,
data_1995_2000,
data_1995_2001,
data_1995_2002,
data_1995_2003,
data_1995_2004,
data_1995_2005,
data_1995_2006,
data_1995_2007,
data_1995_2008,
data_1995_2009,
data_1995_2010,
data_1995_2011,
data_1995_2012,
data_1995_2013,
data_1995_2014,
data_1995_2015,
data_1995_2016,
data_1995_2017,
data_1996_1997,
data_1996_1998,
data_1996_1999,
data_1996_2000,
data_1996_2001,
data_1996_2002,
data_1996_2003,
data_1996_2004,
data_1996_2005,
data_1996_2006,
data_1996_2007,
data_1996_2008,
data_1996_2009,
data_1996_2010,
data_1996_2011,
data_1996_2012,
data_1996_2013,
data_1996_2014,
data_1996_2015,
data_1996_2016,
data_1996_2017,
data_1997_1998,
data_1997_1999,
data_1997_2000,
data_1997_2001,
data_1997_2002,
data_1997_2003,
data_1997_2004,
data_1997_2005,
data_1997_2006,
data_1997_2007,
data_1997_2008,
data_1997_2009,
data_1997_2010,
data_1997_2011,
data_1997_2012,
data_1997_2013,
data_1997_2014,
data_1997_2015,
data_1997_2016,
data_1997_2017,
data_1998_1999,
data_1998_2000,
data_1998_2001,
data_1998_2002,
data_1998_2003,
data_1998_2004,
data_1998_2005,
data_1998_2006,
data_1998_2007,
data_1998_2008,
data_1998_2009,
data_1998_2010,
data_1998_2011,
data_1998_2012,
data_1998_2013,
data_1998_2014,
data_1998_2015,
data_1998_2016,
data_1998_2017,
data_1999_2000,
data_1999_2001,
data_1999_2002,
data_1999_2003,
data_1999_2004,
data_1999_2005,
data_1999_2006,
data_1999_2007,
data_1999_2008,
data_1999_2009,
data_1999_2010,
data_1999_2011,
data_1999_2012,
data_1999_2013,
data_1999_2014,
data_1999_2015,
data_1999_2016,
data_1999_2017,
data_2000_2001,
data_2000_2002,
data_2000_2003,
data_2000_2004,
data_2000_2005,
data_2000_2006,
data_2000_2007,
data_2000_2008,
data_2000_2009,
data_2000_2010,
data_2000_2011,
data_2000_2012,
data_2000_2013,
data_2000_2014,
data_2000_2015,
data_2000_2016,
data_2000_2017,
data_2001_2002,
data_2001_2003,
data_2001_2004,
data_2001_2005,
data_2001_2006,
data_2001_2007,
data_2001_2008,
data_2001_2009,
data_2001_2010,
data_2001_2011,
data_2001_2012,
data_2001_2013,
data_2001_2014,
data_2001_2015,
data_2001_2016,
data_2001_2017,
data_2002_2003,
data_2002_2004,
data_2002_2005,
data_2002_2006,
data_2002_2007,
data_2002_2008,
data_2002_2009,
data_2002_2010,
data_2002_2011,
data_2002_2012,
data_2002_2013,
data_2002_2014,
data_2002_2015,
data_2002_2016,
data_2002_2017,
data_2003_2004,
data_2003_2005,
data_2003_2006,
data_2003_2007,
data_2003_2008,
data_2003_2009,
data_2003_2010,
data_2003_2011,
data_2003_2012,
data_2003_2013,
data_2003_2014,
data_2003_2015,
data_2003_2016,
data_2003_2017,
data_2004_2005,
data_2004_2006,
data_2004_2007,
data_2004_2008,
data_2004_2009,
data_2004_2010,
data_2004_2011,
data_2004_2012,
data_2004_2013,
data_2004_2014,
data_2004_2015,
data_2004_2016,
data_2004_2017,
data_2005_2006,
data_2005_2007,
data_2005_2008,
data_2005_2009,
data_2005_2010,
data_2005_2011,
data_2005_2012,
data_2005_2013,
data_2005_2014,
data_2005_2015,
data_2005_2016,
data_2005_2017,
data_2006_2007,
data_2006_2008,
data_2006_2009,
data_2006_2010,
data_2006_2011,
data_2006_2012,
data_2006_2013,
data_2006_2014,
data_2006_2015,
data_2006_2016,
data_2006_2017,
data_2007_2008,
data_2007_2009,
data_2007_2010,
data_2007_2011,
data_2007_2012,
data_2007_2013,
data_2007_2014,
data_2007_2015,
data_2007_2016,
data_2007_2017,
data_2008_2009,
data_2008_2010,
data_2008_2011,
data_2008_2012,
data_2008_2013,
data_2008_2014,
data_2008_2015,
data_2008_2016,
data_2008_2017,
data_2009_2010,
data_2009_2011,
data_2009_2012,
data_2009_2013,
data_2009_2014,
data_2009_2015,
data_2009_2016,
data_2009_2017,
data_2010_2011,
data_2010_2012,
data_2010_2013,
data_2010_2014,
data_2010_2015,
data_2010_2016,
data_2010_2017,
data_2011_2012,
data_2011_2013,
data_2011_2014,
data_2011_2015,
data_2011_2016,
data_2011_2017,
data_2012_2013,
data_2012_2014,
data_2012_2015,
data_2012_2016,
data_2012_2017,
data_2013_2014,
data_2013_2015,
data_2013_2016,
data_2013_2017,
data_2014_2015,
data_2014_2016,
data_2014_2017,
data_2015_2016,
data_2015_2017,
data_2016_2017]
In [40]:
for i in range(0,len(AllYearsforCSVExport)):
AllYearsforCSVExport[i].to_csv('E:\\Documents\\UKHousePrices\\YearToYearDataALL\\' + str(i) + '.csv')
In [41]:
data_2016_2017.head()
Out[41]:
We now import each data table into our R magic for Hexbin plots¶
In [42]:
%R -i data_1995_1996,data_1995_1997,data_1995_1998,data_1995_1999,data_1995_2000,data_1995_2001,data_1995_2002,data_1995_2003,data_1995_2004,data_1995_2005,data_1995_2006,data_1995_2007,data_1995_2008,data_1995_2009,data_1995_2010,data_1995_2011,data_1995_2012,data_1995_2013,data_1995_2014,data_1995_2015,data_1995_2016,data_1995_2017,data_1996_1997,data_1996_1998,data_1996_1999,data_1996_2000,data_1996_2001,data_1996_2002,data_1996_2003,data_1996_2004,data_1996_2005,data_1996_2006,data_1996_2007,data_1996_2008,data_1996_2009,data_1996_2010,data_1996_2011,data_1996_2012,data_1996_2013,data_1996_2014,data_1996_2015,data_1996_2016,data_1996_2017,data_1997_1998,data_1997_1999,data_1997_2000,data_1997_2001,data_1997_2002,data_1997_2003,data_1997_2004,data_1997_2005,data_1997_2006,data_1997_2007,data_1997_2008,data_1997_2009,data_1997_2010,data_1997_2011,data_1997_2012,data_1997_2013,data_1997_2014,data_1997_2015,data_1997_2016,data_1997_2017,data_1998_1999,data_1998_2000,data_1998_2001,data_1998_2002,data_1998_2003,data_1998_2004,data_1998_2005,data_1998_2006,data_1998_2007,data_1998_2008,data_1998_2009,data_1998_2010,data_1998_2011,data_1998_2012,data_1998_2013,data_1998_2014,data_1998_2015,data_1998_2016,data_1998_2017,data_1999_2000,data_1999_2001,data_1999_2002,data_1999_2003,data_1999_2004,data_1999_2005,data_1999_2006,data_1999_2007,data_1999_2008,data_1999_2009,data_1999_2010,data_1999_2011,data_1999_2012,data_1999_2013,data_1999_2014,data_1999_2015,data_1999_2016,data_1999_2017,data_2000_2001,data_2000_2002,data_2000_2003,data_2000_2004,data_2000_2005,data_2000_2006,data_2000_2007,data_2000_2008,data_2000_2009,data_2000_2010,data_2000_2011,data_2000_2012,data_2000_2013,data_2000_2014,data_2000_2015,data_2000_2016,data_2000_2017,data_2001_2002,data_2001_2003,data_2001_2004,data_2001_2005,data_2001_2006,data_2001_2007,data_2001_2008,data_2001_2009,data_2001_2010,data_2001_2011,data_2001_2012,data_2001_2013,data_2001_2014,data_2001_2015,data_2001_2016,data_2001_2017,data_2002_2003,data_2002_2004,data_2002_2005,data_2002_2006,data_2002_2007,data_2002_2008,data_2002_2009,data_2002_2010,data_2002_2011,data_2002_2012,data_2002_2013,data_2002_2014,data_2002_2015,data_2002_2016,data_2002_2017,data_2003_2004,data_2003_2005,data_2003_2006,data_2003_2007,data_2003_2008,data_2003_2009,data_2003_2010,data_2003_2011,data_2003_2012,data_2003_2013,data_2003_2014,data_2003_2015,data_2003_2016,data_2003_2017,data_2004_2005,data_2004_2006,data_2004_2007,data_2004_2008,data_2004_2009,data_2004_2010,data_2004_2011,data_2004_2012,data_2004_2013,data_2004_2014,data_2004_2015,data_2004_2016,data_2004_2017,data_2005_2006,data_2005_2007,data_2005_2008,data_2005_2009,data_2005_2010,data_2005_2011,data_2005_2012,data_2005_2013,data_2005_2014,data_2005_2015,data_2005_2016,data_2005_2017,data_2006_2007,data_2006_2008,data_2006_2009,data_2006_2010,data_2006_2011,data_2006_2012,data_2006_2013,data_2006_2014,data_2006_2015,data_2006_2016,data_2006_2017,data_2007_2008,data_2007_2009,data_2007_2010,data_2007_2011,data_2007_2012,data_2007_2013,data_2007_2014,data_2007_2015,data_2007_2016,data_2007_2017,data_2008_2009,data_2008_2010,data_2008_2011,data_2008_2012,data_2008_2013,data_2008_2014,data_2008_2015,data_2008_2016,data_2008_2017,data_2009_2010,data_2009_2011,data_2009_2012,data_2009_2013,data_2009_2014,data_2009_2015,data_2009_2016,data_2009_2017,data_2010_2011,data_2010_2012,data_2010_2013,data_2010_2014,data_2010_2015,data_2010_2016,data_2010_2017,data_2011_2012,data_2011_2013,data_2011_2014,data_2011_2015,data_2011_2016,data_2011_2017,data_2012_2013,data_2012_2014,data_2012_2015,data_2012_2016,data_2012_2017,data_2013_2014,data_2013_2015,data_2013_2016,data_2013_2017,data_2014_2015,data_2014_2016,data_2014_2017,data_2015_2016,data_2015_2017,data_2016_2017
In [132]:
%%R -w 4 -h 4 --units in -r 200
data_tables_Years <- list(data_1995_1996,data_1995_1997,data_1995_1998,data_1995_1999,data_1995_2000,data_1995_2001,data_1995_2002,data_1995_2003,data_1995_2004,data_1995_2005,data_1995_2006,data_1995_2007,data_1995_2008,data_1995_2009,data_1995_2010,data_1995_2011,data_1995_2012,data_1995_2013,data_1995_2014,data_1995_2015,data_1995_2016,data_1995_2017)
#data_1996_1997,data_1996_1998,data_1996_1999,data_1996_2000,data_1996_2001,data_1996_2002,data_1996_2003,data_1996_2004,data_1996_2005,data_1996_2006,data_1996_2007,data_1996_2008,data_1996_2009,data_1996_2010,data_1996_2011,data_1996_2012,data_1996_2013,data_1996_2014,data_1996_2015,data_1996_2016,data_1996_2017
#data_1997_1998,data_1997_1999,data_1997_2000,data_1997_2001,data_1997_2002,data_1997_2003,data_1997_2004,data_1997_2005,data_1997_2006,data_1997_2007,data_1997_2008,data_1997_2009,data_1997_2010,data_1997_2011,data_1997_2012,data_1997_2013,data_1997_2014,data_1997_2015,data_1997_2016,data_1997_2017
#data_1998_1999,data_1998_2000,data_1998_2001,data_1998_2002,data_1998_2003,data_1998_2004,data_1998_2005,data_1998_2006,data_1998_2007,data_1998_2008,data_1998_2009,data_1998_2010,data_1998_2011,data_1998_2012,data_1998_2013,data_1998_2014,data_1998_2015,data_1998_2016,data_1998_2017
#data_1999_2000,data_1999_2001,data_1999_2002,data_1999_2003,data_1999_2004,data_1999_2005,data_1999_2006,data_1999_2007,data_1999_2008,data_1999_2009,data_1999_2010,data_1999_2011,data_1999_2012,data_1999_2013,data_1999_2014,data_1999_2015,data_1999_2016,data_1999_2017
#data_2000_2001,data_2000_2002,data_2000_2003,data_2000_2004,data_2000_2005,data_2000_2006,data_2000_2007,data_2000_2008,data_2000_2009,data_2000_2010,data_2000_2011,data_2000_2012,data_2000_2013,data_2000_2014,data_2000_2015,data_2000_2016,data_2000_2017
#data_2001_2002,data_2001_2003,data_2001_2004,data_2001_2005,data_2001_2006,data_2001_2007,data_2001_2008,data_2001_2009,data_2001_2010,data_2001_2011,data_2001_2012,data_2001_2013,data_2001_2014,data_2001_2015,data_2001_2016,data_2001_2017
#data_2002_2003,data_2002_2004,data_2002_2005,data_2002_2006,data_2002_2007,data_2002_2008,data_2002_2009,data_2002_2010,data_2002_2011,data_2002_2012,data_2002_2013,data_2002_2014,data_2002_2015,data_2002_2016,data_2002_2017
#data_2003_2004,data_2003_2005,data_2003_2006,data_2003_2007,data_2003_2008,data_2003_2009,data_2003_2010,data_2003_2011,data_2003_2012,data_2003_2013,data_2003_2014,data_2003_2015,data_2003_2016,data_2003_2017
#data_2004_2005,data_2004_2006,data_2004_2007,data_2004_2008,data_2004_2009,data_2004_2010,data_2004_2011,data_2004_2012,data_2004_2013,data_2004_2014,data_2004_2015,data_2004_2016,data_2004_2017
#data_2005_2006,data_2005_2007,data_2005_2008,data_2005_2009,data_2005_2010,data_2005_2011,data_2005_2012,data_2005_2013,data_2005_2014,data_2005_2015,data_2005_2016,data_2005_2017,data_2006_2007
#data_2006_2008,data_2006_2009,data_2006_2010,data_2006_2011,data_2006_2012,data_2006_2013,data_2006_2014,data_2006_2015,data_2006_2016,data_2006_2017
#data_2007_2008,data_2007_2009,data_2007_2010,data_2007_2011,data_2007_2012,data_2007_2013,data_2007_2014,data_2007_2015,data_2007_2016,data_2007_2017
#data_2008_2009,data_2008_2010,data_2008_2011,data_2008_2012,data_2008_2013,data_2008_2014,data_2008_2015,data_2008_2016,data_2008_2017
#data_2009_2010,data_2009_2011,data_2009_2012,data_2009_2013,data_2009_2014,data_2009_2015,data_2009_2016,data_2009_2017
#data_2010_2011,data_2010_2012,data_2010_2013,data_2010_2014,data_2010_2015,data_2010_2016,data_2010_2017
#data_2011_2012,data_2011_2013,data_2011_2014,data_2011_2015,data_2011_2016,data_2011_2017
#data_2012_2013,data_2012_2014,data_2012_2015,data_2012_2016,data_2012_2017
#data_2013_2014,data_2013_2015,data_2013_2016,data_2013_2017
#data_2014_2015,data_2014_2016,data_2014_2017
#data_2015_2016,data_2015_2017
#data_2016_2017
for (j in data_tables_Years){
dataLondon <- j[ which(j$LondonCheck_x == 'GREATER LONDON'),]
dataRestofUK <- j[ which(j$LondonCheck_x == 'Rest of UK'),]
x1 = dataLondon$AdjustedPrice_x
y1 = dataLondon$AdjustedPrice_y
linmodel1 = lm(y2~x2)
coeffs1 = summary(linmodel1)
intercept1 <- coeffs1$coefficients[1]
slope1 <- coeffs1$coefficients[2]
line1 <- (slope1*x1) + intercept1
df_plot1 <- data.frame(x1,y1, line1)
x2 = dataRestofUK$AdjustedPrice_x
y2 = dataRestofUK$AdjustedPrice_y
linmodel2 = lm(y2~x2)
coeffs2 = summary(linmodel2)
intercept2 <- coeffs2$coefficients[1]
slope2 <- coeffs2$coefficients[2]
line2 <- (slope2*x2) + intercept2
df_plot2 <- data.frame(x2,y2, line2)
hbinALL <- ggplot() +
stat_binhex(data=dataLondon,aes(x=x1,y=y1, alpha=0.5), fill='#A833FF',line = '#fffff',bins = 75) +
stat_binhex(data=dataRestofUK,aes(x=x2,y=y2, alpha=0.5), fill='#33FFA6', bins = 75) +
guides(fill = FALSE, alpha=FALSE) +
ggtitle( paste('Price of the Same House in', toString(unique(dataLondon$Year_x)) ,'Sold Again in' , toString(unique(dataLondon$Year_y)), sep = ' ' ) ) +
scale_x_continuous(name=paste('Price', toString(unique(dataLondon$Year_x)), '£', sep = ' '), labels = scales::comma, limits = c(0,2500000)) +
scale_y_continuous(name=paste('Price', toString(unique(dataLondon$Year_y)), '£', sep = ' '), labels = scales::comma, limits = c(0,10000000)) +
theme(panel.background = element_rect(fill='black' ,colour='#ababab'),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
text = element_text(size=8
)) +
# geom_line(color = '#ee77ee', data = df_plot1, aes(x=x1,y=line1),
# title = element_text(size=2)) +
# geom_line(color = '#36e3e3', data = df_plot2, aes(x=x2,y=line2)) +
annotate(geom = 'text', label = 'London', color = '#A833FF', x=500000, y = 9000000) +
annotate(geom = 'text', label = 'Rest of UK', color = '#33FFA6', x= 2000000, y = 9000000)
#plot(hbinALL)
#hbinAll <- ggplotly(hbintest)
}