|
Alyz Solutions
|
|
Generating Graphs and Charts with Rpy2
|
|
Table of Contents
One of the many powerful and useful parts of R is its graphics engine. Combined with Rpy2, which allows full access to the R engine from within a Python program, it becomes a breeze for anyone to generate sophisticated graphics to better illustrate any set of data.
While there are many excellent sources of information on generating R graphics [1], examples of using Rpy2 to generate graphics are difficult to find. We hope that the examples below will help anyone who wants to quickly get up to speed generating quality charts and graphics in python.
The U.S Government releases a large quantity of economic data on a regular basis on their web sites. In particular, interest rate data is released each business day and consumer price information is released monthly, with historical data readily available for download. This large treasure trove of information is ideal for generating graphs and charts that can help the casual reader to process and understand the information with minimal effort.
The intermediate data files in each of the examples below are stored in JSON format to remove the need for additional code to parse the data from text files. These examples require python 2.6 or later to be installed, as json was added as a standard library component in this version. If an older version of python is used, then a compatible json library must be imported.
[ NOTE: an earlier version of this document using rpy is available at Graphing Interest Rates with Rpy ]
Each of the examples assume that the rpy2 and json libraries have been properly imported into the workspace. This is done with the following code:
import json
from rpy2 import robjects
from rpy2.robjects.vectors import IntVector, FloatVector, StrVector
from rpy2.robjects.packages import importr
from rpy2.rinterface import NA_Real
base = importr("base")
stats = importr("stats")
grdevices = importr("grDevices")
graphics = importr("graphics")
The U.S. Government releases Consumer Price Index data monthly. The most commonly used series in that data is the Consumer Price Index for All Urban Consumers. This data (cpiu.data) can be be used to generate the above line graph very easily with the following python program:
data = json.loads(open("cpiu.data").read())
years = [ x[0] + (x[1] - 1)/ 12.0 for x in data ]
cpiu = [ x[2] for x in data ]
grdevices.png("r_cpiu.png")
graphics.plot(years, cpiu, xlab="Year", ylab="CPI-U", col="blue", type="l", tck=1)
graphics.title("Consumer Price Index (CPI-U)")
grdevices.dev_off()
One of the many files distributed by the U.S. Government on the Consumer Price Index site provides the Components of the CPI-U. After extracting the summary information (cpiuparts.data) from this file, we can create the above pie chart to illustrate the different components and their proportions:
data = json.loads(open("cpiuparts.data").read())
names = [ "%s (%.03f%%)" % (x[0],x[1]) for x in data ]
pcts = [ x[1] for x in data ]
grdevices.png("r_pie.png", width=600, height=350)
graphics.par(mar=FloatVector([1, 0, 2, 12]), cex=0.7)
graphics.pie(FloatVector(pcts), labels=names)
graphics.title("Components of the CPI-U")
grdevices.dev_off()
The Consumer Price Index can be used to compute inflation rates, which are simply the annualized rates of increase in the value of the index. After computing the annualized inflation rates (inflation.data), the above graph can be generated as follows:
data = json.loads(open("inflation.data").read())
years = [ x[0] + (x[1] - 1)/ 12.0 for x in data ]
monthlypct = [ x[2] for x in data ]
annualpct = [ x[3] for x in data ]
grdevices.png("r_inflation.png", width=640, height=480)
graphics.par(mar=FloatVector([2.5, 2.5, 2.5, 1.5]))
graphics.par(mgp=FloatVector([1.5, 0.5, 0]))
graphics.plot(years, monthlypct, xlab="", ylab="", col="blue", type="n")
for i in range(-24,16):
if i % 5 == 0:
graphics.abline(h=i, col="black")
else:
graphics.abline(h=i, col="#d0d0d0")
graphics.title("Inflation Rates")
graphics.legend(2000, 14.8, [ "monthly inflation rate", "annual inflation rate" ],
fill=StrVector([ "blue", "red" ]))
graphics.lines(years, monthlypct, col="blue", type="h")
graphics.lines(years, annualpct, col="red")
grdevices.dev_off()
SLGS are securities offered for sale by the U.S. Treasury to issuers of state and local government tax-exempt debt. The issuers invest their tax-exempt debt proceeds in SLGS instead of regular Treasury Bills, Bonds, and Notes because of the flexibility the SLGS provide for reducing the yields earned in order to comply with the yield restriction or arbitrage rebate provisions of the Internal Revenue Code. The maximum SLGS rates allowed for subscription of these securities are published each business day by the Bureau of the Public Debt. After cleaning up the data (slgs.data), the above interest rate yield curve can be plotted with the following code:
data = json.loads(open("slgs.data").read())
slgsdate = data["date"]
years = [ x[0] / 12.0 for x in data["rates"] ]
rates = [ x[1] for x in data["rates"] ]
grdevices.png("r_slgs.png", width=500, height=400)
graphics.par(mar=FloatVector([3, 3, 2.5, 1.5]))
graphics.par(mgp=FloatVector([1.5, 0.5, 0]))
graphics.plot([0,40],[0,6.0],xlab="Years", ylab="Maximum Rate (%)", type="n", xaxs="i", yaxs="i")
for i in [ 1,2,3,4,5,6,7,8,9,10,20,30 ]:
graphics.abline(v=i, col="#e0e0e0")
for i in [ 0, 40 ]:
graphics.abline(v=i, col="black")
for i in range(0,61):
if i % 5 == 0:
graphics.abline(h=i/10.0, col="black")
else:
graphics.abline(h=i/10.0, col="#e0e0e0")
graphics.title("SLGS Table - %s" % (slgsdate,))
graphics.lines(years,rates,type="l",col="blue")
grdevices.dev_off()
The Federal Reserve provides daily interest rate data for a variety of instruments. In particular, the data contains benchmark yields for the nominal Treasury yield curve and benchmark rates for the Treasury Inflation Protected Security (TIPS) yield curve. The spread between the two yield curves shows the inflation expectation of the market. The above graph was generated by the following python code using this data (treascurve.data):
data = json.loads(open("treascurve.data").read())
years = [ x / 12.0 for x in data["months"]]
tips_years = [ x / 12.0 for x in data["tips_months"]]
grdevices.png("r_treascurve.png", width=500, height=400)
graphics.par(mar=FloatVector([3, 3, 2.5, 1.5]))
graphics.par(mgp=FloatVector([1.5, 0.5, 0]))
graphics.plot([0,30],[0,6.0],xlab="Years", ylab="Rate (%)", type="n", xaxs="i", yaxs="i")
for i in range(0,61):
if i % 10 != 0:
graphics.abline(h=i/10.0, col="#e0e0e0")
for i in range(1,30):
graphics.abline(v=i, col="#e0e0e0")
for i in [ 0, 30 ]:
graphics.abline(v=i, col="black")
for i in range(0,61):
if i % 10 == 0:
graphics.abline(h=i/10.0, col="black")
graphics.title("Treasury Yield Curve - %s" % (data["date"],))
graphics.lines(years,data["yields"],type="l",col="blue")
graphics.lines(tips_years,data["tips_yields"],type="l",col="green")
graphics.arrows(10, data["tips_ten_year_rate"], 10, data["ten_year_rate"], col="red", length=0.1)
graphics.arrows(10, data["ten_year_rate"], 10, data["tips_ten_year_rate"], col="red", length=0.1)
graphics.text(10.2, (data["ten_year_rate"] + data["tips_ten_year_rate"]) / 2,
"spread = %.3lf%%" % (data["ten_year_rate"] - data["tips_ten_year_rate"],), adj=0, col="red")
graphics.legend(18, 1.7, [ "Nominal yields", "TIPS yields", "Inflation spread" ],
fill=StrVector([ "blue", "green", "red" ]))
grdevices.dev_off()
The Federal Reserve provides historical Treasury yield curve data in several different formats. For example, monthly yields for each length of Treasury security can be downloaded separately from 1 month, 3 month, 6 month, 1 year, 2 year, 3 year, 5 year, 7 year, and 10 year. Additionally, monthly yields for each length of TIPS can be downloaded from 5 year TIPS, 7 year TIPS, 10 year TIPS, and 20 year TIPS. We collect all of this data into a single file (historical.data) for further analysis.
(click on image to rotate)
We can generate a 3-D plot of each of the regular (non-TIPS) historical yield data points. As anyone can see from this plot, overall interest rates are extremely low when compared to the historical yield curves since 1982. The python code to generate eight views from eight evenly spaced rotated angles for this 3D graph follows:
import operator
start_year = 1982
x = [ start_year + i / 12.0 for i in range((2011 - start_year) * 12) ]
y = [ 1, 3, 6, 12, 24, 36, 60, 84, 120 ]
yindex = {}
for i,term in enumerate(y):
yindex[term] = i
zval = []
for i in range(len(y)):
zval.append([NA_Real] * len(x))
data = json.loads(open("historical.data").read())
for term,rates in data["treasury"].iteritems():
j = yindex.get(int(term))
if j is not None:
for entry in rates:
if entry[0] >= start_year:
i = (entry[0] - start_year) * 12 + (entry[1] - 1)
if i < len(x):
zval[j][i] = entry[2]
z = base.array( FloatVector(reduce(operator.add, zval)), dim=[len(x),len(y)] )
for angle in [ 15, 60, 105, 150, 195, 240, 285, 330 ]:
grdevices.png("r_historical%03d.png" % (angle,), width=500, height=500)
graphics.par(mar=FloatVector([0, 0, 1.5, 0]))
graphics.par(mgp=FloatVector([0, 0, 0]))
xv = FloatVector(x)
yv = FloatVector(y)
zv = FloatVector(z)
graphics.persp(xv, yv, zv, theta = angle, phi=20, col="#c0c0ff",
xlab="years (1982-present)", ylab="term (0-10yr)",
zlab="rate (0-17%)")
graphics.title("Historical Treasury Yield Curves (1982-present)")
grdevices.dev_off()
We can plot a brief history of 10 year TIPS rates and their corresponding inflation spreads versus the regular 10 year Treasury rates:
import operator
data = json.loads(open("historical.data").read())
years = [ x[0] + (x[1] - 1) / 12.0 for x in data["treasury"]["120"]
if x[0] >= 2003]
rates = [ x[2] for x in data["treasury"]["120"] if x[0] >= 2003]
tips_rates = [ x[2] for x in data["tips"]["120"] if x[0] >= 2003]
spread = map(operator.sub, rates, tips_rates)
grdevices.png("r_10yrtipsrates.png", width=500, height=400)
graphics.par(mar=FloatVector([3, 3, 2.5, 1.5]))
graphics.par(mgp=FloatVector([1.5, 0.5, 0]))
graphics.plot(years, spread, xlab="Year", ylab="Yield / Inflation Spread (%)",
col="red", type="l", tck=1)
graphics.lines(years,tips_rates,type="l",col="green")
graphics.legend(2003.0, 2.7, [ "TIPS yield", "Inflation spread" ],
fill=StrVector([ "green", "red" ]))
graphics.title("Historical Yields/Spreads on 10-year TIPS")
grdevices.dev_off()
We can extract a subset of the data corresponding to the 10-year yields and plot that on a simple line graph:
data = json.loads(open("historical.data").read())
years = [ x[0] + (x[1] - 1) / 12.0 for x in data["treasury"]["120"] ]
rates = [ x[2] for x in data["treasury"]["120"] ]
grdevices.png("r_10yrhistory.png", width=500, height=400)
graphics.plot(years, rates, xlab="Year", ylab="10-year Yield (%)",
col="blue", type="l", tck=1)
graphics.title("Historical 10-year Treasury Yields")
grdevices.dev_off()
Continuing with the same data used in the previous graph, we plot a histogram that illustrates the probability of the 10-year yield falling on any given interval based on the historic yields. On top of that we layer in blue the density function that smooths the histogram into a density curve. Additionally, we plot in red a standard lognormal density curve using the historic mean of the logs of the data points and the historic standard deviation of the logs of the data points. This gives a rough picture of whether standard lognormal density functions may be useful for the purpose of forecasting future interest rates. The code to plot this follows:
import math
data = json.loads(open("historical.data").read())
rates = [ x[2] for x in data["treasury"]["120"] ]
logrates = [ math.log(rate) for rate in rates ]
ratesv = FloatVector(rates)
mean = base.mean(ratesv)
meanlog = base.mean(FloatVector(logrates))
sdlog = base.sqrt(stats.var(FloatVector(logrates)))
xlabel = "Rate (mean=%.03f%%, meanlog=%.03f%%, sdlog=%.03f)" % (
mean[0],meanlog[0],sdlog[0])
title = "Distribution of Monthly 10-year yields"
grdevices.png("r_10yrdistrib.png", width=500, height=400)
graphics.par(mar=FloatVector([3, 3, 2.5, 1.5]))
graphics.par(mgp=FloatVector([1.5, 0.5, 0]))
graphics.hist(ratesv, nclass=50, xlab=xlabel, main=title, probability=True,
col="yellow")
graphics.lines(stats.density(ratesv), col="blue")
x = base.seq(2.0, 15.0, 0.5)
graphics.lines(x, stats.dlnorm(x, meanlog=meanlog[0], sdlog=sdlog[0]), lty=3, col="red")
grdevices.dev_off()
| [1] | Good references include: R Graphics, An Introduction to R. |