DK // Factoids and occasional bits of useful information.
via TechCrunch:
Started by three engineers and math whizzes from MIT and Cornell (Manish Jhunjhunwala, Adam Donovan, and Cem Ozkaynak) who did time at McKinsey and UBS bank, Trefis breaks down a stock price by the contribution of a company’s major products and businesses. For instance, 51.3 percent of Apple’s stock price is attributed to the iPhone, 25.5 percent to the Macintosh, and only 7.7 percent to iTunes and iPhone apps. Don’t agree? You can change the underlying assumptions by simply dragging lines on charts forecasting the future price of the iPhone, its market share going out to 2016, and so forth. Every time you change an assumption, the price target changes accordingly.So let's take a company we all love to hate, AT&T. The screenshot above shows how Trefis decomposes the company's stock price. You can click through to get a more in-depth breakdown of AT&T's business. There's also a social component to the service where subscribers can contribute their own customized models. There aren't that many companies to choose from, but Trefis is still in the free period. I imagine users will have to pay for full access in the future. In any case, it seems like a neat toy.
Garry (one of Posterous' founders), highlights the latest offering from Palantir - Palantir Finance. It looks like it has pretty powerful charting tools. I've signed up for an account and will report back once I've fiddled with it. I'm excited to explore the data exploration capabilities of this new tool (and, of course, whether there's an API).
In a previous post, I complained about the DTCC's CDS data website and the one week lifespan of the data published there. For those of you who don't know, the DTCC clears and settles a massive number of transactions every day for multiple asset classes. It's one of those financial institutions that doesn't get much press but underpins the entire capital market.
Anyway, the recent crisis motivated the DTCC to publish weekly CDS (single name, index, and tranche) exposure data. A good idea, until one realizes the data goes up in smoke when the next week's data arrives. Although DTCC recently added links to data for "a week ago", "a month ago", and "a year ago," it's still pretty inconvenient. So, if you want the data, you have to parse it yourself. I originally wanted to write a smart parser that would dynamically react to whatever format it encountered...I came to my senses and adopted a simpler approach. The approach thus far:
This would've been much easier if all the tables were exactly the same format. Unfortunately, that's never the case. An extra cell here or there, or weird characters, can throw things off. This isn't an issue if you are parsing individual pieces of data or a single table. But what if you need to parse ten, 20, 100, etc. tables? It can get ugly fast. The DTCC data is broken into 23 pages, some of which have multiple tables. Luckily, most of my pain was self-inflicted (hey, I'm a parsing virgin). I only had to account for a few different table formats in the end.
'''
SQLAlchemy ORM declarative example.
'''
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, DECIMAL
from sqlalchemy import MetaData, create_engine, ForeignKey
from sqlalchemy.orm import relation, backref, sessionmaker, scoped_session
from matplotlib.finance import quotes_historical_yahoo
import datetime
import os
path = os.path.expanduser('~') + \
'/Dev/Data/AssetPrices/SQLite/assetprices_SA.sqlite'
engine = create_engine('sqlite:////' + path, echo=True)
Base = declarative_base(bind=engine)
date1 = datetime.datetime(2009,1,1)
date2 = datetime.datetime.now()
#DEFINE TABLES#
class Asset(Base):
__tablename__ = 'assets'
asset_id = Column(Integer, primary_key=True)
ticker = Column(String, unique=True)
tag = Column(String)
prices = relation('Price', order_by='Price.gregorian_day', backref='assets')
def __init__(self, ticker, tag):
self.ticker = ticker
self.tag = tag
def __repr__(self):
return "<Asset('%s', '%s')>" % (self.ticker, self.tag)
class Price(Base):
__tablename__ = 'prices'
price_id = Column(Integer, primary_key=True)
asset_id = Column(Integer, ForeignKey('assets.asset_id'))
gregorian_day = Column(Integer)
date_string = Column(String)
year = Column(Integer)
month = Column(Integer)
day = Column(Integer)
px_open = Column(DECIMAL)
px_close = Column(DECIMAL)
px_high = Column(DECIMAL)
px_low = Column(DECIMAL)
volume = Column(Integer)
#asset = relation(Asset, backref=backref('prices',
#order_by=gregorian_day))
def __init__(self, gregorian_day, date_string, year, month, day,
px_open, px_close, px_high, px_low, volume):
self.gregorian_day = gregorian_day
self.date_string = date_string
self.year = year
self.month = month
self.day = day
self.px_open = px_open
self.px_close = px_close
self.px_high = px_high
self.px_low = px_low
self.volume = volume
def __repr__(self):
return "<Price('%s', '%s', '%s','%s','%s','%s','%s','%s','%s','%s')>" \
% (self.gregorian_day, self.date_string, self.year, self.month,
self.day, self.px_open, self.px_close, self.px_high,
self.px_low, self.volume)
#CREATE DB TABLES#
Base.metadata.create_all(engine)
#PACKAGE RAW DATA#
def package_data(db=None, ticker=None, tag='stock', start=None, end=None):
'''
package_data() uses quotes_historical_yahoo() to create a data set for a
given stock's price history. Date_string, Year, month, and day fields are
included for added flexibility. Returns a dictionary of tuples.
'''
raw_quotes = quotes_historical_yahoo(ticker, start, end) #list of tuples
data = []
for quote in raw_quotes:
date_raw = datetime.datetime.fromordinal(int(quote[0]))
year, month, day = date_raw.year, date_raw.month, date_raw.day
date_string = date_raw.strftime("%Y-%m-%d")
record = (ticker, tag, quote[0], date_string, year, month, day,
quote[1], quote[2], quote[3], quote[4], quote[5])
data.append(record)
headers = ('ticker',
'tag',
'gregorian_day',
'date_string',
'year',
'month',
'day',
'px_open',
'px_close',
'px_high',
'px_low',
'volume')
return {'data':data, 'headers':headers}
Executing this code essentially sets up the schema for an sqlite database and provides a package_data() function that will pull in data for a given ticker and date range. The "echo=True" parameter in "engine = create_engine('sqlite:////' + path, echo=True)" will print out the SQL statements SQLA generates to the terminal. Executing the code yields:>>>
Evaluating SAscratchcode.py
2009-08-14 00:58:57,862 INFO sqlalchemy.engine.base.Engine.0x...b9b0 PRAGMA table_info("assets")
2009-08-14 00:58:57,863 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,863 INFO sqlalchemy.engine.base.Engine.0x...b9b0 PRAGMA table_info("prices")
2009-08-14 00:58:57,863 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,864 INFO sqlalchemy.engine.base.Engine.0x...b9b0
CREATE TABLE assets (
asset_id INTEGER NOT NULL,
ticker VARCHAR,
tag VARCHAR,
PRIMARY KEY (asset_id),
UNIQUE (ticker)
)2009-08-14 00:58:57,864 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,865 INFO sqlalchemy.engine.base.Engine.0x...b9b0 COMMIT
2009-08-14 00:58:57,866 INFO sqlalchemy.engine.base.Engine.0x...b9b0
CREATE TABLE prices (
price_id INTEGER NOT NULL,
asset_id INTEGER,
gregorian_day INTEGER,
date_string VARCHAR,
year INTEGER,
month INTEGER,
day INTEGER,
px_open NUMERIC(10, 2),
px_close NUMERIC(10, 2),
px_high NUMERIC(10, 2),
px_low NUMERIC(10, 2),
volume INTEGER,
PRIMARY KEY (price_id),
FOREIGN KEY(asset_id) REFERENCES assets (asset_id)
) 2009-08-14 00:58:57,866 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,868 INFO sqlalchemy.engine.base.Engine.0x...b9b0 COMMIT
>>> GOOG=Asset('GOOG', 'stock')
>>> GOOG
<Asset('GOOG', 'stock')>
>>> GOOG.ticker
'GOOG'
>>> GOOG.prices
[]
>>> raw = package_data(ticker='GOOG', start=date1, end=date2)
>>> raw['headers']
('ticker', 'tag', 'gregorian_day', 'date_string', 'year', 'month', 'day', 'px_open', 'px_close', 'px_high', 'px_low', 'volume')
>>> GOOG.prices = [Price(record[2],record[3],record[4], record[5],record[6],record[7],record[8],record[9],record[10],record[11]) for record in raw['data']]
>>> GOOG.prices
[<Price('733409.0', '2009-01-02', '2009','1','2','308.6','321.32','321.82','305.5','3610500')>,
<Price('733412.0', '2009-01-05', '2009','1','5','321.0','328.05','331.24','315.0','4889000')>,...]
>>> Session = scoped_session(sessionmaker(engine))
>>> session = Session()
>>> session.add(GOOG)
>>> session.commit()
2009-08-14 01:10:15,424 INFO sqlalchemy.engine.base.Engine.0x...b9b0 BEGIN
2009-08-14 01:10:15,425 INFO sqlalchemy.engine.base.Engine.0x...b9b0 INSERT INTO assets (ticker, tag) VALUES (?, ?)
2009-08-14 01:10:15,425 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ['GOOG', 'stock']
2009-08-14 01:10:15,471 INFO sqlalchemy.engine.base.Engine.0x...b9b0 INSERT INTO prices (asset_id, gregorian_day, date_string, year, month, day, px_open, px_close, px_high, px_low, volume) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2009-08-14 01:10:15,471 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1, 733409.0, '2009-01-02', 2009, 1, 2, '308.6', '321.32', '321.82', '305.5', 3610500]
2009-08-14 01:10:15,472 INFO sqlalchemy.engine.base.Engine.0x...b9b0 INSERT INTO prices (asset_id, gregorian_day, date_string, year, month, day, px_open, px_close, px_high, px_low, volume) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2009-08-14 01:10:15,472 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1, 733412.0, '2009-01-05', 2009, 1, 5, '321.0', '328.05', '331.24', '315.0', 4889000]
...(and it continues on)...
>>> GOOG.prices[0]
2009-08-14 01:18:57,881 INFO sqlalchemy.engine.base.Engine.0x...b9b0 BEGIN
2009-08-14 01:18:57,883 INFO sqlalchemy.engine.base.Engine.0x...b9b0 SELECT assets.asset_id AS assets_asset_id, assets.ticker AS assets_ticker, assets.tag AS assets_tag
FROM assets
WHERE assets.asset_id = ?
2009-08-14 01:18:57,883 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1]
2009-08-14 01:18:57,885 INFO sqlalchemy.engine.base.Engine.0x...b9b0 SELECT prices.price_id AS prices_price_id, prices.asset_id AS prices_asset_id, prices.gregorian_day AS prices_gregorian_day, prices.date_string AS prices_date_string, prices.year AS prices_year, prices.month AS prices_month, prices.day AS prices_day, prices.px_open AS prices_px_open, prices.px_close AS prices_px_close, prices.px_high AS prices_px_high, prices.px_low AS prices_px_low, prices.volume AS prices_volume
FROM prices
WHERE ? = prices.asset_id ORDER BY prices.gregorian_day
2009-08-14 01:18:57,885 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1]
<Price('733409', '2009-01-02', '2009','1','2','308.6','321.32','321.82','305.5','3610500')>
>>> GOOG.prices[0].px_high
Decimal("321.82")
>>> test_run = [(record.date_string, record.px_close) for record in GOOG.prices[0:10]]
>>> test_run
[(u'2009-01-02', Decimal("321.32")), (u'2009-01-05', Decimal("328.05")), (u'2009-01-06', Decimal("334.06")), (u'2009-01-07', Decimal("322.01")), (u'2009-01-08', Decimal("325.19")), (u'2009-01-09', Decimal("315.07")), (u'2009-01-12', Decimal("312.69")), (u'2009-01-13', Decimal("314.32")), (u'2009-01-14', Decimal("300.97")), (u'2009-01-15', Decimal("298.99"))]
Of course, I forgot to attach the code to the previous post.
The previous post showed how matplotlib can pull market data from Yahoo. Using some python-fu, you can easily create CSV files for given stocks. CSV files are great as they are easy to parse and don't require a lot of overhead (in terms of setting things up, you can just open the file directly). Of course, Yahoo, Google, and others offer decent tools to view stock prices, so creating CSV files might be a bit redundant. Furthermore, if you really want to go whole hog and pull prices for thousands of assets, I imagine one would have to think carefully about an appropriate filesystem.
On the other hand, python includes a built in wrapper for sqlite, the ubiquitous file-based database that's embedded in so many mobile and desktop applications. Sqlite runs on any platform and is a completely open source, flyweight piece of software (which explains it's popularity). This post will outline how to create a simple two table sqlite database to store stock prices. First if all, sqlite has it's own interpreter you can access from the command line, so if you're an sql cowboy, you can instantly execute sql statements from the prompt to explore a given database. Here are a few examples: The .help command lists some of the sqlite specific commands available at the command line.
stocks = {'data':[('GOOG', 'stock'), ('AAPL','stock')],
'headers':('ticker', 'tag')}
raw_quotes = quotes_historical_yahoo(ticker, start, end) #list of tuples
data = []
for quote in raw_quotes:
date_raw = datetime.datetime.fromordinal(int(quote[0]))
year, month, day = date_raw.year, date_raw.month, date_raw.day
date_string = str(year)+'-'+str(month)+'-'+str(day)
record = (f_key, quote[0], date_string, year, month, day,
quote[1], quote[2], quote[3], quote[4], quote[5])
data.append(record)
headers = ('asset_id',
'gregorian_day',
'date_string',
'year',
'month',
'day',
'open',
'close',
'high',
'low',
'volume')
return {'data':data, 'headers':headers}
I use a fair amount of string substitution in the code, which isn't strictly recommended, but I'm not really concerned with security. Rather than create two separate functions for adding stocks to the asset table and market data to the prices table, I used string operations to expand or contract the substitution marks to match the number of headers. I'm sure there are more elegant ways to do it, but, hey, it works. Finally, I'll plug the Sqlite Manager add-on for Firefox. It's a basic and easy way to inspect sqlite files.I thought matplotlib was purely a visualization tool, but the rabbit hole is deeper than I thought. One handy module I recently discovered is matplotlib.finance. It isn't featured in the documentation (as far as I know), but contains functions that allow the user to pull stock prices from yahoo as a list of tuples or as array objects.