DK // Factoids and occasional bits of useful information.
As noted in a previous post, your.flowingdata.com (yfd) is a handy way to collect personal data. I've been collecting sleep, diaper, etc. data on my newborn son. Although yfd now allows users to calculate durations between specified events, the visualization of the information isn't quite to my liking and it's clear that errors in the data can make for some odd durations (e.g., my son slept for two days!). Numpy to the rescue!
For those of you who don't know, numpy is python's powerful array package. Rather than loop myself to death, I thought it made more sense to use of numpy's powerful slicing and masking features to clean up the data. These features make it easy to find data entry errors.
I use the Enthought python distribution for convenience sake (and because I can't resist all those libraries -- most of which I'll never use). Below you'll find some screenshots that step through my little script. Refer to the complete code here. (Well, it's just a start really). The code is probably a bit verbose for what it does, but we all start somewhere.
The first step is getting the data into an array you can manipulate. For your reference, your.flowingdata yields data that looks like this:

array([('gmorning', '', '', '2009-10-24 23:45:36'),('gmorning', '', '', '2009-09-30 18:15:04'), ('gnight', '', '', '2009-09-23 21:00:03'), ('gmorning', '', '', '2009-09-23 19:15:03')])
I won't step through all the code here since it's available above, but you get the idea. One thing to watch out for: datetimes. I spent a lot of time trying to figure out the best way to handle the timestamps included with the yfd event data. There are ways to convert strings to ordinal numbers into datetime objects and back again, but really I wanted to manipulate the datetime objects directly to take advantage of numpy's array slicing and arithmetic. Luckily, numpy allows object types (technically, they are called 'dtypes'). This allows you to subtract one timestamp array from another to get the elapsed time without any conversions (though you'll have to convert at some point if you want to generate a human-readable string). Here's an example of the array you'll get at the end (heads -> sleep duration, start time, end time):
Personal data capture is a meme that's gaining momentum. Products such as Nike+ and, more recently, Fitbit, target those who would like to monitor daily exercise and other activities. Websites that allow users to manually track how they use their time have also started to pop-up. For those of us that like to procrastinate, these monitoring tools can help by providing regular feedback. Watching a little line move in the right direction can be pretty motivating.
Of course, I don't use any of these services. For myself.Nevertheless, as a new father, I've found that your.flowingdata.com is an easy and useful way to track the activities of my newborn son! The service uses tweets to capture pretty much any kind of data you'd care to record. There are electronic products (e.g., Itsbeen, basically a stopwatch on steroids) that help new parents keep track of when the baby last slept, ate, poo'ed, etc. They do not, however, capture that data for analysis. My wife and I would like to see the historical data to see if we can tease out some insights about our son (e.g., how much sleep does he need before he gets cranky?). We tried using an iPhone app called Blogger that helps parents keep track of these things, but it wasn't immediate enough. We ended-up writing down events on the nursery mirror with a dry erase pen, but I really wanted to track things via a single button press. By the time I've finished dodging multiple salvos of pee and poo, multiple diaper changes due to said peeing and pooing, spit-up, puking, and sundry other lovely activities (a testament to how much I love you, boy), I can't remember anything that's happened in the last five minutes, let alone the last hour or two. So far, your.flowingdata.com has been the answer. your.flowingdata.com ('yfd') is a service based on Twitter. Users send direct messages to 'yfd' and can visit the site for simple visualizations. Users can also download tab-delimited files with all the data. But wait, there's more! One kind soul also created a simple yfd iPhone application that allows users to send an update (e.g. 'd yfd gnight') via a single button press. Each button can be customized as well. I have no use for Twitter, but yfd got me to open an account. We're still figuring out what we want to record, but the service's flexibility and ease-of-use makes it much more likely we'll actually use it. yfd isn't perfect. There's no built-in way to, for example, calculate the time that has elapsed between two actions (e.g. going to sleep and waking up). One has to download the data and calculate durations manually (or create a script to do it). There are other visualizations available, though. As I mentioned, I find it's much more important to make it easy to capture data for something like this. If it's a pain to capture the data, there won't be anything to analyze on the back-end anyway. So, if you have absolutely no interest in personal fitness, time tracking, etc., you may want to check out your.flowingdata.com...for the children.UPDATE: yfd has been updated to allow the calculation of durations between defined actions. I'd love to be able to aggregate these durations over a given time period (i.e. daily, weekly, monthly, etc.) in the form of a bar chart or something. yfd does visualize the data, but in a slightly different way. Best if you just check it out through the "Explore" link on the yfd site.
'''
SQLAlchemy ORM declarative example.
'''
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, DECIMAL
from sqlalchemy import MetaData, create_engine, ForeignKey
from sqlalchemy.orm import relation, backref, sessionmaker, scoped_session
from matplotlib.finance import quotes_historical_yahoo
import datetime
import os
path = os.path.expanduser('~') + \
'/Dev/Data/AssetPrices/SQLite/assetprices_SA.sqlite'
engine = create_engine('sqlite:////' + path, echo=True)
Base = declarative_base(bind=engine)
date1 = datetime.datetime(2009,1,1)
date2 = datetime.datetime.now()
#DEFINE TABLES#
class Asset(Base):
__tablename__ = 'assets'
asset_id = Column(Integer, primary_key=True)
ticker = Column(String, unique=True)
tag = Column(String)
prices = relation('Price', order_by='Price.gregorian_day', backref='assets')
def __init__(self, ticker, tag):
self.ticker = ticker
self.tag = tag
def __repr__(self):
return "<Asset('%s', '%s')>" % (self.ticker, self.tag)
class Price(Base):
__tablename__ = 'prices'
price_id = Column(Integer, primary_key=True)
asset_id = Column(Integer, ForeignKey('assets.asset_id'))
gregorian_day = Column(Integer)
date_string = Column(String)
year = Column(Integer)
month = Column(Integer)
day = Column(Integer)
px_open = Column(DECIMAL)
px_close = Column(DECIMAL)
px_high = Column(DECIMAL)
px_low = Column(DECIMAL)
volume = Column(Integer)
#asset = relation(Asset, backref=backref('prices',
#order_by=gregorian_day))
def __init__(self, gregorian_day, date_string, year, month, day,
px_open, px_close, px_high, px_low, volume):
self.gregorian_day = gregorian_day
self.date_string = date_string
self.year = year
self.month = month
self.day = day
self.px_open = px_open
self.px_close = px_close
self.px_high = px_high
self.px_low = px_low
self.volume = volume
def __repr__(self):
return "<Price('%s', '%s', '%s','%s','%s','%s','%s','%s','%s','%s')>" \
% (self.gregorian_day, self.date_string, self.year, self.month,
self.day, self.px_open, self.px_close, self.px_high,
self.px_low, self.volume)
#CREATE DB TABLES#
Base.metadata.create_all(engine)
#PACKAGE RAW DATA#
def package_data(db=None, ticker=None, tag='stock', start=None, end=None):
'''
package_data() uses quotes_historical_yahoo() to create a data set for a
given stock's price history. Date_string, Year, month, and day fields are
included for added flexibility. Returns a dictionary of tuples.
'''
raw_quotes = quotes_historical_yahoo(ticker, start, end) #list of tuples
data = []
for quote in raw_quotes:
date_raw = datetime.datetime.fromordinal(int(quote[0]))
year, month, day = date_raw.year, date_raw.month, date_raw.day
date_string = date_raw.strftime("%Y-%m-%d")
record = (ticker, tag, quote[0], date_string, year, month, day,
quote[1], quote[2], quote[3], quote[4], quote[5])
data.append(record)
headers = ('ticker',
'tag',
'gregorian_day',
'date_string',
'year',
'month',
'day',
'px_open',
'px_close',
'px_high',
'px_low',
'volume')
return {'data':data, 'headers':headers}
Executing this code essentially sets up the schema for an sqlite database and provides a package_data() function that will pull in data for a given ticker and date range. The "echo=True" parameter in "engine = create_engine('sqlite:////' + path, echo=True)" will print out the SQL statements SQLA generates to the terminal. Executing the code yields:>>>
Evaluating SAscratchcode.py
2009-08-14 00:58:57,862 INFO sqlalchemy.engine.base.Engine.0x...b9b0 PRAGMA table_info("assets")
2009-08-14 00:58:57,863 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,863 INFO sqlalchemy.engine.base.Engine.0x...b9b0 PRAGMA table_info("prices")
2009-08-14 00:58:57,863 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,864 INFO sqlalchemy.engine.base.Engine.0x...b9b0
CREATE TABLE assets (
asset_id INTEGER NOT NULL,
ticker VARCHAR,
tag VARCHAR,
PRIMARY KEY (asset_id),
UNIQUE (ticker)
)2009-08-14 00:58:57,864 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,865 INFO sqlalchemy.engine.base.Engine.0x...b9b0 COMMIT
2009-08-14 00:58:57,866 INFO sqlalchemy.engine.base.Engine.0x...b9b0
CREATE TABLE prices (
price_id INTEGER NOT NULL,
asset_id INTEGER,
gregorian_day INTEGER,
date_string VARCHAR,
year INTEGER,
month INTEGER,
day INTEGER,
px_open NUMERIC(10, 2),
px_close NUMERIC(10, 2),
px_high NUMERIC(10, 2),
px_low NUMERIC(10, 2),
volume INTEGER,
PRIMARY KEY (price_id),
FOREIGN KEY(asset_id) REFERENCES assets (asset_id)
) 2009-08-14 00:58:57,866 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ()
2009-08-14 00:58:57,868 INFO sqlalchemy.engine.base.Engine.0x...b9b0 COMMIT
>>> GOOG=Asset('GOOG', 'stock')
>>> GOOG
<Asset('GOOG', 'stock')>
>>> GOOG.ticker
'GOOG'
>>> GOOG.prices
[]
>>> raw = package_data(ticker='GOOG', start=date1, end=date2)
>>> raw['headers']
('ticker', 'tag', 'gregorian_day', 'date_string', 'year', 'month', 'day', 'px_open', 'px_close', 'px_high', 'px_low', 'volume')
>>> GOOG.prices = [Price(record[2],record[3],record[4], record[5],record[6],record[7],record[8],record[9],record[10],record[11]) for record in raw['data']]
>>> GOOG.prices
[<Price('733409.0', '2009-01-02', '2009','1','2','308.6','321.32','321.82','305.5','3610500')>,
<Price('733412.0', '2009-01-05', '2009','1','5','321.0','328.05','331.24','315.0','4889000')>,...]
>>> Session = scoped_session(sessionmaker(engine))
>>> session = Session()
>>> session.add(GOOG)
>>> session.commit()
2009-08-14 01:10:15,424 INFO sqlalchemy.engine.base.Engine.0x...b9b0 BEGIN
2009-08-14 01:10:15,425 INFO sqlalchemy.engine.base.Engine.0x...b9b0 INSERT INTO assets (ticker, tag) VALUES (?, ?)
2009-08-14 01:10:15,425 INFO sqlalchemy.engine.base.Engine.0x...b9b0 ['GOOG', 'stock']
2009-08-14 01:10:15,471 INFO sqlalchemy.engine.base.Engine.0x...b9b0 INSERT INTO prices (asset_id, gregorian_day, date_string, year, month, day, px_open, px_close, px_high, px_low, volume) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2009-08-14 01:10:15,471 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1, 733409.0, '2009-01-02', 2009, 1, 2, '308.6', '321.32', '321.82', '305.5', 3610500]
2009-08-14 01:10:15,472 INFO sqlalchemy.engine.base.Engine.0x...b9b0 INSERT INTO prices (asset_id, gregorian_day, date_string, year, month, day, px_open, px_close, px_high, px_low, volume) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2009-08-14 01:10:15,472 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1, 733412.0, '2009-01-05', 2009, 1, 5, '321.0', '328.05', '331.24', '315.0', 4889000]
...(and it continues on)...
>>> GOOG.prices[0]
2009-08-14 01:18:57,881 INFO sqlalchemy.engine.base.Engine.0x...b9b0 BEGIN
2009-08-14 01:18:57,883 INFO sqlalchemy.engine.base.Engine.0x...b9b0 SELECT assets.asset_id AS assets_asset_id, assets.ticker AS assets_ticker, assets.tag AS assets_tag
FROM assets
WHERE assets.asset_id = ?
2009-08-14 01:18:57,883 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1]
2009-08-14 01:18:57,885 INFO sqlalchemy.engine.base.Engine.0x...b9b0 SELECT prices.price_id AS prices_price_id, prices.asset_id AS prices_asset_id, prices.gregorian_day AS prices_gregorian_day, prices.date_string AS prices_date_string, prices.year AS prices_year, prices.month AS prices_month, prices.day AS prices_day, prices.px_open AS prices_px_open, prices.px_close AS prices_px_close, prices.px_high AS prices_px_high, prices.px_low AS prices_px_low, prices.volume AS prices_volume
FROM prices
WHERE ? = prices.asset_id ORDER BY prices.gregorian_day
2009-08-14 01:18:57,885 INFO sqlalchemy.engine.base.Engine.0x...b9b0 [1]
<Price('733409', '2009-01-02', '2009','1','2','308.6','321.32','321.82','305.5','3610500')>
>>> GOOG.prices[0].px_high
Decimal("321.82")
>>> test_run = [(record.date_string, record.px_close) for record in GOOG.prices[0:10]]
>>> test_run
[(u'2009-01-02', Decimal("321.32")), (u'2009-01-05', Decimal("328.05")), (u'2009-01-06', Decimal("334.06")), (u'2009-01-07', Decimal("322.01")), (u'2009-01-08', Decimal("325.19")), (u'2009-01-09', Decimal("315.07")), (u'2009-01-12', Decimal("312.69")), (u'2009-01-13', Decimal("314.32")), (u'2009-01-14', Decimal("300.97")), (u'2009-01-15', Decimal("298.99"))]
I recently talked to someone who was interested in integrating the different MS-Office products programmatically. The obvious solution is VBA, since it's built-in. I have no desire to learn VBA, but Python does offer the win32 COM interface. I'd almost forgotten since I've been using a Mac for a while. Anyway, I ran across this short tutorial on using COM and Python to automate the creation of powerpoint slides. I used COM with excel a while back, but it was slow (and thus turned to the very nice xlwt/xlrd combo to manipulate excel files). Nevertheless, I can see it coming in handy if you are constantly updating slides with essentially the same, but more recent, data.
UPDATE: I recently learned of reStructuredText, which many python tools use to create documentation from plain text files. There are tools such as S5, Bruce, and rst2pdf that facilitate the creation or display of presentations in different ways.
Of course, I forgot to attach the code to the previous post.