[kwlug-disc] Python

William Park opengeometry at yahoo.ca
Tue Jan 10 00:43:09 EST 2012


On Mon, Jan 09, 2012 at 10:49:07PM -0500, Steve Izma wrote:
> On Sun, Jan 08, 2012 at 11:41:28PM -0500, William Park wrote:
> > Subject: Re: [kwlug-disc] Python
> > 
> > "manipulating spreadsheet" caught my eyes.  What do yo mean?
> > Are you parsing out XML file that Excel saves in?
> > Or, are you parsing CSV format?
> 
> I'm referring to the need to manipulate data at a workplace
> where the main tool people use for keeping track of data is a
> spreadsheet. In my experience, for example, people are more
> likely to use spreadsheets than databases for keeping track
> of things like mailing addresses. I blame this abuse of tools
> entirely on the refusal of most businesses and institutions to
> properly train their staff.

Obvious question is what should they be trained on?  That is not easy
question in terms of functionality/complexity and cost/benefit.

> 
> But the advantage of this is that at least people are using a
> structured format for their data. If they only used word
> processors, it would be a lot harder to do something
> programmatically (and thus analytically) with the contents.

Definitely.  You don't have to write code/test/debug trying to extract
data.  Data have already been parsed out for you.

> 
> So I've found that python works really well with structured data:
> spreadsheets, XML/SGML, databases, filesystems. Python also has
> an abundance of libraries (as pointed out in another message) for
> handling various file formats, so I use these to actually get at
> the data.
> 
> Another example is sales reports. Almost all the bookstores and
> book wholesalers we deal with use spreadsheets to report sales to
> us. But they all set up their columns in different ways (usually
> they stick to their own layouts in subsequent reports, but not
> always). At least everyone uses a book's ISBN to identify it, so
> I use python to find the ISBNS, collate the related sales
> data from various sources, and run various calculations on the
> collection, then output the data either into another spreadsheet
> or (more likely) to groff for a clean presentation. This is way
> easier than trying to put a bunch of these spreadsheets together
> into another one and then add the formulas necessary to get
> summaries. Good grief! I'd have to use a mouse in that case!

Why it caught my eyes is that "spreadsheet" is the basis of an
ERP/Accounting application called Appgen (www.appgen.com).  There are
good reasons why you never heard of it.  It's an old old Unix
application that needs to be rewritten and repackaged.  

An Appgen file is DBM database (ie. key=value) where "value" is a
spreadsheet.  Much like multiple worksheet in Excel, it can have
millions of spreadsheet.  There are files for items (IV), sales (CR),
vendors (AP), customers (AR), purchase (PO), order (OE), etc.  People
get hung up user interface, but underneath it's just moving data from
cell to cell.  After working on Appgen for a bit, despite my initial
thoughts, I've come to realize that "spreadsheet" is not that bad.  If
you know Excel, then you can grasp the flow of data through the ERP
system.

Hence, my interest in your use of Python in what seems to be spreadsheet
related business case.
-- 
William




More information about the kwlug-disc mailing list