CompSciWeek6

= Reading (shared with Week 7) =
 * Beginning Python - skim. chapters 8-14 (use as reference material)
 * see expecially urlopen on p. 300, forks and threads on p. 304
 * Beginning Python - Chapter 15 (Web services)

= Class 1: Effective Design =
 * Structured Code, Bioinformatics example from AOS Book
 * Code Testing
 * Source Code Versioning
 * basic git

= Class 2: Using HPC Resources =
 * Accessing binaries and libraries, using modules
 * Using scratch space
 * Submitting a job script
 * Managing queued jobs
 * Advanced scripting tips and tricks
 * awk

= Homework 4 (Due Fri., Oct. 10) = Please email the completed homework with the subject line "SciComp HW4, (your name)"


 * 1) Write example functions that use the advanced function notation from Beginning Python, Ch. 6 (see especially the example on p. 124).
 * 2) f(arg=default): the function should do nothing if the function is called as f, and it should call arg.set_price(12) if it is called as f(type("InvItem",, {"set_price":(lambda a,b: b)}))
 * 3) f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
 * 4) f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")
 * 5) Write an example python class to represent a general inventory item.  It should store its own name, and must contain the following methods: getCount, returning the (arbitrary, fixed) number of items in inventory, and getPrice, which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
 * 6) The article "Working with Big Data in Bioinformatics" describes software that reads lots of small strings and increments some counters for each string.  The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts.  Describe which of those three categories you would place each of the following routines in, and why.
 * 7) A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
 * 8) A script that creates a plot of the k-mer counts in a subset of the data.
 * 9) A function reading and parsing files containing genomic sequence data.
 * 10) A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)
 * 11) Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
 * 12) "We expected the highest traffic to be in the k-mer counting logic."
 * 13) "Redundant calls to the toupper function were present in the highest traffic regions of the code."
 * 14) "Input of genomic reads was performed line-by-line and on demand and without any readahead tuning."
 * 15) "A copy-by-value of the genomic read struct [was] performed for every parsed and valid genomic read."

= Codes =

Power function with logarithmic run time in n (linear in the size of n)

Testing the last module using python's doctest:

Using the python-geocoder-0.2 interface to Google's web-API to get distances: