Difference between revisions of "CompSciWeek6"
From Predictive Chemistry
m (→Class 1: Effective Design) |
|||
Line 17: | Line 17: | ||
* Advanced scripting tips and tricks |
* Advanced scripting tips and tricks |
||
** awk |
** awk |
||
+ | |||
+ | = Homework 4 (Due Fri., Oct. 10) = |
||
+ | Please email the completed homework with the subject line "SciComp HW4, (your name)" |
||
+ | |||
+ | # Write example functions that use the advanced function notation from Beginning Python, Ch. 6 (see especially the example on p. 124). |
||
+ | ## f(arg=default): the function should do nothing if the function is called as f(), and it should call arg.set_price(12) if it is called as f(type("InvItem", (), {"set_price":(lambda a,b: b)})()) |
||
+ | ## f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]}) |
||
+ | ## f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007") |
||
+ | |||
+ | # Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object. |
||
+ | |||
+ | # The article [http://www.aosabook.org/en/posa/working-with-big-data-in-bioinformatics.html "Working with Big Data in Bioinformatics"] describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why. |
||
+ | ## A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays. |
||
+ | ## A script that creates a plot of the k-mer counts in a subset of the data. |
||
+ | ## A function reading and parsing files containing genomic sequence data. |
||
+ | ## A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.) |
||
+ | |||
+ | # Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code: |
||
+ | ## "We expected the highest traffic to be in the k-mer counting logic." |
||
+ | ## "Redundant calls to the toupper function were present in the highest traffic regions of the code." |
||
+ | ## "Input of genomic reads was performed line-by-line and on demand and without any readahead tuning." |
||
+ | ## "A copy-by-value of the genomic read struct [was] performed for every parsed and valid genomic read." |
Revision as of 10:44, 30 September 2014
Contents
- Beginning Python - skim. chapters 8-14 (use as reference material)
- see expecially urlopen on p. 300, forks and threads on p. 304
- Beginning Python - Chapter 15 (Web services)
Class 1: Effective Design
- Structured Code, Bioinformatics example from AOS Book
- Code Testing
- Source Code Versioning
- basic git
Class 2: Using HPC Resources
- Accessing binaries and libraries, using modules
- Using scratch space
- Submitting a job script
- Managing queued jobs
- Advanced scripting tips and tricks
- awk
Homework 4 (Due Fri., Oct. 10)
Please email the completed homework with the subject line "SciComp HW4, (your name)"
- Write example functions that use the advanced function notation from Beginning Python, Ch. 6 (see especially the example on p. 124).
- f(arg=default): the function should do nothing if the function is called as f(), and it should call arg.set_price(12) if it is called as f(type("InvItem", (), {"set_price":(lambda a,b: b)})())
- f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
- f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")
- Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
- The article "Working with Big Data in Bioinformatics" describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why.
- A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
- A script that creates a plot of the k-mer counts in a subset of the data.
- A function reading and parsing files containing genomic sequence data.
- A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)
- Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
- "We expected the highest traffic to be in the k-mer counting logic."
- "Redundant calls to the toupper function were present in the highest traffic regions of the code."
- "Input of genomic reads was performed line-by-line and on demand and without any readahead tuning."
- "A copy-by-value of the genomic read struct [was] performed for every parsed and valid genomic read."