Difference between revisions of "CompSciWeek6"

From Predictive Chemistry
Jump to: navigation, search
m (Homework 4 (Due Fri., Oct. 10))
Line 25: Line 25:
 
## f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
 
## f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
 
## f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")
 
## f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")
 
 
# Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
 
# Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
 
 
# The article [http://www.aosabook.org/en/posa/working-with-big-data-in-bioinformatics.html "Working with Big Data in Bioinformatics"] describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why.
 
# The article [http://www.aosabook.org/en/posa/working-with-big-data-in-bioinformatics.html "Working with Big Data in Bioinformatics"] describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why.
 
## A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
 
## A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
Line 33: Line 31:
 
## A function reading and parsing files containing genomic sequence data.
 
## A function reading and parsing files containing genomic sequence data.
 
## A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)
 
## A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)
 
 
# Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
 
# Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
 
## "We expected the highest traffic to be in the k-mer counting logic."
 
## "We expected the highest traffic to be in the k-mer counting logic."

Revision as of 10:44, 30 September 2014

Reading (shared with Week 7)

  • Beginning Python - skim. chapters 8-14 (use as reference material)
    • see expecially urlopen on p. 300, forks and threads on p. 304
  • Beginning Python - Chapter 15 (Web services)

Class 1: Effective Design

  • Structured Code, Bioinformatics example from AOS Book
  • Code Testing
  • Source Code Versioning
    • basic git

Class 2: Using HPC Resources

  • Accessing binaries and libraries, using modules
  • Using scratch space
  • Submitting a job script
  • Managing queued jobs
  • Advanced scripting tips and tricks
    • awk

Homework 4 (Due Fri., Oct. 10)

Please email the completed homework with the subject line "SciComp HW4, (your name)"

  1. Write example functions that use the advanced function notation from Beginning Python, Ch. 6 (see especially the example on p. 124).
    1. f(arg=default): the function should do nothing if the function is called as f(), and it should call arg.set_price(12) if it is called as f(type("InvItem", (), {"set_price":(lambda a,b: b)})())
    2. f(*arg): the function should return the number of arguments passed in the call f('a', 'b', 1, 5, {'t': [4]})
    3. f(**args): the function should return the value associated with the key "agent" in the call f(auto="DB5", lno=31337, agent="007")
  2. Write an example python class to represent a general inventory item. It should store its own name, and must contain the following methods: getCount(), returning the (arbitrary, fixed) number of items in inventory, and getPrice(), which computes the price using the formula price = price0 - k*log(count), where price0 and k are arbitrary, fixed variables belonging to the object.
  3. The article "Working with Big Data in Bioinformatics" describes software that reads lots of small strings and increments some counters for each string. The overall structure of their code contains a fast C++ library, a python wrapper, and python scripts. Describe which of those three categories you would place each of the following routines in, and why.
    1. A class that creates C++ objects representing counters for sequence data and that contains methods for translating the counts to numpy arrays.
    2. A script that creates a plot of the k-mer counts in a subset of the data.
    3. A function reading and parsing files containing genomic sequence data.
    4. A script installing the complete Khmer package, (compiling the C++ library, copying the python package, etc.)
  4. Explain (without trying to solve their problems) why each of the following quotes from the article might be relevant to the performance of their code:
    1. "We expected the highest traffic to be in the k-mer counting logic."
    2. "Redundant calls to the toupper function were present in the highest traffic regions of the code."
    3. "Input of genomic reads was performed line-by-line and on demand and without any readahead tuning."
    4. "A copy-by-value of the genomic read struct [was] performed for every parsed and valid genomic read."