Difference between revisions of "CompSciWeek5"
From Predictive Chemistry
(→Class 1: Effective Design) |
|||
(One intermediate revision by the same user not shown) | |||
Line 3: | Line 3: | ||
* [http://www.aosabook.org/en/posa/working-with-big-data-in-bioinformatics.html Bioinformatics Code Design] |
* [http://www.aosabook.org/en/posa/working-with-big-data-in-bioinformatics.html Bioinformatics Code Design] |
||
− | = Class 1: |
+ | = Class 1 and 2: Code walkthrough = |
− | * |
+ | * Structuring a list min/max problem over multiple timescales |
− | * |
+ | * Building and using a structured nearest neighbor graph |
− | * Code Testing |
||
− | * Source Code Versioning |
||
− | ** basic git |
||
− | = Class 2: Using HPC Resources = |
||
+ | = Example Codes = |
||
− | * Accessing binaries and libraries, using modules |
||
+ | == List Min/Max == |
||
− | * Using scratch space |
||
+ | See [[CompSciWeek1]]. |
||
− | * Submitting a job script |
||
+ | |||
− | * Managing queued jobs |
||
+ | == Using Graphs == |
||
− | * Advanced scripting tips and tricks |
||
+ | <source lang="python"> |
||
− | ** awk |
||
+ | #!/usr/bin/env python |
||
+ | |||
+ | from numpy import array, sum, reshape |
||
+ | |||
+ | # input - list of names |
||
+ | # - (3n)x 3 coordinate array |
||
+ | # output - Graph G |
||
+ | def make_G(names, x): |
||
+ | assert x.shape[0]%3 == 0 |
||
+ | assert x.shape[1] == 3 |
||
+ | assert len(names) == len(x) |
||
+ | |||
+ | print x.shape |
||
+ | n = len(x)/3 |
||
+ | G = {} |
||
+ | D2 = sum((reshape(x, (1,3*n,3)) - reshape(x, (3*n,1,3)))**2, -1) |
||
+ | print D2 |
||
+ | # add bonds to G! |
||
+ | for i,n in enumerate(names): |
||
+ | print i,n |
||
+ | G[i] = set() |
||
+ | if n != 'O': continue |
||
+ | for j,m in enumerate(names): |
||
+ | print j,m |
||
+ | if m != 'H': continue |
||
+ | # all O-H distances, 1 at a time |
||
+ | if D2[i,j] < 1.1: |
||
+ | G[i].add(j) |
||
+ | G[j].add(i) # Question for class: why does this cause an error? |
||
+ | return G |
||
+ | |||
+ | # input - list of names (1 O per 2 H-s) |
||
+ | # - graph from atom number to atom numbers |
||
+ | # output - list of atom numbers ordered O, H, H, O, H, H, ... |
||
+ | def outp_graph(names, G): |
||
+ | out = [] |
||
+ | for i,n in enumerate(names): |
||
+ | if n != "O": |
||
+ | continue |
||
+ | out.append(i) # 'O' number |
||
+ | bonds = G[i] |
||
+ | assert len(bonds) == 2, "Bad number (%d) of O-bonds."%(len(bonds)) |
||
+ | |||
+ | for j in bonds: # 'H' numbers |
||
+ | out.append(j) |
||
+ | return out |
||
+ | |||
+ | names = ['H', 'O', 'H', 'H', 'H', 'O'] |
||
+ | x = array([[1,0,0],[0,0,0],[-1,0,0]]) |
||
+ | G = make_G(names[:3], x) |
||
+ | print G |
||
+ | #G = {0:{1}, 1:{0,4, 1}, 2:{5}, 3:{5}, 4:{1}, 5:{2,3} } |
||
+ | #l = outp_graph(names, G) |
||
+ | #print l |
||
+ | </source> |
Latest revision as of 10:04, 29 September 2014
Reading:
- Beginning Python, Chapters 7-8 and 16 (on Testing)
- Bioinformatics Code Design
Class 1 and 2: Code walkthrough
- Structuring a list min/max problem over multiple timescales
- Building and using a structured nearest neighbor graph
Example Codes
List Min/Max
See CompSciWeek1.
Using Graphs
<source lang="python">
- !/usr/bin/env python
from numpy import array, sum, reshape
- input - list of names
- - (3n)x 3 coordinate array
- output - Graph G
def make_G(names, x):
assert x.shape[0]%3 == 0 assert x.shape[1] == 3 assert len(names) == len(x)
print x.shape n = len(x)/3 G = {} D2 = sum((reshape(x, (1,3*n,3)) - reshape(x, (3*n,1,3)))**2, -1) print D2 # add bonds to G! for i,n in enumerate(names): print i,n G[i] = set() if n != 'O': continue for j,m in enumerate(names): print j,m if m != 'H': continue # all O-H distances, 1 at a time if D2[i,j] < 1.1: G[i].add(j) G[j].add(i) # Question for class: why does this cause an error? return G
- input - list of names (1 O per 2 H-s)
- - graph from atom number to atom numbers
- output - list of atom numbers ordered O, H, H, O, H, H, ...
def outp_graph(names, G):
out = [] for i,n in enumerate(names): if n != "O": continue out.append(i) # 'O' number bonds = G[i] assert len(bonds) == 2, "Bad number (%d) of O-bonds."%(len(bonds))
for j in bonds: # 'H' numbers out.append(j) return out
names = ['H', 'O', 'H', 'H', 'H', 'O'] x = array([[1,0,0],[0,0,0],[-1,0,0]]) G = make_G(names[:3], x) print G
- G = {0:{1}, 1:{0,4, 1}, 2:{5}, 3:{5}, 4:{1}, 5:{2,3} }
- l = outp_graph(names, G)
- print l
</source>