Saturday, May 22, 2010

What is Python and why python in Bioinformatics?

What is Python and why python in Bioinformatics?

Python is a general-purpose high-level programming language whose design philosophy emphasizes code readability. Python aims to combine "remarkable power with very clear syntax", and its standard library is large and comprehensive. Its use of indentation for block delimiters is unusual among popular programming languages.


Because scientists have long relied on the open availability of each other's research results, it was only natural that they would turn to Open Source software when it came time to apply computer processes to the study of biological processes. One of the first Open Source languages to gain popularity among biologists was Perl. Perl gained a foothold in bioinformatics based on its strong text processing facilities, which were ideally suited to analyzing early sequence data. To its credit, Perl has a history of successful use in bioinformatics and is still a very useful tool for biological research.
In comparison to Perl, Python is a relative newcomer to bioinformatics, but is steadily gaining in popularity. A few of the reasons for this popularity are the:
  • Readability of Python code
  • Ability to development applications quickly
  • Powerful standard library of functionality
  • Scalability from very small to very large programs
The Python language was designed to be as simple and accessible as possible, without giving up any of the power needed to develop sophisticated applications. Python's clean, consistent syntax leaves it free from the subtleties and nuances that can make other languages difficult to learn and programs written in those languages difficult to comprehend.
Python's dynamic nature adds to its accessibility. For example, Python doesn't require you to declare variables before you use them, and the same variable can refer to objects of different types over the course of its existence. Python can be also be used interactively, allowing you to familiarize yourself with the language of any Python modules in an interactive session where each command produces immediate results.
Python also has excellent support for the object-oriented style of programming. The basic idea is that object-orientation often provides a better way to organize the data and functionality within your programs. As the data and analytical techniques used in bioinformatics have become more complex, the value of object-oriented language features has risen.
In addition, Python integrates well with systems written in other languages, such as C, C++, Java and Fortran. One of the main benefits of C is speed. When a programmer needs an algorithm to run as fast as possible, they can code it in C or C++ and make it available to Python as an extension module. To the programmer, these are indistinguishable from pure Python modules. Similar utilities exist that make the large body of scientific algorithms coded in Fortran accessible to Python programs.
Java has become popular as a cross-platform and Web development language. The Python interpreter is now available in two variations: one version written in C, and the other version, known as Jython, written in Java. Jython allows Java programmers to write programs using the Python syntax and dynamic language features, and it allows Python programmers to use existing code developed in Java. These are just a few examples of the many ways Python is able to leverage and extend existing code written in other languages.
So while Perl is more well established in the bioinformatics community, many biologists and bioinformaticians are also turning to Python as it gains in popularity.
Source:
Beginning Python for Bioinformatics

you can download this from :
http://groups.google.com/group/bio-bio-1/files?upload=1

Saturday, March 27, 2010

Interesting Books on BioInformatics

http://biowww.net/biobooks_1_sequence-analysis.html


Courses in the South Korean University .....
http://bi.snu.ac.kr/Courses/bio02/bio02_2.html


South Korea is doing great in R&D in BioInformatics....

Thursday, March 12, 2009

Some Important Links of Cell Biology

I think understanding bioinformatics depends on the concepts of cell biology (cell anatomy, cell function, chromosome structure, etc.. ). So, here i provide some important links from where you can get the concept of cell biology :

http://www.johnkyrk.com/index.html
http://www.biology-online.org/1/1_cell.htm

If anyone interested on "Origins of Life on Earth"

http://www.biology-online.org/10/1_first_life.htm

Future i will discuss the hot topics of cell biology which need to understand Basic Bioinformatics.

Wednesday, March 11, 2009

Domain: a compact local semi-independent folding unit presumed to have arisen by gene fusion & gene duplication events.Domains are not formed from contiguous region of amino acid sequence.They may be discrete entities joined by a flexible linking region of the gene & may also exchange chains with domain neighbours.The combination of domains within a protein determines its overall functions & stable structure.

Analogues: non-homologus proteins that have similar folding site which are believed to have arisen through converges evolution.

ORF: Open Reading Frame.A series of DNA codon including a 5' initiation codon & a 3' termination codon that encodes a putative( known) gene.A DNA sequence must contain a translation start codon(usually ATG) & not exhibit any of the stop codons (usually TAA,TAG,TGA) in phase with the ATG for quite some length(at least 300 nuclotides seperate the start & stop codon).

The ORFing protocols can probably correctly identify 85% of the protein coding regions.There are a variety of situations that frequently occur where a more sophisticated approach need to use.One such approach is taken by GenMark which include
  • finding very short proteins,
  • resolving ambiguous cases where overlapping ORFs are predicted in different reading frames.
  • to pinpoint the exact start codon(most distal ATG is not always the correct one).
Six Frame Translation: Translation of a stretch of DNA taking into account three forward translations and three reverse translations arising from three possible reading frames of an uncharacterised stretch of DNA.
Destination of Proteins:

Proteins have to reach right destination in the organism or within the cell to correctly accomplish their functions.As protein is translated,the peptide chain may expose to a variety of highly specific sequence signals.
One such signal is "ZIP code" which is used by the cell to direct the protein to the appropriate compartment (in or out of the cell).This process always involves the transport of the protein across one or several membranes & is also referred to as translocation.The activity or destination of a protein-
  • getting attached to the cell membrane,
  • being secreted outside the cell,
  • being transported into the periplasm (incase of bacteria),
  • being transported to the mitochondria or any other organelle,
  • being transported into the cell nucleus.
So it is important to know the final compartment of a protein where it ends up to understand its function.This proven/predicted information is recorded in protein databases like swiss-Prot,PDB etc.

A newly synthesized peptide chain is converted into a functional protein by the folding of this chain into a compact & stable 3-D structure.The final structure of a protein generally consists of several relative independent domains.

Most natural proteins are made of combination of 1 to 10 domains picked from a set of a few thousands.The domains are identifiable by their scaffold sequence signatures (the motif in the protein means amino acid texts that remain recognisable despite a zillion years of divergent evolution).The domain architecture underlying a particular protein sequence provides hints about the possible 3-D structure of it & its potential biochemical or cellular functions.

The recognition & definition of protein domains is a major research topic of Bio-informatics.

Tuesday, March 10, 2009

Steps of Producing matured Proteins:

  • Replication- Formation of Single strand of DNA from double strands of DNA.
  • Transcription- Formation of primary transcript from single strand of DNA.
Primary transcripts consist of introns (non-coding regions of DNA sequences which contains prediction of promoter regions,regulatory elements and protein binding sites) and exons (coding regions of DNA sequences which are usually small-150 bp long on average and here the sequence of their splice sites are available).
  • Splicing- Formation of mRNA from primary transcript by removing introns.
  • Translation- Native/nascent protein is formed from mRNA through this process.
  • Post translational modification- matured proteins are formed.
Protein maturation/post translation modification include any combination of the following stages:
- cuts within the amino acid chain,
- removal of fragment of the amino acid chain (eg. insulin),
- chemical modification of specific amino acids (eg. methylation),
-addition of lipid molecules (eg. myristoylation),
- addition of glycosidic(sugar) molecules (eg. glycosylation).