Mining protein interaction data and its context from the scientific literature
PhD studentship in bioinformatics/text mining, University of Manchester

re-posted from,21.05.2010


The School of Computer Science invites applications for a four year BBSRC-funded CASE studentship commencing in the academic year 2010/11.
The studentship is open to UK/EU applicants and will pay tuition fees in addition to a starting stipend of
   £ 15,790 pa for UK students,
   £ 13,000 pa for EU student.
It will also involve a research placement with the industrial CASE partner, Pfizer Global R&D in Sandwich, Kent.

The project will involve research on the context of protein interaction data from the scientific literature. The main archive of life sciences literature currently contains more then 17 million references and grows by approximately 2,000 articles every day. This biomedically relevant information is invaluable and represents a rich source of knowledge. However its current, let alone future size, is rendering it virtually impossible for individuals scientists to keep the pace with publications in their own area, let alone related ones.
This has led to the generation of secondary databases that mine specific information from the published literature. For example, much emphasis has been placed on using text mining (often manually) to identify protein interactions. However, little attempt is made to capture the context of such information, how reliable it is, what is the nature of interaction etc. This project will study the way findings, experiments and knowledge about protein interactions is presented in the literature, and in particular how contextual information that details a protein interactions are encoded and presented. To do this we will implement a state of the art text mining framework to extract from full-text articles, link and contrast protein interaction contextual information with data in other (structured) resources to support informed decisions for understanding the complexity of interactions and identification of potential drug targets.
To be relevant to the industrial partner (Pfizer R&D), focus will be placed on pharmaceutically relevant protein interaction data sets, for example, pathogens such as HIV, hepatitis viruses, malaria etc. The knowledge extracted will be characterised by quantitative measures that may be indicative of its quality or relevance for a specific interaction (bibliometrics such as number of citations and mentions; peaks and changes over time; association with specific entities such as experimental methods, model systems, drug associations, outcomes, etc.). Importantly, the general framework developed for placing biomedical 'facts' in context will be applicable to other text mining domains.

Qualifications and experience

Applicants should ideally have experience in computational biology, bioinformatics, computer science or a related subject area. Knowledge of a programming language and text and/or data mining would be a distinct advantage.

For details on how apply go to link .
If you require further details, please contact:
   Dr. David Rovertson , Faculty of Life Sciences, University of Manchester or
   Dr. Goran Nenadic , School of Computer Science, University of Manchester.