Bi/BE/CS 183, Winter 2024

From Murray Wiki
Jump to navigationJump to search

Introduction to Computational Biology and Bioinformatics


  • Richard Murray (CDS/BE)
  • Lectures: MWF 11-11:55a, room TBD
  • Office hours: Wed, 3-3:45 pm, Annenberg treehouse lounge

Teaching Assistants

  • Tara Chari, Meichen Fang
  • Office hours: Mon/Tue 4-5 pm, Chen 240A

This is the course homepage for Bi/BE/CS 183, Winter 2024. This course closely follows the Winter 2023 course.

Catalog Description

Bi/BE/CS 183. Introduction to Computational Biology and Bioinformatics. 9 units (3-0-6): second term. Prerequisites: Bi 8, CS 2, Ma 3; or BE/Bi 103 a; or instructor's permission. Biology is becoming an increasingly data-intensive science. Many of the data challenges in the biological sciences are distinct from other scientific disciplines because of the complexity involved. This course will introduce key computational, probabilistic, and statistical methods that are common in computational biology and bioinformatics. We will integrate these theoretical aspects to discuss solutions to common challenges that reoccur throughout bioinformatics including algorithms and heuristics for tackling DNA sequence alignments, phylogenetic reconstructions, evolutionary analysis, and population and human genetics. We will discuss these topics in conjunction with common applications including the analysis of high throughput DNA sequencing data sets and analysis of gene expression from RNA-Seq data sets.

Lecture Schedule

Date Topic Reading Homework
Week 1

3 Jan
5 Jan

Course Introduction
  • Overview of computational biology
  • Logistics for the course
  • Overview of scRNA-seq
  • Wi 2024 lecture slides: Wed, Fri
  • Wi 2023 lecture slides:
    • Lecture 1: Introduction to computational biology of single-cell RNA-seq
    • Lecture 2: Single-cell RNA-seq technology
HW #1

Out: 3 Jan
Due: 10 Jan
Solns (Caltech only)

Week 2

8 Jan
10 Jan
12 Jan

Correlation and regresssion
  • Linear and logistic regression, least squares
  • Random variables, covariance, correlation
  • Exploratory data analysis
HW #2

Out: 10 Jan
Due: 17 Jan
Solns (Caltech only)

Week 3

15 Jan
17 Jan
19 Jan

Dimensionality reduction
  • Singular value decomposition (SVD), principal components analysis (PCA)
  • Clustering and data visualization (PCA, t-SNE, UMAP)
  • Wi 2024 lecture slides: Wed, Fri
  • Wi 2023 lecture slides:
HW #3

Out: 17 Jan
Due: 24 Jan
Solns (Caltech only)

Week 4

22 Jan
24 Jan
26 Jan*

Expectation maximization (EM)
  • Maximum likelihood estimation (MLE)
  • Clustering via EM
  • Gaussian mixture models
  • Read alignment via EM
HW #4

Out: 24 Jan
Due: 31 Jan
Solns (Caltech only)

Week 5

29 Jan
31 Jan
2 Feb

Read alignment and modeling counts
  • String algorithms, suffix trees
  • Modeling counts, zero-inflated negative binomial distributions
HW #5

Out: 31 Jan
Due: 7 Feb
Solns (Caltech only)

Week 6

5 Feb
7 Feb
9 Feb

Transformation of non-normal distributions
  • Generalized linear models (GLMs)
  • Variance stabilization, normalization, log1p transformations
HW #6

Out: 7 Feb
Due: 14 Feb
Solns (Caltech only)

Week 7

12 Feb
14 Feb
16 Feb

Differential analysis
  • Differential expression
  • Hypothesis testing, multiple testing
  • Selective inference, aggregation
HW #7

Out: 14 Feb
Due: 21 Feb
Solns (Caltech only)

Week 8

19 Feb
21 Feb
23 Feb*

Hidden Markov models
  • Parameter estimation (Baum-Welch)
  • State estimation (Viterbi algorithm)
  • Dynamic programming
  • Global and local alignment (Needleman-Wunsch, Smith-Waterman)
HW #8

Out: 21 Feb
Due: 28 Feb
Solns (Caltech only)

Week 9

26 Feb
28 Feb
1 Mar

Markov processes
  • Continuous-time Markov chains (CTMC)
  • Stochastic simulation algorithm (SSA)
  • Chemical master equation (CME)
  • Bursty gene expression
HW #9

Out: 28 Feb
Due: 6 Mar
Solns (Caltech only)

Week 10

4 Mar
6 Mar
8 Mar

Machine learning
  • Auto-encoders, function approximation, backpropagation, classification
  • Large language models, RoseTTAFold, ESM2
Final (GradeScope)

Out: 8 Mar
Due: 15 Mar


The final grade will be based on homework sets and a final exam:

  • Homework (70%): Homework sets will be handed out weekly and due on Wednesdays by 11 am using GradeScope. Each student is allowed up to two extensions of no more than 2 days each over the course of the term. Homework turned in after Friday at 11 am or after the two extensions are exhausted will not be accepted without a note from the health center or the Dean. Python code is considered part of your solution and should be printed and turned in with the problem set (whether the problem asks for it or not). For Colab notebooks, first use the Runtime --> Run all command to execute all code cells, then use File --> Print to save the notebook and the outputs to a pdf.
The lowest homework set grade will be dropped when computing your final grade.
  • Final exam (30%): The final exam will be handed out on the last day of class (8 Mar) and due at the end of finals week. It will be an open book exam and computers will be allowed.

Collaboration Policy

Collaboration on homework assignments is encouraged. You may consult outside reference materials, other students, the TA, or the instructor, but you cannot consult homework solutions from prior years and you must cite any use of material from outside references. All solutions that are handed in should be written up individually and should reflect your own understanding of the subject matter at the time of writing. Any computer code that is used to solve homework problems is considered part of your writeup and should be done individually (you can share ideas, but not code).

No collaboration is allowed on the final exam.

Course Text and References

There is no course textbook, but the slides from the prior year's course serve as a reference for much of the material in the course:

The following additional references may also be useful:

  • TBD
  • TBD

Note: the only sources listed here are those that allow free access to online versions. Additional textbooks that are not freely available can be obtained from the library.