Curriculum: Data Science & Machine Learning for Biotechnology Professionals

Curriculum & Course Descriptions

This course offers an interdisciplinary approach to computer programming by exploring programming concepts and techniques in the context of biology, chemistry and biochemistry. The course aims to teach students how to use programming as a tool to solve complex problems in various domains, to analyze and visualize data and to automate repetitive tasks. Students will learn fundamental programming concepts such as variables, loops, conditionals and functions using Python. No previous coding experience is needed. The course will also emphasize good programming practices, including code organization, documentation, testing and debugging. By the end of the course, students will be able to apply their programming skills to computational biology problems, communicate effectively with experts from different fields, and appreciate the role of programming in interdisciplinary research and innovation.  

Course Learning Outcomes (CLO)

When a student has completed this course, they will be able to…

  • Define and implement the basic building blocks of programming with Python (variables, control statements, loops, functions and more). Recognize and execute the essential scientific python programming tools like google Colab, Numpy and Pandas.
  • Interpret computational problems and solve them in python.
  • Distinguish how scientists use programming to work with large datasets and to simulate scientific phenomena.

Instructor Contact

Jennifer Nelson  
kerekesj@sfsu.edu

This course focuses on two key areas of data science: data structures and data visualization. Students will learn how to organize and manipulate data using various data structures such as arrays, lists, trees and graphs. They will also learn about the importance of data visualization in data analysis and exploration and learn how to use popular visualization tools such as Matplotlib and Seaborn. Throughout the course, students will work on hands-on projects that allow them to apply their knowledge to real-world datasets and gain practical experience in using data structures and visualization techniques to analyze and communicate data effectively. By the end of the course, students will be able to select and apply appropriate data structures and visualization techniques for various data types and analysis goals and communicate their findings through effective visualizations.

Prerequisites: CSC 9005 

Course Learning Outcomes ("CLOs") 

When a student has completed this course, they will be able to…

  • Create and modify data structures for data ingestion and visualization. Learn the fundamentals of data structures like lists, tuples, dictionaries and trees to create and manipulate data structures in Python. Understand how to use data structures to represent and visualize data using appropriate tools and libraries. (CLO I)
  • Develop fast and efficient ways for data acquisition and processing algorithms. Understand the basics of data processing algorithms and how to optimize them. Develop skills to identify bottlenecks in data processing pipelines and how optimize them. (CLO II)  
  • Execute data science applications through use cases relevant to daily problems. Learn how to collect and preprocess data for specific use cases through the development of Object-oriented programming techniques. Develop methods to store, process and analyze data such as FASTA data for use case scenarios. (CLO III)

Instructor Contact 

Hossein R. Saray
rsaray@sfsu.edu 

This course provides an introduction to machine learning and data science, with a focus on their applications in personalized medicine. Students will learn the basic concepts and computational techniques of machine learning and data science, including linear and logistic regression, classification, decision trees, random forests and gradient-boosted trees. Throughout the course, students will work on projects that apply machine learning and data science techniques to medical data, such as electronic health records, and genomics data, to identify and improve patient care. We will work on patient or health record data and genomic data to predict heart disease, Alzheimer’s and antibiotic resistance. We will also discuss ethical questions regarding machine learning and medical research, including questions concerning privacy and racism. By the end of the course, students will have developed a practical understanding of machine learning and data science and their applications in personalized medicine, as well as the ability to critically evaluate the results of machine learning models and communicate their findings effectively to different stakeholders.  

Prerequisites: CSC 9006

Course Learning Outcomes ("CLOs") 

When a student has completed this course, they will be able to…

  • Implement, compare and contrast 5 common supervised Machine Learning Models and describe and interpret their results – decision trees, boosted trees, random forests, regression and deep learning (CLO I) 
  • Recognize and describe common types of genetic units such as gene, DNA, chromosome, polymorphism. Compare and contrast the use of common types of genomic data such as whole genome sequences, alignments, short reads, SNPs. Apply machine learning and data science to the genomic data and describe how machine learning and genomic data can be used to diagnose patients, choose treatments and predict outcomes. (CLO II)
  • Apply and explain data science and machine learning related code in Python. Write code to use common packages such as pandas, sklearn and matplotlib to tabular datasets. Write code to implement basic exploratory data science approaches to tabular data, specifically summarizing data, visualizing data and basic data wrangling. Implement methods to deal with missing data. Describe and practice methods to overcome blockers, frustration and troubleshooting. (CLO III)
  • Use standard methods (written and visual) for communication of data sets (genetic and non-genetic), machine learning models, results, interpretation and caveats. Create reports and plots that are standard in the field. Annotate code so others can use it. (CLO IV)
  • Recognize and describe possible ethical issues around machine learning and genetics in medicine such as privacy, fairness, accountability and racism. Describe and provide examples of FAIR data (Findability, Accessibility, Interoperability and Reuse of digital assets) (CLO V)  

Instructor Contact 

Andrew Scott
ats@sfsu.edu 

This course provides hands-on learning opportunities through a term project that applies the classic machine learning (ML) algorithms and techniques to a bio/biochemistry problem of student’s choosing. The skills in computer programming, data visualization, and machine learning that the students were introduced to in prior three courses will be reinforced and mastered in this course. Students will learn how to frame bio/biochemistry problems as applied ML projects, how to develop ML solutions and iteratively improve them, how to evaluate the ML models, and how to develop an implementation plan for such a project. This course is designed for students who have completed an introductory course in machine learning or have equivalent experience.

Prerequisites: CSC 9007

Course Learning Outcomes ("CLOs") 

When a student has completed this course, they will be able to…

  • Identify and frame real-life problems with substantial scope and complexity as an applied machine learning project (CLO I)
  • Utilize open-source toolkits, packages and libraries for machine learning algorithms, data visualization, and general programming for a project (CLO II)
  • Measure, compare and analyze the performance of the implemented machine learning solutions (CLO III)

Instructor Contact 

Anagha Kulkarni
ak@sfsu.edu

This course is designed for students who have a solid understanding of machine learning concepts and techniques, and are interested in their applications to medical image analysis. The development and integration of AI tools aims to optimize a radiologist’s workflow and help prioritize patient care through reducing workloads, improving reproducibility and potentially enabling discovery of new quantitative biomarkers. In addition to clinical practice, AI can also support drug development processes/clinical trials in a variety of ways. This course explores the application of state-of-the-art deep learning models to biomedical image analysis: the task of identifying objects such as features within an image, segmentation of those objects and classifying images according to disease type. The course introduces key medical imaging technologies and data types, begins with an overview of topics central to medical image analysis and deep learning based image analysis and culminates in two hands-on case studies. Through hands-on projects, students will gain practical experience in developing and evaluating machine learning models for medical image analysis, as well as addressing challenges such as limited data, class imbalance and interpretability. By the end of the course, students will have a comprehensive understanding of advanced machine learning techniques and their applications in medical image analysis, as well as the ability to critically evaluate and improve the performance of machine learning models in this domain.  

Prerequisites: CSC 9008 

Course Learning Outcomes ("CLOs") 

When a student has completed this course, they will be able to…

  • Learn the fundamentals of biomedical imaging with focus on different imaging technologies  (X-ray, CT, PET, MRI, US) and their applications in clinical practice and in clinical research. Students should be made cognizant of current challenges and some ways that machine learning can address them (CLO1). 
  • Understand the basics of image processing such as slices and 3D volumes, Regions of interest, overlays and masks, segmentation basics (CLO2). 
  • Understand why deep learning/CNNs are specifically useful for medical imaging problems and what kinds of problems are we currently using deep learning for in medical imaging (e.g. diagnosis, prognosis, segmentation, automatic labeling and image retrieval, quantifying change) (CLO3)
  • Understand the medical images in 2D vs. 3D, and different file formats (dicom, nifti) and unique challenges in applying CNNs to them (CLO4)
  • Develop application of state-of-the-art deep learning models to biomedical image analysis: the task of identifying objects such as features within an image, segmentation of those objects and classifying images according to disease type (CLO5).

Instructor Contact 

Ilmi Yoon
ilmi@sfsu.edu 

Document Readers are required to view documents.