BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Anders Ståhlberg & Serik Sagitov (Chalmers & University of Gothen
 burg)
DTSTART:20221006T131500Z
DTEND:20221006T140000Z
DTSTAMP:20260422T155019Z
UID:gbgstats/2
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/gbgstats/2/"
 >Counting molecular identifiers in sequencing using a multitype branching 
 process with immigration</a>\nby Anders Ståhlberg & Serik Sagitov (Chalme
 rs & University of Gothenburg) as part of Gothenburg statistics seminar\n\
 nLecture held in MVL14.\n\nAbstract\nDetection of extremely rare variant a
 lleles\, such as tumour DNA\, within a complex mixture of DNA molecules is
  experimentally challenging due to sequencing errors. Barcoding of target 
 DNA molecules in library construction for next-generation sequencing provi
 des a way to identify and bioinformatically remove polymerase induced erro
 rs. During the barcoding procedure involving $t$ consecutive PCR cycles\, 
 the DNA molecules become barcoded by unique molecular identifiers (UMI). D
 ifferent library construction protocols utilise different values of $t$. T
 he effect of a larger $t$ and imperfect PCR amplifications is poorly descr
 ibed. \n\nThis paper proposes a branching process with growing immigration
  as a model describing the random outcome of $t$  cycles of PCR  barcoding
 . Our model discriminates between five different amplification rates $r_1$
 \, $r_2$\, $r_3$\, $r_4$\, $r$ for different types of molecules associated
  with the PCR barcoding procedure. We study this model by focussing on $C_
 t$\, the number  of clusters of molecules sharing the same \nUMI\, as well
  as  $C_t(m)$\, the number of UMI clusters of size $m$. Our main finding i
 s a remarkable asymptotic pattern valid for moderately large $t$. It turns
  out that \n$E(C_t(m))/E(C_t)\\approx 2^{-m}$ for $m=1\,2\,\\ldots$\, rega
 rdless of the underlying parameters $(r_1\,r_2\,r_3\,r_4\,r)$. The knowled
 ge of the quantities $C_t$ and $C_t(m)$ as functions of the experimental p
 arameters $t$ and $(r_1\,r_2\,r_3\,r_4\,r)$ will help the users to draw mo
 re adequate conclusions from the outcomes of different sequencing protocol
 s.\n
LOCATION:https://researchseminars.org/talk/gbgstats/2/
END:VEVENT
END:VCALENDAR
