BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Matthew Lee (University of Bristol)
DTSTART;VALUE=DATE-TIME:20201015T130000Z
DTEND;VALUE=DATE-TIME:20201015T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/1
DESCRIPTION:Title: EpiViz: an implementation of Circos plots for epidemiologists\nby Matthew Lee (University of Bristol) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\n\nAbstract\nEpidemiology studies predominantly foc
us on single exposure and single outcome associations. However\, biologica
l pathways involve numerous processes and identifying meaningful intermedi
ate associations that can be taken forward for further analysis is complex
. This is particularly the case for studies involving metabolomics data\,
as effects rarely occur in isolation. Gaining global overview of hundreds
of exposure/outcome associations may therefore aid downstream analyses. Vi
sual inspection is one of the main modes of understanding global exposure/
outcome associations. EpiViz is a wrapper that makes producing Cricos plot
s simple and efficient for those new to programming and data visualisation
.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Godwin Osuntoki (University of Essex)
DTSTART;VALUE=DATE-TIME:20201022T130000Z
DTEND;VALUE=DATE-TIME:20201022T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/2
DESCRIPTION:Title: Bayesian Analysis of chromosomal interactions in Hi-C data using
the hidden Markov random field model\nby Godwin Osuntoki (University
of Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\n\nAbstra
ct\nThere are different biological methods that have been developed over t
he years for analysis of the 3D structure of the DNA. Few computational an
d statistical methods have\, however\, been developed to analysis data gen
erated using the Hi-C method. We follow statistical methodology to explore
the Hi-C data. The Hi-C data is well suited to be analyzed using a finite
mixture model. The Potts model\, a hidden Markov random field model\, was
employed to analyze the hidden (latent) components. The hidden components
through the Potts model can be categorized into k components (k = 2\,3…
\,K). Using the Metropolis-within-Gibbs approach to analyze the data\, the
proposed method was able to detect interactions (short and long range) an
d loops. A large part of the significant interactions that we detect are f
ound within Topological Associated Domains\, which is one of the 3D struct
ures known to occur in Hi-C data.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Nosheen Faiz (University of Essex)
DTSTART;VALUE=DATE-TIME:20201105T140000Z
DTEND;VALUE=DATE-TIME:20201105T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/4
DESCRIPTION:Title: Assessing how feature selection and hyper-parameters influence o
ptimal trees ensemble and random projection\nby Nosheen Faiz (Universi
ty of Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\n\nAbs
tract\nOur work investigates the effect of feature selection on three meth
ods: Random Forest (Breiman 2001)\, Optimal Trees Ensemble (Khan et al 201
6) and Random Projection (Canning and Samworth 2017) in high dimensional s
ettings. To this end\, LASSO has been considered for selecting the most im
portant features based on training data for dimension reduction. Additiona
lly\, the influence of various hyper-parameters regulating the three metho
ds has also been assessed. Analysis on several benchmark datasets is given
to illustrate the phenomena. The results reveal that feature selection im
proves the predictive performance of the Random Forest and Random Projecti
on methods in addition to reducing the computational burden. The performan
ce of Optimal Trees Ensemble is less influenced by feature selection.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/4/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Peng Liu (University of Essex)
DTSTART;VALUE=DATE-TIME:20201112T140000Z
DTEND;VALUE=DATE-TIME:20201112T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/5
DESCRIPTION:Title: Ordering and Inequalities for Mixtures on Risk Aggregation\n
by Peng Liu (University of Essex) as part of (ED-3S) Essex Data Science Se
minar Series\n\n\nAbstract\nAggregation sets\, which represent model uncer
tainty due to unknown dependence\, are an important object in the study of
robust risk aggregation. In this talk\, we investigate ordering relations
between two aggregation sets for which the sets of marginals are related
by two simple operations: distribution mixtures and quantile mixtures. Int
uitively\, these operations ``homogenize" marginal distributions by maki
ng them similar. As a general conclusion from our results\, more ``homogen
eous" marginals lead to a larger aggregation set\, and thus more severe mo
del uncertainty\, although the situation for quantile mixtures is much mor
e complicated than that for distribution mixtures. \nWe proceed to study
inequalities on the worst-case values of risk measures in risk aggregatio
n\, which represent conservative calculation of regulatory capital. Among
other results\, we obtain an order relation on VaR under quantile mixture
for marginal distributions with monotone densities. Numerical results are
presented to visualize the theoretical results and further inspire some c
onjectures.\nFinally\, we discuss the connection of our results to joint m
ixability and to merging p-values in multiple hypothesis testing.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/5/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Tolulope Fadina (University of Essex)
DTSTART;VALUE=DATE-TIME:20210225T140000Z
DTEND;VALUE=DATE-TIME:20210225T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/6
DESCRIPTION:Title: Symmetric measures of variability induced by risk measures\n
by Tolulope Fadina (University of Essex) as part of (ED-3S) Essex Data Sci
ence Seminar Series\n\n\nAbstract\nGeneral measures of variability induced
by risk measures are investigated for their potential applications to ris
k management. We emphasize on the three classes of variability measures ge
nerated by the Value-at-Risk\, Expected Shortfall\, and the Expectiles. Th
eir properties are explored\, and we obtain a characterization result on g
eneral model spaces. Convergence properties and asymptotic normality of th
e empirical variability measures estimators are established. An applicatio
n of the variability measures to financial data is also investigated.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/6/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Ioana Olan (University of Cambridge)
DTSTART;VALUE=DATE-TIME:20201126T140000Z
DTEND;VALUE=DATE-TIME:20201126T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/7
DESCRIPTION:Title: Detecting the hierarchical structure of the cell nucleus\nby
Ioana Olan (University of Cambridge) as part of (ED-3S) Essex Data Scienc
e Seminar Series\n\n\nAbstract\nChromatin consists of DNA wrapped around h
istones and forms complex three-dimensional structures within the cell nuc
leus with various degrees of compaction. Genes have been shown to be repre
ssed by their proximity to the nuclear periphery or activated by being in
contact with special regulatory regions called enhancers. Thus the relativ
e positioning of genes and their interactions with other regions are very
important in determining whether they are expressed or not. Interactions b
etween pairs of genomic regions have been studied using assays such as Hi-
C\, which generate large matrices estimating interaction frequencies. We u
se such interaction estimates as weights in a network whose nodes are equa
lly sized genomic regions and perform nested community detection in order
to resolve the relative positioning of genomic regions of interest and mod
el the interior of the cell nucleus. Our biological model is cellular sene
scence\, a phenotype associated with dramatic changes in its chromatin int
eractions network relative to normal cells. Senescence corresponds to perm
anent cell cycle arrest and has been shown to act as a protective barrier
against tumourigenesis.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/7/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Josh Bull (University of Oxford)
DTSTART;VALUE=DATE-TIME:20201203T140000Z
DTEND;VALUE=DATE-TIME:20201203T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/8
DESCRIPTION:Title: Can maths tell us how to win at Fantasy Football?\nby Josh B
ull (University of Oxford) as part of (ED-3S) Essex Data Science Seminar S
eries\n\n\nAbstract\nFantasy Football is an online game played by millions
of people every year\, in which players attempt to predict the outcome of
football matches over the course of a season. To the surprise of everyone
(including myself)\, I was lucky enough to be crowned the winner of the 2
019-20 Fantasy Premier League\, one of the largest competitions in the UK.
As a researcher in Mathematical Oncology at the University of Oxford\, pe
ople have asked me whether I used maths to win – while I followed some s
trategies at the time\, I didn’t have any proof that they were in some s
ense mathematically optimal. However\, mathematical modelling is a tool wh
ich is capable of exploring exactly these kinds of questions: how can we i
dentify the best strategies to tackle complex problems? What types of data
are important to consider\, and how should we use them to inform our deci
sions? In this talk\, I’ll analyse how different quantitative approaches
can be used to tackle key questions in Fantasy Football\, and identify th
e strengths and weaknesses of these frameworks. Finally\, I’ll address t
he question: Can maths tell us how to win at Fantasy Football?\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/8/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Osama Mahmoud (University of Essex)
DTSTART;VALUE=DATE-TIME:20210211T140000Z
DTEND;VALUE=DATE-TIME:20210211T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/9
DESCRIPTION:Title: Slope-Hunter: A robust method for index-event bias correction in
genome-wide association studies of conditional analyses\nby Osama Mah
moud (University of Essex) as part of (ED-3S) Essex Data Science Seminar S
eries\n\n\nAbstract\nBackground: Studying genetic associations with progno
sis (e.g. survival\, subsequent events) is problematic due to selection bi
as - also termed index event bias or collider bias - whereby selection on
disease status can induce associations between causes of incidence with pr
ognosis. A current method for adjusting genetic associations for this bias
assumes there is no genetic correlation between incidence and prognosis\,
which may not be a plausible assumption.\n\nMethods: We propose an altern
ative\, the ‘Slope-Hunter’ approach\, which is unbiased even when ther
e is genetic correlation between incidence and prognosis. Our approach has
two stages. First\, we use cluster-based techniques to identify: variants
affecting neither incidence nor prognosis (these should not suffer bias a
nd only a random sub-sample of them are retained in the analysis)\; varian
ts affecting prognosis only (excluded from the analysis). Second\, we fit
a cluster-based model to identify the class of variants only affecting inc
idence\, and use this class to estimate the adjustment factor. {\\color{bl
ue} The underlying assumption of our approach is that variants affecting o
nly incidence explain more variation in incidence than any group of varian
ts with unique effects\, e.g. via same exposure\, on both incidence and pr
ognosis}.\n\nResults: Simulation studies showed that {\\color{blue} our ap
proach eliminates the bias and outperforms alternatives in the presence of
genetic correlation\, and performs as well as alternatives under no genet
ic correlation when its assumption is satisfied. We applied the ‘Slope-H
unter’ method to a study of fasting blood insulin levels (FI) conditiona
l on body mass index (BMI)\, estimated the index event bias\, and adjusted
conditional associations of the lead variants with FI. Our estimates sugg
ested that there were common causes of BMI and FI of concordant directions
of effect\, that are in-line with previously observed association between
obesity and insulin resistance.}\n\nConclusions: Our approach is unbiased
even in the presence of genetic correlation between incidence and progres
sion when the underlying assumptions hold. Bias-adjusting methods should b
e used to carry out causal analyses when conditioning on incidence.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Yanchun Bao (University of Essex)
DTSTART;VALUE=DATE-TIME:20201217T140000Z
DTEND;VALUE=DATE-TIME:20201217T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/10
DESCRIPTION:Title: Estimating mode effects from a sequential mixed-modes experimen
t\nby Yanchun Bao (University of Essex) as part of (ED-3S) Essex Data
Science Seminar Series\n\n\nAbstract\nThe large-scale household panel stud
y Understanding Society (The U.K. Household Longitudinal Study UKHLS) has\
, until recently\, used interviewers to administer its questionnaires\, bu
t is now in the process of allowing individuals to participate using the w
eb. Survey data are known to be affected by survey mode so a sequential mo
de-effects experiment was carried out on to evaluate the impact of this ch
ange on the panel. In this talk we present a novel estimator and analysis
strategy to quantify the impact of mode across a wide range of variables\,
with large mode effects on the covariance of a pair of variables used to
indicate an increased risk that statistical analyses involving this pair w
ill be affected.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/10/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rafal Kulakowski (University of Essex)
DTSTART;VALUE=DATE-TIME:20210204T140000Z
DTEND;VALUE=DATE-TIME:20210204T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/11
DESCRIPTION:by Rafal Kulakowski (University of Essex) as part of (ED-3S) E
ssex Data Science Seminar Series\n\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Yassir Rabhi (University of Essex)
DTSTART;VALUE=DATE-TIME:20201210T140000Z
DTEND;VALUE=DATE-TIME:20201210T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/12
DESCRIPTION:Title: Copulas and measures of dependence under length-biased sampling
and informative censoring\nby Yassir Rabhi (University of Essex) as p
art of (ED-3S) Essex Data Science Seminar Series\n\n\nAbstract\nLength-bia
sed data are often encountered in cross-sectional surveys and prevalent-co
hort studies on disease durations. Under length-biased sampling subjects w
ith longer disease durations have greater chance to be observed. As a resu
lt\, covariate values linked to the longer survivors are favoured by the s
ampling mechanism. When the sampled durations are also subject to right ce
nsoring\, the censoring is informative. Modelling dependence structure wit
hout adjusting for these issues leads to biased results. In this talk\, I
will present a study on copulas for modelling dependence when the collecte
d data are length-biased and account for both informative censoring and co
variate bias. I will address the nonparametric estimation of the bivariate
distribution\, copula function and its density\, and Kendall and Spearman
measures for right-censored length-biased data. The proposed estimator of
the bivariate CDF is a Hadamard-differentiable functional of two MLEs\, K
aplan-Meier and empirical CDF\, and inherits their efficiencies. Based on
this estimator\, we devise estimators for copula function and a local-poly
nomial estimator for copula density that accounts for boundary bias. In ad
dition\, I will introduce estimators for Kendall and Spearman measures. Th
e weak convergence of the estimators will also be discussed. The proposed
method is then applied to analyse a set of right-censored length-biased da
ta on survival with dementia\, collected as part of a nationwide study in
Canada.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Carolin Strobl (Universität Zürich)
DTSTART;VALUE=DATE-TIME:20201119T140000Z
DTEND;VALUE=DATE-TIME:20201119T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/13
DESCRIPTION:Title: A Statistician’s Botanical Garden - The Ideas behind Trees\,
Model-Based Trees and Random Forests\nby Carolin Strobl (Universität
Zürich) as part of (ED-3S) Essex Data Science Seminar Series\n\n\nAbstrac
t\nClassification and regression trees\, model-based trees and random fore
sts are powerful statistical methods from the field of machine learning. T
hey have been shown to achieve a high prediction accuracy\, especially in
big data applications with many predictor variables and complex associatio
n patterns (such as nonlinear and higher-order interaction effects). While
individual trees are easy to interpret\, random forests are "black box" p
rediction methods. They do\, however\, provide variable importance measure
s\, that are being used to judge the relevance of the individual predictor
variables. The aim of this presentation is to introduce the rationale beh
ind trees\, model-based trees and random forests\, to illustrate their pot
ential for high-dimensional data exploration\, e.g.\, in psychological res
earch\, but also to point out limitations and potential pitfalls in their
practical application.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Shenggang Hu (University of Essex)
DTSTART;VALUE=DATE-TIME:20221013T130000Z
DTEND;VALUE=DATE-TIME:20221013T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/14
DESCRIPTION:Title: Statistical disaggregation - a Monte Carlo approach for imputat
ion under constraints\nby Shenggang Hu (University of Essex) as part o
f (ED-3S) Essex Data Science Seminar Series\n\nLecture held in NTC.1.04.\n
\nAbstract\nStatistical disaggregation has become more and more important
for smart energy systems. A typical example of such disaggregation problem
s is to learn energy consumption for a higher resolution level (data recor
ded at higher frequency) based on data at a lower resolution (data recorde
d at lower frequency). Constrained models are often used in such problems
and they are often very useful compared to their unconstrained counterpart
s in terms of reducing uncertainty and leading to an improvement of the ov
erall performance. However\, these constrained models usually are not expr
essible as ordinary distributions due to their intractable density functio
ns which makes it hard to conduct further analysis. This paper introduces
a novel constrained Monte Carlo sampling algorithm based on Langevin diffu
sions and rejection sampling to solve the problem of sampling from constra
ined models. This new method is then applied to a statistical disaggregati
on problem for an electricity consumption dataset. Our approach provides
excellent accuracy of data imputation\, based on our simulation studies an
d data analysis. The new method is also justified theoretically.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/14/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof Christian Martin Hennig (University of Bologna\, UCL)
DTSTART;VALUE=DATE-TIME:20221103T140000Z
DTEND;VALUE=DATE-TIME:20221103T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/15
DESCRIPTION:Title: Advances in using cluster analysis for species delimitation
\nby Prof Christian Martin Hennig (University of Bologna\, UCL) as part of
(ED-3S) Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\
nAbstract\nBiological species are often delimited based on genetic multilo
cus data using methods for inferring phylogenetic trees or model- or dista
nce-based cluster analysis. A major problem here is that genetic dissimila
rity does not only arise from separated species\, but also if subpopulatio
ns of a species live in geographically distant areas without genetic excha
nge. In any case\, be it using partitioning cluster analysis or hierarchic
al trees\, it is a hard problem to decide the number of species\, and whet
her groups that are candidates for being species actually belong together.
I will discuss some the use of some new approaches for clustering and est
imating the number of clusters for this problem\, focusing particularly on
testing whether observed genetic heterogeneity within a species candidate
group can be explained be geographical distance rather than consisting of
separate species. This requires hypothesis testing in a distance-distance
regression model. I will also discuss the integration of such a testing r
outine in a fully automated method for species delimitation.\n\nReference\
n\nHausdorf\, B\, Hennig\, C. Species delimitation and geography. Mol Ecol
Resour. 2020\; 20: 950– 960. https://doi.org/10.1111/1755-0998.13184\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/15/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Johan van der Molen (University of Cambridge)
DTSTART;VALUE=DATE-TIME:20221124T140000Z
DTEND;VALUE=DATE-TIME:20221124T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/16
DESCRIPTION:Title: Dirichlet process mixture inconsistency for the number of compo
nents: how worried should we be in practice?\nby Dr Johan van der Mole
n (University of Cambridge) as part of (ED-3S) Essex Data Science Seminar
Series\n\nLecture held in STEM 3.1.\n\nAbstract\nBayesian nonparametric mi
xture models are widely used for model-based clustering due to their flexi
bility and conceptual simplicity\, as well as the availability of efficie
nt sampling methods for performing inference. However\, recent work has es
tablished that such models have undesirable asymptotic properties regardin
g the estimation of the number of clusters. For instance\, Dirichlet Proce
ss Mixtures (DPMs) have been shown to be inconsistent for the number of cl
usters\, and overestimation of the number of clusters has been observed in
practice for finite samples. Finite mixtures with a prior on the number o
f components - also known as Mixtures of Finite Mixtures (MFMs) - have bee
n suggested as an asymptotically consistent alternative\, but the effects
of model misspecification can still result in asymptomatic inconsistency a
nd poor estimation of the number of clusters in practice. \n\nHere we spec
ifically focus on estimation of the number of clusters in Bayesian nonpara
metric mixtures in practice\, including the impact of Markov chain Monte C
arlo (MCMC) post-processing algorithms for summarisation and identificatio
n of a final representative summary clustering. We consider practical scen
arios of low to moderate dimension\, through both simulation studies and a
pplications to real biomolecular data. In the situations we consider\, we
confirm that even when the parametric form of the mixture component distri
butions is correctly specified\, DPMs lead to mild overestimation of the n
umber of clusters for finite samples. However\, we also demonstrate that t
his can be corrected by common summarisation methods\, suggesting that app
lications of DPMs in practice may be more robust than the theory might sug
gest. We show that\, for both DPMs and MFMs\, mixture component density mi
sspecification typically leads to more dramatic overestimation\, with DPMs
providing slightly worse estimates than MFMs\, but with the common patter
n of “true” clusters in the data being split into smaller subclusters
due to additional mixture components being required to flexibly capture fe
atures of the data inadequately described by the misspecified models. We c
onsider implications for high-dimensional data analysis\, in which simplif
ying assumptions that are commonly made in practice for computational trac
tability (e.g. assuming a diagonal covariance matrix for Gaussian mixture
components) are also expected to result in model misspecification. As part
of our work\, we compare popular MCMC post-processing algorithms for iden
tifying a final summary clustering\, and show that although some of them h
ave a positive impact on results\, others can introduce severe overestimat
ion of the number of clusters\, even when the underlying posterior distrib
ution from which samples are being drawn is centred on the true number of
clusters. This is joint work with Yannis Chaumeny\, Paul Kirk\, Anthony Da
vidson.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/16/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Alexei Vernitski (University of Essex)
DTSTART;VALUE=DATE-TIME:20221027T130000Z
DTEND;VALUE=DATE-TIME:20221027T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/17
DESCRIPTION:Title: Using machine learning to solve mathematical problems and to se
arch for examples and counterexamples in pure maths research\nby Dr Al
exei Vernitski (University of Essex) as part of (ED-3S) Essex Data Science
Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nOur recent resea
rch can be generally described as applying state-of-the-art technologies o
f machine learning to suitable mathematical problems. We use both reinforc
ement learning and supervised learning (underpinned by deep learning). As
to mathematical problems we consider\, they include learning to untangle a
braid (this problem is not unlike the problem of solving the Rubik cube)\
, learning to find the parity of a permutation (as compared to the classic
al problem of deep learning of learning to find the parity bit of a binary
array)\, comparing mathematical mistakes made by artificial intelligence
with those made by human mathematicians\, etc.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/17/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Qiuyi Hong (University of Essex)
DTSTART;VALUE=DATE-TIME:20221117T140000Z
DTEND;VALUE=DATE-TIME:20221117T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/18
DESCRIPTION:Title: A Bilevel Game-TheoreDc Decision-Making Framework for Strategic
Retailers in Both Local and Wholesale Electricity Markets\nby Qiuyi H
ong (University of Essex) as part of (ED-3S) Essex Data Science Seminar Se
ries\n\nLecture held in STEM 3.1.\n\nAbstract\nIn this talk we propose a b
ilevel game-theoretic model for multiple strategic retailers participating
in both wholesale and local electricity markets while considering custome
rs’ switching behaviours. At the upper level\, each retailer maximizes i
ts own profit by making optimal offering decisions in the retail market an
d bidding decisions in the day-ahead wholesale (DAW) and local power excha
nge (LPE) markets. The interaction among multiple strategic retailers is f
ormulated using the Bertrand competition model. For the lower level\, ther
e are three optimisation problems. First\, the customers’ welfare maximi
sation problem with their switching behaviors is formulated to capture the
demand responses from customers. Second\, a market-clearing problem is fo
rmulated for the independent system operator (ISO) in the DAW market. Thir
d\, a novel LPE market is developed for retailers to facilitate their powe
r balancing. In addition\, the bilevel multi-leader multi-follower Stackel
berg game forms an equilibrium problem with equilibrium constraints (EPEC)
problem\, which is solved by the diagonalization algorithm. Numerical res
ults demonstrate the feasibility and effectiveness of the EPEC model and t
he importance of modeling customers’ switching behaviors. We corroborate
that incentivising customers’ switching behaviors and increasing the nu
mber of retailers facilitates retail competition\, which results in reduci
ng strategic retailers’ retail prices and profits. Moreover\, the relati
onship between customers’ switching behaviors and welfare is reflected b
y a balance between the electricity purchasing cost (i.e.\, electricity pr
ice) and the electricity consumption level.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/18/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mateo Salles (University of Essex)
DTSTART;VALUE=DATE-TIME:20230209T140000Z
DTEND;VALUE=DATE-TIME:20230209T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/19
DESCRIPTION:Title: Supervised Learning for Untangling Braids\nby Mateo Salles
(University of Essex) as part of (ED-3S) Essex Data Science Seminar Series
\n\nLecture held in STEM 3.1.\n\nAbstract\nUntangling a braid is a typical
multi-step process\, and reinforcement learning can be used to train an a
gent to untangle braids. Here we present another approach. Starting from t
he untangled braid\, we produce a dataset of braids using breadth-first se
arch and then apply behavioral cloning to train an agent on the output of
this search. As a result\, the (inverses of) steps predicted by the agent
turn out to be an unexpectedly good method of untangling braids\, includin
g those braids which did not feature in the dataset.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/19/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Peng Liu (University of Kent)
DTSTART;VALUE=DATE-TIME:20230504T130000Z
DTEND;VALUE=DATE-TIME:20230504T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/20
DESCRIPTION:Title: Optimal Smooth Approximation for Quantile Matrix Factorisation<
/a>\nby Dr. Peng Liu (University of Kent) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nMatrix Fact
orisation (MF) is essential to many estimation tasks. Most existing matrix
factorisation methods focus on least squares matrix factorisation (LSMF)\
, which aims to minimise a smooth L2 loss between observations and their d
ependent matrix measurement variables. In reality\, however\, L1 loss and
check loss are widely used in regression to deal with outliers or observat
ions contaminated by skewed or heavy-tailed noise. Although under certain
conditions\, linear convergence to the global optimality can be establishe
d for matrix factorisation under the L2 loss\, there is a lack of provably
efficient algorithms for solving matrix factorisation under non-smooth lo
sses. In this paper\, we investigate Quantile Matrix Factorization (QMF)\,
the counterpart of Quantile Regression in matrix estimation\, that adopts
a tunable check loss and introduces robustness to matrix estimation for s
kewed and heavy tailed observations\, which are prevalent in reality. To d
eal with the non-smooth loss\, we propose Nesterov smoothed QMF (NsQMF)\,
extending Nesterov’s optimal smooth approximation technique to the matri
x factorisation setting. We then present an alternating minimization algor
ithm to solve the smooth NsQMF efficiently. We mathematically prove that s
olving the smoothed NsQMF is equivalent to solving the original non-smooth
QMF problem and that our proposed algorithm achieves linear convergence t
o the global optimality of QMF. Numerical evaluations verify our theoretic
al findings and demonstrate that NsQMF significantly outperforms the commo
nly used LSMF and prior approximate smoothing heuristics for QMF under var
ious noise distributions.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/20/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Xiaochuan Yang (University of Brunel)
DTSTART;VALUE=DATE-TIME:20230525T130000Z
DTEND;VALUE=DATE-TIME:20230525T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/21
DESCRIPTION:Title: Some recent progress in random geometric graphs: beyond the sta
ndard regimes\nby Dr. Xiaochuan Yang (University of Brunel) as part of
(ED-3S) Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\
nAbstract\nI will survey some recent joint works with Mathew Penrose (Bath
) on the cluster structure of random geometric graphs in a regime that is
less discussed in the literature. The statistics of interest include the
number of k-components\, the number of components\, the number of vertice
s in the giant component\, and the connectivity threshold. We show LLN and
normal/Poisson approximation by Stein's method.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/21/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Yufei Zhang (London School of Economics & Political Science)
DTSTART;VALUE=DATE-TIME:20230511T130000Z
DTEND;VALUE=DATE-TIME:20230511T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/22
DESCRIPTION:Title: Exploration-exploitation trade-off for continuous-time reinforc
ement learning\nby Dr. Yufei Zhang (London School of Economics & Polit
ical Science) as part of (ED-3S) Essex Data Science Seminar Series\n\nLect
ure held in STEM 3.1.\n\nAbstract\nRecently\, reinforcement learning (RL)
has attracted substantial research interests. Much of the attention and su
ccess\, however\, has been for the discrete-time setting. Continuous-time
RL\, despite its natural analytical connection to stochastic controls\, ha
s been largely unexplored and with limited progress. In particular\, chara
cterising sample efficiency for continuous-time RL algorithms remains a ch
allenging and open problem.\n\nIn this talk\, we develop a framework to an
alyse model-based reinforcement learning in the episodic setting. We then
apply it to optimise exploration-exploitation trade-off for linear-convex
RL problems\, and report sublinear (or even logarithmic) regret bounds for
a class of learning algorithms inspired by filtering theory. The approach
is probabilistic\, involving analysing learning efficiency using concentr
ation inequalities for correlated continuous-time observations\, and apply
ing stochastic control theory to quantify the performance gap between appl
ying greedy policies derived from estimated and true models.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/22/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof. Chenggui Yuan (Swansea University)
DTSTART;VALUE=DATE-TIME:20230601T130000Z
DTEND;VALUE=DATE-TIME:20230601T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/24
DESCRIPTION:Title: Numerical solutions of SDEs with irregular coefficients\nby
Prof. Chenggui Yuan (Swansea University) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nStochastic
differential equations (SDEs) with irregular coefficients have been widely
studied. In this talk\, I will discuss the strong convergence and the we
ak convergence of SDEs with irregular coefficients. The convergence rate
will be investigated under different irregular conditions on coefficients.
\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/24/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Robert Gaunt (The University of Manchester)
DTSTART;VALUE=DATE-TIME:20230615T130000Z
DTEND;VALUE=DATE-TIME:20230615T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/25
DESCRIPTION:Title: Normal approximation for the posterior in exponential families<
/a>\nby Dr. Robert Gaunt (The University of Manchester) as part of (ED-3S)
Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstrac
t\nIn this talk I'll introduce quantitative Bernstein-von Mises type bound
s on the normal approximation of the posterior distribution in exponential
family models when centering either around the posterior mode or around t
he maximum likelihood estimator. Our bounds\, obtained through a version o
f Stein’s method\, are non-asymptotic\, and data dependent\; they are of
the correct order both in the total variation and Wasserstein distances\,
as well as for approximations for expectations of smooth functions of the
posterior. All our results are valid for univariate and multivariate post
eriors alike\, and do not require a conjugate prior setting. We illustrate
our findings on a variety of exponential family distributions\, including
Poisson\, multinomial and normal distribution with unknown mean and varia
nce. The resulting bounds have an explicit dependence on the prior distrib
ution and on sufficient statistics of the data from the sample\, and thus
provide insight into how these factors may affect the quality of the norma
l approximation.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/25/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Arthur Maheo (Amazon)
DTSTART;VALUE=DATE-TIME:20230622T130000Z
DTEND;VALUE=DATE-TIME:20230622T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/26
DESCRIPTION:Title: Benders decomposition for public transportation\nby Dr. Art
hur Maheo (Amazon) as part of (ED-3S) Essex Data Science Seminar Series\n\
nLecture held in STEM 3.1.\n\nAbstract\nCanberra (Australia) wants to desi
gn a transportation network combining high-frequency buses with on-demand
taxis. The resulting hub-and-shuttle network design problem is a large\, d
ifficult mixed-integer program. We identified how to decompose the problem
– design first\, route second – and used a modern Benders decompositi
on on the resulting formulation.\nThis new approach is orders of magnitude
faster\, allowing us to solve full instances where a standard approach ca
n only do small ones.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/26/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof Boris Mirkin (National Research University Higher School of E
conomics)
DTSTART;VALUE=DATE-TIME:20231006T120000Z
DTEND;VALUE=DATE-TIME:20231006T130000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/27
DESCRIPTION:Title: Anomalous clustering at various data formats\nby Prof Boris
Mirkin (National Research University Higher School of Economics) as part
of (ED-3S) Essex Data Science Seminar Series\n\nLecture held in 1N1.4.1.\n
\nAbstract\nAnomalous clustering is a method for extracting clusters one-b
y-one. It is an extension of the Principal Component Analysis method to z
ero-one matrix factorization settings. After a brief overview of various v
ersions of the method\, including its extensions to similarity data\, sp
atial data\, and fuzzy clustering\, I am going to concentrate on a most r
ecent development\, a triple-stage application of the approach to the anal
ysis of spatial-temporal patterns in a coastal oceanic phenomenon of upwel
ling (see Nascimento et al. 2023).\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/27/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Jin Zhu (LSE)
DTSTART;VALUE=DATE-TIME:20231019T130000Z
DTEND;VALUE=DATE-TIME:20231019T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/28
DESCRIPTION:Title: A Tuning-Free Algorithm for Sparsity-Constraint Optimization\nby Dr Jin Zhu (LSE) as part of (ED-3S) Essex Data Science Seminar Serie
s\n\nLecture held in STEM 3.1.\n\nAbstract\nSparsity-constraint optimizati
on has wide applicability in signal processing\, statistics\, and machine
learning. Existing fast algorithms must burdensomely tune parameters\, suc
h as the step size or the implementation of precise stop criteria\, which
may be challenging to determine in practice. To address this issue\, we de
velop an algorithm named sparsity-constraint optimization via splicing ite
ration (SCOPE) to optimize nonlinear differential objective functions with
strong convexity and smoothness in low dimensional subspaces. Algorithmic
ally\, the SCOPE algorithm converges effectively without tuning parameters
. Theoretically\, SCOPE has a linear convergence rate and converges to a s
olution that recovers the true support set when it correctly specifies the
sparsity. We also develop parallel theoretical results without restricted
-isometry-property-type conditions. We apply SCOPE’s versatility and pow
er to solve sparse quadratic optimization\, learn sparse classifiers\, and
recover sparse Markov networks for binary variables. The numerical result
s on these specific tasks reveal that SCOPE perfectly identifies the true
support set with a 10–1000 speedup over the standard exact solver\, conf
irming SCOPE’s algorithmic and theoretical merits. Our open-source Pytho
n package scope based on C++ implementation is publicly available on GitHu
b\, reaching a ten-fold speedup on the competing convex relaxation methods
implemented by the cvxpy library.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/28/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Shenggang Hu (University of Warwick)
DTSTART;VALUE=DATE-TIME:20231026T130000Z
DTEND;VALUE=DATE-TIME:20231026T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/29
DESCRIPTION:Title: Differential Privacy of Bayesian Posterior under Contamination<
/a>\nby Dr Shenggang Hu (University of Warwick) as part of (ED-3S) Essex D
ata Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nIn re
cent years\, differential privacy has been adopted by tech-companies and g
overnmental agencies as the standard for measuring privacy in algorithms.
We study the level of differential privacy in Bayesian posterior sampling
setups. As opposed to the common privatization approach of injecting Lapla
ce/Gaussian noise into the output\, Huber's contamination model is conside
red\, where we replace at random the data points with samples from a heavy
-tailed distribution. The derived bound for the differential privacy level
in our approach matches the existing literature while lifting the restric
tion on bounded observation space. We further consider the effect of sampl
e size on privacy level and conclude that asymptotically the contamination
approach is fully private at no cost of information loss.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/29/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof Wolfgang Hardle (Humboldt-Universität zu Berlin\, Germany)
DTSTART;VALUE=DATE-TIME:20240118T140000Z
DTEND;VALUE=DATE-TIME:20240118T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/30
DESCRIPTION:Title: Data Science in a Math-Less Digital Society\nby Prof Wolfga
ng Hardle (Humboldt-Universität zu Berlin\, Germany) as part of (ED-3S) E
ssex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\
nIn an increasingly digital and data-driven world\, the importance of data
science cannot be overstated. Data science\, by itself\, carries a "push
to analyse“ button though\, that lets the analyst forget about the „
math behind the machine learning tools“\n\nWe cover a few examples\, whe
re data science needs math in order to be understood and applied.\n\nBy th
e end of this talk\, attendees will gain a fresh perspective on data scien
ce's role in a math-less digital society. They will leave with practical i
nsights\, tools\, and strategies to leverage data effectively\, fostering
a culture of data-driven decision-making that transcends mathematical barr
iers.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/30/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Dimitra Kosta (University of Edinburgh)
DTSTART;VALUE=DATE-TIME:20231123T134500Z
DTEND;VALUE=DATE-TIME:20231123T144500Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/31
DESCRIPTION:Title: Maximum likelihood estimation of toric Fano varieties\nby D
r Dimitra Kosta (University of Edinburgh) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\nLecture held in Zoom.\n\nAbstract\nI will talk abo
ut the maximum likelihood estimation problem for several classes of toric
Fano models. I will start by exploring the maximum likelihood degree for a
ll 2-dimensional Gorenstein toric Fano varieties. I will show that the ML
degree is equal to the degree of the surface in every case except for the
quintic del Pezzo surface with two ordinary double points and provide expl
icit expressions that allow one to compute the maximum likelihood estimate
in closed form whenever the ML degree is less than 5. I will explore the
reasons for the ML degree drop using A-discriminants and intersection theo
ry. If there is time\, I will discuss about toric Fano varieties associate
d to 3-valent phylogenetic trees and their ML degree.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/31/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Xiaochun Meng (Sussex)
DTSTART;VALUE=DATE-TIME:20240125T140000Z
DTEND;VALUE=DATE-TIME:20240125T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/32
DESCRIPTION:by Dr Xiaochun Meng (Sussex) as part of (ED-3S) Essex Data Sci
ence Seminar Series\n\nLecture held in STEM 3.1.\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/32/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Richard Mann (Leeds)
DTSTART;VALUE=DATE-TIME:20240201T140000Z
DTEND;VALUE=DATE-TIME:20240201T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/33
DESCRIPTION:by Dr Richard Mann (Leeds) as part of (ED-3S) Essex Data Scien
ce Seminar Series\n\nLecture held in STEM 3.1.\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/33/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Jinyu Tian (Macau University of Science and Technology)
DTSTART;VALUE=DATE-TIME:20231214T140000Z
DTEND;VALUE=DATE-TIME:20231214T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/34
DESCRIPTION:by Dr Jinyu Tian (Macau University of Science and Technology)
as part of (ED-3S) Essex Data Science Seminar Series\n\nLecture held in ST
EM 3.1.\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/34/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Hong Duong (University of Birmingham)
DTSTART;VALUE=DATE-TIME:20231130T140000Z
DTEND;VALUE=DATE-TIME:20231130T150000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/35
DESCRIPTION:Title: Model Reduction of Complex Systems\nby Dr Hong Duong (Unive
rsity of Birmingham) as part of (ED-3S) Essex Data Science Seminar Series\
n\nLecture held in STEM 3.1.\n\nAbstract\nComplex systems in nature and in
applications (such as molecular systems\, crowd dynamics\, swarming\, opi
nion formation\, just to name a few) are often described by systems of sto
chastic differential equations (SDEs) and partial differential equations (
PDEs). It is often analytically impossible or computationally prohibitivel
y expensive to deal with the full models due to their high dimensionality
(degrees of freedom\, number of involved parameters\, etc.). It is thus of
great importance to approximate such large and complex systems by simpler
and lower dimensional ones\, while still preserving the essential informa
tion from the original model. This procedure is referred to as model reduc
tion or coarse-graining in the literature. In this talk\, I will present m
ethods for qualitative and quantitative coarse-graining of several SDEs an
d PDEs\, in the presence or absence of a scale-separation.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/35/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Yuyu Chen (University of Melbourne)
DTSTART;VALUE=DATE-TIME:20231116T130000Z
DTEND;VALUE=DATE-TIME:20231116T140000Z
DTSTAMP;VALUE=DATE-TIME:20231209T114959Z
UID:Essex-DataScience/36
DESCRIPTION:Title: Diversification of infinite-mean Pareto distributions\nby D
r Yuyu Chen (University of Melbourne) as part of (ED-3S) Essex Data Scienc
e Seminar Series\n\nLecture held in Zoom.\n\nAbstract\nWe show the perhaps
surprising inequality that the weighted average of negatively dependent s
uper-Pareto random variables\, possibly caused by triggering events\, is l
arger than one such random variable in the sense of first-order stochastic
dominance. The class of super-Pareto distributions is extremely heavy-tai
led and it includes the class of infinite-mean Pareto distributions. We di
scuss several implications of this result via an equilibrium analysis in a
risk exchange market. First\, diversification of super-Pareto losses incr
eases portfolio risk\, and thus a diversification penalty exists. Second\,
agents with super-Pareto losses will not share risks in a market equilibr
ium. Third\, transferring losses from agents bearing super-Pareto losses t
o external parties without any losses may arrive at an equilibrium which b
enefits every party involved. The empirical studies show that our new ineq
uality can be observed empirically for real datasets that fit well with ex
tremely heavy tails.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/36/
END:VEVENT
END:VCALENDAR