BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Matthew Lee (University of Bristol)
DTSTART;VALUE=DATE-TIME:20201015T130000Z
DTEND;VALUE=DATE-TIME:20201015T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/1
DESCRIPTION:Title: EpiViz: an implementation of Circos plots for epidemiologists\nby Matthew Lee (University of Bristol) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\n\nAbstract\nEpidemiology studies predominantly foc
us on single exposure and single outcome associations. However\, biologica
l pathways involve numerous processes and identifying meaningful intermedi
ate associations that can be taken forward for further analysis is complex
. This is particularly the case for studies involving metabolomics data\,
as effects rarely occur in isolation. Gaining global overview of hundreds
of exposure/outcome associations may therefore aid downstream analyses. Vi
sual inspection is one of the main modes of understanding global exposure/
outcome associations. EpiViz is a wrapper that makes producing Cricos plot
s simple and efficient for those new to programming and data visualisation
.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Godwin Osuntoki (University of Essex)
DTSTART;VALUE=DATE-TIME:20201022T130000Z
DTEND;VALUE=DATE-TIME:20201022T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/2
DESCRIPTION:Title: Bayesian Analysis of chromosomal interactions in Hi-C data using
the hidden Markov random field model\nby Godwin Osuntoki (University
of Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\n\nAbstra
ct\nThere are different biological methods that have been developed over t
he years for analysis of the 3D structure of the DNA. Few computational an
d statistical methods have\, however\, been developed to analysis data gen
erated using the Hi-C method. We follow statistical methodology to explore
the Hi-C data. The Hi-C data is well suited to be analyzed using a finite
mixture model. The Potts model\, a hidden Markov random field model\, was
employed to analyze the hidden (latent) components. The hidden components
through the Potts model can be categorized into k components (k = 2\,3…
\,K). Using the Metropolis-within-Gibbs approach to analyze the data\, the
proposed method was able to detect interactions (short and long range) an
d loops. A large part of the significant interactions that we detect are f
ound within Topological Associated Domains\, which is one of the 3D struct
ures known to occur in Hi-C data.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Nosheen Faiz (University of Essex)
DTSTART;VALUE=DATE-TIME:20201105T140000Z
DTEND;VALUE=DATE-TIME:20201105T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/4
DESCRIPTION:Title: Assessing how feature selection and hyper-parameters influence o
ptimal trees ensemble and random projection\nby Nosheen Faiz (Universi
ty of Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\n\nAbs
tract\nOur work investigates the effect of feature selection on three meth
ods: Random Forest (Breiman 2001)\, Optimal Trees Ensemble (Khan et al 201
6) and Random Projection (Canning and Samworth 2017) in high dimensional s
ettings. To this end\, LASSO has been considered for selecting the most im
portant features based on training data for dimension reduction. Additiona
lly\, the influence of various hyper-parameters regulating the three metho
ds has also been assessed. Analysis on several benchmark datasets is given
to illustrate the phenomena. The results reveal that feature selection im
proves the predictive performance of the Random Forest and Random Projecti
on methods in addition to reducing the computational burden. The performan
ce of Optimal Trees Ensemble is less influenced by feature selection.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/4/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Peng Liu (University of Essex)
DTSTART;VALUE=DATE-TIME:20201112T140000Z
DTEND;VALUE=DATE-TIME:20201112T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/5
DESCRIPTION:Title: Ordering and Inequalities for Mixtures on Risk Aggregation\n
by Peng Liu (University of Essex) as part of (ED-3S) Essex Data Science Se
minar Series\n\n\nAbstract\nAggregation sets\, which represent model uncer
tainty due to unknown dependence\, are an important object in the study of
robust risk aggregation. In this talk\, we investigate ordering relations
between two aggregation sets for which the sets of marginals are related
by two simple operations: distribution mixtures and quantile mixtures. Int
uitively\, these operations ``homogenize" marginal distributions by maki
ng them similar. As a general conclusion from our results\, more ``homogen
eous" marginals lead to a larger aggregation set\, and thus more severe mo
del uncertainty\, although the situation for quantile mixtures is much mor
e complicated than that for distribution mixtures. \nWe proceed to study
inequalities on the worst-case values of risk measures in risk aggregatio
n\, which represent conservative calculation of regulatory capital. Among
other results\, we obtain an order relation on VaR under quantile mixture
for marginal distributions with monotone densities. Numerical results are
presented to visualize the theoretical results and further inspire some c
onjectures.\nFinally\, we discuss the connection of our results to joint m
ixability and to merging p-values in multiple hypothesis testing.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/5/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Tolulope Fadina (University of Essex)
DTSTART;VALUE=DATE-TIME:20210225T140000Z
DTEND;VALUE=DATE-TIME:20210225T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/6
DESCRIPTION:Title: Symmetric measures of variability induced by risk measures\n
by Tolulope Fadina (University of Essex) as part of (ED-3S) Essex Data Sci
ence Seminar Series\n\n\nAbstract\nGeneral measures of variability induced
by risk measures are investigated for their potential applications to ris
k management. We emphasize on the three classes of variability measures ge
nerated by the Value-at-Risk\, Expected Shortfall\, and the Expectiles. Th
eir properties are explored\, and we obtain a characterization result on g
eneral model spaces. Convergence properties and asymptotic normality of th
e empirical variability measures estimators are established. An applicatio
n of the variability measures to financial data is also investigated.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/6/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Ioana Olan (University of Cambridge)
DTSTART;VALUE=DATE-TIME:20201126T140000Z
DTEND;VALUE=DATE-TIME:20201126T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/7
DESCRIPTION:Title: Detecting the hierarchical structure of the cell nucleus\nby
Ioana Olan (University of Cambridge) as part of (ED-3S) Essex Data Scienc
e Seminar Series\n\n\nAbstract\nChromatin consists of DNA wrapped around h
istones and forms complex three-dimensional structures within the cell nuc
leus with various degrees of compaction. Genes have been shown to be repre
ssed by their proximity to the nuclear periphery or activated by being in
contact with special regulatory regions called enhancers. Thus the relativ
e positioning of genes and their interactions with other regions are very
important in determining whether they are expressed or not. Interactions b
etween pairs of genomic regions have been studied using assays such as Hi-
C\, which generate large matrices estimating interaction frequencies. We u
se such interaction estimates as weights in a network whose nodes are equa
lly sized genomic regions and perform nested community detection in order
to resolve the relative positioning of genomic regions of interest and mod
el the interior of the cell nucleus. Our biological model is cellular sene
scence\, a phenotype associated with dramatic changes in its chromatin int
eractions network relative to normal cells. Senescence corresponds to perm
anent cell cycle arrest and has been shown to act as a protective barrier
against tumourigenesis.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/7/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Josh Bull (University of Oxford)
DTSTART;VALUE=DATE-TIME:20201203T140000Z
DTEND;VALUE=DATE-TIME:20201203T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/8
DESCRIPTION:Title: Can maths tell us how to win at Fantasy Football?\nby Josh B
ull (University of Oxford) as part of (ED-3S) Essex Data Science Seminar S
eries\n\n\nAbstract\nFantasy Football is an online game played by millions
of people every year\, in which players attempt to predict the outcome of
football matches over the course of a season. To the surprise of everyone
(including myself)\, I was lucky enough to be crowned the winner of the 2
019-20 Fantasy Premier League\, one of the largest competitions in the UK.
As a researcher in Mathematical Oncology at the University of Oxford\, pe
ople have asked me whether I used maths to win – while I followed some s
trategies at the time\, I didn’t have any proof that they were in some s
ense mathematically optimal. However\, mathematical modelling is a tool wh
ich is capable of exploring exactly these kinds of questions: how can we i
dentify the best strategies to tackle complex problems? What types of data
are important to consider\, and how should we use them to inform our deci
sions? In this talk\, I’ll analyse how different quantitative approaches
can be used to tackle key questions in Fantasy Football\, and identify th
e strengths and weaknesses of these frameworks. Finally\, I’ll address t
he question: Can maths tell us how to win at Fantasy Football?\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/8/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Osama Mahmoud (University of Essex)
DTSTART;VALUE=DATE-TIME:20210211T140000Z
DTEND;VALUE=DATE-TIME:20210211T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/9
DESCRIPTION:Title: Slope-Hunter: A robust method for index-event bias correction in
genome-wide association studies of conditional analyses\nby Osama Mah
moud (University of Essex) as part of (ED-3S) Essex Data Science Seminar S
eries\n\n\nAbstract\nBackground: Studying genetic associations with progno
sis (e.g. survival\, subsequent events) is problematic due to selection bi
as - also termed index event bias or collider bias - whereby selection on
disease status can induce associations between causes of incidence with pr
ognosis. A current method for adjusting genetic associations for this bias
assumes there is no genetic correlation between incidence and prognosis\,
which may not be a plausible assumption.\n\nMethods: We propose an altern
ative\, the ‘Slope-Hunter’ approach\, which is unbiased even when ther
e is genetic correlation between incidence and prognosis. Our approach has
two stages. First\, we use cluster-based techniques to identify: variants
affecting neither incidence nor prognosis (these should not suffer bias a
nd only a random sub-sample of them are retained in the analysis)\; varian
ts affecting prognosis only (excluded from the analysis). Second\, we fit
a cluster-based model to identify the class of variants only affecting inc
idence\, and use this class to estimate the adjustment factor. {\\color{bl
ue} The underlying assumption of our approach is that variants affecting o
nly incidence explain more variation in incidence than any group of varian
ts with unique effects\, e.g. via same exposure\, on both incidence and pr
ognosis}.\n\nResults: Simulation studies showed that {\\color{blue} our ap
proach eliminates the bias and outperforms alternatives in the presence of
genetic correlation\, and performs as well as alternatives under no genet
ic correlation when its assumption is satisfied. We applied the ‘Slope-H
unter’ method to a study of fasting blood insulin levels (FI) conditiona
l on body mass index (BMI)\, estimated the index event bias\, and adjusted
conditional associations of the lead variants with FI. Our estimates sugg
ested that there were common causes of BMI and FI of concordant directions
of effect\, that are in-line with previously observed association between
obesity and insulin resistance.}\n\nConclusions: Our approach is unbiased
even in the presence of genetic correlation between incidence and progres
sion when the underlying assumptions hold. Bias-adjusting methods should b
e used to carry out causal analyses when conditioning on incidence.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Yanchun Bao (University of Essex)
DTSTART;VALUE=DATE-TIME:20201217T140000Z
DTEND;VALUE=DATE-TIME:20201217T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/10
DESCRIPTION:Title: Estimating mode effects from a sequential mixed-modes experimen
t\nby Yanchun Bao (University of Essex) as part of (ED-3S) Essex Data
Science Seminar Series\n\n\nAbstract\nThe large-scale household panel stud
y Understanding Society (The U.K. Household Longitudinal Study UKHLS) has\
, until recently\, used interviewers to administer its questionnaires\, bu
t is now in the process of allowing individuals to participate using the w
eb. Survey data are known to be affected by survey mode so a sequential mo
de-effects experiment was carried out on to evaluate the impact of this ch
ange on the panel. In this talk we present a novel estimator and analysis
strategy to quantify the impact of mode across a wide range of variables\,
with large mode effects on the covariance of a pair of variables used to
indicate an increased risk that statistical analyses involving this pair w
ill be affected.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/10/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rafal Kulakowski (University of Essex)
DTSTART;VALUE=DATE-TIME:20210204T140000Z
DTEND;VALUE=DATE-TIME:20210204T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/11
DESCRIPTION:by Rafal Kulakowski (University of Essex) as part of (ED-3S) E
ssex Data Science Seminar Series\n\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Yassir Rabhi (University of Essex)
DTSTART;VALUE=DATE-TIME:20201210T140000Z
DTEND;VALUE=DATE-TIME:20201210T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/12
DESCRIPTION:Title: Copulas and measures of dependence under length-biased sampling
and informative censoring\nby Yassir Rabhi (University of Essex) as p
art of (ED-3S) Essex Data Science Seminar Series\n\n\nAbstract\nLength-bia
sed data are often encountered in cross-sectional surveys and prevalent-co
hort studies on disease durations. Under length-biased sampling subjects w
ith longer disease durations have greater chance to be observed. As a resu
lt\, covariate values linked to the longer survivors are favoured by the s
ampling mechanism. When the sampled durations are also subject to right ce
nsoring\, the censoring is informative. Modelling dependence structure wit
hout adjusting for these issues leads to biased results. In this talk\, I
will present a study on copulas for modelling dependence when the collecte
d data are length-biased and account for both informative censoring and co
variate bias. I will address the nonparametric estimation of the bivariate
distribution\, copula function and its density\, and Kendall and Spearman
measures for right-censored length-biased data. The proposed estimator of
the bivariate CDF is a Hadamard-differentiable functional of two MLEs\, K
aplan-Meier and empirical CDF\, and inherits their efficiencies. Based on
this estimator\, we devise estimators for copula function and a local-poly
nomial estimator for copula density that accounts for boundary bias. In ad
dition\, I will introduce estimators for Kendall and Spearman measures. Th
e weak convergence of the estimators will also be discussed. The proposed
method is then applied to analyse a set of right-censored length-biased da
ta on survival with dementia\, collected as part of a nationwide study in
Canada.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Carolin Strobl (Universität Zürich)
DTSTART;VALUE=DATE-TIME:20201119T140000Z
DTEND;VALUE=DATE-TIME:20201119T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/13
DESCRIPTION:Title: A Statistician’s Botanical Garden - The Ideas behind Trees\,
Model-Based Trees and Random Forests\nby Carolin Strobl (Universität
Zürich) as part of (ED-3S) Essex Data Science Seminar Series\n\n\nAbstrac
t\nClassification and regression trees\, model-based trees and random fore
sts are powerful statistical methods from the field of machine learning. T
hey have been shown to achieve a high prediction accuracy\, especially in
big data applications with many predictor variables and complex associatio
n patterns (such as nonlinear and higher-order interaction effects). While
individual trees are easy to interpret\, random forests are "black box" p
rediction methods. They do\, however\, provide variable importance measure
s\, that are being used to judge the relevance of the individual predictor
variables. The aim of this presentation is to introduce the rationale beh
ind trees\, model-based trees and random forests\, to illustrate their pot
ential for high-dimensional data exploration\, e.g.\, in psychological res
earch\, but also to point out limitations and potential pitfalls in their
practical application.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Shenggang Hu (University of Essex)
DTSTART;VALUE=DATE-TIME:20221013T130000Z
DTEND;VALUE=DATE-TIME:20221013T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/14
DESCRIPTION:Title: Statistical disaggregation - a Monte Carlo approach for imputat
ion under constraints\nby Shenggang Hu (University of Essex) as part o
f (ED-3S) Essex Data Science Seminar Series\n\nLecture held in NTC.1.04.\n
\nAbstract\nStatistical disaggregation has become more and more important
for smart energy systems. A typical example of such disaggregation problem
s is to learn energy consumption for a higher resolution level (data recor
ded at higher frequency) based on data at a lower resolution (data recorde
d at lower frequency). Constrained models are often used in such problems
and they are often very useful compared to their unconstrained counterpart
s in terms of reducing uncertainty and leading to an improvement of the ov
erall performance. However\, these constrained models usually are not expr
essible as ordinary distributions due to their intractable density functio
ns which makes it hard to conduct further analysis. This paper introduces
a novel constrained Monte Carlo sampling algorithm based on Langevin diffu
sions and rejection sampling to solve the problem of sampling from constra
ined models. This new method is then applied to a statistical disaggregati
on problem for an electricity consumption dataset. Our approach provides
excellent accuracy of data imputation\, based on our simulation studies an
d data analysis. The new method is also justified theoretically.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/14/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof Christian Martin Hennig (University of Bologna\, UCL)
DTSTART;VALUE=DATE-TIME:20221103T140000Z
DTEND;VALUE=DATE-TIME:20221103T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/15
DESCRIPTION:Title: Advances in using cluster analysis for species delimitation
\nby Prof Christian Martin Hennig (University of Bologna\, UCL) as part of
(ED-3S) Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\
nAbstract\nBiological species are often delimited based on genetic multilo
cus data using methods for inferring phylogenetic trees or model- or dista
nce-based cluster analysis. A major problem here is that genetic dissimila
rity does not only arise from separated species\, but also if subpopulatio
ns of a species live in geographically distant areas without genetic excha
nge. In any case\, be it using partitioning cluster analysis or hierarchic
al trees\, it is a hard problem to decide the number of species\, and whet
her groups that are candidates for being species actually belong together.
I will discuss some the use of some new approaches for clustering and est
imating the number of clusters for this problem\, focusing particularly on
testing whether observed genetic heterogeneity within a species candidate
group can be explained be geographical distance rather than consisting of
separate species. This requires hypothesis testing in a distance-distance
regression model. I will also discuss the integration of such a testing r
outine in a fully automated method for species delimitation.\n\nReference\
n\nHausdorf\, B\, Hennig\, C. Species delimitation and geography. Mol Ecol
Resour. 2020\; 20: 950– 960. https://doi.org/10.1111/1755-0998.13184\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/15/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Johan van der Molen (University of Cambridge)
DTSTART;VALUE=DATE-TIME:20221124T140000Z
DTEND;VALUE=DATE-TIME:20221124T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/16
DESCRIPTION:Title: Dirichlet process mixture inconsistency for the number of compo
nents: how worried should we be in practice?\nby Dr Johan van der Mole
n (University of Cambridge) as part of (ED-3S) Essex Data Science Seminar
Series\n\nLecture held in STEM 3.1.\n\nAbstract\nBayesian nonparametric mi
xture models are widely used for model-based clustering due to their flexi
bility and conceptual simplicity\, as well as the availability of efficie
nt sampling methods for performing inference. However\, recent work has es
tablished that such models have undesirable asymptotic properties regardin
g the estimation of the number of clusters. For instance\, Dirichlet Proce
ss Mixtures (DPMs) have been shown to be inconsistent for the number of cl
usters\, and overestimation of the number of clusters has been observed in
practice for finite samples. Finite mixtures with a prior on the number o
f components - also known as Mixtures of Finite Mixtures (MFMs) - have bee
n suggested as an asymptotically consistent alternative\, but the effects
of model misspecification can still result in asymptomatic inconsistency a
nd poor estimation of the number of clusters in practice. \n\nHere we spec
ifically focus on estimation of the number of clusters in Bayesian nonpara
metric mixtures in practice\, including the impact of Markov chain Monte C
arlo (MCMC) post-processing algorithms for summarisation and identificatio
n of a final representative summary clustering. We consider practical scen
arios of low to moderate dimension\, through both simulation studies and a
pplications to real biomolecular data. In the situations we consider\, we
confirm that even when the parametric form of the mixture component distri
butions is correctly specified\, DPMs lead to mild overestimation of the n
umber of clusters for finite samples. However\, we also demonstrate that t
his can be corrected by common summarisation methods\, suggesting that app
lications of DPMs in practice may be more robust than the theory might sug
gest. We show that\, for both DPMs and MFMs\, mixture component density mi
sspecification typically leads to more dramatic overestimation\, with DPMs
providing slightly worse estimates than MFMs\, but with the common patter
n of “true” clusters in the data being split into smaller subclusters
due to additional mixture components being required to flexibly capture fe
atures of the data inadequately described by the misspecified models. We c
onsider implications for high-dimensional data analysis\, in which simplif
ying assumptions that are commonly made in practice for computational trac
tability (e.g. assuming a diagonal covariance matrix for Gaussian mixture
components) are also expected to result in model misspecification. As part
of our work\, we compare popular MCMC post-processing algorithms for iden
tifying a final summary clustering\, and show that although some of them h
ave a positive impact on results\, others can introduce severe overestimat
ion of the number of clusters\, even when the underlying posterior distrib
ution from which samples are being drawn is centred on the true number of
clusters. This is joint work with Yannis Chaumeny\, Paul Kirk\, Anthony Da
vidson.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/16/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Alexei Vernitski (University of Essex)
DTSTART;VALUE=DATE-TIME:20221027T130000Z
DTEND;VALUE=DATE-TIME:20221027T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/17
DESCRIPTION:Title: Using machine learning to solve mathematical problems and to se
arch for examples and counterexamples in pure maths research\nby Dr Al
exei Vernitski (University of Essex) as part of (ED-3S) Essex Data Science
Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nOur recent resea
rch can be generally described as applying state-of-the-art technologies o
f machine learning to suitable mathematical problems. We use both reinforc
ement learning and supervised learning (underpinned by deep learning). As
to mathematical problems we consider\, they include learning to untangle a
braid (this problem is not unlike the problem of solving the Rubik cube)\
, learning to find the parity of a permutation (as compared to the classic
al problem of deep learning of learning to find the parity bit of a binary
array)\, comparing mathematical mistakes made by artificial intelligence
with those made by human mathematicians\, etc.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/17/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Qiuyi Hong (University of Essex)
DTSTART;VALUE=DATE-TIME:20221117T140000Z
DTEND;VALUE=DATE-TIME:20221117T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/18
DESCRIPTION:Title: A Bilevel Game-TheoreDc Decision-Making Framework for Strategic
Retailers in Both Local and Wholesale Electricity Markets\nby Qiuyi H
ong (University of Essex) as part of (ED-3S) Essex Data Science Seminar Se
ries\n\nLecture held in STEM 3.1.\n\nAbstract\nIn this talk we propose a b
ilevel game-theoretic model for multiple strategic retailers participating
in both wholesale and local electricity markets while considering custome
rs’ switching behaviours. At the upper level\, each retailer maximizes i
ts own profit by making optimal offering decisions in the retail market an
d bidding decisions in the day-ahead wholesale (DAW) and local power excha
nge (LPE) markets. The interaction among multiple strategic retailers is f
ormulated using the Bertrand competition model. For the lower level\, ther
e are three optimisation problems. First\, the customers’ welfare maximi
sation problem with their switching behaviors is formulated to capture the
demand responses from customers. Second\, a market-clearing problem is fo
rmulated for the independent system operator (ISO) in the DAW market. Thir
d\, a novel LPE market is developed for retailers to facilitate their powe
r balancing. In addition\, the bilevel multi-leader multi-follower Stackel
berg game forms an equilibrium problem with equilibrium constraints (EPEC)
problem\, which is solved by the diagonalization algorithm. Numerical res
ults demonstrate the feasibility and effectiveness of the EPEC model and t
he importance of modeling customers’ switching behaviors. We corroborate
that incentivising customers’ switching behaviors and increasing the nu
mber of retailers facilitates retail competition\, which results in reduci
ng strategic retailers’ retail prices and profits. Moreover\, the relati
onship between customers’ switching behaviors and welfare is reflected b
y a balance between the electricity purchasing cost (i.e.\, electricity pr
ice) and the electricity consumption level.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/18/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mateo Salles (University of Essex)
DTSTART;VALUE=DATE-TIME:20230209T140000Z
DTEND;VALUE=DATE-TIME:20230209T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/19
DESCRIPTION:Title: Supervised Learning for Untangling Braids\nby Mateo Salles
(University of Essex) as part of (ED-3S) Essex Data Science Seminar Series
\n\nLecture held in STEM 3.1.\n\nAbstract\nUntangling a braid is a typical
multi-step process\, and reinforcement learning can be used to train an a
gent to untangle braids. Here we present another approach. Starting from t
he untangled braid\, we produce a dataset of braids using breadth-first se
arch and then apply behavioral cloning to train an agent on the output of
this search. As a result\, the (inverses of) steps predicted by the agent
turn out to be an unexpectedly good method of untangling braids\, includin
g those braids which did not feature in the dataset.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/19/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Peng Liu (University of Kent)
DTSTART;VALUE=DATE-TIME:20230504T130000Z
DTEND;VALUE=DATE-TIME:20230504T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/20
DESCRIPTION:Title: Optimal Smooth Approximation for Quantile Matrix Factorisation<
/a>\nby Dr. Peng Liu (University of Kent) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nMatrix Fact
orisation (MF) is essential to many estimation tasks. Most existing matrix
factorisation methods focus on least squares matrix factorisation (LSMF)\
, which aims to minimise a smooth L2 loss between observations and their d
ependent matrix measurement variables. In reality\, however\, L1 loss and
check loss are widely used in regression to deal with outliers or observat
ions contaminated by skewed or heavy-tailed noise. Although under certain
conditions\, linear convergence to the global optimality can be establishe
d for matrix factorisation under the L2 loss\, there is a lack of provably
efficient algorithms for solving matrix factorisation under non-smooth lo
sses. In this paper\, we investigate Quantile Matrix Factorization (QMF)\,
the counterpart of Quantile Regression in matrix estimation\, that adopts
a tunable check loss and introduces robustness to matrix estimation for s
kewed and heavy tailed observations\, which are prevalent in reality. To d
eal with the non-smooth loss\, we propose Nesterov smoothed QMF (NsQMF)\,
extending Nesterov’s optimal smooth approximation technique to the matri
x factorisation setting. We then present an alternating minimization algor
ithm to solve the smooth NsQMF efficiently. We mathematically prove that s
olving the smoothed NsQMF is equivalent to solving the original non-smooth
QMF problem and that our proposed algorithm achieves linear convergence t
o the global optimality of QMF. Numerical evaluations verify our theoretic
al findings and demonstrate that NsQMF significantly outperforms the commo
nly used LSMF and prior approximate smoothing heuristics for QMF under var
ious noise distributions.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/20/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Xiaochuan Yang (University of Brunel)
DTSTART;VALUE=DATE-TIME:20230525T130000Z
DTEND;VALUE=DATE-TIME:20230525T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/21
DESCRIPTION:Title: Some recent progress in random geometric graphs: beyond the sta
ndard regimes\nby Dr. Xiaochuan Yang (University of Brunel) as part of
(ED-3S) Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\
nAbstract\nI will survey some recent joint works with Mathew Penrose (Bath
) on the cluster structure of random geometric graphs in a regime that is
less discussed in the literature. The statistics of interest include the
number of k-components\, the number of components\, the number of vertice
s in the giant component\, and the connectivity threshold. We show LLN and
normal/Poisson approximation by Stein's method.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/21/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Yufei Zhang (London School of Economics & Political Science)
DTSTART;VALUE=DATE-TIME:20230511T130000Z
DTEND;VALUE=DATE-TIME:20230511T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/22
DESCRIPTION:Title: Exploration-exploitation trade-off for continuous-time reinforc
ement learning\nby Dr. Yufei Zhang (London School of Economics & Polit
ical Science) as part of (ED-3S) Essex Data Science Seminar Series\n\nLect
ure held in STEM 3.1.\n\nAbstract\nRecently\, reinforcement learning (RL)
has attracted substantial research interests. Much of the attention and su
ccess\, however\, has been for the discrete-time setting. Continuous-time
RL\, despite its natural analytical connection to stochastic controls\, ha
s been largely unexplored and with limited progress. In particular\, chara
cterising sample efficiency for continuous-time RL algorithms remains a ch
allenging and open problem.\n\nIn this talk\, we develop a framework to an
alyse model-based reinforcement learning in the episodic setting. We then
apply it to optimise exploration-exploitation trade-off for linear-convex
RL problems\, and report sublinear (or even logarithmic) regret bounds for
a class of learning algorithms inspired by filtering theory. The approach
is probabilistic\, involving analysing learning efficiency using concentr
ation inequalities for correlated continuous-time observations\, and apply
ing stochastic control theory to quantify the performance gap between appl
ying greedy policies derived from estimated and true models.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/22/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof. Chenggui Yuan (Swansea University)
DTSTART;VALUE=DATE-TIME:20230601T130000Z
DTEND;VALUE=DATE-TIME:20230601T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/24
DESCRIPTION:Title: Numerical solutions of SDEs with irregular coefficients\nby
Prof. Chenggui Yuan (Swansea University) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nStochastic
differential equations (SDEs) with irregular coefficients have been widely
studied. In this talk\, I will discuss the strong convergence and the we
ak convergence of SDEs with irregular coefficients. The convergence rate
will be investigated under different irregular conditions on coefficients.
\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/24/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Robert Gaunt (The University of Manchester)
DTSTART;VALUE=DATE-TIME:20230615T130000Z
DTEND;VALUE=DATE-TIME:20230615T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/25
DESCRIPTION:Title: Normal approximation for the posterior in exponential families<
/a>\nby Dr. Robert Gaunt (The University of Manchester) as part of (ED-3S)
Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstrac
t\nIn this talk I'll introduce quantitative Bernstein-von Mises type bound
s on the normal approximation of the posterior distribution in exponential
family models when centering either around the posterior mode or around t
he maximum likelihood estimator. Our bounds\, obtained through a version o
f Stein’s method\, are non-asymptotic\, and data dependent\; they are of
the correct order both in the total variation and Wasserstein distances\,
as well as for approximations for expectations of smooth functions of the
posterior. All our results are valid for univariate and multivariate post
eriors alike\, and do not require a conjugate prior setting. We illustrate
our findings on a variety of exponential family distributions\, including
Poisson\, multinomial and normal distribution with unknown mean and varia
nce. The resulting bounds have an explicit dependence on the prior distrib
ution and on sufficient statistics of the data from the sample\, and thus
provide insight into how these factors may affect the quality of the norma
l approximation.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/25/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr. Arthur Maheo (Amazon)
DTSTART;VALUE=DATE-TIME:20230622T130000Z
DTEND;VALUE=DATE-TIME:20230622T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/26
DESCRIPTION:Title: Benders decomposition for public transportation\nby Dr. Art
hur Maheo (Amazon) as part of (ED-3S) Essex Data Science Seminar Series\n\
nLecture held in STEM 3.1.\n\nAbstract\nCanberra (Australia) wants to desi
gn a transportation network combining high-frequency buses with on-demand
taxis. The resulting hub-and-shuttle network design problem is a large\, d
ifficult mixed-integer program. We identified how to decompose the problem
– design first\, route second – and used a modern Benders decompositi
on on the resulting formulation.\nThis new approach is orders of magnitude
faster\, allowing us to solve full instances where a standard approach ca
n only do small ones.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/26/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof Boris Mirkin (National Research University Higher School of E
conomics)
DTSTART;VALUE=DATE-TIME:20231006T120000Z
DTEND;VALUE=DATE-TIME:20231006T130000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/27
DESCRIPTION:Title: Anomalous clustering at various data formats\nby Prof Boris
Mirkin (National Research University Higher School of Economics) as part
of (ED-3S) Essex Data Science Seminar Series\n\nLecture held in 1N1.4.1.\n
\nAbstract\nAnomalous clustering is a method for extracting clusters one-b
y-one. It is an extension of the Principal Component Analysis method to z
ero-one matrix factorization settings. After a brief overview of various v
ersions of the method\, including its extensions to similarity data\, sp
atial data\, and fuzzy clustering\, I am going to concentrate on a most r
ecent development\, a triple-stage application of the approach to the anal
ysis of spatial-temporal patterns in a coastal oceanic phenomenon of upwel
ling (see Nascimento et al. 2023).\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/27/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Jin Zhu (LSE)
DTSTART;VALUE=DATE-TIME:20231019T130000Z
DTEND;VALUE=DATE-TIME:20231019T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/28
DESCRIPTION:Title: A Tuning-Free Algorithm for Sparsity-Constraint Optimization\nby Dr Jin Zhu (LSE) as part of (ED-3S) Essex Data Science Seminar Serie
s\n\nLecture held in STEM 3.1.\n\nAbstract\nSparsity-constraint optimizati
on has wide applicability in signal processing\, statistics\, and machine
learning. Existing fast algorithms must burdensomely tune parameters\, suc
h as the step size or the implementation of precise stop criteria\, which
may be challenging to determine in practice. To address this issue\, we de
velop an algorithm named sparsity-constraint optimization via splicing ite
ration (SCOPE) to optimize nonlinear differential objective functions with
strong convexity and smoothness in low dimensional subspaces. Algorithmic
ally\, the SCOPE algorithm converges effectively without tuning parameters
. Theoretically\, SCOPE has a linear convergence rate and converges to a s
olution that recovers the true support set when it correctly specifies the
sparsity. We also develop parallel theoretical results without restricted
-isometry-property-type conditions. We apply SCOPE’s versatility and pow
er to solve sparse quadratic optimization\, learn sparse classifiers\, and
recover sparse Markov networks for binary variables. The numerical result
s on these specific tasks reveal that SCOPE perfectly identifies the true
support set with a 10–1000 speedup over the standard exact solver\, conf
irming SCOPE’s algorithmic and theoretical merits. Our open-source Pytho
n package scope based on C++ implementation is publicly available on GitHu
b\, reaching a ten-fold speedup on the competing convex relaxation methods
implemented by the cvxpy library.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/28/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Shenggang Hu (University of Warwick)
DTSTART;VALUE=DATE-TIME:20231026T130000Z
DTEND;VALUE=DATE-TIME:20231026T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/29
DESCRIPTION:Title: Differential Privacy of Bayesian Posterior under Contamination<
/a>\nby Dr Shenggang Hu (University of Warwick) as part of (ED-3S) Essex D
ata Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nIn re
cent years\, differential privacy has been adopted by tech-companies and g
overnmental agencies as the standard for measuring privacy in algorithms.
We study the level of differential privacy in Bayesian posterior sampling
setups. As opposed to the common privatization approach of injecting Lapla
ce/Gaussian noise into the output\, Huber's contamination model is conside
red\, where we replace at random the data points with samples from a heavy
-tailed distribution. The derived bound for the differential privacy level
in our approach matches the existing literature while lifting the restric
tion on bounded observation space. We further consider the effect of sampl
e size on privacy level and conclude that asymptotically the contamination
approach is fully private at no cost of information loss.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/29/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Prof Wolfgang Hardle (Humboldt-Universität zu Berlin\, Germany)
DTSTART;VALUE=DATE-TIME:20240118T140000Z
DTEND;VALUE=DATE-TIME:20240118T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/30
DESCRIPTION:Title: Data Science in a Math-Less Digital Society\nby Prof Wolfga
ng Hardle (Humboldt-Universität zu Berlin\, Germany) as part of (ED-3S) E
ssex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\
nIn an increasingly digital and data-driven world\, the importance of data
science cannot be overstated. Data science\, by itself\, carries a "push
to analyse“ button though\, that lets the analyst forget about the „
math behind the machine learning tools“\n\nWe cover a few examples\, whe
re data science needs math in order to be understood and applied.\n\nBy th
e end of this talk\, attendees will gain a fresh perspective on data scien
ce's role in a math-less digital society. They will leave with practical i
nsights\, tools\, and strategies to leverage data effectively\, fostering
a culture of data-driven decision-making that transcends mathematical barr
iers.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/30/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Dimitra Kosta (University of Edinburgh)
DTSTART;VALUE=DATE-TIME:20231123T134500Z
DTEND;VALUE=DATE-TIME:20231123T144500Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/31
DESCRIPTION:Title: Maximum likelihood estimation of toric Fano varieties\nby D
r Dimitra Kosta (University of Edinburgh) as part of (ED-3S) Essex Data Sc
ience Seminar Series\n\nLecture held in Zoom.\n\nAbstract\nI will talk abo
ut the maximum likelihood estimation problem for several classes of toric
Fano models. I will start by exploring the maximum likelihood degree for a
ll 2-dimensional Gorenstein toric Fano varieties. I will show that the ML
degree is equal to the degree of the surface in every case except for the
quintic del Pezzo surface with two ordinary double points and provide expl
icit expressions that allow one to compute the maximum likelihood estimate
in closed form whenever the ML degree is less than 5. I will explore the
reasons for the ML degree drop using A-discriminants and intersection theo
ry. If there is time\, I will discuss about toric Fano varieties associate
d to 3-valent phylogenetic trees and their ML degree.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/31/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Richard Mann (University of Leeds)
DTSTART;VALUE=DATE-TIME:20240201T140000Z
DTEND;VALUE=DATE-TIME:20240201T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/33
DESCRIPTION:Title: Collective decision-making by rational agents\nby Dr Richar
d Mann (University of Leeds) as part of (ED-3S) Essex Data Science Seminar
Series\n\nLecture held in STEM 3.1.\n\nAbstract\nThe decisions made by ot
hers are a valuable source of social information about the world\, because
they may have knowledge that we lack. This means that when one agent make
s a given choice\, it can induce others to do so as well. In this talk I w
ill describe a theory of rational agents who optimally utilise the social
information provided by others\, and explore the dynamics this produces at
the individual and group level. In particular\, I will show how the impli
cit beliefs such agents hold about the physical and social environment sha
pe their response to each other\, and how changes to the environment that
conflict with these beliefs can dramatically alter collective behaviour an
d impact the success of groups.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/33/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Jinyu Tian (Macau University of Science and Technology)
DTSTART;VALUE=DATE-TIME:20231214T140000Z
DTEND;VALUE=DATE-TIME:20231214T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/34
DESCRIPTION:Title: Discreteness Problem in Adversarial Machine Learning\nby Dr
Jinyu Tian (Macau University of Science and Technology) as part of (ED-3S
) Essex Data Science Seminar Series\n\n\nAbstract\nAdversarial examples (A
Es) of deep neural networks (DNNs) are receiving ever-increasing attention
because they help in understanding the mechanism of DNNs and provide a no
vel perspective of the ethics of deep learning applications. In many real
scenarios\, AEs have to be discrete (e.g. digital images). Most existing w
orks achieve the discreteness relying on the discretization of continuous
AEs. Unfortunately\, they cannot sufficiently control the spatial differen
ce before and after discretizing continuous AEs\, which will leads to two
sid-effects: degrading the attack capability of the obtained discrete AEs
or introducing the extra distortion. \n\nIn this work\, we propose an adve
rsarial attack called Discrete Attack (DATK) to produce continuous AEs tig
htly close to their discrete counterparts. Owning the negligible spatial d
istance between them\, the expected discrete AEs perform with the same pow
erful attack capability as the continuous AEs without an extra distortion
overhead. More precisely\, the proposed DATK generate AEs from a novel per
spective by directly modeling adversarial perturbations (APs) as discrete
random variables. The AE generation problem thus reduces to the estimation
of the distribution of discrete APs. Since this problem typically is nond
ifferential\, we relax it with the proposed reparameterizing tricks and ob
tain an approximated continuous distribution of discrete APs. Our theoreti
cal proof shows that\, by virtue the continuous APs sampled from the appro
ximated distribution\, the spatial distance between the resultant continuo
us AEs and their discrete counterparts are tightly bounded\, which signifi
cantly overcomes the side-effects caused by the discretization. Extensive
results over Imagenet\, Cifar10 and TU Berlin Sketch demonstrate the super
iority of our method when attacking representative DNNs including Vgg19\,
Resnet50\, DenseNet121 and MobilenetV2. It is also verified that our DATK
is more robust against the state-ofthe-art adversarial detection methods.\
n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/34/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Hong Duong (University of Birmingham)
DTSTART;VALUE=DATE-TIME:20231130T140000Z
DTEND;VALUE=DATE-TIME:20231130T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/35
DESCRIPTION:Title: Model Reduction of Complex Systems\nby Dr Hong Duong (Unive
rsity of Birmingham) as part of (ED-3S) Essex Data Science Seminar Series\
n\nLecture held in STEM 3.1.\n\nAbstract\nComplex systems in nature and in
applications (such as molecular systems\, crowd dynamics\, swarming\, opi
nion formation\, just to name a few) are often described by systems of sto
chastic differential equations (SDEs) and partial differential equations (
PDEs). It is often analytically impossible or computationally prohibitivel
y expensive to deal with the full models due to their high dimensionality
(degrees of freedom\, number of involved parameters\, etc.). It is thus of
great importance to approximate such large and complex systems by simpler
and lower dimensional ones\, while still preserving the essential informa
tion from the original model. This procedure is referred to as model reduc
tion or coarse-graining in the literature. In this talk\, I will present m
ethods for qualitative and quantitative coarse-graining of several SDEs an
d PDEs\, in the presence or absence of a scale-separation.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/35/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Yuyu Chen (University of Melbourne)
DTSTART;VALUE=DATE-TIME:20231116T130000Z
DTEND;VALUE=DATE-TIME:20231116T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/36
DESCRIPTION:Title: Diversification of infinite-mean Pareto distributions\nby D
r Yuyu Chen (University of Melbourne) as part of (ED-3S) Essex Data Scienc
e Seminar Series\n\nLecture held in Zoom.\n\nAbstract\nWe show the perhaps
surprising inequality that the weighted average of negatively dependent s
uper-Pareto random variables\, possibly caused by triggering events\, is l
arger than one such random variable in the sense of first-order stochastic
dominance. The class of super-Pareto distributions is extremely heavy-tai
led and it includes the class of infinite-mean Pareto distributions. We di
scuss several implications of this result via an equilibrium analysis in a
risk exchange market. First\, diversification of super-Pareto losses incr
eases portfolio risk\, and thus a diversification penalty exists. Second\,
agents with super-Pareto losses will not share risks in a market equilibr
ium. Third\, transferring losses from agents bearing super-Pareto losses t
o external parties without any losses may arrive at an equilibrium which b
enefits every party involved. The empirical studies show that our new ineq
uality can be observed empirically for real datasets that fit well with ex
tremely heavy tails.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/36/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Xiaochun Meng (University of Bath)
DTSTART;VALUE=DATE-TIME:20240509T130000Z
DTEND;VALUE=DATE-TIME:20240509T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/37
DESCRIPTION:Title: Angular Combining of Forecasts of Probability Distributions
\nby Dr Xiaochun Meng (University of Bath) as part of (ED-3S) Essex Data S
cience Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nWhen multi
ple forecasts are available for a probability distribution\, forecast comb
ining enables a pragmatic synthesis of the information to extract the wisd
om of the crowd. A linear opinion pool has been widely used\, whereby the
combining is applied to the probability predictions of the distributional
forecasts. However\, it has been argued that this will tend to deliver ove
rdispersed distributional forecasts\, prompting the combination to be appl
ied\, instead\, to the quantile predictions of the distributional forecast
s. Results from different applications are mixed\, leaving it as an empiri
cal question whether to combine probabilities or quantiles. In this paper\
, we present an alternative approach. Looking at the distributional foreca
sts\, combining the probability forecasts can be viewed as vertical combin
ing\, with quantile forecast combining seen as horizontal combining. Our p
roposal is to allow combining to take place on an angle between the extrem
e cases of vertical and horizontal combining. We term this angular combini
ng. The angle is a parameter that can be optimized using a proper scoring
rule. For implementation\, we provide a pragmatic numerical approach and a
simulation algorithm. Among our theoretical results\, we show that\, as w
ith vertical and horizontal averaging\, angular averaging results in a dis
tribution with mean equal to the average of the means of the distributions
that are being combined. We also show that angular averaging produces a d
istribution with lower variance than vertical averaging\, and\, under cert
ain assumptions\, greater variance than horizontal averaging. We provide e
mpirical support for angular combining using weekly distributional forecas
ts of Covid mortality.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/37/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mahendra Singh Rajpoot (University of Essex)
DTSTART;VALUE=DATE-TIME:20240125T140000Z
DTEND;VALUE=DATE-TIME:20240125T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/38
DESCRIPTION:Title: Large Language Models: A Stepping Stone for AGI!\nby Mahend
ra Singh Rajpoot (University of Essex) as part of (ED-3S) Essex Data Scien
ce Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstract\nIn the rapidly
evolving landscape of Artificial Intelligence (AI)\, Large Language Model
s (LLMs) have emerged as a transformative force\, showcasing remarkable ca
pabilities in natural language understanding and generation. This presenta
tion delves into the pivotal role that LLMs play as a stepping stone towar
ds achieving Artificial General Intelligence (AGI). We explore the fundame
ntal principles\, applications\, and underlying mechanisms that propel LLM
s while contemplating their implications for the broader goal of AGI. The
talk will navigate through recent advancements\, challenges\, and ethical
considerations in harnessing the potential of LLMs\, ultimately envisionin
g their contribution to the evolution of comprehensive artificial intellig
ence\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/38/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Yi Zhang (University of Birmingham)
DTSTART;VALUE=DATE-TIME:20240425T130000Z
DTEND;VALUE=DATE-TIME:20240425T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/40
DESCRIPTION:Title: On discounted Markov decision processes and their extensions\nby Dr Yi Zhang (University of Birmingham) as part of (ED-3S) Essex Data
Science Seminar Series\n\nLecture held in 4SW.6.28.\n\nAbstract\nThe theo
ry for discounted Markov decision processes (MDPs) has been well developed
. In this talk we review some basic results concerning their occupation me
asures\, which are convenient for the studies of optimal control problems
with constraints. After that\, we discuss the possibility of their extensi
ons to more general models (uniformly absorbing MDPs\, absorbing MDPs\, or
more general MDPs with total criteria). The studies of absorbing MDPs hav
e been active recently.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/40/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Kareemah Chopra (University of Essex)
DTSTART;VALUE=DATE-TIME:20240208T140000Z
DTEND;VALUE=DATE-TIME:20240208T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/41
DESCRIPTION:Title: [Cancelled] The Bunching Behaviour of Cows\nby Dr Kareemah
Chopra (University of Essex) as part of (ED-3S) Essex Data Science Seminar
Series\n\nLecture held in STEM 3.1.\n\nAbstract\nBunching behavior in cat
tle may occur for several reasons including enabling social interactions\,
a response to stress or danger\, or due to shared interest in resources s
uch as feeding or watering areas. There is evidence in pasture grazed catt
le that bunching may occur more frequently at higher ambient temperatures\
, possibly due to sharing of fly-load or to seek shade from the direct sun
under heat stress conditions. Here we demonstrate how bunching behavior i
s associated with higher ambient temperatures in a barn-housed UK dairy he
rd. A real-time local positioning system (RTLS) was used\, as part of a pr
ecision livestock farming (PLF) approach\, to track the spatial position a
nd activity of a commercial dairy herd (c100 cows) in a freestall barn con
tinuously at high temporal resolution for 4 mo between August and November
2014. Bunching was determined using 4 different spatial measures determin
ed on an hourly basis: herd full and core range size\, mean herd inter-cow
distance (ICD)\, and mean herd nearest neighbor distance (NND). For hourl
y mean ambient temperatures above 20°C\, the herd showed higher bunching
behavior with increasing ambient temperature (i.e.\, reduced full and core
range size\, ICD\, and NND). Aggregated space-use intensity was found to
positively correlate with localized variations in temperature across the b
arn (as measured by animal mounted sensors)\, but the level of correlation
decreased at higher ambient barn temperatures. Bunching behavior may incr
ease localized temperatures experienced by individuals and hence may be a
maladaptive behavioral response in housed dairy cattle\, which are known t
o suffer heat stress at higher temperatures. Our study is the first to use
high-resolution positional data to provide evidence of associations betwe
en bunching behavior and higher ambient temperatures for a barn-housed dai
ry herd in a temperate region (UK). Further studies are needed to explore
the exact mechanisms for this response to inform both welfare and producti
on management.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/41/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Professor Richard J. Samworth (University of Cambridge)
DTSTART;VALUE=DATE-TIME:20240229T140000Z
DTEND;VALUE=DATE-TIME:20240229T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/42
DESCRIPTION:Title: Isotonic subgroup selection\nby Professor Richard J. Samwor
th (University of Cambridge) as part of (ED-3S) Essex Data Science Seminar
Series\n\nLecture held in STEM 3.1.\n\nAbstract\nGiven a sample of covari
ate-response pairs\, we consider the subgroup selection problem of identif
ying a subset of the covariate domain where the regression function exceed
s a pre-determined threshold. We introduce a computationally-feasible appr
oach for subgroup selection in the context of multivariate isotonic regres
sion based on martingale tests and multiple testing procedures for logical
ly-structured hypotheses. Our proposed procedure satisfies a non-asymptoti
c\, uniform Type I error rate guarantee with power that attains the minima
x optimal rate up to poly-logarithmic factors. Extensions cover classifica
tion\, isotonic\nquantile regression and heterogeneous treatment effect se
ttings.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/42/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Professor Edward Rochead (Defence Science and Technology Laborator
y)
DTSTART;VALUE=DATE-TIME:20240307T140000Z
DTEND;VALUE=DATE-TIME:20240307T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/43
DESCRIPTION:Title: The Alliance for Data Science Professionals\nby Professor E
dward Rochead (Defence Science and Technology Laboratory) as part of (ED-3
S) Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\n\nAbstr
act\nThe talk will begin by introducing the Alliance\, its members and how
it was formed. It will then explain how individuals can become accredited
as Advanced Data Science Professionals and also describe the plans being
formed to accredit degrees. It is expected that the discussion would focus
on how the AfDSP can work with academic colleagues and ensure accreditati
on is attractive and meaningful to them\, and also consider how it may fee
d into the employability of graduates in relevant disciplines.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/43/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Laurel Ariane Regibeau-Rockett (Stanford University)
DTSTART;VALUE=DATE-TIME:20240321T140000Z
DTEND;VALUE=DATE-TIME:20240321T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/44
DESCRIPTION:Title: Hurricanes as heat engines\nby Laurel Ariane Regibeau-Rocke
tt (Stanford University) as part of (ED-3S) Essex Data Science Seminar Ser
ies\n\nLecture held in STEM 3.1.\n\nAbstract\nHurricanes are dangerous and
destructive atmospheric phenomena\, frequently causing loss of lives worl
dwide. Improving our understanding of hurricanes can help improve hurrican
e forecasts and projections of their response to climate change. One conce
ptual model of the hurricane\, which has supported major advancements in h
urricane science\, is the conceptualization of the hurricane as a heat eng
ine. This theoretical framework supports research at the intersection of
physics\, mathematics\, and atmospheric science. In this seminar\, we will
review this important theoretical model and some of its applications\, to
gether with possible directions of future research in this interdisciplina
ry domain.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/44/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Professor Mariachiara Di Cesare (University of Essex)
DTSTART;VALUE=DATE-TIME:20240314T140000Z
DTEND;VALUE=DATE-TIME:20240314T150000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/45
DESCRIPTION:Title: Institute of Public Health and Wellbeing opportunities to enhan
ce research for all\nby Professor Mariachiara Di Cesare (University of
Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\nLecture he
ld in STEM 3.1.\n\nAbstract\nThe IPHW\, established in 2022\, represents a
major strategic innovation for the University of Essex\, bringing togethe
r our community of experts to provide pioneering leadership in the product
ion of world-class research\, knowledge exchange and impact. Working with
regional\, national\, and international partners\, the IPHW is driven by a
collective goal of creating a healthier and fairer society. During this s
eminar we will discuss the IPHW mission\, vision\, and strategy and look a
t opportunities to enhance interdisciplinary research in the field of heal
th and wellbeing with a special focus on data science.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/45/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Maria Brigida Ferraro (Sapienza University of Rome)
DTSTART;VALUE=DATE-TIME:20240530T130000Z
DTEND;VALUE=DATE-TIME:20240530T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/46
DESCRIPTION:Title: Two-mode clustering in a fuzzy setting: methods and cluster val
idity indices\nby Dr Maria Brigida Ferraro (Sapienza University of Rom
e) as part of (ED-3S) Essex Data Science Seminar Series\n\nLecture held in
STEM 3.1.\n\nAbstract\nThe aim of clustering is to find a partition of th
e rows (e.g. objects) of a data matrix based on the values assumed on a se
t of variables (columns). Two objects belong to the same cluster if the co
rresponding rows are close to each other according to a certain metric bas
ed on all the variables. However\, it can be reasonable to seek clusters s
uch that objects assigned to the same cluster are close to each other with
respect to a subset of variables. The research\ninterest can also be reve
rsed\, i.e.\, the goal is to find clusters of variables close to each othe
r in terms of a subset of objects. Standard clustering algorithms are not
adequate to accomplish these tasks. For this purpose\, two-mode clustering
methods have been introduced. Two-mode clustering consists in simultaneou
sly partitioning modes (e.g.\, objects and variables) of an observed two-m
ode data matrix.\n\nIn the literature\, two-mode clustering methods have b
een extensively studied and extended\nalong various directions. Most of th
em are based on the classical approach to clustering\, i.e.\, the objects
(or the variables) are either assigned or not to the clusters. A more powe
rful and flexible exploratory approach is represented by introducing fuzzi
ness in the clustering process. In this case\, the objects (or the variabl
es) are no longer either assigned or not to the clusters\, but belong to t
he clusters with the so-called (fuzzy) membership degrees taking values in
the interval [0\,1]. A high membership degree\, close to 1\, recognizes a
n object (or variable) strongly assigned to a cluster\, i.e.\, an object (
or variable) very close to the corresponding cluster prototype.\n\nStartin
g from the Double k-Means\, we propose a class of two-mode clustering algo
rithms in a\nfuzzy framework\, including some robust proposals\, taking in
to account that\, in this case\,\ndifferent kinds of outliers exist and sh
ould be considered.\nIn addition\, in order to evaluate the two fuzzy part
itions and to choose the optimal numbers of clusters\, new cluster validit
y indices are introduced. The proposed measures are defined in\nterms of t
he compactness within each cluster and separation between clusters. Starti
ng from\nsome well-known indices in standard fuzzy clustering\, some gener
alizations to the two-mode\ncase are addressed. The adequacy of the propos
als is checked by means of simulation and real-case studies.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/46/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Mohamed Bader (University of Portsmouth)
DTSTART;VALUE=DATE-TIME:20240627T130000Z
DTEND;VALUE=DATE-TIME:20240627T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/47
DESCRIPTION:by Dr Mohamed Bader (University of Portsmouth) as part of (ED-
3S) Essex Data Science Seminar Series\n\nLecture held in STEM 3.1.\nAbstra
ct: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/47/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Professor Guy Nason (Imperial College London)
DTSTART;VALUE=DATE-TIME:20240516T130000Z
DTEND;VALUE=DATE-TIME:20240516T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/48
DESCRIPTION:Title: Network Time Series\nby Professor Guy Nason (Imperial Colle
ge London) as part of (ED-3S) Essex Data Science Seminar Series\n\nLecture
held in STEM 3.1.\n\nAbstract\nA network time series is a multivariate ti
me series where the individual series are known to be linked by some under
lying network structure. Sometimes this network is known a priori\, and so
metimes the network has to be created\, often inferred from the multivaria
te series itself. Network time series are becoming increasingly common\, l
ong\, and collected over a large number of variables. We are particularly
interested in network time series whose network structure changes over tim
e.\n\nWe describe some recent developments in the modeling of network time
series via generalized network autoregressive (GNAR) process models. Thes
e models use regular autoregressive links between a variable and its past
and between a variable and the past of its neighbours. GNAR models are hig
hly parsimonious and\, hence\, work well for short series or those afflict
ed by worrying amounts of missing data. For the same reason\, they tend no
t to overfit and often exhibit excellent forecasting performance\, especia
lly when compared to alternatives such as vector autoregressive models.\n\
nThis talk explains the GNAR model and some interesting variants. We intro
duce some new tools for model selection and exhibit their use on epidemic
and economic data.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/48/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dr Kareemah Chopra (University of Essex)
DTSTART;VALUE=DATE-TIME:20240502T130000Z
DTEND;VALUE=DATE-TIME:20240502T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/50
DESCRIPTION:Title: The Bunching Behaviour of Cows\nby Dr Kareemah Chopra (Univ
ersity of Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\n\
nAbstract\nBunching behavior in cattle may occur for several reasons inclu
ding enabling social interactions\, a response to stress or danger\, or du
e to shared interest in resources such as feeding or watering areas. There
is evidence in pasture grazed cattle that bunching may occur more frequen
tly at higher ambient temperatures\, possibly due to sharing of fly-load o
r to seek shade from the direct sun under heat stress conditions. Here we
demonstrate how bunching behavior is associated with higher ambient temper
atures in a barn-housed UK dairy herd. A real-time local positioning syste
m (RTLS) was used\, as part of a precision livestock farming (PLF) approac
h\, to track the spatial position and activity of a commercial dairy herd
(c100 cows) in a freestall barn continuously at high temporal resolution f
or 4 mo between August and November 2014. Bunching was determined using 4
different spatial measures determined on an hourly basis: herd full and co
re range size\, mean herd inter-cow distance (ICD)\, and mean herd nearest
neighbor distance (NND). For hourly mean ambient temperatures above 20°C
\, the herd showed higher bunching behavior with increasing ambient temper
ature (i.e.\, reduced full and core range size\, ICD\, and NND). Aggregate
d space-use intensity was found to positively correlate with localized var
iations in temperature across the barn (as measured by animal mounted sens
ors)\, but the level of correlation decreased at higher ambient barn tempe
ratures. Bunching behavior may increase localized temperatures experienced
by individuals and hence may be a maladaptive behavioral response in hous
ed dairy cattle\, which are known to suffer heat stress at higher temperat
ures. Our study is the first to use high-resolution positional data to pro
vide evidence of associations between bunching behavior and higher ambient
temperatures for a barn-housed dairy herd in a temperate region (UK). Fur
ther studies are needed to explore the exact mechanisms for this response
to inform both welfare and production management.\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/50/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Anusa Suwanwong (University of Essex)
DTSTART;VALUE=DATE-TIME:20240620T130000Z
DTEND;VALUE=DATE-TIME:20240620T140000Z
DTSTAMP;VALUE=DATE-TIME:20240614T055001Z
UID:Essex-DataScience/51
DESCRIPTION:Title: A Gene Selection Method for Classification with Three Classes U
sing Proportional Overlapping Scores\nby Anusa Suwanwong (University o
f Essex) as part of (ED-3S) Essex Data Science Seminar Series\n\nLecture h
eld in STEM 3.1.\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/Essex-DataScience/51/
END:VEVENT
END:VCALENDAR