Skip to content

4. Extracting the top quark mass from the b jet energy spectrum

Sebastian Wuchterl edited this page Jun 7, 2023 · 1 revision

At this point, you are familiar with everything needed to extract the top-quark mass from the b-jet energy spectrum. A small group will setup a code to calibrate the b-jet energy peak measured to the expected b-jet energy peak (from which the top-quark mass can be easily extracted) and evaluate the performance of the method, while two other groups will work on systematic uncertainties.

Calibration and fit performance

In principle, the top-quark mass could be directly extracted from the b-jet energy peak through

$m_t= E_{b,peak}+\sqrt{m_W^2 - m_b^2 + E^2_{b,peak}}$

(the Particle Data Group giving $m_W = (80.385 \pm 0.015)$ GeV and $m_b = (4.18 \pm 0.03)$ GeV.)

Compute using your fitted peak.

However, after extracting top quark mass from MC, you can see that the mass is not exactly the same as the MC generated mass.

Question: what can bias this relation?

Question: Given the top mass of the nominal sample, 172.5 GeV, what is the predicted value of $E_{b,peak}$ ?

Calibration

Different sources of bias will affect the position of the energy peak. Therefore, you need to compute the b-jet energy peak from simulations for several top-quark masses in order to precisely know the relation between the b-jet energy peak measured for the set of selection criteria previously defined to the expected b-jet energy peak. At 8 TeV, this lead to the following figure: image

The nominal samples that have been used to check the data to simulation agreement have been generated for a top-quark mass of 172.5 GeV. Samples for the top-pair process are also available for top-quark masses of 169.5 GeV and 175.5 GeV. The other samples may be considered as invariable whatever the value of the top-quark mass.

Simulation Samples

top quark mass Sample $\sigma$ [pb]
169.5 GeV /TT_TuneCUETP8M1_mtop1695_13TeV-powheg-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM 903
175.5 GeV /TT_TuneCUETP8M1_mtop1755_13TeV-powheg-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM 767

For each mass point, you may consider the possibility of completing the

fitPeak.py

script to:

  1. generate pseudo-experiments from the 'bjetenls' distribution with Poissonian fluctuations
  2. fit each pseudo-experiments to get a distribution of peak positions
  3. take the mean of all the peak positions to complete the calibration curve

Question: What is the main point of using pseudo-experiments ?

A class named TRandom3() is implemented in Root so that if you want to compute a Poissonian fluctuation for a given value x, you can simply do:

random3 = TRandom3()
random3.SetSeed(1)
fluctuation = random3.PoissonD(y*math.exp(x))/math.exp(x)

Here, y is the bin content, and x is the bin center for each bin of the bjetenls energy distribution.

Checkpoint: Your macro may look like fitNcalibrate/answers/fitPeak_pseudo.py.

When you have the values for the 3 different mass points, you can use the

peakplotter.py

macro to get the relation between the b-jet energy peak measured and the calibrated energy.

Once you have a calibration curve to correct the b-jet energy peak, you may be tempted to convert the b-jet energy peak you got from data into a top-quark mass. Do not until the end of the exercise! The Top Mass Working Group has indeed a blinding policy... and you have many other things to do before 😉

Fit performance and uncertainty

The calibration fit introduces a systematic uncertainty due to the uncertainty of the fit parameters that you need to evaluate. To assess the systematic uncertainty of the fit, the covariance between the two fit parameters is required.

You also need to evaluate the performance of the method. One way to do so is to generate pseudo-experiments (for the example, let's say $N_\mathrm{pe}=1000$) from expected b-jet energy distributions for several given top-quark masses (for example, let's consider 171.5 and 173.5 GeV). The number of events in each pseudo-experiment should follow a Poisson distribution whose expected value is the number of events observed in data ($N_\mathrm{evt}$). Each toy is then fitted with the PDF, and the b-jet energy peak value is corrected thanks to the calibration curve.

You get 1,000 calibrated b-jet energy peak measurements for a generated top-quark mass of 171.5 GeV, and the same for 173.5 GeV.

As a closure test, you can have a look at

$\langle E_\mathrm{b,peak}^\mathrm{calibrated} \rangle = f(E_\mathrm{b,peak}^\mathrm{predicted})$

The slope should be 1.

You can also compute the pull distribution

$E_\mathrm{b,peak}^\mathrm{calibrated}-E_\mathrm{b,peak}^\mathrm{predicted}/\Delta E_\mathrm{b,peak}^\mathrm{calibrated}$

for each predicted energy peak position i.e. each generated top-quark mass.

If the mean and width of the pull distributions, as functions of the predicted energy peak position, are flat, around respectively 1 and 0, then the method is unbiased.

It is also interesting to compute the energy peak position uncertainty as function of the predicted peak position.

Systematic uncertainties

Different sources of systematic uncertainties affect the top-quark mass measurement. They can be separated in two categories:

Experimental uncertainties:

  • The jet energy resolution is corrected in simulations to match the one observed in data (already in the input files that have been provided to you). It also has to be varied within its uncertainty, stored in "Jet_uncs[ijet][0]".
  • The jet energy scale (JES) is potentially different for data and simulations. There are several kinds of jet energy corrections (JEC), that must be varied within their uncertainty. Within CMS we can group uncertainties as "absolute scale" (1 - 3), "high pt extrapolation" (4 - 6), "time dependence" (7), "relative corrections" (8 - 19), "pileup" (20 - 25), and "flavor" (26). For combinations with other LHC experiments uncertainty #3 is referred to as the "in-situ correlation" uncertainty and #16 is referred to as the "inter-calibration" uncertainty. We will calculate 5 groups of uncertainty: 3, 16, 20-25, 26, and all others. Details can be found in the dedicated TWiki page.
  • Pileup collisions may add jets in reconstructed events. Simulations are reweighed so that the vertex multiplicity is the same in simulations and data. The weights need to be varied within their uncertainty. In the data tree, nominal/down variation/up variation are respectively stored as PUWeights.
  • The b-tagging, trigger and lepton selection efficiencies may differ between data and simulations. Scale factors are applied to simulation. They need to be varied within their uncertainty. In the data tree, nominal/down variation/up variation are respectively stored as LepSelEffWeights.
  • The calibration of the sensitivity of the b-jet energy peak to the top-quark mass is also a source of systematic uncertainty. It will be computed by the group responsible of the calibration.
  • The aim is to select top-pair events. All the other events that pass the selection criteria are considered as background. Background processes are normalized at their cross-sections, which are known with a limited accuracy. Thus, background normalizations need to be varied (by 25% for single top, 100% for W+jets, and 50% for Drell-Yan and diboson).

Theoretical uncertainties:

  • The parton distribution functions need to be varied following the PDF4LHC recommendations. A method is described in AN 2009/048.
  • The renormalization and factorization scale (Q) needs to be varied by factors 1/2 and 2.
  • Different matrix element (ME) generator often predict different kinematics. This can be taken into account by comparing the top-quark mass obtained for two different ME generators, associated to the same parton shower modeling.
  • It has been observed in several analyses at 8 and 13 TeV that the top-quark $p_T$ is not perfectly reproduced in simulations. Simulations are not corrected for top-quark measurements but they need to be reweighed so that the top-quark $p_T$ distribution matches the one observed in data in order to consider this mis-modeling as a source of systematic uncertainty. Although this method should be robust to the top-quark boost, the measurement is actually impacted by the top-quark $p_T$ uncertainty.
  • The underlying event can not be described by perturbative QCD. Generators are tuned to data with a limited accuracy. To take this into account, you need to compare different tunes. As a cross-check, you can also compare different parton shower generators.
  • The color reconnection can not be described by perturbative QCD either. Here again, you need to compare different tunes for which only color reconnection parameters vary.

Question: Which sources do you expect to be dominant ? Why ?

In the following figure, you can see the effect of the main sources of systematic uncertainty on the b-jet energy distribution in log scale: image

For each source of systematic uncertainty, uncertainties in the b-jet energy have to be added to (subtracted from) the nominal value to get two bjetenls distributions. Once two distributions are obtained, fitting can be performed as usual. The calibrated b-jet peak values will give upper and lower bound of top quark mass. Taking a difference of the upper and lower bound of the top mass and dividing it by 2, will give you an estimate on the top mass uncertainty due to the specific systematic uncertainty. This has to be repeated many times for the different sources of systematic uncertainties. If the uncertainties are not correlated to one another, they can be added in quadrature to one another to get a single uncertainty value since they are not correlated to one another.

Try to evaluate as much systematic uncertainties as possible, focusing as a priority on the most relevant ones. Several simulation samples are available to help you:

Simulation samples:

Uncertainty source Sample $\sigma$ [pb]
$Q^2$ scale (parton shower) /TT_TuneCUETP8M2T4{down,up}_13TeV-powheg-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM 831.76
ISR /TT_TuneCUETP8M2T4_13TeV-powheg-isr{down,up}-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM 831.76
FSR /TT_TuneCUETP8M2T4_13TeV-powheg-fsr{down,up}-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM 831.76

A skeleton is available in the analyzeNplot folder. Let's open systBJetEnergyPeak.py The main() is at the end, as usual, but the interesting part is the runBJetEnergyPeak function at line 14. The histograms for the different systematic variations are defined and then filled in a loop on the events.

To run this code:

python systBJetEnergyPeak.py -i /eos/user/c/cmsdas/2023/long-ex-top/ -j data/samples_Run2016_25ns.json -o systematics -n 2

As before, -i represents the input directory, -j the path for the json file, and -o the subfolder in which the output files will be saved.

You may consider the possibility of completing this code adding the histograms where they are defined

    histos={ 
        # nominal (for xcheck)
        'bjetenls_nominal':ROOT.TH1F('bjetenls_nominal',';log(E);  1/E dN_{b jets}/dlog(E)',20,3.,7.),
        # JER
        'bjetenls_jer_up':ROOT.TH1F('bjetenls_jer_up',';log(E);  1/E dN_{b jets}/dlog(E)',20,3.,7.),
        'bjetenls_jer_down':ROOT.TH1F('bjetenls_jer_down',';log(E);  1/E dN_{b jets}/dlog(E)',20,3.,7.),
        # JEC: uncorrelated group
        # JEC: in-situ correlation group
        # JEC: inter-calibration
        # JEC: pile-up
        # JEC: flavour
        # Other Systematics
        }

and the necessary lines to fill them.

Checkpoint: Your macro may look like analyzeNplot/answers/systBJetEnergyPeak.py.