# Difference between revisions of "Walkthrough on PCA through the command line"

Line 1: | Line 1: | ||

[[PCA]] computations through the command line are governed through ''PCA workflow'' objects. We describe here how to create and handle them: | [[PCA]] computations through the command line are governed through ''PCA workflow'' objects. We describe here how to create and handle them: | ||

− | + | ||

= Creation of a synthetic data set= | = Creation of a synthetic data set= | ||

Line 55: | Line 55: | ||

The correlation matrix is diagonalised. The eigenvectors are used to expressed as the particles as combinations of weights. | The correlation matrix is diagonalised. The eigenvectors are used to expressed as the particles as combinations of weights. | ||

<tt>wb.steps.items.eigentable.compute(); </tt> | <tt>wb.steps.items.eigentable.compute(); </tt> | ||

− | These weights are ordered in descending order relative to their impact on the variance of the set, ideally a particle should be represented by its few components on this basis. The weights are stored in a | + | These weights are ordered in descending order relative to their impact on the variance of the set, ideally a particle should be represented by its few components on this basis. The weights are stored in a regular ''Dynamo'' table. First eigencomponent of a particle goes into column 41. |

== Eigenvolumes == | == Eigenvolumes == | ||

Line 76: | Line 76: | ||

== Eigencomponents == | == Eigencomponents == | ||

+ | |||

+ | === Predefined exploring tools=== | ||

+ | You can use a general browser for ''Dynamo'' tables: | ||

+ | <tt>wb.exploreGUI;</tt> | ||

+ | [[ File:walkthroughPCACommandLine_generalToolForExploringTables.png |thumb|center| 600px|general tool for exploring tables]] | ||

+ | |||

+ | Advanced users: This is just a wrapper to the function <tt>dpktbl.gui</tt> applied on the eigentable produced by the workflow. | ||

+ | |||

+ | For instance, you can check how two eigencomponents relate to each other: | ||

+ | |||

+ | [[ File:walkthroughPCACommandLine_scatterTabTunedToShowCobehaviourOfFirstTwoEigencomponents.png |thumb|center| 600px|scatter tab tuned to show co-behaviour of first two eigencomponents]] | ||

+ | |||

+ | Pressing the [Scatter 2D] button will create this interactive plot | ||

+ | [[ File:walkthroughPCACommandLine_scatterplotEigencomponents.png |thumb|center| 600px|scatterplot eigencomponents]] | ||

+ | Where each point represents a different particle. | ||

+ | Right-click on it to create a "lasso", a tool to hand-draw sets of particles. | ||

+ | [[ File:walkthroughPCACommandLine_rightClickToGetTheOptionsToManuallySelectParticlesThroughALassoTool.png |thumb|center| 600px|right click to get the options to manually select particles through a lasso tool]] | ||

+ | [[ File:walkthroughPCACommandLine_lassoParticles.png |thumb|center| 200px|lasso particles]] | ||

+ | |||

+ | Right clicking on the "lassoed" particles give you the option of saving the information on the selected set of particles. | ||

+ | [[ File:walkthroughPCACommandLine_rightClickOnTheLassoToGetOptionsOnThatSubsetOfParticles.png |thumb|center| 600px|right click on the lasso to get options on that subset of particles]] | ||

+ | |||

+ | === Custom approach=== | ||

+ | You can use your own methods to visualize the eigencomponents. They can be accessed through: | ||

+ | <tt>m = w.getEigencomponents();</tt> | ||

+ | will produce a matrix <tt>m</tt> where each column represents an eigenvector and each row a particle. Thus, to see how a particular eigencomponent distributes among the particles, you can just write: | ||

+ | |||

+ | <tt>plot(m(:,i),'.');</tt> | ||

+ | |||

+ | [[ File:walkthroughPCACommandLine_firstEigencomponentOfEachParticle.png |thumb|center| 600px|first eigencomponent of each particle]] | ||

=== Series of plots=== | === Series of plots=== | ||

Line 88: | Line 118: | ||

gui.shg(); </nowiki> | gui.shg(); </nowiki> | ||

+ | this will produce ten plots (as i=1:10) collected in a single GUI. Each plot is called a "frame", and you can view them sequentially or in sets, just play with the layout given by the rows and columns, and use the [Refresh] icon on the top left of the GUI. | ||

+ | |||

+ | [[ File:walkthroughPCACommandLine_slidingMontageOnTheDistributionOfSeveralEigencomponent.png |thumb|center| 600px|sliding montage on the distribution of several eigencomponent]] | ||

=== Series of histograms === | === Series of histograms === | ||

− | + | [[ File:walkthroughPCACommandLine_slidingMontageOfSetsOfEigencomponent.png |thumb|center| 600px|sliding montage of sets of eigencomponent]] | |

== Eigenvolumes == | == Eigenvolumes == | ||

<tt>eigSet=wb.getEigenvolume(1:30);</tt> | <tt>eigSet=wb.getEigenvolume(1:30);</tt> | ||

+ | This creates a cell array (arbitrarily called <tt>eigSet</tt>). We can visualise it through: | ||

<tt>mbvol.groups.montage(eigSet); </tt> | <tt>mbvol.groups.montage(eigSet); </tt> | ||

+ | [[ File:walkthroughPCACommandLine_eigenvolumes.png |thumb|center| 600px|eigenvolumes ]] | ||

+ | |||

+ | This plot is showing the true relative intensity of the eigenvolumes. In order to compare them, we can show the normalised eigenvolumes instead: | ||

<tt>mbvol.groups.montage(dynamo_normalize_roi(eigSet)); </tt> | <tt>mbvol.groups.montage(dynamo_normalize_roi(eigSet)); </tt> | ||

− | + | [[ File:walkthroughPCACommandLine_eigenvolumesAfterNomrmalization.png |thumb|center| 600px|eigenvolumes after nomrmalization]] | |

==Correlation of tilts== | ==Correlation of tilts== | ||

It is a good idea to check if some eigenvolumes correlate strongly with the tilt. | It is a good idea to check if some eigenvolumes correlate strongly with the tilt. | ||

<tt>wb.show.correlationEigenvectorTilt(1:10) </tt> | <tt>wb.show.correlationEigenvectorTilt(1:10) </tt> | ||

+ | [[ File:walkthroughPCACommandLine_correlationOfEigencomponensWithTilt.png |thumb|center| 600px|correlation of eigencomponens with tilt]] | ||

+ | |||

+ | Remember that each particle is accessible through right clicking on it. | ||

+ | [[ File:walkthroughPCACommandLine_rightClickOnEachPointToAccessTheParticle.png |thumb|center| 600px|right click on each point to access the particle]] | ||

+ | |||

In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge" | In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge" | ||

==TSNE reduction== | ==TSNE reduction== | ||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

− | |||

[[ File:walkthroughPCACommandLine_tsneClustering.png |thumb|center| 600px|tsne clustering]] | [[ File:walkthroughPCACommandLine_tsneClustering.png |thumb|center| 600px|tsne clustering]] | ||

[[ File:walkthroughPCACommandLine_rightClickOnTheAxesForAnAutomatedClustering.png |thumb|center| 600px|right click on the axes for an automated clustering]] | [[ File:walkthroughPCACommandLine_rightClickOnTheAxesForAnAutomatedClustering.png |thumb|center| 600px|right click on the axes for an automated clustering]] |

## Revision as of 16:51, 3 April 2020

PCA computations through the command line are governed through *PCA workflow* objects. We describe here how to create and handle them:

# Creation of a synthetic data set

dtutorial ttest128 -M 64 -N 64 -linear_tags 1 -tight 1

This generates a set of 128 particles where 64 are slightly closer than the other 64. The particle subtomogram are randomly oriented, but the alignment parameters are known.

# Creation of a workflow

## Input elements

The input of a PCA workflow are:

- a set of particles (called
*data container*in this article) - a table that expreses the alignment
- a mask that indicates the area of each alignment particle that will be taken into account during the classification procedure.

### Data

dataFolder = 'ttest128/data';

### Table

tableFile = 'ttest128/real.tbl';

### Mask

We create a cylindrical mask with the dimensions of the particles (40 pixels) mask = dcylinder([20,20],40);

### Syntax

We decide a name for the workflow itself, for instance

name = 'classtest128';

Now we are ready to create the workflow:

wb = dpkpca.new(name,'t',tableFile,'d',dataFolder,'m',mask);

This creates an workflow object (arbitrarily called wb in the workspace during the current session). It also creates a folder called classtest128.PCA where results will be stored as they are produced.

## Mathematical parameters

The main parameters that can be chosen in this area are:

- bandpass
- symmetry
- binning level (to accelerate the computations)

## Computational parameters

The main burden of the PCA computation is the creation of the cross correlation matrix.

### Computing device

PCA computations can be run on GPUs of on CPUs, in both cases in parallel.

### Size of parallel blocks

The

# Running

In this workflow we run the steps one by one to discuss them. In real workflows, you can use the `run` methods to just launch all steps sequentially.

## Prealigning

wb.steps.items.prealign.compute();

## Correlation matrix

All pairs of correlations are computed in blocks, as described above

wb.steps.items.ccmatrix.compute();

## Eigentable

The correlation matrix is diagonalised. The eigenvectors are used to expressed as the particles as combinations of weights.

wb.steps.items.eigentable.compute();

These weights are ordered in descending order relative to their impact on the variance of the set, ideally a particle should be represented by its few components on this basis. The weights are stored in a regular *Dynamo* table. First eigencomponent of a particle goes into column 41.

## Eigenvolumes

The eigenvectors are expressed as three=dimensional volumes.

wb.steps.items.eigenvolumes.compute();

## TSNE reduction

TSNE remaps the particles into 2D maps which can be visualised and operated interactively.

wb.steps.items.tsene.compute();

# Visualization

Computed elements have been stored in the workflow folder. Some of them () can be directly access through workflow tools.

## Correlation matrix

`m=wb.getCCMatrix();`

`figure;dshow(cmm);h=gca();h.YDir = 'reverse'; `

## Eigencomponents

### Predefined exploring tools

You can use a general browser for *Dynamo* tables:

wb.exploreGUI;

Advanced users: This is just a wrapper to the function `dpktbl.gui` applied on the eigentable produced by the workflow.

For instance, you can check how two eigencomponents relate to each other:

Pressing the [Scatter 2D] button will create this interactive plot

Where each point represents a different particle. Right-click on it to create a "lasso", a tool to hand-draw sets of particles.

Right clicking on the "lassoed" particles give you the option of saving the information on the selected set of particles.

### Custom approach

You can use your own methods to visualize the eigencomponents. They can be accessed through:

m = w.getEigencomponents();

will produce a matrix `m` where each column represents an eigenvector and each row a particle. Thus, to see how a particular eigencomponent distributes among the particles, you can just write:

plot(m(:,i),'.');

### Series of plots

To check all the eigencomponents, it is a good idea to do some scripting. The script below uses a handy *Dynamo* trick to create several plots in the same figure.

gui = mbgraph.montage(); for i=1:10 plot(m(:,i),'.','Parent',gui.gca); % gui.gca captures the gui.step; end gui.first(); gui.shg();

this will produce ten plots (as i=1:10) collected in a single GUI. Each plot is called a "frame", and you can view them sequentially or in sets, just play with the layout given by the rows and columns, and use the [Refresh] icon on the top left of the GUI.

### Series of histograms

## Eigenvolumes

eigSet=wb.getEigenvolume(1:30);

This creates a cell array (arbitrarily called `eigSet`). We can visualise it through:

mbvol.groups.montage(eigSet);

This plot is showing the true relative intensity of the eigenvolumes. In order to compare them, we can show the normalised eigenvolumes instead:

mbvol.groups.montage(dynamo_normalize_roi(eigSet));

## Correlation of tilts

It is a good idea to check if some eigenvolumes correlate strongly with the tilt.

wb.show.correlationEigenvectorTilt(1:10)

Remember that each particle is accessible through right clicking on it.

In this plot, each point represents a particle in your data set. We see that in this particular experiment, eigencomponent 3 seems to have been "corrupted by the missing wedge"