Latent Dirichlet Allocation

LDA is located under Textual Analysis > Topic Modeling > Latent Dirichlet Allocation. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for modeling.

Properties of Latent Dirichlet Allocation

The following table shows the description of Latent Dirichlet Allocation.

Field	Description	Remark
Run	It allows you to run the node.	-
Explore	It allows you to explore the successfully executed node.	-
Vertical Ellipses	The available options are Run till node Run from node Publish as a model Publish code	-
Task Name	It is the name of the task selected on the workbook canvas.	You can click the text field to edit or modify the name of the task, as required. Space between words is not allowed in the Task Name.
Corpus	Corpus is a large and structured collection of text. It displays categorical and text columns present in the dataset.	You can select only one variable.
Number of Topics	Enter the required number of topics to be extracted from the corpus.	The default value is 5.
Advanced
Coherence Method	It evaluates the quality and interpretability of topics.	The available methods are: c_v u_mass The default method is c_v.
Topic Range	You can specify the number of topics that the model can discover and represent.	The default value is 10.
Chunk Size	It represents number of topics to be used in each training chunk.	The default value is 2000.
Passes	It refers to the number of times the entire corpus is handled during the training.	The default value is 1.
Iterations	It specifies the maximum number of iterations allowed for each pass.	The default value is 50.
Random State	It allows you to enter the number to control the random number generator used for initializing the model.	-
Alpha	It controls the sparsity of the corpus.	The Default value is alpha='symmetric', which means all topics are equally likely in the corpus.
Gamma Threshold	It allows you to control the threshold for the topic.	The default is value is 0.0001, which means topics with a probability less than 0.001 are not assigned to words in the corpus.
Decay	It allows you to control the decrease rate in learning rate in online learning.	The default value is 0.5. It means that the learning rate is half after processing each chunk.
Minimum Probability	It filters out the topics with probabilities lower than the assigned value.	The default value is 0.01 which means topics with probabilities less than 0.01 are filtered out.

Example of the Latent Dirichlet Allocation

LDA (Latent Dirichlet Allocation) is a popular method for topic modeling. In this example, we apply LDA to BBC News dataset. Before connecting the LDA to the BBC News dataset, we prepare the data using various data preparation algorithms and build the workflow. Refer to the workflow shown below:

In the Properties pane, the following values were selected:

After the successful execution of the algorithm, we obtain the following result:

The result page displays:

Coherence Scores vs. Number of Topics
List of Assigned Topics
Intertopic Distance Map
Top-30 Most Salient Terms
λ Value Table
WordCloud Chart

Related Articles
Latent Dirichlet Allocation
LDA is located under Textual Analysis > Topic Modeling > Latent Dirichlet Allocation. Use drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for modeling. Properties of Latent Dirichlet ...

Latent Dirichlet Allocation

Latent Dirichlet Allocation

Properties of Latent Dirichlet Allocation

Example of the Latent Dirichlet Allocation

Related Articles

Latent Dirichlet Allocation