The following table shows the description of Latent Dirichlet Allocation.
Field | Description | Remark |
Run | It allows you to run the node. | - |
Explore | It allows you to explore the successfully executed node. | - |
Vertical Ellipses | The available options are
| - |
Task Name | It is the name of the task selected on the workbook canvas. |
|
Corpus | Corpus is a large and structured collection of text. It displays categorical and text columns present in the dataset. | You can select only one variable. |
Number of Topics | Enter the required number of topics to be extracted from the corpus. | The default value is 5. |
Advanced | ||
Coherence Method | It evaluates the quality and interpretability of topics. |
|
Topic Range | You can specify the number of topics that the model can discover and represent. | The default value is 10. |
Chunk Size | It represents number of topics to be used in each training chunk. | The default value is 2000. |
Passes | It refers to the number of times the entire corpus is handled during the training. | The default value is 1. |
Iterations | It specifies the maximum number of iterations allowed for each pass. | The default value is 50. |
Random State | It allows you to enter the number to control the random number generator used for initializing the model. | - |
Alpha | It controls the sparsity of the corpus. | The Default value is alpha='symmetric', which means all topics are equally likely in the corpus. |
Gamma Threshold | It allows you to control the threshold for the topic. | The default is value is 0.0001, which means topics with a probability less than 0.001 are not assigned to words in the corpus. |
Decay | It allows you to control the decrease rate in learning rate in online learning. | The default value is 0.5. It means that the learning rate is half after processing each chunk. |
Minimum Probability | It filters out the topics with probabilities lower than the assigned value. | The default value is 0.01 which means topics with probabilities less than 0.01 are filtered out. |
LDA (Latent Dirichlet Allocation) is a popular method for topic modeling. In this example, we apply LDA to BBC News dataset. Before connecting the LDA to the BBC News dataset, we prepare the data using various data preparation algorithms and build the workflow. Refer to the workflow shown below:
In the Properties pane, the following values were selected:
After the successful execution of the algorithm, we obtain the following result:
The result page displays: