Last updated: 2020-11-05

Checks: 7 0

Knit directory: ebpmf_data_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200511) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 1efe752. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/ebpmf_bg_tutorial_cache/
    Ignored:    analysis/ebpmf_wbg_model_intro_cache/
    Ignored:    analysis/ebpmf_wbg_simulate_big_data2_cache/
    Ignored:    analysis/ebpmf_wbg_simulate_big_data3_cache/
    Ignored:    analysis/ebpmf_wbg_simulate_big_data_cache/
    Ignored:    analysis/ebpmf_wbg_simulation_cache/
    Ignored:    analysis/investigate_np_ebpmf_wbg_cache/
    Ignored:    analysis/pmf_greedy_experiment_cache/
    Ignored:    analysis/sla_data_analysis_k10_cache/
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store
    Ignored:    topicView-app/.DS_Store

Untracked files:
    Untracked:  analysis/draft.Rmd
    Untracked:  analysis/draft2.Rmd
    Untracked:  analysis/ebpmf_wbg_simulation_big.Rmd
    Untracked:  analysis/ebpmf_wbg_simulation_big2_more.Rmd
    Untracked:  analysis/heatmap.Rmd
    Untracked:  analysis/investigate_largeK.Rmd
    Untracked:  analysis/investigate_news_topics.Rmd
    Untracked:  analysis/summary_sla_news_nips.Rmd
    Untracked:  analysis/test.R
    Untracked:  data/sim/init_random.sim_bg_block_n1100_p2100_K50.Rds
    Untracked:  output/sim/v0.4.5/exper2/init_random.sim_bg_block_n1100_p2100_K50.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter100_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter10_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter10_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter10_init_random2.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter10_pmf_bg_K50_maxiter10_from_truth_scaled.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter10_pmf_bg_K50_maxiter10_from_truth_scaled0.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter10_pmf_bg_K50_maxiter10_from_truth_scaled1.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter1_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter20_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter2_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter30_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter3_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter40_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter4_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter50_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5_pmf_bg_K50_maxiter10_from_truth_scaled.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5_pmf_bg_K50_maxiter10_from_truth_scaled0.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5_pmf_bg_K50_maxiter10_from_truth_scaled1.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter60_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter6_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter70_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter7_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter80_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter8_from_truth.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter90_init_random.Rds
    Untracked:  output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter9_from_truth.Rds
    Untracked:  script/Rplots.pdf
    Untracked:  script/init_ebpmf_wbg_from_pmf_bg.R
    Untracked:  script/init_ebpmf_wbg_random.R
    Untracked:  script/save_volcano_plot.R
    Untracked:  topicView-app/app_utils.R
    Untracked:  topicView-app/data/
    Untracked:  topicView-app/output/
    Untracked:  topicView-app/rsconnect/

Unstaged changes:
    Modified:   analysis/ebpmf_wbg_simulate_big_data2.Rmd
    Deleted:    analysis/sla_data_analysis_k10.Rmd
    Deleted:    analysis/sla_data_analysis_k5.Rmd
    Deleted:    analysis/sla_data_analysis_k50.Rmd
    Modified:   code/misc.R
    Modified:   code/util.R
    Deleted:    data/SLA/SCC2016/Code/APL/compCM.m
    Deleted:    data/SLA/SCC2016/Code/APL/compMuI.m
    Deleted:    data/SLA/SCC2016/Code/APL/compParamErr2.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl4c.m
    Deleted:    data/SLA/SCC2016/Code/APL/cplEstimParam.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl_basic_demo_PJ.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl_demo.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl_demo2a.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcBlkMod.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcBlkMod2.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcBlkMod3.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcbm_nmi_beta_D.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcbm_nmi_lambda_D.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcbm_time_vs_n_D.m
    Deleted:    data/SLA/SCC2016/Code/APL/genDCBlkMod.c
    Deleted:    data/SLA/SCC2016/Code/APL/genDCBlkMod.mexa64
    Deleted:    data/SLA/SCC2016/Code/APL/genDCBlkMod2.m
    Deleted:    data/SLA/SCC2016/Code/APL/initLabel5b.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/ProfileLike.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/calCri1.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/calCri2.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/mutiExp.m
    Deleted:    data/SLA/SCC2016/Code/MatlabCode.m
    Deleted:    data/SLA/SCC2016/Code/NewmanSM/NewmanSM.m
    Deleted:    data/SLA/SCC2016/Code/coauthorThresh2GiantAdj.txt
    Deleted:    data/SLA/SCC2016/Code/coauthorThresh2GiantCommLabelK2Matlab.txt
    Deleted:    data/SLA/SCC2016/Code/functions.R
    Deleted:    data/SLA/SCC2016/Code/main.R
    Deleted:    data/SLA/SCC2016/Data/authorList.txt
    Deleted:    data/SLA/SCC2016/Data/authorPaperBiadj.txt
    Deleted:    data/SLA/SCC2016/Data/paperCitAdj.txt
    Deleted:    data/SLA/SCC2016/Data/paperList.txt
    Deleted:    data/SLA/SCC2016/ReadMe.txt
    Modified:   data/sim/docword.sim_bg_block_n1100_p2100_K50.txt
    Deleted:    data/sim/init.sim_bg_block_n1100_p2100_K50.Rds
    Modified:   data/sim/truth.sim_bg_block_n1100_p2100_K50.Rds
    Deleted:    data/uci_BoW.sh
    Deleted:    data/uci_BoW/docword.kos.txt
    Deleted:    data/uci_BoW/readme.txt
    Deleted:    data/uci_BoW/vocab.kos.txt
    Deleted:    output/sim/v0.4.5/fit_sim_bg_block_n1100_p2100_K50_ebpmf_wbg_maxiter_5000.Rout
    Deleted:    output/sim/v0.4.5/fit_sim_bg_block_n1100_p2100_K50_ebpmf_wbg_maxiter_5000_from_truth.Rout
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter3.Rds
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5000.Rds
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5000_from_truth.Rds
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter50_from_truth2.Rds
    Deleted:    output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_500.Rout
    Deleted:    output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K2_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K300_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K500_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K300_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K500_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter5.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter100.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter200.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter300.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter400.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter600.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter700.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter800.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter900.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K100_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K20_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K300_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K500_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K50_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter5.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter100.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter200.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter300.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter400.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter600.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter700.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter800.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter900.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K3_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K100_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K20_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K300_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K3_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K500_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K50_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K50_maxiter500.Rds
    Modified:   topicView-app/app.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/ebpmf_wbg_simulation_big2_2.Rmd) and HTML (docs/ebpmf_wbg_simulation_big2_2.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 1efe752 zihao12 2020-11-05 ebpmf_wbg_simulation_big2_2.Rmd

Introduction

  • What I did:
    • I did experiment with ebpmf-wbg using bigger simulated dataset (\(n = 1100, p = 2100, K = 50\)).
    • The signal from \(L, F\) is low compared to from \(l_0 L, f_0 F\). \(\text{Cor}(L_{k}, L_{k^{'}}) \approx - 0.009\) but \(\text{Cor}(l_0 L_{k}, f_0 L_{k^{'}}) \approx - 0.9\)
    • I compared pmf_bg and ebpmf_wbg on this dataset. (Note pmf_bg has the same objective as pmf, but that I separate \(L\) into \(l_0, L\) and update them separately (same for \(F\)). Or it is just like ebpmf_wbg without the adaptive shrinkage part)
  • What I found:
    • Because the signal from deviation \(L, F\) is too low, both methods have solutions that ignore the signal but achieves better objective value
    • Even when initialized from truth, pmf-bg learns messy structure.
    • When initialized sufficiently close to the truth, ebpmf-wbg recovers the structure well, and the estimate of g also makes sense.
rm(list = ls())
knitr::opts_chunk$set(message = FALSE, warning = FALSE, autodep = TRUE)
library(ggplot2)
library(gridExtra)
library(Matrix)
source("code/misc.R")
source("code/util.R")
data_dir = "output/sim/v0.4.5/exper2"
data_name = "sim_bg_block_n1100_p2100_K50"

Load data and models

## load data
X = read_sim_bag_of_words(sprintf("%s/docword.%s.txt", data_dir, data_name))
truth = readRDS(sprintf("%s/truth.%s.Rds", data_dir, data_name))
n = nrow(X); p = ncol(X); K = ncol(truth$L)

## load wbg models
wbg_from_truth = load_model_ebpmf(data_dir = data_dir, data_name = data_name, 
                                      method_name = "ebpmf_wbg_K50_maxiter5000_from_truth")
wbg_from_pmf_truth = load_model_ebpmf(data_dir = data_dir, data_name = data_name, 
                                  method_name="ebpmf_wbg_K50_maxiter500_pmf_bg_K50_maxiter10_from_truth_scaled0")
wbg_from_pmf_truth_scaled = load_model_ebpmf(data_dir = data_dir, data_name = data_name, 
                                  method_name="ebpmf_wbg_K50_maxiter500_pmf_bg_K50_maxiter10_from_truth_scaled0")

wbg_from_random = load_model_ebpmf(data_dir = data_dir, data_name = data_name, 
                                  method_name="ebpmf_wbg_K50_maxiter100_init_random")

## load pmf models
pmf_bg_from_truth = load_model_pmf(data_dir = data_dir, data_name = data_name, 
                                  method_name = "pmf_bg_K50_maxiter1000_from_truth")

## load pmf models
pmf_bg_from_truth_iter10 = load_model_pmf(data_dir = data_dir, data_name = data_name, 
                                  method_name = "pmf_bg_K50_maxiter10_from_truth")

What does the data look like

  • The data is generated in bigger_simulated_dataset.
  • Data model is \(X \sim \text{Pois}(\Lambda); \Lambda_{ij} = l_{i0} f_{j0} \sum_k l_{ik} f_{jk}\). We call
    • \(l_0, f_0\) background frequency for loading and factor
    • \(L, F\) deviation for loading and factor
    • \(\tilde{\Lambda} := L F^t\) deviation for the mean
  • \(n = 1100, p = 2100, K = 50\)
  • The last 100 words and documents are frequent words/docs
  • For each \(k = 1...K\), it has 20 top words and 10 top documents (100 times more deviation). I arrange them so that \(\tilde{\Lambda}\) has block structures: with \(K\) blocks from \(n = 1...1000, p = 1:2000\), and a block for frequent words/documents at the end.
  • The signal from frequent words is much stronger than those from top words/documents (in \(X\), the block for frequent word has value \(> 10\) times of that in block for top words/documents). Note that this seems to be intrinsic to this type of data: if the amplifying factor for background frequent words, and for top words/documents are the same, the signal from the background is \(K\) times stronger.

Truth (\(l_0, f_0, L, F\) and deviation matrix)

par(mfrow = c(2,2))
plot(truth$l0, log = "y", main = "l0 (truth)")
plot(truth$f0, log = "y", main = "f0 (truth)")

k = 12
plot(truth$L[,k], log = "y", main = sprintf("%dth loading", k))
plot(truth$F[,k], log = "y", main = sprintf("%dth factor", k))

Deviation matrix (block for top words and docs)

image(truth$L[1:50,] %*% t(truth$F[1:100,]), main = "deviation matrix (one block)")

Show some blocks in \(X\)

with topic words/documents, plus neghbors

X[1:15, 1:30]
15 x 30 sparse Matrix of class "dgCMatrix"
                                                                 
 [1,] 1 . . 1 . . . 3 . . 2 1 . . 1 1 . . 1 . . . . . . . . . . .
 [2,] . 1 1 . 2 1 2 . . 1 . . . 2 . 1 . 2 . 1 . . . . . . . . . .
 [3,] . 1 . . . . 1 . 1 1 1 . 3 . . . . . 3 . . . . . . . . . . .
 [4,] . 2 2 1 . 1 . . . 1 1 . . 2 1 . 3 . . 2 . . . . . . . . . .
 [5,] 2 1 1 . 1 . . 1 1 1 . . 1 . . . . 1 . . . . . . . . . . . .
 [6,] 2 . 1 . 2 1 . 1 . 1 1 . 1 . . 1 1 1 2 . . . . . . . . . . .
 [7,] 1 . . 1 . 1 . . . . . . . . . . 1 . . . . . . . . . . . . .
 [8,] 2 3 . . . 1 . 1 . . 1 . . . 1 . . . 1 . . . . . . . . . . .
 [9,] 1 . . 1 . . 1 . 1 1 . . 1 1 . 1 2 . . . . . . . . . . . . .
[10,] 1 . 1 . . 1 . 1 1 . . 2 . . . . 2 . . 1 . . . . . . . . . .
[11,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[12,] . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . .
[13,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[14,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[15,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

with frequent words/documents

X[(n-10):n, (p-20):p]
11 x 21 sparse Matrix of class "dgCMatrix"
                                                                    
 [1,] 16 16 22 25 16 26 29 15 27 27 22 16 19 19 18 22 26 32 20 24 18
 [2,] 20 16 34 23 28 35 31 22 30 24 21 28 29 19 20 20 29 31 12 20 19
 [3,] 17 14 33 31 17 31 21 22 18 26 28 27 22 29 15 21 30 25 25 29 17
 [4,] 23 22 20 17 19 27 18 12 23 27 34 18 16 14  9 15 24 23 12 13 24
 [5,] 27 26 24 21 22 29 23 19 38 31 25 21 26 20 21 30 16 35 25 18 30
 [6,] 15 19 16 26 11 21 29 16 25 31 23 26 26 19 28 15 37 34 22 27 17
 [7,] 20 17 25 27 18 24 24 34 34 28 22 17 28 29 23 27 26 32 25 22 16
 [8,] 13 14 21 18 14 15  8 13 25 16 12 30 13  8 20 19 26 16 18 16 17
 [9,] 16 10 29 28 24 25 25 16 31 25 13 19 14 18 20 24 24 25 16 16 24
[10,] 20 15 36 30 29 23 26 26 34 25 25 29 16 25 16 19 30 32 25 24 21
[11,] 12 16 25 24 13 25 22 26 23 25 19 15 23 23 12 11 23 29 21 20 14

with frequent words and ordinary documents

X[(n-10):n, 1:20]
11 x 20 sparse Matrix of class "dgCMatrix"
                                             
 [1,] 2 1 . 1 2 . 2 . 2 . . . . . . 1 . 2 1 3
 [2,] 1 3 . . 1 . 1 1 . 1 2 . . 1 1 . 1 . 1 .
 [3,] 2 3 1 2 1 . . . 1 . . . . . . 1 1 . 1 1
 [4,] . 1 2 . 2 . 1 1 1 2 1 . . . . 1 2 1 1 1
 [5,] 3 1 1 2 . . . . . . 1 . . 1 . 2 1 . 1 .
 [6,] 1 1 . 1 2 . 3 . 3 . . . 1 . 1 . 1 . . 1
 [7,] . 1 1 1 1 1 1 3 . 1 . 1 . . . . 1 . . 2
 [8,] 1 . . . . . . . 1 . . 1 . 1 1 . . . . .
 [9,] . . 2 2 1 . 1 1 . . 1 . 1 . . . . 1 . 1
[10,] . . . 1 1 . 1 2 1 . . . . 1 2 1 1 . . 1
[11,] . . 2 . 3 2 . 2 . . . 1 4 1 . . . . . 1

with top words and ordinary documents

X[20:30, 1:20]
11 x 20 sparse Matrix of class "dgCMatrix"
                                             
 [1,] . . . . . . . . . . . . . . . . . . . .
 [2,] . . . . . . . . . . . . . . . . . . . .
 [3,] . . . . . . . . . . . . . . . . . . . .
 [4,] . . . . . . . . . . . . . . . . . . . .
 [5,] . . . . . . . . . . . . . . . . . . . .
 [6,] . . . . . . . . . . . . . . . . . . . .
 [7,] . 1 . . . . . . . . . . . . . . . . . .
 [8,] . . . . . . . . . . . . . . . . . . . .
 [9,] . . . . . . . . . . . 1 . . . . . . . .
[10,] . . . . . . . . . . . . . . . . . . . .
[11,] . . . . . . . . . . . . . . . . . . . .

with ordinary documents and ordinary words

X[50:60, 50:70]
11 x 21 sparse Matrix of class "dgCMatrix"
                                               
 [1,] . . . . . . . . . . . . . . . . . . . . .
 [2,] . . . . . . . . . . . . . . . . . . . . .
 [3,] . . . . . . . . . . . . . . . . . . . . .
 [4,] . . . . . . . . . . . . . . . . . . . . .
 [5,] . . . . . . . . . . . . . . . . . . . . .
 [6,] . . . . . . . . . . . . . . . . . . . . .
 [7,] . . . . . . . . . . . . . . . . . . . . .
 [8,] . . . . . . . . . . . . . . . . . . . . .
 [9,] . . . . . . . . . . . . . . . . . . . . .
[10,] . . . . . . . . . . . . . . . . . . . . .
[11,] . . . . . . . . . . . . . . . . . . . . .

pmf_bg from truth

It gets more messy results (why?)

par(mfrow = c(2,2))
k = 13
plot(pmf_bg_from_truth_iter10$L[,k], main = sprintf("10th iter: %dth loading", k), ylab = "loading")
plot(pmf_bg_from_truth$L[,k], main = sprintf("1000th iter: %dth loading", k), ylab = "loading")

plot(pmf_bg_from_truth_iter10$F[,k], main = sprintf("10th iter: %dth factor", k), ylab = "factor")
plot(pmf_bg_from_truth$F[,k], main = sprintf("1000th iter: %dth factor", k), ylab = "factor")

ebpmf_wbg from truth

Posterior mean for \(L, F\) are good

par(mfrow = c(2,2))
k = 13
plot(wbg_from_truth$qg$qls_mean[,k], log = "y", main = sprintf(" %dth loading", k), ylab = "loading")
plot(wbg_from_truth$qg$qfs_mean[,k], log = "y", main = sprintf(" %dth factor", k), ylab = "factor")

k = 29
plot(wbg_from_truth$qg$qls_mean[,k], log = "y", main = sprintf(" %dth loading", k), ylab = "loading")
plot(wbg_from_truth$qg$qfs_mean[,k], log = "y", main = sprintf(" %dth factor", k), ylab = "factor")

The prior g makes sense:
* g has weights on two components, one with small \(\phi\), the other large \(\phi\) * the weights of big \(\phi\) almost equal the proportion of top words/documents for \(L, F\).

## pi = 0.01 for phi = 100, and 0.99 for phi = 0.001 (truth: around 0.01 are top doc)
g = wbg_from_truth$qg$gls
Pi_L = get_prior_summary(g, log10 = TRUE, return_matrix = TRUE)

## pi around 0.01 for phi = 100, and 0.99 for phi = 0.001 (truth: around 0.01 are top words)
g = wbg_from_truth$qg$gfs
Pi_F = get_prior_summary(g, log10 = TRUE, return_matrix = TRUE)

ebpmf_wbg from close to truth

Above we see pmf_bg gets messy when initialized from the truth. I use that pmf_bg of 10 iterations as initialization for ebpmf-wbg (also tried 1000 iteration pmf_bg for initialization but not very good)

Posterior mean for \(L, F\): make a few mistakes, due to initialization

par(mfrow = c(2,2))
k = 19
plot(pmf_bg_from_truth_iter10$L[,k], main = sprintf("init: %dth loading", k), ylab = "loading")
plot(wbg_from_pmf_truth$qg$qls_mean[,k], main = sprintf("wbg: %dth loading", k), ylab = "loading")

plot(pmf_bg_from_truth_iter10$F[,k], main = sprintf("init iter: %dth factor", k), ylab = "factor")
plot(wbg_from_pmf_truth$qg$qfs_mean[,k], main = sprintf("wbg: %dth factor", k), ylab = "factor")

The prior g still mostly makes sense. The proportions are mostly good.
(Note topic 15 and 29 are down-weighted. Their g_F are slightly different than others)

g = wbg_from_pmf_truth$qg$gls
Pi_L = get_prior_summary(g, log10 = TRUE, return_matrix = TRUE)

g = wbg_from_pmf_truth$qg$gfs
Pi_F = get_prior_summary(g, log10 = TRUE, return_matrix = TRUE)

Whereebpmf-wbg can go wrong

  • When the initialization is not good enough, ebpmf-wbg can go wrong.

  • When initialized with \(l_0\), \(f_0\) from rank-1 model (scaled properly), and \(L, F\) uniform with mean 1, the model ignores the signal from \(L,F\) and g puts all mass on very small \(\phi\). As a result, the E-loglik is slightly worse, but the KL divergence is much smaller. This gives the model a higher ELBO than starting the truth. It converges after a couple of iterations.

  • When initialized from pmf which completely misses the structure, the final model also does not make sense, and has low ELBO (didn’t show).

compare_df = data.frame(cbind(as.numeric(wbg_from_random$summary), 
                              as.numeric(wbg_from_truth$summary),
                              as.numeric(wbg_from_pmf_truth$summary)), 
                        row.names = names(wbg_from_random$summary))
colnames(compare_df) <- c("from_random", "from_truth", "from_pmf")
round(compare_df)
             from_random from_truth from_pmf
ELBO             -364607    -370581  -371185
KL                     7      16476    17404
E_loglik         -364600    -354105  -353781
runtime_iter           9          7        8

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.7

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pheatmap_1.0.12 Matrix_1.2-17   gridExtra_2.3   ggplot2_3.3.0  
[5] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         RColorBrewer_1.1-2 compiler_3.5.1     pillar_1.4.4      
 [5] later_1.1.0.1      git2r_0.26.1       tools_3.5.1        digest_0.6.25     
 [9] lattice_0.20-38    evaluate_0.14      lifecycle_0.2.0    tibble_3.0.1      
[13] gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.6        yaml_2.2.0        
[17] xfun_0.8           withr_2.2.0        stringr_1.4.0      dplyr_0.8.1       
[21] knitr_1.28         fs_1.3.1           vctrs_0.3.0        rprojroot_1.3-2   
[25] grid_3.5.1         tidyselect_0.2.5   glue_1.4.1         R6_2.4.1          
[29] rmarkdown_2.1      purrr_0.3.4        magrittr_1.5       whisker_0.3-2     
[33] backports_1.1.7    scales_1.1.1       promises_1.1.1     htmltools_0.5.0   
[37] ellipsis_0.3.1     assertthat_0.2.1   colorspace_1.4-1   httpuv_1.5.4      
[41] stringi_1.4.3      munsell_0.5.0      crayon_1.3.4