{"id":471,"date":"2020-05-01T09:30:00","date_gmt":"2020-05-01T00:30:00","guid":{"rendered":"https:\/\/gri.jp\/media\/entry\/471"},"modified":"2023-06-13T14:49:21","modified_gmt":"2023-06-13T05:49:21","slug":"r-tabulizer-purrr","status":"publish","type":"post","link":"https:\/\/gri.jp\/media\/entry\/471","title":{"rendered":"PDF\u306e\u4e2d\u306b\u57cb\u3081\u8fbc\u307e\u308c\u3066\u3044\u308b\u30c6\u30fc\u30d6\u30eb\u3092\u30c7\u30fc\u30bf\u30d5\u30ec\u30fc\u30e0\u3067\u53d6\u308a\u51fa\u3059\u3001R\u3067"},"content":{"rendered":"<p>PDF\u306e\u4e2d\u306b\u57cb\u3081\u8fbc\u307e\u308c\u3066\u3044\u308b\u30c6\u30fc\u30d6\u30eb\u304b\u3089\u30c7\u30fc\u30bf\u3092\u629c\u304d\u51fa\u3057\u305f\u3044\u3068\u3044\u3046\u72b6\u6cc1\u3063\u3066\u983b\u7e41\u306b\u3042\u308b\u3068\u601d\u3044\u307e\u3059\u3002\u307e\u3041\u3082\u3057\u3082\u305d\u3093\u306a\u72b6\u6cc1\u306f\u4e00\u5ea6\u3082\u306a\u304b\u3063\u305f\u3068\u3057\u3066\u3082\u3001\u305d\u308c\u304c\u7c21\u5358\u306b\u3067\u304d\u308b\u3068\u3044\u3046\u3053\u3068\u3067\u305b\u3063\u304b\u304f\u306a\u306e\u3067\u3061\u3087\u3063\u3068\u8a66\u3057\u3066\u307f\u3088\u3046\u3068\u601d\u3044\u307e\u3059\u3002R\u3092\u4f7f\u3044\u307e\u3059\u3002<\/p>\n<p>\u304d\u3063\u304b\u3051\u306f\u3001Exploratory\u897f\u7530\u3055\u3093\u306e\u3053\u306e\u30c4\u30a4\u30fc\u30c8\u3067\u3059\u3002tabulizer\u3068\u3044\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u4f7f\u3046\u3088\u3046\u3067\u3059\u3002<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">In R, these 5 lines are all you need to extract table data from 85 pages long PDF. <a href=\"https:\/\/twitter.com\/hashtag\/rstats?src=hash&amp;ref_src=twsrc%5Etfw\">#rstats<\/a><\/p>\n<p>library(tabulizer)<br \/>library(purer)<br \/>library(data.table)<\/p>\n<p>pdf &lt;- extract_tables(&quot;file.pdf&quot;)<br \/>df &lt;- map(pdf, function(x){<a href=\"https:\/\/t.co\/Bz4BWtBUrJ\">https:\/\/t.co\/Bz4BWtBUrJ<\/a>.frame(x)})<br \/>rbindlist(df, fill=TRUE)<\/p>\n<p>&mdash; Kan Nishida \ud83c\uddfa\ud83c\uddf8\u2764\ufe0f\ud83c\uddef\ud83c\uddf5 (@KanAugust) <a href=\"https:\/\/twitter.com\/KanAugust\/status\/1252380559449309186?ref_src=twsrc%5Etfw\">April 20, 2020<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p><!--more--><\/p>\n<p>\u3053\u306e\u8a18\u4e8b\u3082\u53c2\u8003\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<div class=\"linkcard\"><table border=\"1\" cellspacing=\"0\" cellpadding=\"4\"><tbody><\/tr><tr><td>\u4eca\u5e74\u306e3\u6708\u3054\u308d\u304b\u3089\u4f55\u3084\u3089Cookpad\u304c\u9a12\u304c\u3057\u3044\u3067\u3059\u306d\u3002IT\u696d\u754c\u306b\u8db3\u3092\u7f6e\u3044\u3066\u3044\u308c\u3070\u3001\u3053\u306e\u3088\u3046\u306a\u5642\u3092\u805e\u3044\u305f\u308a\u3059\u308b\u306e\u306f\u30011\u5ea6\u30842\u5ea6\u3067\u306f\u306a\u3044\u306e\u3067\u306f\u306a\u3044\u306e\u3067\u3057\u3087\u3046\u304b\uff1f\u4e00\u90e8\u5831\u9053\u306b\u306f\u3001\u5e79\u90e8\u793e\u54e1\u304c\u5927\u91cf\u96e2\u8131\u3057\u305f\u305f\u3081\u30b5\u30fc\u30d3\u30b9\u904b\u55b6\u304c\u56f0\u96e3\u3001\u3068\u3044\u3046\u5831\u9053\u3082...<br><a class=\"lkc-link no_icon\" href=\"https:\/\/qiita.com\/21-Hidetaka-Ko\/items\/4e8977797cbfaab081e3\" target=\"_blank\" rel=\"external noopenner\">\u300c2016\u5e74\u7248\u3053\u306eR\u30d1\u30c3\u30b1\u30fc\u30b8\u304c\u3059\u3054\u3044\u300d\u66ab\u5b9a\u7b2c\u4e00\u4f4d\u3001tabulizer\u30d1\u30c3\u30b1\u30fc\u30b8\u3092\u4f7f\u3063\u3066\u3001...<\/a> - Qiita<\/td><\/tr><\/tbody><\/table><\/div>\n<p>WHO\u306e\u65b0\u578b\u30b3\u30ed\u30ca\u30a6\u30a3\u30eb\u30b9\u306e\u30ec\u30dd\u30fc\u30c8\u304b\u3089\u30c7\u30fc\u30bf\u3092\u629c\u304d\u51fa\u3057\u3066\u307f\u3088\u3046\u3068\u601d\u3044\u307e\u3059\u3002<\/p>\n<div class=\"linkcard\"><table border=\"1\" cellspacing=\"0\" cellpadding=\"4\"><tbody><\/tr><tr><td><br><a class=\"lkc-link no_icon\" href=\"https:\/\/www.who.int\/docs\/default-source\/coronaviruse\/situation-reports\/20200424-sitrep-95-covid-19.pdf\" target=\"_blank\" rel=\"external noopenner\">https:\/\/www.who.int\/docs\/default-source\/coronaviruse\/situation-reports\/202004...<\/a> - www.who.int<\/td><\/tr><\/tbody><\/table><\/div>\n<img decoding=\"async\" class=\"hatena-fotolife aligncenter\" title=\"f:id:gri-blog:20200428214542p:plain\" src=\"\/media\/wp\/wp-content\/uploads\/2021\/08\/20200428214542.png\" alt=\"f:id:gri-blog:20200428214542p:plain\" \/>\n<p>pdf\u304b\u3089table\u306e\u90e8\u5206\u3092\u629c\u304d\u51fa\u3057\u3001\u30c7\u30fc\u30bf\u30d5\u30ec\u30fc\u30e0\u3067\u683c\u7d0d\u3059\u308b\u307e\u3067\u306f\u3001\u3082\u3046\u3053\u308c\u3060\u3051\u3067\u3059\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">library(tabulizer)\r\nlibrary(purrr)\r\ndf_combined &lt;- tabulizer::extract_tables(\"20200424-sitrep-95-covid-19.pdf\") %&gt;%\r\npurrr::map_dfr(as.data.frame)<\/pre>\n<p>\u3059\u308b\u3068\u3001\u30c7\u30fc\u30bf\u30d5\u30ec\u30fc\u30e0\u306f\u3053\u3093\u306a\u611f\u3058\u3002<br \/>\n<img decoding=\"async\" class=\"hatena-fotolife aligncenter\" title=\"f:id:gri-blog:20200428215504p:plain\" src=\"\/media\/wp\/wp-content\/uploads\/2021\/08\/20200428215504.png\" alt=\"f:id:gri-blog:20200428215504p:plain\" \/><\/p>\n<p>\u305b\u3063\u304b\u304f\u306a\u3093\u3067\u304d\u308c\u3044\u306b\u3057\u3066\u3044\u304d\u307e\u3059\u3002<br \/>\n\u307e\u305a\u306f\u6539\u884c\u30b3\u30fc\u30c9\u3084\u30d5\u30e9\u30f3\u30b9\u8a9e\uff1f\u3092\u7f6e\u63db\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">library(dplyr)\r\nlibrary(stringr)\r\ndf_combined %&gt;%\r\nmutate(V1 = str_replace(V1, \"\\r\", \" \")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00e3\", \"a\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00ed\", \"i\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00e9\", \"e\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00f4\", \"o\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00e7\", \"c\")) %&gt;%<\/pre>\n<p>\u30e9\u30aa\u30b9\u306e\u3088\u3046\u306b\u56fd\u540d\u304c\u6539\u884c\u3055\u308c\u3066\u3057\u307e\u3063\u3066\u3044\u308b\u5834\u5408\u306f\u3001\u5ff5\u306e\u70ba\uff11\u3064\u524d\u306e\u884c\u3092\u53c2\u7167\u3057\u306a\u304c\u3089\u4fee\u6b63\u3002\u30e9\u30aa\u30b9\u4ee5\u5916\u3082\u4fee\u6b63\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">  mutate(V1_prev = lag(V1, n = 1)) %&gt;%  # \uff11\u3064\u524d\u306e\u884c\u306b\u5165\u3063\u3066\u3044\u305f\u6587\u5b57\u5217\r\nmutate(V1 = ifelse(V1_prev == \"Lao People's\" &amp; V1 == \"Democratic Republic\", \"Lao People's Democratic Republic\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"(Commonwealth of\" &amp; V1 == \"the)\", \"Northern Mariana Islands (Commonwealth of the)\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"United Republic of\" &amp; V1 == \"Tanzania\", \"United Republic of Tanzania\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"Central African\" &amp; V1 == \"Republic\", \"Central African Republic\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"Sao Tome and\" &amp; V1 == \"Principe\", \"Sao Tome and Principe\", V1)) %&gt;%<\/pre>\n<p>\u7c21\u5358\u306a\u4fee\u6b63\u306f\u56fd\u540d\u306e\u4e00\u90e8\u3092\u5143\u306b\u3001\u6b63\u898f\u8868\u73fe\u306e\u7f6e\u63db\u3067\u5bfe\u5fdc\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">  mutate(V1 = gsub(\"Kosovo.*\", \"Kosovo\", V1)) %&gt;%\r\nmutate(V1 = gsub(\".*d\u2019Ivoire\", \"Cote d'Ivoire\", V1)) %&gt;%\r\nmutate(V1 = gsub(\".*conveyance \\\\(Diamond.*\", \"International conveyance (Diamond Princess)\", V1)) %&gt;%<\/pre>\n<p>\u56fd\u540d\u306e\u5217\u3060\u3051\u3067\u306a\u304fTransmission classification\u306e\u5217\u3082\u4fee\u6b63\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">  mutate(V6 = str_replace(V6, \"\\r\", \" \")) %&gt;%\r\nmutate(V6 = ifelse(V6 == \"transmission\", \"Community transmission\", V6)) %&gt;%<\/pre>\n<p>\u898b\u51fa\u3057\u3084\u7dcf\u8a08\u306e\u4e0d\u8981\u306a\u884c\u3092\u524a\u9664\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">  filter(V1 != \"\") %&gt;%\r\nfilter(V1 != \"Grand total\") %&gt;%\r\nfilter(V1 != \"Territory\/Area  \u2020\") %&gt;%\r\nfilter(V2 != \"\") %&gt;%<\/pre>\n<p>\u4f5c\u696d\u7528\u306b\u4f5c\u6210\u3057\u305f\u5217\u306f\u4e0d\u8981\u306a\u306e\u3067\u3001\u5143\u3005\u306e\u5217\u306e\u307f\u3092\u9078\u629e\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">  select(c(\"V1\", \"V2\", \"V3\", \"V4\", \"V5\", \"V6\", \"V7\")) %&gt;%<\/pre>\n<p>\u5217\u540d\u3092\u6539\u3081\u3066\u6b63\u3057\u304f\u8a2d\u5b9a\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">  rename(\"Reporting Country\/Territory\/Area\"=V1\r\n,\"Total confirmed cases\"=V2\r\n,\"Total confirmed new cases\"=V3\r\n,\"Total deaths\"=V4\r\n,\"Total new deaths\"=V5\r\n,\"Transmission classification\"=V6\r\n,\"Days since last reported case\"=V7)<\/pre>\n<p>\u3053\u308c\u3067\u30c7\u30fc\u30bf\u30d5\u30ec\u30fc\u30e0\u306f\u306a\u304b\u306a\u304b\u304d\u308c\u3044\u306b\u306a\u308a\u3001<br \/>\n<img decoding=\"async\" class=\"hatena-fotolife aligncenter\" title=\"f:id:gri-blog:20200428214548p:plain\" src=\"\/media\/wp\/wp-content\/uploads\/2021\/08\/20200428214548.png\" alt=\"f:id:gri-blog:20200428214548p:plain\" \/><\/p>\n<p>\u6700\u7d42\u7684\u306b\u30b3\u30fc\u30c9\u306f\u3053\u3046\u306a\u308a\u307e\u3057\u305f\u3002<\/p>\n<pre class=\"code R\" data-lang=\"R\" data-unlink=\"\">library(tabulizer)\r\nlibrary(purrr)\r\nlibrary(dplyr)\r\nlibrary(stringr)\r\ndf_combined &lt;- tabulizer::extract_tables(\"1_downloadFiles\/20200424-sitrep-95-covid-19.pdf\") %&gt;%\r\npurrr::map_dfr(as.data.frame)\r\ndf_combined &lt;- df_combined %&gt;%\r\nmutate(V1 = str_replace(V1, \"\\r\", \" \")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00e3\", \"a\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00ed\", \"i\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00e9\", \"e\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00f4\", \"o\")) %&gt;%\r\nmutate(V1 = str_replace(V1, \"\u00e7\", \"c\")) %&gt;%\r\nmutate(V1_prev = lag(V1, n = 1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"Lao People's\" &amp; V1 == \"Democratic Republic\", \"Lao People's Democratic Republic\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"(Commonwealth of\" &amp; V1 == \"the)\", \"Northern Mariana Islands (Commonwealth of the)\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"United Republic of\" &amp; V1 == \"Tanzania\", \"United Republic of Tanzania\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"Central African\" &amp; V1 == \"Republic\", \"Central African Republic\", V1)) %&gt;%\r\nmutate(V1 = ifelse(V1_prev == \"Sao Tome and\" &amp; V1 == \"Principe\", \"Sao Tome and Principe\", V1)) %&gt;%\r\nmutate(V1 = gsub(\"Kosovo.*\", \"Kosovo\", V1)) %&gt;%\r\nmutate(V1 = gsub(\".*d\u2019Ivoire\", \"Cote d'Ivoire\", V1)) %&gt;%\r\nmutate(V1 = gsub(\".*conveyance \\\\(Diamond.*\", \"International conveyance (Diamond Princess)\", V1)) %&gt;%\r\nmutate(V6 = str_replace(V6, \"\\r\", \" \")) %&gt;%\r\nmutate(V6 = ifelse(V6 == \"transmission\", \"Community transmission\", V6)) %&gt;%\r\nfilter(V1 != \"\") %&gt;%\r\nfilter(V1 != \"Grand total\") %&gt;%\r\nfilter(V1 != \"Territory\/Area  \u2020\") %&gt;%\r\nfilter(V2 != \"\") %&gt;%\r\nselect(c(\"V1\", \"V2\", \"V3\", \"V4\", \"V5\", \"V6\", \"V7\")) %&gt;%\r\nrename(\"Reporting Country\/Territory\/Area\"=V1\r\n,\"Total confirmed cases\"=V2\r\n,\"Total confirmed new cases\"=V3\r\n,\"Total deaths\"=V4\r\n,\"Total new deaths\"=V5\r\n,\"Transmission classification\"=V6\r\n,\"Days since last reported case\"=V7)\r\nwrite.csv(df_combined, \"20200424-sitrep-95-covid-19.csv\", row.names = FALSE)<\/pre>\n<p>Congo\u304c\uff12\u3064\u3042\u3063\u3066\u3001\u3042\u308cWHO\u9593\u9055\u3063\u3066\u3044\u308b\uff1f\u3068\u601d\u3063\u305f\u304c\u3001\u30b3\u30f3\u30b4\u3063\u3066\u30b3\u30f3\u30b4\u5171\u548c\u56fd\u3068\u30b3\u30f3\u30b4\u6c11\u4e3b\u5171\u548c\u56fd\u3068\uff12\u3064\u3042\u308b\u3093\u3067\u3059\u306d\u2026\u2026\u77e5\u3089\u306a\u304b\u3063\u305f\u3067\u3059\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>PDF\u306e\u4e2d\u306b\u57cb\u3081\u8fbc\u307e\u308c\u3066\u3044\u308b\u30c6\u30fc\u30d6\u30eb\u304b\u3089\u30c7\u30fc\u30bf\u3092\u629c\u304d\u51fa\u3057\u305f\u3044\u3068\u3044\u3046\u72b6\u6cc1\u3063\u3066\u983b\u7e41\u306b\u3042\u308b\u3068\u601d\u3044\u307e\u3059\u3002\u307e\u3041\u3082\u3057\u3082\u305d\u3093\u306a\u72b6\u6cc1\u306f\u4e00\u5ea6\u3082\u306a\u304b\u3063\u305f\u3068\u3057\u3066\u3082\u3001\u305d\u308c\u304c\u7c21\u5358\u306b\u3067\u304d\u308b\u3068\u3044\u3046\u3053\u3068\u3067\u305b\u3063\u304b\u304f\u306a\u306e\u3067\u3061\u3087\u3063\u3068\u8a66\u3057\u3066\u307f\u3088\u3046\u3068\u601d\u3044\u307e\u3059\u3002R<\/p>\n","protected":false},"author":22,"featured_media":9451,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,127],"tags":[],"class_list":["post-471","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-datascience","category-sns"],"acf":[],"meta_field":{"_edit_lock":["1686635260:44"],"_edit_last":["44"],"_wp_page_template":["default"],"memo":["\u306a\u3057"],"_pv_count":["a:24:{i:7;i:33;i:10;i:44;i:4;i:97;i:2;i:133;i:1;i:50;i:3;i:95;i:23;i:24;i:17;i:59;i:18;i:58;i:8;i:35;i:12;i:50;i:15;i:63;i:0;i:29;i:19;i:32;i:22;i:38;i:21;i:27;i:16;i:59;i:6;i:46;i:14;i:52;i:13;i:51;i:20;i:26;i:5;i:40;i:11;i:40;i:9;i:32;}"],"pv_count":["1213"],"hidden_toppage":["0"],"_hidden_toppage":["field_61933136630d2"],"note_url":[""],"_note_url":["field_61243c8278b90"],"_thumbnail_id":["9451"],"_oembed_7fd26ad101518165db7c9d631a614c80":["{{unknown}}"],"_oembed_73e374738cf6a361dcb414abe4739d39":["{{unknown}}"],"_oembed_73dbdd86e5b34f7ffc577848de0f75b2":["<blockquote class=\"twitter-tweet\" data-width=\"496\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">In R, these 5 lines are all you need to extract table data from 85 pages long PDF. <a href=\"https:\/\/twitter.com\/hashtag\/rstats?src=hash&amp;ref_src=twsrc%5Etfw\">#rstats<\/a><br><br>library(tabulizer)<br>library(purer)<br>library(data.table)<br><br>pdf &lt;- extract_tables(&quot;file.pdf&quot;)<br>df &lt;- map(pdf, function(x){<a href=\"https:\/\/t.co\/Bz4BWtBUrJ\">https:\/\/t.co\/Bz4BWtBUrJ<\/a>.frame(x)})<br>rbindlist(df, fill=TRUE)<\/p>&mdash; Kan Nishida \ud83c\uddfa\ud83c\uddf8\u2764\ufe0f\ud83c\uddef\ud83c\uddf5 (@KanAugust) <a href=\"https:\/\/twitter.com\/KanAugust\/status\/1252380559449309186?ref_src=twsrc%5Etfw\">April 20, 2020<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>"],"_oembed_time_73dbdd86e5b34f7ffc577848de0f75b2":["1686635329"],"_oembed_0a3bc13a07c660fe13152555638f7d09":["<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">In R, these 5 lines are all you need to extract table data from 85 pages long PDF. <a href=\"https:\/\/twitter.com\/hashtag\/rstats?src=hash&amp;ref_src=twsrc%5Etfw\">#rstats<\/a><br><br>library(tabulizer)<br>library(purer)<br>library(data.table)<br><br>pdf &lt;- extract_tables(&quot;file.pdf&quot;)<br>df &lt;- map(pdf, function(x){<a href=\"https:\/\/t.co\/Bz4BWtBUrJ\">https:\/\/t.co\/Bz4BWtBUrJ<\/a>.frame(x)})<br>rbindlist(df, fill=TRUE)<\/p>&mdash; Kan Nishida \ud83c\uddfa\ud83c\uddf8\u2764\ufe0f\ud83c\uddef\ud83c\uddf5 (@KanAugust) <a href=\"https:\/\/twitter.com\/KanAugust\/status\/1252380559449309186?ref_src=twsrc%5Etfw\">April 20, 2020<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>"],"_oembed_time_0a3bc13a07c660fe13152555638f7d09":["1686635336"],"_oembed_ba1278c26b80a9f5cc360a9c00dfe1a2":["{{unknown}}"],"_oembed_fb2ba81c917b73a625ec14518d3f5cde":["{{unknown}}"],"_oembed_4e49dc94c554f503b95e1fbf04e63b25":["<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">In R, these 5 lines are all you need to extract table data from 85 pages long PDF. <a href=\"https:\/\/twitter.com\/hashtag\/rstats?src=hash&amp;ref_src=twsrc%5Etfw\">#rstats<\/a><br><br>library(tabulizer)<br>library(purer)<br>library(data.table)<br><br>pdf &lt;- extract_tables(&quot;file.pdf&quot;)<br>df &lt;- map(pdf, function(x){<a href=\"https:\/\/t.co\/Bz4BWtBUrJ\">https:\/\/t.co\/Bz4BWtBUrJ<\/a>.frame(x)})<br>rbindlist(df, fill=TRUE)<\/p>&mdash; Kan Nishida \ud83c\uddfa\ud83c\uddf8\u2764\ufe0f\ud83c\uddef\ud83c\uddf5 (@KanAugust) <a href=\"https:\/\/twitter.com\/KanAugust\/status\/1252380559449309186?ref_src=twsrc%5Etfw\">April 20, 2020<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>"],"_oembed_time_4e49dc94c554f503b95e1fbf04e63b25":["1712348561"],"_oembed_b4327383e3ffa7709e30e45ab7f58242":["{{unknown}}"],"_oembed_fc8cfd84aa5e26070fc97535c12fd72a":["{{unknown}}"]},"_links":{"self":[{"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/posts\/471","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/comments?post=471"}],"version-history":[{"count":6,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/posts\/471\/revisions"}],"predecessor-version":[{"id":29238,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/posts\/471\/revisions\/29238"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/media\/9451"}],"wp:attachment":[{"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/media?parent=471"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/categories?post=471"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gri.jp\/media\/wp-json\/wp\/v2\/tags?post=471"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}