Skip to contents

This function takes a file with HTML input and processes it to generate corresponding output in CSV format. If the input is a character string, the output will be printed to the console. Otherwise, a CSV file will be generated with the same name as the input file but with a .csv extension.

Usage

html_to_csv(file, n_tokens_limit = 2000, ...)

Arguments

file

A character string representing the path to the file with HTML input.

n_tokens_limit

An integer representing the maximum number of tokens allowed in the input text. Defaults to 2000.

...

Additional arguments passed down to lower-level functions.

Value

If the input is a character string, the function returns the output as a character string and prints it to the console. Otherwise, the function returns the output as a data frame and generates a CSV file with the same name as the input file but with a .csv extension.

Examples

if (FALSE) {
# Example 1: HTML string
html_string <- '<table>
 <tr>
   <th>firstName</th>
   <th>lastName</th>
 </tr>
 <tr>
   <td>John</td>
   <td>Doe</td>
 </tr>
 <tr>
   <td>Anna</td>
   <td>Smith</td>
 </tr>
 <tr>
   <td>Peter</td>
   <td>Jones</td>
 </tr>
</table>'
html_to_csv(html_string)

# Example 2: HTML file
html_file <- "example.html"
html_to_csv(html_file)
}