Hi hugging face community !
I try to parse an html file and extract data to json.
For example, I want to crawll a web page listing events, such as this one “La programmation | Bataclan - Bataclan”
and search the html to find all gigs and generate a JSON structured like this:
{
name: 'name of the event'
date: 'a timestamp',
url: 'url of event'
artists: [
{
name: 'name of artist'
style: 'style of artist'
url: 'url of artist'
}
]}
"
I’d appreciate some expert advice. Is the Text2TextGeneration pipeline the best model for this type of task?