score:10
Accepted answer
The data is in a script tag. You can get the script tag using bs4 and a regex. You could also extract the data using a regex but I like using /js2xml to parse js functions into a xml tree:
from bs4 import BeautifulSoup
import requests
import re
import js2xml
soup = BeautifulSoup(requests.get("http://www.worldweatheronline.com/brussels-weather-averages/be.aspx").content, "html.parser")
script = soup.find("script", text=re.compile("Highcharts.Chart")).text
# script = soup.find("script", text=re.compile("precipchartcontainer")).text if you want precipitation data
parsed = js2xml.parse(script)
print js2xml.pretty_print(parsed)
That gives you:
<program>
<functioncall>
<function>
<identifier name="$"/>
</function>
<arguments>
<funcexpr>
<identifier/>
<parameters/>
<body>
<var name="chart"/>
<functioncall>
<function>
<dotaccessor>
<object>
<functioncall>
<function>
<identifier name="$"/>
</function>
<arguments>
<identifier name="document"/>
</arguments>
</functioncall>
</object>
<property>
<identifier name="ready"/>
</property>
</dotaccessor>
</function>
<arguments>
<funcexpr>
<identifier/>
<parameters/>
<body>
<assign operator="=">
<left>
<identifier name="chart"/>
</left>
<right>
<new>
<dotaccessor>
<object>
<identifier name="Highcharts"/>
</object>
<property>
<identifier name="Chart"/>
</property>
</dotaccessor>
<arguments>
<object>
<property name="chart">
<object>
<property name="renderTo">
<string>tempchartcontainer</string>
</property>
<property name="type">
<string>spline</string>
</property>
</object>
</property>
<property name="credits">
<object>
<property name="enabled">
<boolean>false</boolean>
</property>
</object>
</property>
<property name="colors">
<array>
<string>#FF8533</string>
<string>#4572A7</string>
</array>
</property>
<property name="title">
<object>
<property name="text">
<string>Average Temperature (°c) Graph for Brussels</string>
</property>
</object>
</property>
<property name="xAxis">
<object>
<property name="categories">
<array>
<string>January</string>
<string>February</string>
<string>March</string>
<string>April</string>
<string>May</string>
<string>June</string>
<string>July</string>
<string>August</string>
<string>September</string>
<string>October</string>
<string>November</string>
<string>December</string>
</array>
</property>
<property name="labels">
<object>
<property name="rotation">
<number value="270"/>
</property>
<property name="y">
<number value="40"/>
</property>
</object>
</property>
</object>
</property>
<property name="yAxis">
<object>
<property name="title">
<object>
<property name="text">
<string>Temperature (°c)</string>
</property>
</object>
</property>
</object>
</property>
<property name="tooltip">
<object>
<property name="enabled">
<boolean>true</boolean>
</property>
</object>
</property>
<property name="plotOptions">
<object>
<property name="spline">
<object>
<property name="dataLabels">
<object>
<property name="enabled">
<boolean>true</boolean>
</property>
</object>
</property>
<property name="enableMouseTracking">
<boolean>false</boolean>
</property>
</object>
</property>
</object>
</property>
<property name="series">
<array>
<object>
<property name="name">
<string>Average High Temp (°c)</string>
</property>
<property name="color">
<string>#FF8533</string>
</property>
<property name="data">
<array>
<number value="6"/>
<number value="8"/>
<number value="11"/>
<number value="14"/>
<number value="19"/>
<number value="21"/>
<number value="23"/>
<number value="23"/>
<number value="19"/>
<number value="15"/>
<number value="9"/>
<number value="6"/>
</array>
</property>
</object>
<object>
<property name="name">
<string>Average Low Temp (°c)</string>
</property>
<property name="color">
<string>#4572A7</string>
</property>
<property name="data">
<array>
<number value="2"/>
<number value="2"/>
<number value="4"/>
<number value="6"/>
<number value="10"/>
<number value="12"/>
<number value="14"/>
<number value="14"/>
<number value="11"/>
<number value="8"/>
<number value="5"/>
<number value="2"/>
</array>
</property>
</object>
</array>
</property>
</object>
</arguments>
</new>
</right>
</assign>
</body>
</funcexpr>
</arguments>
</functioncall>
</body>
</funcexpr>
</arguments>
</functioncall>
</program>
So to get all the data:
In [28]: from bs4 import BeautifulSoup
In [29]: import requests
In [30]: import re
In [31]: import js2xml
In [32]: from itertools import repeat
In [33]: from pprint import pprint as pp
In [34]: soup = BeautifulSoup(requests.get("http://www.worldweatheronline.com/brussels-weather-averages/be.aspx").content, "html.parser")
In [35]: script = soup.find("script", text=re.compile("Highcharts.Chart")).text
In [36]: parsed = js2xml.parse(script)
In [37]: data = [d.xpath(".//array/number/@value") for d in parsed.xpath("//property[@name='data']")]
In [38]: categories = parsed.xpath("//property[@name='categories']//string/text()")
In [39]: output = list(zip(repeat(categories), data))
In [40]: pp(output)
[(['January',
'February',
'March',
'April',
'May',
'June',
'July',
'August',
'September',
'October',
'November',
'December'],
['6', '8', '11', '14', '19', '21', '23', '23', '19', '15', '9', '6']),
(['January',
'February',
'March',
'April',
'May',
'June',
'July',
'August',
'September',
'October',
'November',
'December'],
['2', '2', '4', '6', '10', '12', '14', '14', '11', '8', '5', '2'])]
Like I said you could just use a regex but js2xml I find is more reliable as erroneous spaces etc.. won't break it.
Source: stackoverflow.com
Related Query
- Can I scrape the raw data from highcharts.js?
- Can I add on the webpage, data that I get from a Highcharts function?
- How can I scrape Json data from Highcharts (stats.twitchapps.com)
- How can I remove the white border from HighCharts pie chart?
- How can I delete all of the points from a highcharts series
- How can i change highcharts data values by selecting from a dropdown list
- Highcharts: Can I export to the user an Excel or CSV of the raw data driving the chart?
- How Can I Hide a Pie Chart's Slice in HighCharts Without Removing It From the Legend?
- How to make the Y Axis values not start from 0 in highcharts? How to display forcefully display the last category data on X axis in HighCharts ?
- Can we get index from series data in highcharts
- How can I reveal my chart tooltip programatically when the tooltip combines data from multiple series? (sync charts)
- I am using click event to trigger a new page to generate the drill down data from highcharts however the entire chart is being generated
- In highcharts how can I provide data with values x, y, title so that I can put the title in the tooltip?
- Can I Create a Single HighCharts Graph from Multiple Data Sources (Multiple GoogleSheets in this case)
- How can i load external json data in highcharts to show the bar chart
- Highcharts :: Need to split y-axis from primary body of chart. Can I display this axis separately from the Highchart?
- Highcharts - Series data from MySQL in the expected format
- How can I update the data in highcharts for both x axis & y axis?
- Highcharts Chart Bar - How can I display in the chart, only one column from my HTML table?
- How to scrape data from Highcharts charts using selenium and python?
- Highcharts - Organizing the raw data
- what should be the initial dataset from millions of data points for stock line highcharts
- How do I scrape data from a highcharts graph in python?
- Scrape data from graph generated with Highcharts
- How to display data coming from the controller as collection in a ViewData dynamically Highcharts
- highcharts - pulling data from HTML but converting the strings to number
- Scrape data from dynamic Highcharts visualisation
- R scrape data from Highcharts
- how can I use rangeselector and navigation in highcharts in the given code
- HighCharts - When I download csv, can I switch data source to new one?
More Query from same tag
- Graph for data updated every 30 minutes
- highchart dataLabel diffent position
- Specify color for Pie Chart Highchart
- Highcharts multiple series in drill down
- HightChart Error 13 with angular js, function and controller scope
- HighCharts - how to set labels font color for printing?
- Highcharts: Credits on multiple lines
- Highcharts access drilldown data from event click
- how to add custom button on highstock/highcharts on subplot/addAxis
- Add time-stamp data from multiple csv files to highchart
- HighCharts - Negative $ currency values
- High Charts Polar / Spider chart with off-graph y-axis labels
- Highchart startR is undefined under N(H, "parts/PieSeries.js", [H["parts/Globals.js"], H["parts/Utilities.js"]] highcharts.js file
- How can I do a padding between my highest x-value and the plot's right gap in Highcharts?
- Hightchart to display user online time
- rCharts : Highcharts column with color based on value
- Highcharts chart moves up from x-axis on drill down
- highcharts charts not displaying with pdfkit (wkhtlmltopdf)
- AngularJs choose charts framework d3.js vs highcharts
- Highcharts Gantt (JS) - Remove the day in x axis
- Highchart + IE = Greyscale Exports?
- Highstock and Highmaps together causing issue
- how to display 24 hours in xAxis on Highcharts
- Highcharts click on column (not bar)
- Adding series markers to highcharts area chart
- Highcharts scatter chart with a name per point
- Reversed bullet chart in Highcharts
- the datatime value displayed not correct on highchart
- Changing barchart dynamically using dropdown
- Highchart Spline with inverted axis categorie text left align