Australian Digital Observatory Application Programming Interface (ADO-API)

Version 1.0.0

Introduction

The Australian Digital Observatory Application Programming Interface (API) provides programmatic access to social media data and collections present in the Melbourne eResearch Group node of the project. The API is RESTful, encrypted and requires authorization to access its services. It is also synchronous, hence the responses to the queries sent to the API reflect the latest state of the data.

All in all, the functionality of the API can be broken down into three areas:

1. Aggregation

Aggregation over the number of documents in the collection can be performed at various aggregation levels. This includes by time period (day/month/year), seasonality, language, and by place (where the information is available).

2. Term frequency

Aggregation can be performed over a number of days to attain term frequency for specific terms by day.

3. Term Similarity

Term similarity can also be queried, for a given term on a certain day, based on word2vec models built on a daily basis.

4. Topic Modelling

Topic modelling is performed on the social media collections using BERTopic daily. The API can be queried to interact with the results of these results, along with building and retrieving network graphs from consecutive days of clustering.

5. Text Search

A full-text search query that returns an array of social media post IDs. This feature allows selecting posts based on: author id, text, hashtags, date, language (or combination of these fields).

NOTE: All times are in UTC, hence every aggregation or selection by date and time may be off of about 10 hours for posters that are based on the eastern seaboard of Australia. Since there is no sure way of knowing the location of the poster at the time of posting, we preferred to use the UTC time in the API to minimize bias.

Authorization and Authentication (A&A)

Each resource in the API requires authorization by supplying a JSON web token (JWT), and a user has to go through an authentication process to acquire the JWT.

The procedure to authenticate is based on JWT and requires the following steps to be executed:

A user has to send a POST request to the /login endpoint, using the basic authentication scheme, where the API key has been requested to the ADO Project. This request returns either an error or a JWT token that contains the roles the API key holder is entitled to. Below is an example how authentication is performed on the command line, where is the api key granted to the user:

export API_DEVELOPERUSER_KEY=<key string>
JWT=$(curl -XPOST -u "apikey:${API_DEVELOPERUSER_KEY}" https://api.ado.eresearch.unimelb.edu.au/login)

The JWT token is then passed as Authorization header (as in Bearer: <JWT token>) to all subsequent requests.

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/version'\
  --header "Authorization: Bearer ${JWT}"; echo

The JWT is only valid for 24 hours, after which another JWT has to be requested to continue API accessibility.

A&A Example in Python

Below is an example of how to attain a JWT in python (version 3.0 and over), using the requests library for http calls, where to be replaced the the api key.

import requests
from requests.auth import HTTPBasicAuth

// user to pass their api key as a string
API_KEY = <key string>

url = "https://api.ado.eresearch.unimelb.edu.au/login"
res = requests.post(url, auth=HTTPBasicAuth('apikey', API_KEY))
if res.ok:
   jwt = res.text

Below is an example of how to use the JWT with the requests.get() function, to attain the API version.

url = 'https://api.ado.eresearch.unimelb.edu.au/version'
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers)
version = res.text

Aggregation

The aggregation endpoint is designed to perform aggregations over the collections to retrieve count data or sentiment data. Aggregation can be performed by a function of time, or by a number of descriptive properties inherent to the social media document data. The requests are synchronous and reflect the state of the data in the database, in near real-time.

Summary of a collection

The number of posts in a collection and the start and end dates of harvesting can be retrieved using:

GET /analysis/aggregate/collections/{collection}/summary

Examples:

cURL (command line)

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/summary'\ --header "Authorization: Bearer ${JWT}"

Python

import requests url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/summary', headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params) 
result = res.json()

Examples:

{"startDate":"2021-6-2", "endDate":"2023-3-22","count":2845093}

Aggregation by time

Day, month and year

Aggregation of count or sentiment can be performed to obtain a number of documents in a social media database by year, month, day of week or hour as the aggregation level. The request type and endpoint for this function are:

GET /analysis/aggregate/collections/{collection}/aggregation

where ‘collection’ is a path parameter, of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
aggregationLevel	string	One of `day`, `month` or `year`	Yes	`day`
sentiment	boolean	Request sentiment statistics rather than count statistics	No	‘false

The function defaults to a count request is sentiment is absent or false, otherwise returns a sentiment property (sum of the sentiments of individual posts and sentimentcount instead of count to avoid confusion).

Examples

Below are examples of how to aggregate twitter documents by day that have been published between 2021-07-01 and 2021-07-11, with the output, in various programming languages

cURL (command line)

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/aggregation?aggregationLevel=day&startDate=2021-07-01&endDate=2021-07-11'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/aggregation'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-01' , 'endDate':'2021-07-11', 'aggregationLevel': 'day' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample Response

Below is a sample response for an aggregation request with startDate as 2021-07-10 and endDate as 2021-07-15, with aggregationLevel set to day.

[
  {
    "time": "2021-7-10",
    "count": 285671
  },
  {
    "time": "2021-7-11",
    "count": 312456
  },
  {
    "time": "2021-7-12",
    "count": 294918
  },
  {
    "time": "2021-7-13",
    "count": 289360
  },
  {
    "time": "2021-7-14",
    "count": 305276
  }
]

Seasonality

Aggregation of count or sentiment can be performed to obtain a number of documents in a social media database by day of the week, as well as hour of the day within the day of the week, as the aggregation level. The request type and endpoint for this function are:

GET /analysis/aggregate/collections/{collection}/seasonality

where ‘collection’ is a path parameter, of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
aggregationLevel	string	One of `dayofweek` or `dayofweekhour`	Yes	`day`
sentiment	boolean	Request sentiment statistics rather than count statistics	No	`false`

The function defaults to a count request is sentiment is absent or false, otherwise returns the sentiment.

Examples

Below are examples of how to aggregate twitter documents by seasonality, set to the day of the week, that have been published between 2021-06-01 and 2021-07-31, with the output, in various programming languages.

cURL (command line)

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/seasonality?aggregationLevel=dayofweek&startDate=2021-07-01&endDate=2021-07-11'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/seasonality'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-01' , 'endDate':'2021-07-11', 'aggregationLevel': 'dayofweek' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample Response

Below is a sample response for a seasonality request with startDate as 2021-07-01 and endDate as 2021-07-11, with aggregationLevel set to dayofweek.

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/aggregate/collections/twitter/seasonality?aggregationLevel=dayofweek&startDate=2021-07-01&endDate=2021-07-31'\
  --header "Authorization: Bearer ${JWT}"
[
  {
    "time": "friday",
    "count": 1377965
  },
  {
    "time": "monday",
    "count": 1004482
  },
  {
    "time": "tuesday",
    "count": 922970
  },
  {
    "time": "wednesday",
    "count": 1220736
  },
  {
    "time": "thursday",
    "count": 1167768
  },
  {
    "time": "saturday",
    "count": 1240801
  },
  {
    "time": "sunday",
    "count": 937304
  }
]

Aggregation by descriptive property

A fraction of the social media data (~7%) contains some geographic information about the origin of the post. However, the specificity of this origin can span from just the country, down to suburbs or landmarks. Therefore, the place information of a tweet was standardized to the following levels:

{ 
  countrycode: (*|string|null), 
  statecode: (*|null),
  gccsacode: (*|null), 
  salcode: (*|null)
}

where each of the levels was inferred from the location data present in the document. NOTE: the original data was modified with the above standardization, the documents were transformed separately in CouchDB MapReduce views, hence the original location data as received from the harvesters is still as it was received. If one of the levels can’t be inferred, it is left as null.

Social media documents also contain the language property that can be aggregated over to retrieve the count or sentiment by language over a time period.

Language and place

Aggregation of count or sentiment can be performed to obtain a number of documents by language, with further aggregation possible to obtain the count or sentiment of language by place. The request type and endpoint for this function are:

GET /analysis/language/collections/{collection}

where ‘collection’ is a path parameter, of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
aggregationLevel	string	One of `language`, `country`, `state`, `gccsa` and `suburb`	Yes	`language`
sentiment	boolean	Request sentiment statistics rather than count statistics	No	`false`

Examples

Below are examples of how to aggregate twitter documents that have been published between 2021-07-01 and 2021-07-11 by language, with the output in various programming languages.

cURL

curl -XGET     'https://api.ado.eresearch.unimelb.edu.au/analysis/language/collections/twitter?aggregationLevel=language&startDate=2021-07-01&endDate=2021-07-11'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/language/collections/twitter'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-01' , 'endDate':'2021-07-11', 'aggregationLevel': 'language' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample Response

The language codes conform to the alpha-3/ISO 639-2 standard.

Aggregation Level: language

Below is an abridged sample response (NOTE: the actual response was much larger) for a language request with startDate as 2021-07-01 and endDate as 2021-07-11, with aggregationLevel set to language:

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/language/collections/twitter?aggregationLevel=language&startDate=2021-07-01&endDate=2021-07-11'\
  --header "Authorization: Bearer ${JWT}"
[
  {
    "language": "is",
    "count": 1
  },
  {
    "language": "pa",
    "count": 1
  },
  {
    "language": "sd",
    "count": 1
  },
  {
    "language": "si",
    "count": 1
  },
  {
    "language": "te",
    "count": 1
  },
  {
    "language": "vi",
    "count": 1
  },
  {
    "language": "bg",
    "count": 1
  },
  {
    "language": "bn",
    "count": 1
  },
  {
    "language": "kn",
    "count": 1
  },
  {
    "language": "mr",
    "count": 1
  },
  {
    "language": "sd",
    "count": 1
  }
]

Aggregation Level: state

Below is an abridged sample response (NOTE: the actual response was much larger) for a language request with startDate as 2021-07-27 and endDate as 2021-07-31, with aggregationLevel set to state:

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/language/collections/twitter?aggregationLevel=state&startDate=2021-07-27&endDate=2021-07-31'\
  --header "Authorization: Bearer ${JWT}"
[
  {
    "language": "ar",
    "country": "au",
    "state": "2",
    "count": 25
  },
  {
    "language": "ar",
    "country": "au",
    "state": "3",
    "count": 22
  },
  {
    "language": "ar",
    "country": "au",
    "state": "4",
    "count": 3
  },
  {
    "language": "ar",
    "country": "au",
    "state": "5",
    "count": 19
  },
  {
    "language": "ar",
    "country": "au",
    "state": "8",
    "count": 2
  },
  {
    "language": "ar",
    "country": "sa",
    "state": null,
    "count": 2
  }
]

Aggregation Level: gccsa

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/language/collections/twitter?aggregationLevel=gccsa&startDate=2021-07-27&endDate=2021-07-31'\
 --header "Authorization: Bearer ${JWT}"
[
  {
    "language": "und",
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "count": 3
  },
  {
    "language": "und",
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "count": 29
  },
  {
    "language": "und",
    "country": "eg",
    "state": null,
    "gccsa": null,
    "count": 1
  },
  {
    "language": "und",
    "country": "in",
    "state": null,
    "gccsa": null,
    "count": 6
  },
  {
    "language": "und",
    "country": "pk",
    "state": null,
    "gccsa": null,
    "count": 3
  },
  {
    "language": "und",
    "country": "th",
    "state": null,
    "gccsa": null,
    "count": 1
  },
  {
    "language": "und",
    "country": "us",
    "state": null,
    "gccsa": null,
    "count": 7
  },
  {
    "language": "ur",
    "country": null,
    "state": null,
    "gccsa": null,
    "count": 136
  },
  {
    "language": "ur",
    "country": "au",
    "state": "1",
    "gccsa": "1gsyd",
    "count": 1
  },
  {
    "language": "ur",
    "country": "au",
    "state": "2",
    "gccsa": "2gmel",
    "count": 2
  },
  {
    "language": "ur",
    "country": "au",
    "state": "3",
    "gccsa": "3gbri",
    "count": 3
  },
  {
    "language": "ur",
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "count": 1
  },
  {
    "language": "vi",
    "country": null,
    "state": null,
    "gccsa": null,
    "count": 23
  },
  {
    "language": "vi",
    "country": "au",
    "state": "1",
    "gccsa": "1gsyd",
    "count": 1
  },
  {
    "language": "vi",
    "country": "au",
    "state": "4",
    "gccsa": "4gade",
    "count": 2
  },
  {
    "language": "zh",
    "country": null,
    "state": null,
    "gccsa": null,
    "count": 1420
  },
  {
    "language": "zh",
    "country": "au",
    "state": "1",
    "gccsa": "1gsyd",
    "count": 19
  },
  {
    "language": "zh",
    "country": "au",
    "state": "1",
    "gccsa": "1rnsw",
    "count": 5
  },
  {
    "language": "zh",
    "country": "au",
    "state": "2",
    "gccsa": null,
    "count": 1
  },
  {
    "language": "zh",
    "country": "au",
    "state": "2",
    "gccsa": "2gmel",
    "count": 26
  },
  {
    "language": "zh",
    "country": "au",
    "state": "3",
    "gccsa": "3gbri",
    "count": 6
  },
  {
    "language": "zh",
    "country": "au",
    "state": "5",
    "gccsa": "5gper",
    "count": 27
  },
  {
    "language": "zh",
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "count": 1
  }
]

Aggregation Level: suburb

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/language/collections/twitter?aggregationLevel=suburb&startDate=2021-07-27&endDate=2021-07-31'\
 --header "Authorization: Bearer ${JWT}"
[
  {
    "language": "und",
    "country": "au",
    "state": "5",
    "gccsa": "5rwau",
    "suburb": "50228",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "5",
    "gccsa": "5rwau",
    "suburb": "50492",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "5",
    "gccsa": "5rwau",
    "suburb": "50536",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "5",
    "gccsa": "5rwau",
    "suburb": "50602",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "5",
    "gccsa": "5rwau",
    "suburb": "51281",
    "count": 3
  },
  {
    "language": "und",
    "country": "au",
    "state": "6",
    "gccsa": null,
    "suburb": null,
    "count": 5
  },
  {
    "language": "und",
    "country": "au",
    "state": "6",
    "gccsa": "6ghob",
    "suburb": "60051",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "6",
    "gccsa": "6ghob",
    "suburb": "60276",
    "count": 20
  },
  {
    "language": "und",
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60156",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60253",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60322",
    "count": 3
  },
  {
    "language": "und",
    "country": "au",
    "state": "7",
    "gccsa": null,
    "suburb": null,
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "7",
    "gccsa": "7gdar",
    "suburb": "70073",
    "count": 11
  },
  {
    "language": "und",
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70005",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70133",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70241",
    "count": 1
  },
  {
    "language": "und",
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "count": 29
  }
]

Place and Language

Aggregation of count or sentiment can be performed to obtain a number of documents by place, with further aggregation possible to obtain the count or sentiment of place by language. The request type and endpoint for this function are:

GET /analysis/place/collections/{collection}

where ‘collection’ is a path parameter, of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
aggregationLevel	string	One of `country`, `state`, `gccsa`, `suburb` and `language`	Yes	`language`
sentiment	boolean	Request sentiment statistics rather than count statistics	No	`false`

The places are represented by codes according to the following standards:

Place level	Standard
Country	ISO-3166 two-letter alpha code
State	Australian Statistical Geography Standard – States and Territories
Greater Capital City Statistical Area	Australian Statistical Geography Standard – GCCSA
Suburb or locality	Australian Statistical Geography Standard – Suburbs and Localities

Examples

Below are examples of how to aggregate twitter documents that have been published between 2021-07-01 and 2021-07-1 by place, with the output in various programming languages.

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/place/collections/twitter?aggregationLevel=country&startDate=2021-07-01&endDate=2021-07-11'\
  --header "Authorization: Bearer ${JWT}"; echo

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/place/collections/twitter'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-06-01' , 'endDate':'2021-07-31', 'aggregationLevel': 'country' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)Location 
result = res.json()

Sample Response

Aggregation Level: suburb

Below is an abridged sample response (NOTE: the actual response was much larger) for a place request with startDate as 2021-07-26 and endDate as 2021-07-31, with aggregationLevel set to state:

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/place/collections/twitter?aggregationLevel=state&startDate=2021-07-26&endDate=2021-07-31'\
 --header "Authorization: Bearer ${JWT}"
[
  {
    "country": "au",
    "state": "1",
    "count": 6693
  },
  {
    "country": "au",
    "state": "2",
    "count": 6721
  },
  {
    "country": "au",
    "state": "3",
    "count": 3228
  },
  {
    "country": "au",
    "state": "4",
    "count": 1389
  },
  {
    "country": "au",
    "state": "5",
    "count": 1601
  },
  {
    "country": "au",
    "state": "6",
    "count": 445
  },
  {
    "country": "au",
    "state": "7",
    "count": 202
  },
  {
    "country": "au",
    "state": "8",
    "count": 457
  }
]

Aggregation Level: gccsa

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/place/collections/twitter?aggregationLevel=gccsa&startDate=2021-07-26&endDate=2021-07-31'\
 --header "Authorization: Bearer ${JWT}"
[
  {
    "country": "au",
    "state": "5",
    "gccsa": "5rwau",
    "count": 184
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": null,
    "count": 109
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6ghob",
    "count": 195
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "count": 141
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": null,
    "count": 20
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7gdar",
    "count": 135
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "count": 47
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": null,
    "count": 6
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "count": 451
  },
  {
    "country": "au",
    "state": "9",
    "gccsa": "9oter",
    "count": 1
  }
]

Aggregation Level: suburb

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/place/collections/twitter?aggregationLevel=suburb&startDate=2021-07-26&endDate=2021-07-31'\
 --header "Authorization: Bearer ${JWT}"
[
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60322",
    "count": 89
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60367",
    "count": 5
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60426",
    "count": 1
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60427",
    "count": 1
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60437",
    "count": 3
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60634",
    "count": 1
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60675",
    "count": 1
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60702",
    "count": 2
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60717",
    "count": 1
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60749",
    "count": 6
  },
  {
    "country": "au",
    "state": "6",
    "gccsa": "6rtas",
    "suburb": "60758",
    "count": 3
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": null,
    "suburb": null,
    "count": 20
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7gdar",
    "suburb": "70073",
    "count": 135
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70005",
    "count": 21
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70108",
    "count": 23
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70133",
    "count": 1
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70241",
    "count": 1
  },
  {
    "country": "au",
    "state": "7",
    "gccsa": "7rnte",
    "suburb": "70251",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": null,
    "suburb": null,
    "count": 6
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "count": 420
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80012",
    "count": 25
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80083",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80092",
    "count": 4
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80113",
    "count": 1
  },
  {
    "country": "au",
    "state": "9",
    "gccsa": "9oter",
    "suburb": "90003",
    "count": 1
  },
  {
    "country": "br",
    "state": null,
    "gccsa": null,
    "suburb": null,
    "count": 4
  }
]

Aggregation Level: language

[
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "fr",
    "count": 2
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "hi",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "in",
    "count": 7
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "ja",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "no",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "pl",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "th",
    "count": 3
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "tl",
    "count": 2
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "und",
    "count": 29
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "ur",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80003",
    "language": "zh",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80012",
    "language": "en",
    "count": 25
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80083",
    "language": "en",
    "count": 1
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80092",
    "language": "en",
    "count": 4
  },
  {
    "country": "au",
    "state": "8",
    "gccsa": "8acte",
    "suburb": "80113",
    "language": "en",
    "count": 1
  },
  {
    "country": "au",
    "state": "9",
    "gccsa": "9oter",
    "suburb": "90003",
    "language": "en",
    "count": 1
  }
]

Term Frequency

The ability to analyze the count frequency of terms present in a social media collection, within a certain date range, is also possible. The terms are stemmed using the nltk PorterStemmer function, so the words that can be queried are stem words. E.g. “Likes”, ”liked”, ”likely” and ”liking” will be reduced to “like” after stemming.

All terms

The terms available, along with their aggregated count, in a time period can be queried at the following endpoint:

GET /analysis/term/collections/{collection}

where ‘collection’ is a path parameter, of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`

The results from this endpoint can also be a precursor to the Specific Terms request, to determine which stem words are available to be queried for daily count statistics.

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/terms/collections/twitter?startdate=2021-07-26&enddate=2021-08-10'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/terms/collections/twitter'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-26' , 'endDate':'2021-08-10'}
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample response

Below is an abridged sample response (NOTE: the actual response was much larger) for an All terms request with startDate as 2021-07-26 and endDate as 2021-08-10:

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/terms/collections/twitter?startdate=2021-07-26&enddate=2021-08-10'\
  --header "Authorization: Bearer ${JWT}" 
{
  "terms": {
    "number": 29444,
    "nurs": 3984,
    "nut": 1987,
    "octob": 2378,
    "odd": 1525,
    "off": 2559,
    "offer": 6118,
    "offic": 10772,
    "offici": 3868,
    "oil": 2963,
    "okay": 5562,
    "old": 3689,
    "olymp": 25792,
    "omg": 9915,
    "one": 24539,
    "onli": 1945,
    "onlin": 8532,
    "open": 4987,
    "oper": 4458,
    "opinion": 9746,
    "opportun": 8799,
    "opposit": 3761,
    "option": 9490,
    "orang": 1640,
    "order": 13610,
    "organ": 1620,
    "organis": 3635,
    "origin": 2704,
    "other": 18474,
    "out": 4424,
    "outbreak": 6269,
    "outcom": 3295,
    "outdoor": 1078,
    "outfit": 1252,
    "over": 879,
    "owner": 3989,
    "pacif": 589,
    "pack": 3184,
    "packag": 2538,
    "page": 7929,
    "pain": 5848,
    "paint": 2334,
    "pair": 2111,
    "pandem": 5462,
    "panel": 2095,
    "pant": 1879,
    "paper": 11548,
    "parent": 9057,
    "park": 6372,
    "parliament": 2676,
    "part": 29335,
    "parti": 15366,
    "particip": 1335,
    "partner": 5619,
    "pass": 5302,
    "passion": 2211,
    "passport": 2642,
    "past": 2968,
    "path": 2811,
    "patienc": 1316,
    "patient": 5572,
    "pay": 8529,
    "payment": 5798,
    "pcr": 1563,
    "peac": 3594,
    "peak": 993,
    "penalti": 2638,
    "pension": 587,
    "peopl": 173508,
    "perfect": 3930,
    "perform": 7532,
    "period": 6783,
    "permiss": 246,
    }
}

Specific terms

The daily frequencies of specific terms of interest can be queried at the following endpoint:

GET /analysis/terms/collections/{collection}/term

where ‘collection’ is a path parameter, of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
terms	string	List of terms in comma separated format	Yes	`covid,scomo,vaccin`

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/terms/collections/twitter/term?startdate=2021-07-13&enddate=2021-07-31&terms=scomo,vaccin'\
  --header "Authorization: Bearer ${JWT}"; echo

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/terms/collections/twitter/term'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-13' , 'endDate':'2021-07-31', 'terms' : 'scomo,vaccin' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample response

Below is an abridged sample response (NOTE: the actual response was much larger) for an All terms request with startDate as 2021-07-26 and endDate as 2021-08-10, with the terms set to scomo,vaccin:

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/terms/collections/twitter/term?startdate=2021-07-13&enddate=2021-07-31&terms=scomo,vaccin'\
 --header "Authorization: Bearer ${JWT}"
{
  "scomo": [
    {
      "date": "2021-7-13",
      "count": 132
    },
    {
      "date": "2021-7-14",
      "count": 142
    },
    {
      "date": "2021-7-15",
      "count": 198
    },
    {
      "date": "2021-7-16",
      "count": 148
    },
    {
      "date": "2021-7-17",
      "count": 118
    },
    {
      "date": "2021-7-18",
      "count": 107
    },
    {
      "date": "2021-7-19",
      "count": 138
    },
    {
      "date": "2021-7-20",
      "count": 134
    },
    {
      "date": "2021-7-21",
      "count": 225
    },
    {
      "date": "2021-7-22",
      "count": 229
    },
    {
      "date": "2021-7-23",
      "count": 227
    },
    {
      "date": "2021-7-24",
      "count": 162
    },
    {
      "date": "2021-7-25",
      "count": 164
    },
    {
      "date": "2021-7-26",
      "count": 133
    },
    {
      "date": "2021-7-28",
      "count": 139
    },
    {
      "date": "2021-7-29",
      "count": 144
    },
    {
      "date": "2021-7-30",
      "count": 191
    }
  ],
  "vaccin": [
    {
      "date": "2021-7-13",
      "count": 3262
    },
    {
      "date": "2021-7-14",
      "count": 3175
    },
    {
      "date": "2021-7-15",
      "count": 3080
    },
    {
      "date": "2021-7-16",
      "count": 3193
    },
    {
      "date": "2021-7-17",
      "count": 2858
    },
    {
      "date": "2021-7-18",
      "count": 2910
    },
    {
      "date": "2021-7-19",
      "count": 4156
    },
    {
      "date": "2021-7-20",
      "count": 3978
    },
    {
      "date": "2021-7-21",
      "count": 4492
    },
    {
      "date": "2021-7-22",
      "count": 5197
    },
    {
      "date": "2021-7-23",
      "count": 6336
    },
    {
      "date": "2021-7-24",
      "count": 6047
    },
    {
      "date": "2021-7-25",
      "count": 5163
    },
    {
      "date": "2021-7-26",
      "count": 4124
    },
    {
      "date": "2021-7-27",
      "count": 3680
    },
    {
      "date": "2021-7-28",
      "count": 4039
    },
    {
      "date": "2021-7-29",
      "count": 4578
    },
    {
      "date": "2021-7-30",
      "count": 4599
    }
  ]
}

Term Similarity

In the active natural language processing pipeline on the ADO cluster, word embedding models are built on corpus data derived from a single day’s worth of text data, using word2vec. Word embedding models can be used to find how similar words are to each other, in the embedding space, by computing the cosine distance of each word vector. Similarities change across different days, depending on the semantic relationships of the terms (often how frequently they appear together).

Terms available

To view the terms available for querying, of a specific word embedding model, the following endpoint can be queried:

GET /analysis/nlp/collections/{collection}/days/{day}/terms

where ‘collection’ is a path parameter, of the target social media collection, and ‘day’ is the day of the model requested:

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/days/2021-07-09/terms'\
  --header "Authorization: Bearer ${JWT}"; echo

Python

import requests
day = '2021-07-09'
collection = 'twitter'
url = f'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/{collection}/days/{day}/terms'
# query parameters set in the dict below
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers)
result = res.json()

Query word embedding models

To query a model and retrieve the top 25 most similar terms, along with their cosine similarities, the following endpoint can be queried:

GET /analysis/nlp/collections/{collection}/days/{day}/terms/{term}

where collection is a path parameter, of the target social media collection. day is a path parameter in the format YYYY-MM-DD, as well as term which is the term of being queried.

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/days/2021-07-09/terms/vaccin'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests

term = 'vaccin'
day = '2021-07-09'
url = f'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/days/{day}/terms/{term}'
# query parameters set in the dict below
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers)
result = res.json()

Sample Response

Below is a sample response for the term vaccin on the day 2021-07-09:

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/days/2021-07-09/terms/vaccin'\
  --header "Authorization: Bearer ${JWT}"
{
  "agenda": 0.5740534067153931,
  "approv": 0.6987619400024414,
  "australian": 0.6622416973114014,
  "blame": 0.6074656844139099,
  "blood": 0.6090492010116577,
  "booster": 0.8681329488754272,
  "bullshit": 0.6017955541610718,
  "cabinet": 0.5635865330696106,
  "campaign": 0.6179795265197754,
  "capac": 0.5706251859664917,
  "claim": 0.5647196769714355,
  "clinic": 0.7478079199790955,
  "concern": 0.5908010005950928,
  "condit": 0.5645681023597717,
  "conspiraci": 0.6220813393592834,
  "coronaviru": 0.6344245672225952,
  "coverag": 0.6052238941192627,
  "covid": 0.7316592931747437,
  "crisi": 0.6347731351852417,
  "damag": 0.5770517587661743,
  "danger": 0.6578422784805298,
  "deal": 0.6643673181533813,
  "death": 0.57118159532547,
  "delta": 0.5919604301452637,
  "disabl": 0.5580296516418457,
  "disast": 0.608578085899353,
  "diseas": 0.7518317699432373,
  "doctor": 0.7485677003860474,
  "dose": 0.7945512533187866,
  "effect": 0.6603673696517944,
  "emerg": 0.6338611841201782,
  "evid": 0.5946788787841797,
  "facil": 0.6528164148330688,
  "fact": 0.5785788297653198,
  "fail": 0.5803477168083191,
  "failur": 0.7815086245536804,
  "fed": 0.6356635093688965,
  "feder": 0.5594338178634644,
  "figur": 0.7420264482498169,
  "flu": 0.7349779605865479,
  "gov": 0.6872922778129578,
  "govern": 0.5914908647537231,
  "govt": 0.6604992151260376,
  "headlin": 0.6021615266799927,
  "hospitalis": 0.6953837275505066,
  "hub": 0.723802387714386,
  "hunt": 0.5952842235565186,
  "icu": 0.6137961149215698,
  "ill": 0.620633602142334,
  "immun": 0.8382552266120911,
  "incompet": 0.5905853509902954,
  "increas": 0.592781662940979,
  "infect": 0.6805194616317749,
  "jab": 0.8495683670043945,
  "lack": 0.5874945521354675,
  "lie": 0.5710851550102234,
  "major": 0.5966668128967285,
  "mass": 0.8208097219467163,
  "medicin": 0.6375464200973511,
  "million": 0.7198325991630554,
  "msm": 0.6013255715370178,
  "outbreak": 0.5976635813713074,
  "pandem": 0.6687020659446716,
  "patient": 0.5810028314590454,
  "pfizer": 0.900665283203125,
  "phase": 0.6531299352645874,
  "popul": 0.7427656054496765,
  "proof": 0.5772579312324524,
  "propaganda": 0.5715232491493225,
  "protect": 0.7308518886566162,
  "quarantin": 0.6936514973640442,
  "rate": 0.6851494312286377,
  "report": 0.6093935966491699,
  "respons": 0.631423830986023,
  "risk": 0.68557208776474,
  "roll": 0.7304665446281433,
  "rollout": 0.912151038646698,
  "scientist": 0.5725030303001404,
  "scomo": 0.6933631896972656,
  "seem": 0.6128174066543579,
  "shot": 0.6573461294174194,
  "spin": 0.5988937020301819,
  "spread": 0.5883970856666565,
  "stat": 0.5837931632995605,
  "statement": 0.654644250869751,
  "strain": 0.6624536514282227,
  "suppli": 0.8358686566352844,
  "symptom": 0.7199208736419678,
  "target": 0.5763600468635559,
  "theori": 0.5651994347572327,
  "thousand": 0.572020411491394,
  "total": 0.5963312387466431,
  "treatment": 0.6949573755264282,
  "trial": 0.5938171744346619,
  "variant": 0.6742241978645325,
  "vax": 0.9031803011894226,
  "ventil": 0.6273638010025024,
  "viru": 0.6889223456382751,
  "which": 0.6370202302932739,
  "zero": 0.6220378279685974
}

Topic Modelling

Topic modelling is performed daily on a corpus derived from the previous day’s social media posts. As a result, a list of clusters is obtained for every day, which represents the topics of that particular day. In the metadata of each topic cluster there is the following information:
• the size of the cluster
• the list of document ids in the cluster
• the top 30 terms (by frequency) of the cluster
• the aggregate pairwise similarity of the 30 terms, computed using a word embedding model (word2vec) which was built with the same corpus. A variety of API endpoints allow access to this data.

Topics

To get the list of topics on certain days (in a date range), with the top 30 terms representing each topic, the following endpoint can be queried.

GET /analysis/nlp/collections/{collection}/topics

where collection is a path parameter, of the target social media collection. The query parameters for the endpoint are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
fullResult	boolean	when set to true, it returns the whole clustering information, otherwise returns only the top terms and size of each cluster	No	false

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/topics?startDate=2021-07-15&endDate=2021-07-31'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests

url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/topics'
# query parameters set in the dict below
headers = {'Authorization': f"Bearer {jwt}"}
qs_params = { 'startDate' : '2021-07-15' , 'endDate':'2021-07-31' }
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample responses

Below is an abridged sample response (NOTE: the actual response was much larger) for a topics request with startDate as 2021-07-06 and endDate as 2021-07-07:

[
  {
    "time": "twitter-2021-7-6",
    "topics": [
      {
        "size": 13426,
        "terms": [
          [
            "time",
            0.012539179088579358
          ],
          [
            "thing",
            0.011031433589385515
          ],
          [
            "thank",
            0.010893080954586857
          ],
          [
            "day",
            0.01079276048562502
          ],
          [
            "way",
            0.009899755014715397
          ],
          [
            "peopl",
            0.009862197831996721
          ],
          [
            "year",
            0.009700065374291758
          ],
          [
            "work",
            0.009599968190570853
          ],
          [
            "pleas",
            0.009439287725906211
          ],
          [
            "love",
            0.00923899830888478
          ],
          [
            "someth",
            0.008347818972842945
          ],
          [
            "famili",
            0.007857277204932738
          ],
          [
            "man",
            0.007826952365885256
          ],
          [
            "life",
            0.007707197935138655
          ],
          [
            "look",
            0.00760704576238854
          ],
          [
            "number",
            0.00748684770119196
          ],
          [
            "hope",
            0.007459941187635528
          ],
          [
            "chang",
            0.0073872271226581845
          ],
          [
            "lol",
            0.007273909313434026
          ],
          [
            "world",
            0.007218640153030318
          ],
          [
            "today",
            0.0070560603622170735
          ],
          [
            "lot",
            0.007020157362037514
          ],
          [
            "stori",
            0.0067819740134623194
          ],
          [
            "point",
            0.006586016841722677
          ],
          [
            "someon",
            0.006336134860858228
          ],
          [
            "amp",
            0.006335191555513297
          ],
          [
            "person",
            0.006326825382022919
          ],
          [
            "mate",
            0.0063065568290590086
          ],
          [
            "idea",
            0.00630524951478842
          ],
          [
            "issu",
            0.006262870208386203
          ]
        ]
      },
      {
        "size": 1184,
        "terms": [
          [
            "been",
            0.6629800970648556
          ],
          [
            "and",
            0.6447700257307555
          ],
          [
            "tood",
            0.5790615196460484
          ],
          [
            "wbk",
            0.5790615196460484
          ],
          [
            "ofoot",
            0.5790615196460484
          ],
          [
            "lme",
            0.5790615196460484
          ],
          [
            "lulz",
            0.5058207004015077
          ],
          [
            "aint",
            0.4717656588171083
          ],
          [
            "edit",
            0.3298835450938238
          ],
          [
            "note",
            0.30500326203449424
          ],
          [
            "new",
            0.24546509924968438
          ],
          [
            "good",
            0.23786193215159798
          ],
          [
            "other",
            0.2128574234233451
          ]
        ]
      },
      {
        "size": 854,
        "terms": [
          [
            "women",
            0.02896990327024129
          ],
          [
            "peopl",
            0.01671489963171787
          ],
          [
            "labor",
            0.015499202570304927
          ],
          [
            "lnp",
            0.014917234756772927
          ],
          [
            "govern",
            0.014581957505262429
          ],
          [
            "men",
            0.01409398595674428
          ],
          [
            "parti",
            0.013829154562138842
          ],
          [
            "vote",
            0.012737353913619187
          ],
          [
            "woman",
            0.012643179591426178
          ],
          [
            "thing",
            0.012518401784242076
          ],
          [
            "elect",
            0.012349628904950012
          ],
          [
            "amp",
            0.01229898149785756
          ],
          [
            "countri",
            0.011066587380819054
          ],
          [
            "noth",
            0.01101237688074447
          ],
          [
            "time",
            0.010930784687508228
          ],
          [
            "someon",
            0.010913677044621818
          ],
          [
            "way",
            0.010883379274568714
          ],
          [
            "person",
            0.01087435730958338
          ],
          [
            "book",
            0.010083183625184377
          ],
          [
            "point",
            0.009814435710351012
          ],
          [
            "sex",
            0.009678021925846197
          ],
          [
            "media",
            0.009644183199768645
          ],
          [
            "polici",
            0.00942604534487092
          ],
          [
            "word",
            0.009384505091836968
          ],
          [
            "job",
            0.00929449252100335
          ],
          [
            "world",
            0.008890181955987965
          ],
          [
            "right",
            0.008671508417990654
          ],
          [
            "leader",
            0.008594574935355337
          ],
          [
            "power",
            0.00858276775345682
          ],
          [
            "minist",
            0.008506955957547122
          ]
        ]
      }
    ]
  },
  {
    "time": "twitter-2021-7-7",
    "topics": [
      {
        "size": 13625,
        "terms": [
          [
            "peopl",
            0.04954222129345652
          ],
          [
            "time",
            0.03852017067849832
          ],
          [
            "thank",
            0.036208638709126315
          ],
          [
            "year",
            0.03078951789567827
          ],
          [
            "day",
            0.029598754995594615
          ],
          [
            "thing",
            0.028388628668967766
          ],
          [
            "way",
            0.025430383221641064
          ],
          [
            "amp",
            0.023461890502198015
          ],
          [
            "vaccin",
            0.022571555371053823
          ],
          [
            "game",
            0.02133101156090405
          ],
          [
            "work",
            0.020237805815050747
          ],
          [
            "someth",
            0.016794161708378788
          ],
          [
            "man",
            0.015534358266290738
          ],
          [
            "world",
            0.015384118574431803
          ],
          [
            "someon",
            0.015120125645947224
          ],
          [
            "school",
            0.015006561551544784
          ],
          [
            "look",
            0.014854740985397878
          ],
          [
            "point",
            0.01470245590150101
          ],
          [
            "love",
            0.0145879336603674
          ],
          [
            "life",
            0.0145879336603674
          ],
          [
            "week",
            0.014473144192518831
          ],
          [
            "everyon",
            0.014049920157173034
          ],
          [
            "govern",
            0.013700852268937852
          ],
          [
            "lol",
            0.013661907987933455
          ],
          [
            "noth",
            0.013622931506343448
          ],
          [
            "lot",
            0.013309946606863058
          ],
          [
            "anyth",
            0.013270675101068469
          ],
          [
            "women",
            0.013231370219479292
          ],
          [
            "hope",
            0.013152659838993398
          ],
          [
            "home",
            0.013034340863974179
          ]
        ]
      },
      {
        "size": 851,
        "terms": [
          [
            "just",
            1.0704923609828432
          ],
          [
            "second",
            1.0102780475251398
          ],
          [
            "forgot",
            0.6540451718630429
          ],
          [
            "twice",
            0.6199901302786435
          ],
          [
            "dam",
            0.5975586478372293
          ],
          [
            "shouldn",
            0.5808043526185023
          ],
          [
            "more",
            0.5674263062543589
          ],
          [
            "mon",
            0.5674263062543589
          ],
          [
            "been",
            0.5154157357510538
          ],
          [
            "ill",
            0.4865581302646353
          ],
          [
            "not",
            0.43945345020516285
          ],
          [
            "other",
            0.3852927434460453
          ]
        ]
      }
    ]
  }
]

Below is an abridged sample response (NOTE: the actual response was much larger) for a topics request with startDate as 2021-07-06 and endDate as 2021-07-07, with fullResults set to true :

Topic Posts

The social media posts/documents assigned to a particular topic cluster can also be retrieved at the following endpoint:

POST /analysis/nlp/collections/{collection}/topicposts

where collection is a path parameter, of the target social media collection. There are no query parameters for this endpoint. However, the request body has to be an array of topics ids (as strings) in the format ‘yyyymmdd-t’ (where tt is the topic number for the day specified in yyyymmdd ), such as:

[
 '20210922-1', '20210923-4', '20210924-5'
]

and 'Content-Type:application/json' has to be passed into the request headers.

Examples

cURL

curl -XPOST 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/topicposts'\
  --data '["20211010-1", "20211008-2", "20211008-1"]'\
  --header 'Content-Type:application/json'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/nlp/collections/twitter/topicposts'
# query parameters set in the dict below
headers = {'Authorization': f"Bearer {jwt}", 'Content-Type': 'application/json'}
data = ["20211010-1", "20211008-2", "20211008-1"]
res = requests.post(url, headers = headers, json=data)
result = res.json()

Sample responses

Below is an abridged response from a topicposts request for the clusters with ids "20211010-1", "20211008-2", "20211008-1" (clusters on separate days)

curl -XPOST 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/topicposts'\
  --data '["20211010-1", "20211008-2", "20211008-1"]'\
  --header 'Content-Type:application/json'\
  --header "Authorization: Bearer ${JWT}"
[
  "1204822986919952384",
  "1265315022663753728",
  "1293965389089722369",
  "1316470333747101701",
  "1345038179070476289",
  "1350585250506805248",
  "1354391784479019008",
  "1363294990395301890",
  "1374312193785614337",
  "1374420691743739909",
  "1404490430214029315",
  "1406973122193002504",
  "1408094835203002369",
  "1411106951111622664",
  "1413489152708993027",
  "1414266348306370568",
  "1414309380229672973",
  "1418268245493157892",
  "1418588497414340610",
  "1418614308049723396",
  "1423995484469927941",
  "1425970355793895427",
  "1426007838242050054",
  "1429482874500177928",
  "1431703502078812168",
  "1431741358411173889",
  "1432438057274327056",
  "1432668449780731906",
  "1433138299435167747",
  "1433658094538698767",
  "1435668001009946624",
  "1436039149790892034",
  "1438050872068562945",
  "1438193702686527490",
  "1438680701696679939",
  "1438793907207163908",
  "1439684661551280128",
  "1440300321624911880",
  "1440346508780470281"
]

Topic Groupings

A network graph was chosen as the data structure to capture the relationships between topic clusters on consecutive days. In brief, the makeup of the network graph is the following:

• A node is a topic cluster on a given day. These are qualified via the size and similarity of the topic clusters, and would only be in the graph if were above a defined threshold.

• An is was an intersection between two nodes on consecutive days. It would only exist if the number of intersecting terms was above the threshold specified to the function via the API. The higher the threshold, the more sparse the network graph would be.

The network graph can be retrieved at the following endpoint

GET /analysis/nlp/collections/{collection}/topicgroupings

where collection is a path parameter, of the target social media collection. The query parameters for the endpoint are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`
threshold	integer	minimum number of common terms to define a grouping between two clusters	Yes	15

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/topicgroupings?startdate=2021-07-10&enddate=2021-07-20&threshold=12'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/topicgroupings'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-10' , 'endDate':'2021-07-20', 'threshold' : 12 }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample responses

Below is a sample response for a topicgroupings request, with startDate as 2021-07-10 and endDate as 2021-07-20, and threshold as 12:

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/topicgroupings?startdate=2021-07-10&enddate=2021-07-20&threshold=12'\
  --header "Authorization: Bearer ${JWT}"
{
  "directed": true,
  "graph": {},
  "links" : [
     {
      "common_terms": [
        "band",
        "concert",
        "track",
        "artist",
        "audio",
        "sound",
        "tune",
        "singer",
        "lyric",
        "album",
        "music",
        "rock",
        "listen",
        "feat",
        "song",
        "metal",
        "guitar",
        "soundtrack",
        "vocal",
        "remix",
        "danc",
        "courtesi",
        "ti",
        "sing",
        "playlist"
      ],
      "sequences": [
        1
      ],
      "source": "20210710-1",
      "target": "20210711-1",
      "weight": 25
    }
   ],
  "multigraph": false,
  "nodes" : [
    {
      "day": 0,
      "id": "20210710-1",
      "sequences": [
        1
      ],
      "size": 2257,
      "terms": [
        "song",
        "music",
        "album",
        "sound",
        "audio",
        "band",
        "playlist",
        "tune",
        "rock",
        "lyric",
        "theme",
        "guitar",
        "listen",
        "remix",
        "soundtrack",
        "feat",
        "track",
        "sing",
        "artist",
        "metal",
        "concert",
        "vocal",
        "singer",
        "courtesi",
        "punk",
        "version",
        "danc",
        "piano",
        "ti",
        "give"
      ]
    },
    {
      "day": 1,
      "id": "20210711-1",
      "sequences": [
        1
      ],
      "size": 1940,
      "terms": [
        "song",
        "music",
        "album",
        "playlist",
        "sound",
        "audio",
        "tune",
        "band",
        "rock",
        "radio",
        "soundtrack",
        "listen",
        "guitar",
        "jam",
        "lyric",
        "singer",
        "track",
        "remix",
        "feat",
        "ti",
        "concert",
        "drum",
        "artist",
        "microphon",
        "metal",
        "vocal",
        "danc",
        "sing",
        "app",
        "courtesi"
      ]
    },
   ]
}

Topic modelling metadata

To retrieve the model parameters, corpus, and processing time of the topic modelling performed on the cluster, the following endpoint can be queried.

GET /analysis/nlp/collections/{collection}/metadata

where collection is a path parameter, of the target social media collection. The query parameters for the endpoint are:

Parameter	type	Description	Required	Example
startDate	string (date-time)	Start date for data requested	Yes	`2021-09-16`
endDate	string (date-time)	End date for data requested	Yes	`2021-09-21`

Examples

cURL

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/metadata?startdate=2021-07-13&enddate=2021-07-31'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/metadata'
# query parameters set in the dict below
qs_params = { 'startDate' : '2021-07-10' , 'endDate':'2021-07-20'}
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample response

curl -XGET 'https://api.ado.eresearch.unimelb.edu.au//analysis/nlp/collections/twitter/metadata?startdate=2021-07-13&enddate=2021-07-15'\
  --header "Authorization: Bearer ${JWT}"
[
  {
    "id": "twitter-20210713",
    "corpus": {
      "start": "2021-07-13T00:00:00+11:00",
      "end": "2021-07-13T23:59:59+11:00",
      "days": 1,
      "ndocuments": 237902
    },
    "processing": {
      "start": "2021-11-25T17:32:53+11:00",
      "end": "2021-11-25T17:57:44+11:00",
      "procminutes": 24,
      "corpussize": 142944,
      "dictionarysize": 42127,
      "parameters": {
        "w2v": {
          "wv_vector_size": 100,
          "wv_min_count": 100,
          "wv_window_size": 5,
          "wv_training_algorithm": 0
        },
        "clustering": {
          "bert_model": "w2v",
          "bert_top_n_words": 30,
          "bert_min_topic_size": 100
        }
      }
    }
  },
  {
    "id": "twitter-20210714",
    "corpus": {
      "start": "2021-07-14T00:00:00+11:00",
      "end": "2021-07-14T23:59:59+11:00",
      "days": 1,
      "ndocuments": 252765
    },
    "processing": {
      "start": "2021-11-25T17:32:52+11:00",
      "end": "2021-11-25T18:02:45+11:00",
      "procminutes": 29,
      "corpussize": 150183,
      "dictionarysize": 43002,
      "parameters": {
        "w2v": {
          "wv_vector_size": 100,
          "wv_min_count": 100,
          "wv_window_size": 5,
          "wv_training_algorithm": 0
        },
        "clustering": {
          "bert_model": "w2v",
          "bert_top_n_words": 30,
          "bert_min_topic_size": 100
        }
      }
    }
  },
  {
    "id": "twitter-20210715",
    "corpus": {
      "start": "2021-07-15T00:00:00+11:00",
      "end": "2021-07-15T23:59:59+11:00",
      "days": 1,
      "ndocuments": 227384
    },
    "processing": {
      "start": "2021-11-25T17:32:54+11:00",
      "end": "2021-11-25T18:02:02+11:00",
      "procminutes": 29,
      "corpussize": 135061,
      "dictionarysize": 39341,
      "parameters": {
        "w2v": {
          "wv_vector_size": 100,
          "wv_min_count": 100,
          "wv_window_size": 5,
          "wv_training_algorithm": 0
        },
        "clustering": {
          "bert_model": "w2v",
          "bert_top_n_words": 30,
          "bert_min_topic_size": 100
        }
      }
    }
  }
]

Text Search

The text search query follows the Lucene syntax . The fields available are:

text (tokenized text of the social media post)
hashtags (text of hashtags)
author (author’s id)
date (date of posting expressed as YYYYMMDD, in UTC)
language (the language the post is expressed in)

For instance, a valid expression could be: hashtags:booksthatmademe AND author:"22250517" AND text:helped AND language:"en" AND date:"20210718"

GET /analysis/textsearch/collections/{collection}

where ‘collection’ is a path parameter of the target social media collection. The query parameters for the function are:

Parameter	type	Description	Required	Example
query	string	Full-text query	Yes	`climate*`

The output is returned in pages of 200 IDs each. The pagination is managed via the x-ado-bookmark header: the header is returned as a response header by each request, and has to be added to the next request to get the subsequent page until an empty ID array is returned.

The total number of rows that has to be returned by the query is contained in the x-ado-totalrows response header.

Examples

cURL (command-line)

curl -XGET  'https://api.ado.eresearch.unimelb.edu.au/analysis/textsearch/collections/twitter?query=hashtags:climate*'\
  --header "Authorization: Bearer ${JWT}"

Python

import requests
url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/textsearch/collections/twitter'
# query parameters set in the dict below
qs_params = { 'query' : 'hashtags:climate*' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result = res.json()

Sample Response

[
  "1440864320926011394",
  "1505702676994015232"
]

Pagination (Python)

url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/textsearch/collections/twitter'
qs_params = { 'query' : 'hashtags:climate*' }
headers = {'Authorization': f"Bearer {jwt}"}
res = requests.get(url, headers = headers, params=qs_params)
result1 = res.json()

url = 'https://api.ado.eresearch.unimelb.edu.au/analysis/textsearch/collections/twitter'
qs_params = { 'query' : 'hashtags:climate*' }
headers = {'Authorization': f"Bearer {jwt}", 'x-ado-bookmark':res.headers['x-ado-bookmark']}
res = requests.get(url, headers = headers, params=qs_params)
result2 = res.json()