Guides / Sending and managing data / Format and structure your data

Algolia limits the size of a record for performance reasons, with these size limits depending on your plan. If a record is larger than your plan’s threshold, Algolia returns the error message Record is too big. Before considering upgrading your plan, investigate techniques to help reduce the size of your records.

Reduce the size of your records before sending them to Algolia by removing any attributes you don’t need.

If you need to index long documents, split your data into several records rather than indexing an entire document in one record.

Removing unused attributes

You might not need to index every attribute from your data sources. Indexing everything will grow your record’s size, but you might not need everything to build your search experience. For example, imagine you’re developing a website to display tweets on specific topics. Twitter may send you a lot of data, most of which are useless for searching.

Before removing attributes

For example, an excerpt of the API response for a single tweet might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
  {
    "text": "Good morning #LaraconEU ! Swing by the Algolia booth for some vintage video games, awesome stickers, and lightning-fast search 😎",
    "truncated": false,
    "in_reply_to_user_id": null,
    "in_reply_to_status_id": null,
    "pictures": true,
    "pictures_urls": [
      "https://pbs.twimg.com/media/Dl1UftbX4AAF1Fj.jpg",
      "https://pbs.twimg.com/media/Dl1Um6OXsAAa1JS.jpg"
    ],
    "source": "<a href=\"http://twitter.com/\" rel=\"nofollow\">Twitter for Web</a>",
    "in_reply_to_screen_name": null,
    "in_reply_to_status_id_str": null,
    "retweeted": false,
    "retweet_count": 10,
    "like_count": 24,
    "place": null,
    "geolocated": false
  }
]

You can reduce this record’s size by:

  • Only indexing attributes that will help build your search experience
  • Reusing an attribute for different purposes. For example, retweeted=false is the same as retweet_count=0
  • Adding attributes that are conditional on the values of other attributes. For example, if you want to use the pictures attribute to display a gallery (in the pictures_url array), only include it when the pictures_urls array isn’t empty.

After removing attributes

You can cherry-pick only the information you need for searching, displaying, ranking, and leaving everything else out. After transforming the data, a record could look like this:

1
2
3
4
5
6
7
8
9
10
11
[
  {
    "text": "Good morning #LaraconEU ! Swing by the Algolia booth for some vintage video games, awesome stickers, and lightning-fast search 😎",
    "pictures_urls": [
      "https://pbs.twimg.com/media/Dl1UftbX4AAF1Fj.jpg",
      "https://pbs.twimg.com/media/Dl1Um6OXsAAa1JS.jpg"
    ],
    "retweet_count": 10,
    "like_count": 24
  }
]

Before reworking the data, you had information that would bloat your record without helping search, display, or rank. After selecting only the necessary data, you’ve reduced the number of attributes and the record size without hurting search quality.

Did you find this page helpful?