RightFind™ XML for Mining FAQ

What does XML for Mining do?


Researchers struggle to gain access to full-text articles for mining. When they do get the full-text they must contend with multiple formats and inconsistent license terms – all of which inhibit text mining efforts. To address these issues, RightsDirect created XML for Mining, which provides a single source of full-text XML content for text mining researchers.

RightsDirect’s XML for Mining solution enables you to quickly identify and download a collection (or corpus) of full-text articles from multiple publishers through a single source. Content and metadata is normalized to make it easy to integrate XML-formatted content into your existing text mining workflow and tools. (When we normalize content, we ensure the XML has consistently applied tags and formatting across publishers, conforming to the JATS standard.) You can access XML for Mining through a web-based user interface or a RESTful API for workflow integrations and custom application development.

How does the service work?

RightsDirect collects content in XML format from multiple publishers and normalizes the content and metadata to the JATS (Journal Article Tag Suite) standard format for mining. (For more information on JATS, visit http://dtd.nlm.nih.gov/.)  XML for Mining enables you to search the full-text of all the content in the solution to identify an article corpus for mining. Results are checked against company subscriptions, and the full-text content of the subscribed articles is available for download along with the abstracts and metadata from those materials to which your company is not subscribed.


How does RightsDirect know whether my company subscribes to a particular journal?


The librarian or information manager at your company stores credentials for the various publisher platforms in XML for Mining. The product uses that information to automatically obtain your company’s subscription information from the publisher’s websites on your behalf. XML for Mining automatically downloads and updates subscription holdings from the publisher’s website and constructs an A-Z title list to ensure copyright compliance.


Does XML for Mining allow full-text downloads from the collection of available articles?


Your company’s holdings information determines whether you are subscribed to a particular article within each project. Downloads of not subscribed articles contain only metadata, including abstracts, and do not contain the full text of the article. Downloads of subscribed articles will contain the full text. Each organization is limited to a maximum number of total unique full-text downloads based on the usage plan they choose.

 How does XML for Mining work with text mining software like Linguamatics I2E or IBM Watson?


XML for Mining is specifically designed to allow you to access and obtain machine-readable content formatted in XML for loading into text mining software such as Linguamatics I2E or IBM Watson. RightsDirect uses the JATS format for its XML files, enabling text mining software to easily ingest the content from XML for Mining. In cases when we have an integration partnership with a text mining software product (for example, Linguamatics I2E), users can easily export their corpus of articles directly into the text mining tool.


What is a unique download?


A unique download refers to an individual full-text article contained in a download archive (compressed in .zip or tar.gz format) that has not been included in any previously-created download archive.

Download archives are created when a user selects “Create Download” from the Project Results page. Once a download archive is created, the number of unique full-text articles contained in it are counted against the client’s available download balance. Note that these archived articles are counted against the client organization’s download balance even if the archive itself is not downloaded by the user.

How are unique downloads counted?

Your organization’s license allows for a specific number of unique full-text downloads to be made each year. All full-text articles contained in a download archive, whether the article is subscribed, Open Access, or purchased, counts against the organization’s available downloads UNLESS it has been downloaded previously by a user within the organization while the client organization has been continuously subscribed to XML for Mining.

Abstracts present in a download archive (such as not subscribed content) do not count against your organization’s available downloads.  You can see your unique download balance within the XML for Mining administrative interface.

If I download an item twice, is it counted?


No. An article previously downloaded by anyone in the organization does not count against your allowable downloads. It will be included in the download archive so you can download and mine against it again without an additional charge.

I received an email that my download archive is expiring, what should I do?

If you have not downloaded the content, follow the link in the email notification to download the archive file. Download archives are available for 90 days after the creation date. After that time we remove the archive from our system and you must create a new archive.

Creating a duplicate download archive will not affect your download balance as the content was already counted as downloaded. Content not included in a download archive will be debited against your available unique download balance.

If I have a Linguamatics license and create an I2E index are the indexed articles counted as a download?

Yes. Articles indexed in the I2E server are first put into a download archive, at which time we count the unique downloads against your download balance. Indexing the content in I2E does not affect the download count.

Can clients purchase full-text XML-formatted articles to which they are not subscribed?

Yes. Clients can either download metadata and abstracts at no cost for unsubscribed articles, or purchase the full-text XML article.

If a client purchases unsubscribed articles for one project, must they purchase them again for use on other projects?

No. After a user has purchased an article, that article is available in full-text XML to all other users and for all other projects at no additional charge.

How can I use downloaded content that is subscribed, unsubscribed, or purchased?

You can make the following uses of the XML content obtained through XML for Mining:

  • Store the XML files internally for the term of the XML for Mining agreement, and create and store up to two additional copies solely as reasonably necessary to text mine the content.
  •  Perform text and data mining across the content using your own tools or third party software.
  • Integrate the results of their text mining into an internal database for internal business purposes.

You may not:

  • Externally distribute or make available any copies of the content;
  • Create, license, sell, distribute, or otherwise make available a database or other product comprised, in large part, of their text mining output intended for use by persons outside your organization;
  • Use downloaded content to create a library or collection to substitute for subscriptions or purchases of the content that your organization ordinarily relies on;
  • Reformat the downloaded content into PDF, EPS, DOC or other formats intended for reading, or for uses other than text and data mining.

Note: The above terms do not apply to Open Access (OA) articles obtained through the XML for Mining solution, which in each case are subject to the specific terms of the relevant OA license.