Collection | Golden Retriever

# Golden Retriever API Documentation ## Overview The Golden Retriever API provides a suite of endpoints for interacting with collections, querying logs, and managing chat sessions. This documentation outlines the usage of the API endpoints for retrieving collection details, querying logs, and initiating chat sessions. ### Base URL `https://platform.aisolutionslab.net/golden-retriever/api` --- ## Endpoints --- ### Get Models #### Get Models Request - **URL**: `/get_models/` - **Method**: `GET` - **Headers**: - `Authorization: Bearer token` #### Get Models Response - **Type**: JSON - **Content**: ```json { "embedding_models": [ {"model_id": "st/msmarco-bert-base-dot-v5","name": "st/msmarco-bert-base-dot-v5: 768"}, {"model_id": "st/intfloat/e5-base-v2","name": "E5 Base: Dimensions: 768"}, {"model_id": "st/distilbert-multilingual-nli-stsb-quora-ranking","name": "Multilingual Distill BERT Model - Dimensions: 768" ], "qa_models": [ {"model_id": "openai/gpt-4-turbo","name": "gpt4 Turbo"}, {"model_id": "openai/gpt-4","name": "GPT4"}, {"model_id": "br/meta.llama3-70b-instruct-v1:0","name": "Llama-3-70b-chat (AWS Bedrock)"}, ..... ], "temperature_settings": [ {"name": "Precise","value": 0}, {"name": "Balanced","value": 0.7}, {"name": "Creative","value": 1} ] } ``` --- ### Upload File to Collection #### Upload File Request - **URL**: `/upload/?collection_id=test_0o3yhxax` - **Method**: `POST` - **Headers**: - `Content-Type: multipart/form-data; boundary=----WebKitFormBoundarytecd6ss3iKJxTure` - **Body**: - Form data containing the file to be uploaded and any associated metadata. -uploaded_files[]: (binary) data: {} #### Upload File Response - **Type**: JSON - **Content**: ```json { "errors": [], "success": [ "File 1d8be1cd7ed9_totest.png has been uploaded successfully" ], "results": { "date": "Fri, 17 May 2024 03:52:54 GMT", "metadata": { "access": "private", "collection_id": "test_0o3yhxax", "conceptual_answer_generation": true, "context_length": "2", "created": "2024-05-17 03:52:02", ... }, "run_time": "0.0 sec", "status": "success" } } ``` --- ### Get Files from Collection #### Get Files Request - **URL**: `/get_files/?collection_id=test_0o3yhxax&page=0&docs_per_page=20` - **Method**: `GET` - **Headers**: - `Authorization: Bearer token` #### Get Files Response - **Type**: JSON - **Content**: ```json { "collection_id": "test_0o3yhxax", "date": "Fri, 17 May 2024 03:57:03 GMT", "docs_per_page": 20, "hits": [ { "__source": "ec58d2321760_1d8be1cd7ed9_testfile.txt", "_conv_date": "2024-05-17 03:52:55", "_end_char": 99, "_ext": ".txt", "_file_path": "ec58d2321760_1d8be1cd7ed9_testfile.txt", "_id": "91805602200", "_name": "testfile", "_num_chars": 99, "_start_char": 0, "_text": "Sample text content for the test file. This is a placeholder text used for demonstration." } ], "page": 0, "run_time": "0.0 sec", "success": true, "total_num_results": 1 } ``` This response structure provides details about the files retrieved from the specified collection, including metadata such as file path, file extension, and a snippet of the file's text content. The response also includes pagination details and the total number of results available. --- ### List Collections #### List Collections Request - **URL**: `/list_collections/` - **Method**: `GET` - **Headers**: - `Authorization: Bearer token` #### List Collections Response - **Type**: JSON - **Content**: ```json { "__allowed_actions": { "delete": true, "edit": true, "file_delete": true, "file_upload": true, "view": true }, "access": "public", "chats": { "col_chat_5pdacr29": { "date": "2024-05-01 07:25:27.735249", "title": "What are the Agent Orange dates for Thailand?" } }, "collection_id": "m-21-md", "conceptual_answer_generation": false, "context_length": "2", "count": 21649, "created": "2024-04-25 17:41:44", "description": "", "display_keys": "", "faiss_clusters": "20", "faiss_index_type": "FlatIP", "file_count": 409, ... } --- ### Get Collection Details #### Get Collection Request - **URL**: `/get_collection/` - **Method**: `GET` - **Headers**: - `Authorization: Bearer token` - **Query String Parameters**: - `collection_id`: The unique identifier for the collection. #### Get Collection Response - **Type**: JSON - **Content**: ```json { "__allowed_actions": { "delete": true, "edit": true, "file_delete": true, "file_upload": true, "view": true }, "access": "private", "chats": { "col_chat_09e8a8us": {"title": "title 1"}, "col_chat_0o1k519h": {"title": "title 2"}, "col_chat_zk3n30u1": {"title": "title 3"} }, "collection_id": "m-21-default_e5_correct", "conceptual_answer_generation": true, "context_length": "3", "count": 18948, "created": "2024-04-09 14:44:13", "description": "", "display_keys": "", "faiss_clusters": "20", "faiss_index_type": "FlatIP", "file_count": 409, "updated": "2024-04-18 10:45:36", "users": { "guest": "author", "user1": "editor" } ... } ``` --- ### Search #### Search Request - **URL**: `/search/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "m-21-default_e5_correct", "query": "Hi", "model_id": "openai/gpt-3.5-turbo-16k", "temperature": "0.7" } ``` #### Search Response - **Type**: JSON - **Content**: ```json { "collection_id": "m-21-default_e5_correct", "date": "Thu, 18 Apr 2024 11:18:20 GMT", "hits": [ { "distance": 0.8029217720031738, "sections": [ { "content": "↵February 15, 2024", "title": "Change Date" }, ... ], "title": "title 1", "url": "https://www.dummyurl/uri", "__source": "", "_conv_date": "2024-04-09 14:44:22", "_end_char": 62501, "_ext": ".txt", "_id": "69840161000", "_name": "", "_num_chars": 1901, "_start_char": 60600, "_text": "dummy text" }, ... ], "query": "Hi", "run_time": "2.2 sec", "success": true, "total_num_results": 18 } ``` --- ### Search with filters New Feature: search filters: Narrow down your search results (semantic, keyword, or hybrid) by {key:val} in 'filters' parms. This will let you limit your search/answer/chat to a particular set of documents. Default filters is None. #### Search Request - **URL**: `/search/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "news", "query": "political science", "filters" : {"source_site":"cnn,abcnews", "author":"john smith"} } ``` #### Search Response - **Type**: JSON - **Content**: ```json { 'collection_id': 'news', 'date': '2025-02-12 14:40:38.606595', 'success': True, 'query': 'political science', 'filters': {"source_site":"cnn,abcnews", "author":"john smith"}, 'total_num_results': 2, 'hits': [ {'_file_path': '357118574b05_ec2ae37e7bb9_739y4zq5z7zw2ere.txt', '__source': '357118574b05_ec2ae37e7bb9_739y4zq5z7zw2ere.txt', '_name': '739y4zq5z7zw2ere', '_ext': '.txt', '_start_char': 0, '_end_char': 19, '_num_chars': 19, '_conv_date': '2025-02-12 14:33:25', 'title': 'Political Science 1', 'source_site': 'cnn', '_text': 'political science 1', '_id': '62309797871800', 'distance': 0.9028787488745843}, {'_file_path': '1752d769b4eb_a0d4843205d7_viua7egdswzj3azs.txt', '__source': '1752d769b4eb_a0d4843205d7_viua7egdswzj3azs.txt', '_name': 'viua7egdswzj3azs', '_ext': '.txt', '_start_char': 0, '_end_char': 19, '_num_chars': 19, '_conv_date': '2025-02-12 14:33:25', 'title': 'Political Science 2', 'source_site': 'abcnews', '_text': 'political science 2', '_id': '15959800540900', 'distance': 0.8928195380405719} ], 'run_time': '0.09 sec' } ``` --- ### Answer #### Answer Request - **URL**: `/answer/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "m-21-default_e5_correct", "question": "Hi", "model_id": "openai/gpt-3.5-turbo-16k", "temperature": "0.7", "results_in_context": 2 } ``` #### Answer Response - **Type**: JSON - **Content**: ```json { "answer": "Answer Text", "collection_id": "m-21-default_e5_correct", "date": "Thu, 18 Apr 2024 11:18:24 GMT", "hits": [ { "__source": "", "_conv_date": "2024-04-09 14:44:22", "_end_char": 62501, "_ext": ".txt", "_id": "69840161000", "_name": "", "_num_chars": 1901, "_start_char": 60600, "_text": "Section dummy text", "distance": 0.8029217720031738, "sections": [ { "content": "content text of section", "title": " section title " }, ... ], "title": "M21-1-Part-IV-Subpart-i-Chapter-2-Section-A-Examination-Requests-Overview", "url": "https://www.dummyurl.com/uri" }, ... ], "q_id": "9r6uxg1b", "question": "user question placeholder", "run_time": "3.53 sec", "success": true } ``` --- ### Get QA Log #### QA Log Request - **URL**: `/get_qa_log/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "m-21-default_e5_correct", "log_type": "saved_queries" } ``` #### QA Log Response - **Type**: JSON - **Content**: ```json { "collection_id": "m-21-default_e5_correct", "date": "Thu, 18 Apr 2024 10:45:49 GMT", "logs": [], "messages": [], "run_time": "0.0 sec", "status": "success" } ``` ### Get QA Log (Updated for `chat_logs`) #### QA Chat Log Request - **URL**: `/get_qa_log/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "m-21-default_e5_correct", "log_type": "chat_logs", "chat_id": "col_chat_0o1k519h" } ``` #### Response for `chat_logs` - **Type**: JSON - **Content**: ```json { "collection_id": "m-21-default_e5_correct", "date": "Thu, 18 Apr 2024 11:32:17 GMT", "logs": [ { "chat_id": "col_chat_0o1k519h", "date": "2024-04-17 03:21:54.069381", "doc_id": null, "system_prompt": "Given the following contexts, answer the question from any of the provided contexts or chat history and only from provided contexts or chat history. The answer should be formatted for readability. ...", "user": "guest" }, ... ``` --- ### Chat #### Chat Overview The Chat API endpoint allows users to initiate and continue conversations with an AI model. It supports context management by using `chat_id` and `messages` to maintain the continuity of the conversation. #### Chat Request - **URL**: `/chat/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "m-21-default_e5_correct", "chat_id": null, // Initially null for new conversations "question": "hi", "model_id": "openai/gpt-3.5-turbo-16k", "temperature": "0.7", "results_in_context": 2 } ``` #### Response - **Type**: JSON - **Content**: ```json { "answer": "Hello! How can I assist you today?", "chat_id": "col_chat_vfg3z6u8", // Unique identifier for the conversation "collection_id": "m-21-default_e5_correct", "date": "Thu, 18 Apr 2024 10:53:31 GMT", "doc_id": null, "hits": [...], "messages": [ { "content": "hi", "role": "user" } ], "model_id": "openai/gpt-3.5-turbo-16k", "q_id": "930d7ek3", "question": "hi", "run_time": "4.1 sec", "status": "success", "suggested_follow_up_question": "What types of examinations are routinely performed by specialists?" } ``` #### Context Management - **Initial Request**: For new conversations, the `chat_id` should be sent as `null`. The server will generate a new `chat_id` for the conversation. - **Subsequent Requests**: To maintain the context of the conversation, the `chat_id` received from the initial or previous response must be included in subsequent requests. This allows the API to retrieve the conversation history and maintain continuity. - **Messages**: The `messages` array acts as the memory of the conversation. It should include all previous exchanges in the conversation to provide context to the AI model for generating relevant responses. Each message should capture the content of the exchange and the role (e.g., "user" or "AI"). #### Usage Notes - Ensure that the `chat_id` and `messages` are stored and managed appropriately in client applications to maintain the state of the conversation. - Use the `results_in_context` parameter to specify how many previous interactions the model should consider when generating a response. --- ### Get Job Status Update #### Get Job Status Update for a job_id - **URL**: `/job_status/` - **Method**: `GET` - **Headers**: - `Authorization: Bearer token` - **Query String Parameters**: - `collection_id`: collection_id - `job_id`: job_id returned by the upload API. #### Get Job Status Update Response - **Type**: JSON - **Content**: ```json { "job_id": "6rgxmy1n", "user": "ahmoham@us.ibm.com", "status": "ready", "collection_id": "MCAT_test", "updated": "2024-11-18 15:24:00", "ids": { "64835656200": { "messages": [ { "date": "2024-11-18 15:08:27", "message": "Processing document: e9005ccbf3de_e7f337395e20_M211_Part_VIII_Subpart_iv_Chapter_6_Section_C__Authorization_of_Awards_Under_38_U.S.C._1151.json" }, { "date": "2024-11-18 15:08:27", "message": "Document Chunking: Split Method: char_split, Chunk Size: 600" }, { "date": "2024-11-18 15:08:27", "message": "Creating Embeddings" }, { "date": "2024-11-18 15:08:28", "message": "Saving in DB" }, { "date": "2024-11-18 15:08:29", "message": "Completed Processing Document" } ], "status": "ready" }, "messages": [ { "date": "2024-11-18 15:08:24", "message": "Started Document Conversion" }, { "date": "2024-11-18 15:08:27", "message": "Completed Document Conversion" }, { "date": "2024-11-18 15:24:00", "message": "Completed Processing Documents" } ] } ``` --- ### Upload #### Upload Overview The Upload API endpoint facilitates the uploading of files to a specified collection on the AISolutionsLab platform. It supports file chunking, various splitting methods, and the addition of metadata both per file and universally across all documents. #### Upload Request - **URL**: `/upload/` - **Method**: `POST` - **Headers**: - `Authorization: Bearer token` - **Payload**: ```json { "collection_id": "M-Cat-All_vgp32bd6", "chunking": true, "split_method": "char_split", "chunk_size": 600, "add_keys": "", "uploaded_files[]": ["/path/to/file1.json", "/path/to/file2.json"], "metadata": "/path/to/metadata.json", "data": {"author": "John Doe", "year": "2023"} } ``` #### Split Methods Options - `char_split`: Splits the file based on a specified number of characters. - `line_split`: Splits the file based on lines. - `paragraph_split`: Splits the file based on paragraphs. - `word_split`: Splits the file based on words. #### Response - **Type**: JSON - **Content**: ```json { "errors": [], "results": { "date": "Thu, 12 Sep 2024 12:24:24 GMT", "metadata": { "access": "private", "collection_id": "uploaded_json_j1l8ks3a", "conceptual_answer_generation": false }, "run_time": "0.01 sec", "status": "success", "success": [ "File upload successful", "File upload successful", ... ] } } ``` #### File Handling - **Chunking**: Files can be split into smaller chunks based on the specified `chunk_size`. This is particularly useful for large files. - **Metadata**: Metadata can be specified per file through a metadata file or universally via the `data` JSON object. This metadata is added to all documents uploaded in the request. #### Usage Notes - The `metadata` file should be structured as an array where each element corresponds to the metadata for each file uploaded in the same order. - The `data` JSON object allows for the addition of universal metadata that applies to all documents uploaded in the request, enhancing the flexibility and utility of the upload process. --- ## Usage To interact with the API, users must send HTTP POST requests to the specified endpoint with the appropriate parameters and payload. The responses will be in JSON format, providing detailed information about the upload status and any associated metadata. ## Security Ensure that all interactions with the API are secured, and sensitive information is properly handled to prevent unauthorized access. Use secure methods to transmit files and metadata to protect data integrity and privacy.