File Explorer
This document provides a step-by-step guide to the File Explorer of File Manager, which allows Authors to manage, organize, and analyze files and folders across various data lake systems, such as HDFS, S3, NFS, and more.
Getting Started
Connect to Data Lakes: Use connectors or upload tools to add data lakes like S3, HDFS, Google Drive, NFS, and more.
Catalog Data Lakes: Cataloging allows viewing files and folders within the Data Catalog (optional for some levels).
Explore Files and Folders: The File Explorer displays detailed information about files and folders within a data lake connection.
Key Features
Manage Files and Folders: Upload, download, delete, and organize files and folders within data lakes for NFS Connection.
Cataloging: Categorize files and folders for better visibility and data profiling.
Supported File Formats: Upload and manage various file formats, such as CSV, JSON, Parquet, and more. Configure allowed upload formats as needed.
Data Lake OvalSight: Provides a high-level overview of a data lake's structure, size, and file distribution.
Folder OvalSight: Analyze folder contents, including file types, sizes, and overall structure. Accessible from the File Manager or Data Catalog.
List View: Navigate data folders and subfolders visually and access detailed Folder Analysis information.
Data Lake Search: Search for files and folders across an entire data lake connection using keywords.
System Settings: Administrators and Authors can set the maximum upload file size, define allowed file types for cataloging, and control the number of file entries shown per page.
Catalog Data Lakes
Catalog data lakes before using File Explorer is essential. This allows Authors to view all files and folders within the lake. Various methods for adding data:
Connectors: OvalEdge integrates with data lake systems like Hadoop, Amazon S3, Google Drive, and more. These connectors are available on the "Connectors" page.
Upload Tools: Authors can also upload files and folders directly using the "Upload File" or "Upload Folder" tools for NFS Connection.
Crawl with Connectors
Navigate: Authors navigate to Administration > Crawler.
Add Connection: Select the file system (NFS/S3/HDFS/Azure/Drive) and enter the database name.
Provide Credentials: Enter and validate connection details in the "Manage Connection" window. Save.
Crawl Data: Click "Crawl/Profile" to initiate the process. Upon successful completion, folders and files will appear in the File Explorer.
Example:
In S3, "Hospital" (Level 1) gets automatically cataloged (visible in Data Catalog).
To view them, manually catalog "Departments" (Level 2) or "General Medicine" (Level 3) from the File Explorer.
Upload Files via NFS
Authors can upload files and folders directly to the NFS data lake connection.
Access Upload: In File Explorer, select the NFS data lake and click the 9-Dots icon to access the "Upload" option.
Choose File/Folder: Select "File" or "Folder" on the upload page.
Browse and Upload: Browse the computer directory to select the file or folder, then initiate the upload.
Create Directory (Optional): Use the 9-Dots icon to create a new directory if needed.
Verify and Finish: A successful upload will highlight the file in green. Click "Finish" to complete.
Supported File Formats for Upload
The File Explorer supports specific file types. Authors can configure these types through the "config.file.types.to.be.cataloged" setting in System Settings (OTHERS tab). Provide the valid file format for upload.
Supported File Formats
Once uploaded and cataloged in the Data Catalog, the following file formats can be profiled:
CSV (.csv): Comma Separated Values stores tabular data, with each line representing a record and commas separating fields.
JSON (.json): JavaScript Object Notation stores simple data structures for easy data interchange between applications and servers.
Parquet (.parquet): An Apache Parquet file format that is efficient for storing and processing large datasets.
ORC (.orc): Optimized Row Columnar files are used in the Hadoop ecosystem for structured data storage.
XLSX (.xlsx): Microsoft Excel Open XML Spreadsheet format.
XLS (.xls): Microsoft Office Excel spreadsheet format containing rows and columns of data.
Avro (.avro): Apache Avro is a data serialization framework for efficient data exchange with features like schema evolution.
Gzip (.gz): Compressed files using the gzip algorithm for reduced size and faster transmission.
Copyright © 2025, OvalEdge LLC, Peachtree Corners, GA USA
Last updated
Was this helpful?

