Question
Capgemini
ES
Last activity: 16 Oct 2018 12:03 EDT
Reading content of a MS Word file
Hi,
The specification is to find text in a word file to save in a context variable in order to be used in later automations. I have already added microsoftWord connector to the toolbox and I have managed to open a MS Word file but I don't know how to read the content of this file.
I'd appreciate any suggestions
Regards,
Héctor
**Moderation Team has archived post**
This post has been archived for educational purposes. Contents and links will no longer be updated. If you have the same/similar question, please write a new post.
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
Pegasystems Inc.
US
My automation can be done as a script as well - this will be faster to execute on long documents. You will need to do what Mike demonstrated with adding the reference. Here is the script and the automation.
Pegasystems Inc.
US
You need to understand a little bit of the Word Object Model to work with a document. The connector exposes a property named WordDocument which is your starting point - this is a wrapper to the actual document. In the example I show below we next get the Paragraph collection from the Document. As we loop through each Paragraph we compare the Range.Text (the text of the paragraph) with your search phrase. This should get you started - there are several other ways you can do this but this should get you smaller chunks of text to compare and return.
Edit: change the start value of the loop to 1 (Word collections are 1 based - and update the Limit value to count + 1).
capgemini
IN
Hi Jeff,
I used the above flow to read the document content but I got this below error .
Please help me on this
Error executing link in Automation: SearchForWordAndUpdate - From: SearchForWordAndUpdate.paragraphsProxy1.GetItem() To: SearchForWordAndUpdate.paragraphProxy1.Range.Properties
Could not get PropertyInfo for Property: paragraphProxy1.Range on Instance: Automator-8D4E7098912F56F\TypeProxy-8D4E73740516C63
1986
US
Hi Pradeep,
Could you able to resolve the issue you mentioned above. I am also getting the same. Please let me know if you know the resolution.
Thanks
Integrity Systems, Inc.
US
This will require a script most likely. You can use a script to perform complex operations on a Word document. Here is an example of a script that reads tables from a document.
First add a script component to your solution. You will need to add the interop dll for Word as a reference to the script. To do this, right click on the script component and select "Edit Reference". Choose the Microsoft.Office.Interop.Word reference from the GAC.
Now create a script, either double click on the Script component, or right click and select "Edit Script".
Click the green + button at the top of the Script editor to add a new script.
This will require a script most likely. You can use a script to perform complex operations on a Word document. Here is an example of a script that reads tables from a document.
First add a script component to your solution. You will need to add the interop dll for Word as a reference to the script. To do this, right click on the script component and select "Edit Reference". Choose the Microsoft.Office.Interop.Word reference from the GAC.
Now create a script, either double click on the Script component, or right click and select "Edit Script".
Click the green + button at the top of the Script editor to add a new script.
Click the Validate button to validate the script. Now click the green + button again to add a second script.
Enter the following values in the specified fields
Name: GetData
Parameters: Microsoft.Office.Interop.Word.Document worddoc, out string message, out Hashtable header, out System.Data.DataTable instructors, out System.Data.DataTable students
Method body:
message = string.Empty;
header = new Hashtable();
instructors = new System.Data.DataTable();
students = new System.Data.DataTable();
Microsoft.Office.Interop.Word.Table headerTable = worddoc.Tables[1];
Microsoft.Office.Interop.Word.Table instructorTable = worddoc.Tables[2];
Microsoft.Office.Interop.Word.Table studentTable = worddoc.Tables[3];
// Process header table
for(int row = 1; row <= headerTable.Rows.Count; row++)
{
string name1 = CleanString(headerTable.Cell(row, 1).Range.Text);
string value1 = CleanString(headerTable.Cell(row, 2).Range.Text);
header.Add(name1, value1);
string name2 = CleanString(headerTable.Cell(row, 3).Range.Text);
string value2 = CleanString(headerTable.Cell(row, 4).Range.Text);
header.Add(name2, value2);
}
instructors.Columns.Add("Index");
instructors.Columns.Add("Name");
instructors.Columns.Add("WWID");
for(int row = 2; row <= instructorTable.Rows.Count; row++)
{
DataRow newRow = instructors.NewRow();
newRow["Index"] = CleanString(instructorTable.Cell(row, 1).Range.Text);
newRow["Name"] = CleanString(instructorTable.Cell(row, 2).Range.Text);
newRow["WWID"] = CleanString(instructorTable.Cell(row, 3).Range.Text);
instructors.Rows.Add(newRow);
}
students.Columns.Add("Index");
students.Columns.Add("Name");
students.Columns.Add("WWID");
for (int row = 2; row <= studentTable.Rows.Count; row++)
{
DataRow newRow = students.NewRow();
newRow["Index"] = CleanString(studentTable.Cell(row, 1).Range.Text);
newRow["Name"] = CleanString(studentTable.Cell(row, 2).Range.Text);
newRow["WWID"] = CleanString(studentTable.Cell(row, 3).Range.Text);
students.Rows.Add(newRow);
}
Validate the script, when Valid, click the OK button to close the Script editor.
Now you can use the script from an automation.
Accepted Solution
Pegasystems Inc.
US
My automation can be done as a script as well - this will be faster to execute on long documents. You will need to do what Mike demonstrated with adding the reference. Here is the script and the automation.
Capgemini
ES
1986
US
Hi Jeff,
When I tried the automation exactly as mentioned in your post, I am getting below issue.
Error executing link in Automation: WRC_P_ReadWordContent - From: WRC_P_ReadWordContent.paragraphs.GetItem() To: WRC_P_ReadWordContent.paragraph.Range.Properties
Could not get PropertyInfo for Property: paragraph.Range on Instance: Automator-8D5D24E3FEB2475\TypeProxy-8D5D24FCED9C6D8
Please find the attached automation screenshot and let me know whether I am doing anything wrong.
Thanks,
-
Achref Mabrouk